What Is Throughput?
Throughput is the volume of work a system successfully completes in a given period — most commonly expressed as requests per second (RPS) or jobs per minute. It measures the productive capacity of an API or pipeline, as opposed to latency, which measures how fast a single unit of work completes.
Think of it like a highway: latency is how long it takes one car to drive from A to B, while throughput is how many cars per hour pass through the toll booth.
Throughput vs. Latency
These metrics move in opposite directions under optimization pressure:
- High concurrency increases throughput but can increase per-request latency as resources are shared.
- Serial processing keeps latency low but limits throughput.
- Batching dramatically increases throughput for bulk operations at the cost of added latency per item.
For web scraping and knowledge extraction workloads, throughput is often the primary concern: you want to process as many URLs as possible in the shortest time, and individual request latency is secondary.
Measuring Throughput
Throughput is straightforward to measure:
RPS = total_requests / elapsed_seconds
For a batch extraction job that processes 1,000 URLs over 10 minutes:
RPS = 1000 / 600 = 1.67 requests per second
In practice, measure throughput under sustained load — not just for a short burst — because systems often perform well initially but degrade as caches fill, queues back up, or connection pools exhaust.
Throughput Limits in KnowledgeSDK
KnowledgeSDK enforces rate limits per API key (knowledgesdk_live_*) using a token bucket algorithm. Your plan determines:
- Burst capacity — how many requests you can fire in rapid succession.
- Sustained rate — the long-term average the system will allow.
For high-throughput bulk extraction, the recommended pattern is:
- Call
POST /v1/extract/asyncfor each URL — this returns ajobIdimmediately. - KnowledgeSDK processes jobs in the background using worker queues.
- Receive results via the
callbackUrlwebhook rather than polling.
This approach decouples your submission throughput from the extraction processing rate, letting you queue thousands of URLs quickly without waiting for each one to complete.
Factors That Affect Throughput
- Concurrency — how many parallel requests your client sends simultaneously.
- Connection reuse — HTTP/2 multiplexing and persistent connections eliminate per-request setup overhead.
- Payload size — larger request and response bodies consume more bandwidth and serialization time.
- Server-side queuing — background job workers add throughput by parallelizing work across multiple processes.
- Caching — repeated requests for the same URL (e.g., the same domain's sitemap) hit cache and complete faster, increasing effective throughput.
Optimizing Client-Side Throughput
import PQueue from "p-queue";
const queue = new PQueue({ concurrency: 10 }); // 10 parallel requests
const urls = ["https://example.com", "https://another.com" /* ...more */];
const jobs = urls.map((url) =>
queue.add(async () => {
const res = await fetch("https://api.knowledgesdk.com/v1/extract/async", {
method: "POST",
headers: {
"x-api-key": process.env.KNOWLEDGE_API_KEY!,
"Content-Type": "application/json",
},
body: JSON.stringify({ url, callbackUrl: "https://yourapp.com/webhooks/knowledge" }),
});
return res.json();
})
);
const results = await Promise.all(jobs);
console.log(`Submitted ${results.length} jobs`);
Tune concurrency to stay within your plan's rate limits while maximizing submission speed.