Throughput

The number of requests or operations a system can process per unit of time, a key performance metric for scraping and search APIs.

What Is Throughput?

Throughput is the volume of work a system successfully completes in a given period — most commonly expressed as requests per second (RPS) or jobs per minute. It measures the productive capacity of an API or pipeline, as opposed to latency, which measures how fast a single unit of work completes.

Think of it like a highway: latency is how long it takes one car to drive from A to B, while throughput is how many cars per hour pass through the toll booth.

Throughput vs. Latency

These metrics move in opposite directions under optimization pressure:

High concurrency increases throughput but can increase per-request latency as resources are shared.
Serial processing keeps latency low but limits throughput.
Batching dramatically increases throughput for bulk operations at the cost of added latency per item.

For web scraping and knowledge extraction workloads, throughput is often the primary concern: you want to process as many URLs as possible in the shortest time, and individual request latency is secondary.

Measuring Throughput

Throughput is straightforward to measure:

RPS = total_requests / elapsed_seconds

For a batch extraction job that processes 1,000 URLs over 10 minutes:

RPS = 1000 / 600 = 1.67 requests per second

In practice, measure throughput under sustained load — not just for a short burst — because systems often perform well initially but degrade as caches fill, queues back up, or connection pools exhaust.

Throughput Limits in KnowledgeSDK

KnowledgeSDK enforces rate limits per API key (knowledgesdk_live_*) using a token bucket algorithm. Your plan determines:

Burst capacity — how many requests you can fire in rapid succession.
Sustained rate — the long-term average the system will allow.

For high-throughput bulk extraction, the recommended pattern is:

Call POST /v1/extract/async for each URL — this returns a jobId immediately.
KnowledgeSDK processes jobs in the background using worker queues.
Receive results via the callbackUrl webhook rather than polling.

This approach decouples your submission throughput from the extraction processing rate, letting you queue thousands of URLs quickly without waiting for each one to complete.

Factors That Affect Throughput

Concurrency — how many parallel requests your client sends simultaneously.
Connection reuse — HTTP/2 multiplexing and persistent connections eliminate per-request setup overhead.
Payload size — larger request and response bodies consume more bandwidth and serialization time.
Server-side queuing — background job workers add throughput by parallelizing work across multiple processes.
Caching — repeated requests for the same URL (e.g., the same domain's sitemap) hit cache and complete faster, increasing effective throughput.

Optimizing Client-Side Throughput

import PQueue from "p-queue";

const queue = new PQueue({ concurrency: 10 }); // 10 parallel requests

const urls = ["https://example.com", "https://another.com" /* ...more */];

const jobs = urls.map((url) =>
  queue.add(async () => {
    const res = await fetch("https://api.knowledgesdk.com/v1/extract/async", {
      method: "POST",
      headers: {
        "x-api-key": process.env.KNOWLEDGE_API_KEY!,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ url, callbackUrl: "https://yourapp.com/webhooks/knowledge" }),
    });
    return res.json();
  })
);

const results = await Promise.all(jobs);
console.log(`Submitted ${results.length} jobs`);

Tune concurrency to stay within your plan's rate limits while maximizing submission speed.

Related Terms

Infrastructure & DevOpsbeginner

Latency

The time delay between sending an API request and receiving the response, a critical metric for real-time AI applications.

Infrastructure & DevOpsbeginner

Rate Limiting

A control mechanism that restricts how many API requests a client can make within a given time window.

Infrastructure & DevOpsadvanced

Token Bucket

A rate limiting algorithm that allows bursts of traffic up to a bucket capacity while enforcing a sustained average request rate.

← Temperature Token →

Try it now

Build with Throughput using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →

← Back to glossary