Rate Limiting

A control mechanism that restricts how many API requests a client can make within a given time window.

What Is Rate Limiting?

Rate limiting is a server-side policy that caps how many requests a client can send to an API within a defined time window. Once a client exceeds its quota, the server rejects additional requests — typically with an HTTP 429 Too Many Requests response — until the window resets.

Rate limiting protects API infrastructure from overload, prevents abuse, and ensures fair resource distribution across all users.

Why APIs Implement Rate Limits

Stability — unbounded request volume can exhaust CPU, memory, or downstream service connections.
Cost control — web scraping, AI inference, and vector search are expensive operations. Limits prevent runaway costs.
Fairness — without limits, a single heavy user can degrade performance for everyone else on the same plan.
Security — rate limits slow down credential-stuffing attacks and denial-of-service attempts.

KnowledgeSDK enforces rate limits per API key (knowledgesdk_live_*), so each tenant's request budget is tracked independently.

Common Rate Limiting Strategies

Fixed Window

The simplest approach: count requests in a fixed time slot (e.g., 0:00–0:59). The counter resets at the top of each minute. Easy to implement but can allow bursts at window boundaries.

Sliding Window

A rolling window that looks back exactly N seconds from the current moment. Smoother than fixed window but requires more memory.

Token Bucket

Tokens accumulate in a bucket at a steady rate. Each request consumes one token. Clients can burst up to the bucket's capacity before being throttled. This is the algorithm KnowledgeSDK uses internally.

Leaky Bucket

Requests enter a queue (the bucket) and are processed at a fixed rate. Excess requests overflow and are dropped. Produces very smooth output but adds latency.

Reading Rate Limit Headers

Well-designed APIs communicate limits through response headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 43
X-RateLimit-Reset: 1711929600
Retry-After: 30

Use Retry-After or X-RateLimit-Reset to implement exponential backoff in your client rather than hammering the API.

Handling 429 Errors in Your Code

async function extractWithRetry(url: string, apiKey: string) {
  for (let attempt = 0; attempt < 5; attempt++) {
    const res = await fetch("https://api.knowledgesdk.com/v1/extract", {
      method: "POST",
      headers: { "x-api-key": apiKey, "Content-Type": "application/json" },
      body: JSON.stringify({ url }),
    });

    if (res.status === 429) {
      const retryAfter = Number(res.headers.get("Retry-After") ?? 10);
      await new Promise((r) => setTimeout(r, retryAfter * 1000 * 2 ** attempt));
      continue;
    }

    return res.json();
  }
  throw new Error("Rate limit retries exhausted");
}

Best Practices

Cache aggressively. If the same URL will be extracted multiple times, cache the result rather than re-calling the API.
Use async endpoints for bulk work. POST /v1/extract/async offloads processing to background jobs and reduces synchronous request pressure.
Spread requests over time. Add deliberate delays between batch operations instead of firing all requests simultaneously.
Monitor X-RateLimit-Remaining. Slow down proactively before hitting zero rather than reacting to 429 errors.