What Is Rate Limiting?
Rate limiting is a server-side policy that caps how many requests a client can send to an API within a defined time window. Once a client exceeds its quota, the server rejects additional requests — typically with an HTTP 429 Too Many Requests response — until the window resets.
Rate limiting protects API infrastructure from overload, prevents abuse, and ensures fair resource distribution across all users.
Why APIs Implement Rate Limits
- Stability — unbounded request volume can exhaust CPU, memory, or downstream service connections.
- Cost control — web scraping, AI inference, and vector search are expensive operations. Limits prevent runaway costs.
- Fairness — without limits, a single heavy user can degrade performance for everyone else on the same plan.
- Security — rate limits slow down credential-stuffing attacks and denial-of-service attempts.
KnowledgeSDK enforces rate limits per API key (knowledgesdk_live_*), so each tenant's request budget is tracked independently.
Common Rate Limiting Strategies
Fixed Window
The simplest approach: count requests in a fixed time slot (e.g., 0:00–0:59). The counter resets at the top of each minute. Easy to implement but can allow bursts at window boundaries.
Sliding Window
A rolling window that looks back exactly N seconds from the current moment. Smoother than fixed window but requires more memory.
Token Bucket
Tokens accumulate in a bucket at a steady rate. Each request consumes one token. Clients can burst up to the bucket's capacity before being throttled. This is the algorithm KnowledgeSDK uses internally.
Leaky Bucket
Requests enter a queue (the bucket) and are processed at a fixed rate. Excess requests overflow and are dropped. Produces very smooth output but adds latency.
Reading Rate Limit Headers
Well-designed APIs communicate limits through response headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 43
X-RateLimit-Reset: 1711929600
Retry-After: 30
Use Retry-After or X-RateLimit-Reset to implement exponential backoff in your client rather than hammering the API.
Handling 429 Errors in Your Code
async function extractWithRetry(url: string, apiKey: string) {
for (let attempt = 0; attempt < 5; attempt++) {
const res = await fetch("https://api.knowledgesdk.com/v1/extract", {
method: "POST",
headers: { "x-api-key": apiKey, "Content-Type": "application/json" },
body: JSON.stringify({ url }),
});
if (res.status === 429) {
const retryAfter = Number(res.headers.get("Retry-After") ?? 10);
await new Promise((r) => setTimeout(r, retryAfter * 1000 * 2 ** attempt));
continue;
}
return res.json();
}
throw new Error("Rate limit retries exhausted");
}
Best Practices
- Cache aggressively. If the same URL will be extracted multiple times, cache the result rather than re-calling the API.
- Use async endpoints for bulk work.
POST /v1/extract/asyncoffloads processing to background jobs and reduces synchronous request pressure. - Spread requests over time. Add deliberate delays between batch operations instead of firing all requests simultaneously.
- Monitor
X-RateLimit-Remaining. Slow down proactively before hitting zero rather than reacting to 429 errors.