What Is the Token Bucket Algorithm?
The token bucket is a rate limiting algorithm that models request capacity as a bucket filled with tokens. Each API request consumes one token. Tokens refill at a fixed rate over time. When the bucket is empty, further requests are rejected until tokens regenerate.
The key advantage over simpler algorithms is that it explicitly supports bursting: a client that has been idle accumulates tokens up to the bucket's maximum capacity, and can then spend that stockpile in a rapid burst before being throttled.
How It Works
Imagine a bucket with a capacity of 100 tokens. Tokens are added at a rate of 10 per second. When a request arrives:
- Check how many tokens have accumulated since the last request.
- If there is at least 1 token available, allow the request and deduct 1 token.
- If the bucket is empty, reject the request with HTTP
429 Too Many Requests.
bucket_capacity = 100 tokens
refill_rate = 10 tokens/second
time=0s: bucket = 100 (full)
→ send 100 requests instantly — all succeed, bucket = 0
time=1s: bucket = 10 (refilled)
→ send 10 requests — all succeed, bucket = 0
time=2s: bucket = 10
→ send 15 requests — 10 succeed, 5 rejected (429)
Token Bucket vs. Other Algorithms
| Algorithm | Burst Allowed | Complexity | Smoothness |
|---|---|---|---|
| Fixed Window | Yes (boundary spikes) | Low | Poor |
| Sliding Window | Limited | Medium | Good |
| Token Bucket | Yes (controlled) | Medium | Good |
| Leaky Bucket | No | Medium | Excellent |
- Fixed window resets the counter at the start of each minute, allowing double the rate at window boundaries (end of minute N + start of minute N+1).
- Leaky bucket enforces a perfectly smooth output rate — requests drip out at a fixed pace regardless of arrival pattern. No bursting is allowed.
- Token bucket strikes the best balance for most APIs: it smooths out the sustained rate while accommodating legitimate bursts from idle clients.
Implementation in Redis
Token bucket state is typically stored in Redis for distributed systems, where multiple API server instances share the same rate limit per API key:
import Redis from "ioredis";
const redis = new Redis();
async function consumeToken(apiKeyId: string, capacity: number, refillRate: number): Promise<boolean> {
const now = Date.now() / 1000; // seconds
const key = `rate:${apiKeyId}`;
const [tokens, lastRefill] = await redis.hmget(key, "tokens", "last_refill");
const storedTokens = parseFloat(tokens ?? String(capacity));
const storedLastRefill = parseFloat(lastRefill ?? String(now));
// Calculate how many tokens to add since last refill
const elapsed = now - storedLastRefill;
const newTokens = Math.min(capacity, storedTokens + elapsed * refillRate);
if (newTokens < 1) {
// Not enough tokens — reject
return false;
}
// Consume one token and persist state
await redis.hmset(key, {
tokens: String(newTokens - 1),
last_refill: String(now),
});
await redis.expire(key, 3600);
return true;
}
How KnowledgeSDK Uses Token Buckets
KnowledgeSDK applies token bucket rate limiting per API key (knowledgesdk_live_*). Your plan tier determines:
- Bucket capacity — the maximum burst size (e.g., 50 requests for Starter, 200 for Pro).
- Refill rate — the sustained average (e.g., 5 requests/second for Starter).
This means if your application has been idle overnight and then kicks off a batch extraction job in the morning, it can submit a rapid burst of POST /v1/extract/async requests before the limiter engages — without needing artificial delays between calls.
Client-Side Token Bucket
You can implement a client-side token bucket to self-throttle before hitting server limits:
class TokenBucket {
private tokens: number;
private lastRefill: number;
constructor(
private capacity: number,
private refillRate: number // tokens per ms
) {
this.tokens = capacity;
this.lastRefill = Date.now();
}
async consume(): Promise<void> {
const now = Date.now();
const elapsed = now - this.lastRefill;
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
if (this.tokens < 1) {
const waitMs = (1 - this.tokens) / this.refillRate;
await new Promise((r) => setTimeout(r, waitMs));
this.tokens = 0;
} else {
this.tokens -= 1;
}
}
}
// Allow bursts up to 20, sustained at 5 RPS
const bucket = new TokenBucket(20, 5 / 1000);
for (const url of urls) {
await bucket.consume(); // self-throttle before each request
submitExtractionJob(url);
}
This pattern prevents your client from triggering server-side 429 errors in the first place.