knowledgesdk.com/glossary/token-bucket
Infrastructure & DevOpsadvanced

Also known as: token bucket algorithm

Token Bucket

A rate limiting algorithm that allows bursts of traffic up to a bucket capacity while enforcing a sustained average request rate.

What Is the Token Bucket Algorithm?

The token bucket is a rate limiting algorithm that models request capacity as a bucket filled with tokens. Each API request consumes one token. Tokens refill at a fixed rate over time. When the bucket is empty, further requests are rejected until tokens regenerate.

The key advantage over simpler algorithms is that it explicitly supports bursting: a client that has been idle accumulates tokens up to the bucket's maximum capacity, and can then spend that stockpile in a rapid burst before being throttled.

How It Works

Imagine a bucket with a capacity of 100 tokens. Tokens are added at a rate of 10 per second. When a request arrives:

  1. Check how many tokens have accumulated since the last request.
  2. If there is at least 1 token available, allow the request and deduct 1 token.
  3. If the bucket is empty, reject the request with HTTP 429 Too Many Requests.
bucket_capacity = 100 tokens
refill_rate     = 10 tokens/second

time=0s:  bucket = 100 (full)
          → send 100 requests instantly — all succeed, bucket = 0
time=1s:  bucket = 10 (refilled)
          → send 10 requests — all succeed, bucket = 0
time=2s:  bucket = 10
          → send 15 requests — 10 succeed, 5 rejected (429)

Token Bucket vs. Other Algorithms

Algorithm Burst Allowed Complexity Smoothness
Fixed Window Yes (boundary spikes) Low Poor
Sliding Window Limited Medium Good
Token Bucket Yes (controlled) Medium Good
Leaky Bucket No Medium Excellent
  • Fixed window resets the counter at the start of each minute, allowing double the rate at window boundaries (end of minute N + start of minute N+1).
  • Leaky bucket enforces a perfectly smooth output rate — requests drip out at a fixed pace regardless of arrival pattern. No bursting is allowed.
  • Token bucket strikes the best balance for most APIs: it smooths out the sustained rate while accommodating legitimate bursts from idle clients.

Implementation in Redis

Token bucket state is typically stored in Redis for distributed systems, where multiple API server instances share the same rate limit per API key:

import Redis from "ioredis";

const redis = new Redis();

async function consumeToken(apiKeyId: string, capacity: number, refillRate: number): Promise<boolean> {
  const now = Date.now() / 1000; // seconds
  const key = `rate:${apiKeyId}`;

  const [tokens, lastRefill] = await redis.hmget(key, "tokens", "last_refill");

  const storedTokens = parseFloat(tokens ?? String(capacity));
  const storedLastRefill = parseFloat(lastRefill ?? String(now));

  // Calculate how many tokens to add since last refill
  const elapsed = now - storedLastRefill;
  const newTokens = Math.min(capacity, storedTokens + elapsed * refillRate);

  if (newTokens < 1) {
    // Not enough tokens — reject
    return false;
  }

  // Consume one token and persist state
  await redis.hmset(key, {
    tokens: String(newTokens - 1),
    last_refill: String(now),
  });
  await redis.expire(key, 3600);

  return true;
}

How KnowledgeSDK Uses Token Buckets

KnowledgeSDK applies token bucket rate limiting per API key (knowledgesdk_live_*). Your plan tier determines:

  • Bucket capacity — the maximum burst size (e.g., 50 requests for Starter, 200 for Pro).
  • Refill rate — the sustained average (e.g., 5 requests/second for Starter).

This means if your application has been idle overnight and then kicks off a batch extraction job in the morning, it can submit a rapid burst of POST /v1/extract/async requests before the limiter engages — without needing artificial delays between calls.

Client-Side Token Bucket

You can implement a client-side token bucket to self-throttle before hitting server limits:

class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private capacity: number,
    private refillRate: number // tokens per ms
  ) {
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  async consume(): Promise<void> {
    const now = Date.now();
    const elapsed = now - this.lastRefill;
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;

    if (this.tokens < 1) {
      const waitMs = (1 - this.tokens) / this.refillRate;
      await new Promise((r) => setTimeout(r, waitMs));
      this.tokens = 0;
    } else {
      this.tokens -= 1;
    }
  }
}

// Allow bursts up to 20, sustained at 5 RPS
const bucket = new TokenBucket(20, 5 / 1000);

for (const url of urls) {
  await bucket.consume(); // self-throttle before each request
  submitExtractionJob(url);
}

This pattern prevents your client from triggering server-side 429 errors in the first place.

Related Terms

Infrastructure & DevOpsbeginner
Rate Limiting
A control mechanism that restricts how many API requests a client can make within a given time window.
Infrastructure & DevOpsbeginner
Throughput
The number of requests or operations a system can process per unit of time, a key performance metric for scraping and search APIs.
Infrastructure & DevOpsbeginner
API Key
A secret token passed in HTTP headers or query parameters to authenticate requests to an API service.
TokenTokenization

Try it now

Build with Token Bucket using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →
← Back to glossary