knowledgesdk.com/blog/web-scraping-rate-limiting

technicalMarch 19, 2026·12 min read

Web Scraping Rate Limiting: Production Best Practices for 2026

Learn why rate limiting is critical for production web scraping, with strategies for request queues, exponential backoff, and distributed rate limiting.

Rate limiting is one of those topics that every web scraping engineer eventually learns the hard way. You write a fast scraper, hit a site with 50 concurrent requests, and wake up the next morning to find your IP banned, your data pipeline dead, and an angry email from your CTO.

This guide covers everything you need to know about rate limiting for production web scraping in 2026: why it matters, how to implement it correctly, and how to avoid the pitfalls that trip up even experienced engineers.

Why Rate Limiting Matters

Ethics and Respect for Server Resources

Servers have finite capacity. When your scraper sends hundreds of requests per second to a site, you're consuming CPU, bandwidth, and memory that was provisioned for real users. At extreme levels, aggressive scrapers constitute a denial-of-service attack — even if that was never your intent.

This isn't just a legal concern (though it can be — see the CFAA). It's an engineering ethics issue. The websites you scrape are often built and maintained by small teams with limited infrastructure budgets. A runaway scraper can cause real harm.

Respectful scraping means treating a web server the way you'd want your own servers treated: send requests at a sustainable pace, cache aggressively, and don't hammer endpoints that clearly can't handle the load.

Getting Blocked Is Expensive

From a purely selfish engineering perspective, ignoring rate limits is a false economy. The time you save by scraping fast is dwarfed by the time you spend:

Debugging why 40% of your responses are 429s or CAPTCHAs
Rotating proxies and managing IP pools
Rewriting your scraper after your IP block gets burned
Rebuilding missing data after an incomplete run

A well-rate-limited scraper that runs reliably is worth far more than a fast scraper that breaks constantly.

Detection Fingerprinting

Modern anti-bot systems don't just look at request rate. They look at patterns. A scraper that sends exactly one request every 500ms is actually easier to detect than one with natural variance. Rate limiting done right includes jitter — randomized delays that approximate human browsing behavior.

The Request Rate Limiting Stack

Production rate limiting has several layers, each solving a different part of the problem.

Layer 1: Global Rate Limits

The simplest form — limit your total requests per second across all targets. This is easy to implement but blunt. A global limit of 10 req/s means you could still slam a single slow domain with all 10 if you're not careful.

import { KnowledgeSDK } from '@knowledgesdk/node';
import Bottleneck from 'bottleneck';

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGE_API_KEY });

// Global limiter: max 10 requests per second
const limiter = new Bottleneck({
  maxConcurrent: 5,
  minTime: 100, // minimum 100ms between requests
});

const urls = [
  'https://example.com/page1',
  'https://example.com/page2',
  // ...hundreds more
];

const results = await Promise.all(
  urls.map(url =>
    limiter.schedule(() => client.scrape({ url }))
  )
);

Layer 2: Per-Domain Rate Limits

The correct approach for scraping multiple domains is per-domain limiting. This lets you crawl site-a.com and site-b.com concurrently without either site seeing more than your per-domain limit.

import { KnowledgeSDK } from '@knowledgesdk/node';
import Bottleneck from 'bottleneck';

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGE_API_KEY });

// One limiter per domain
const limiters = new Map<string, Bottleneck>();

function getLimiter(domain: string): Bottleneck {
  if (!limiters.has(domain)) {
    limiters.set(domain, new Bottleneck({
      maxConcurrent: 2,
      minTime: 500, // 2 req/s per domain
      reservoir: 30, // max 30 requests per minute
      reservoirRefreshAmount: 30,
      reservoirRefreshInterval: 60 * 1000,
    }));
  }
  return limiters.get(domain)!;
}

async function scrapeWithRateLimit(url: string) {
  const domain = new URL(url).hostname;
  const limiter = getLimiter(domain);
  return limiter.schedule(() => client.scrape({ url }));
}

Layer 3: Exponential Backoff with Jitter

When you hit a 429 or 503, don't retry immediately. Use exponential backoff with jitter to spread out retry attempts:

async function scrapeWithBackoff(
  url: string,
  maxRetries = 5,
  baseDelayMs = 1000
): Promise<string> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const result = await client.scrape({ url });
      return result.markdown;
    } catch (error: any) {
      const isRetryable =
        error.status === 429 ||
        error.status === 503 ||
        error.code === 'ECONNRESET';

      if (!isRetryable || attempt === maxRetries) {
        throw error;
      }

      // Exponential backoff with full jitter
      const exponentialDelay = baseDelayMs * Math.pow(2, attempt);
      const maxDelay = Math.min(exponentialDelay, 30000); // cap at 30s
      const jitter = Math.random() * maxDelay;

      console.log(`Retry ${attempt + 1}/${maxRetries} for ${url} in ${Math.round(jitter)}ms`);
      await new Promise(resolve => setTimeout(resolve, jitter));
    }
  }
  throw new Error(`Failed after ${maxRetries} retries: ${url}`);
}

The "full jitter" approach (where the delay is random(0, exponential_cap)) outperforms "equal jitter" and "decorrelated jitter" for most workloads, as shown in the AWS Architecture Blog's classic 2015 analysis that remains the industry reference.

Layer 4: Request Queues

For large-scale scraping operations, a simple in-process rate limiter isn't enough. You need a persistent queue that survives restarts and supports multiple workers.

// Using Bull (Redis-backed queue) with rate limiting
import Queue from 'bull';
import { KnowledgeSDK } from '@knowledgesdk/node';

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGE_API_KEY });

const scrapeQueue = new Queue('scrape', {
  redis: { host: 'localhost', port: 6379 },
  limiter: {
    max: 10,        // max 10 jobs
    duration: 1000, // per 1000ms
  },
});

scrapeQueue.process(async (job) => {
  const { url } = job.data;
  const result = await client.scrape({ url });
  return { markdown: result.markdown, url };
});

// Add URLs to queue
async function queueUrls(urls: string[]) {
  const jobs = urls.map(url => ({
    data: { url },
    opts: {
      attempts: 3,
      backoff: { type: 'exponential', delay: 2000 },
      removeOnComplete: 100,
    },
  }));

  await scrapeQueue.addBulk(jobs);
}

Distributed Rate Limiting

When you scale beyond a single process — multiple workers across multiple machines — you need distributed rate limiting backed by a shared state store.

The naive approach (each process maintains its own in-memory limiter) breaks down immediately: three workers each allowing 10 req/s means 30 req/s to the target domain.

Redis-Based Token Bucket

The token bucket algorithm with Redis as the state store is the standard approach:

import Redis from 'ioredis';

const redis = new Redis();

async function acquireToken(domain: string, tokensPerSecond: number): Promise<boolean> {
  const key = `rate_limit:${domain}`;
  const now = Date.now();
  const windowMs = 1000;
  const maxTokens = tokensPerSecond;

  // Lua script for atomic token bucket check
  const script = `
    local key = KEYS[1]
    local now = tonumber(ARGV[1])
    local window = tonumber(ARGV[2])
    local max_tokens = tonumber(ARGV[3])

    local data = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(data[1]) or max_tokens
    local last_refill = tonumber(data[2]) or now

    -- Refill tokens based on elapsed time
    local elapsed = now - last_refill
    local new_tokens = math.min(max_tokens, tokens + (elapsed / window * max_tokens))

    if new_tokens >= 1 then
      redis.call('HMSET', key, 'tokens', new_tokens - 1, 'last_refill', now)
      redis.call('EXPIRE', key, 60)
      return 1
    else
      return 0
    end
  `;

  const result = await redis.eval(script, 1, key, now, windowMs, maxTokens);
  return result === 1;
}

async function scrapeWithDistributedLimit(url: string) {
  const domain = new URL(url).hostname;

  // Wait until we have a token
  while (!(await acquireToken(domain, 2))) { // 2 req/s per domain
    await new Promise(resolve => setTimeout(resolve, 100));
  }

  return client.scrape({ url });
}

Reading and Respecting Rate Limit Headers

Most production sites that want to be scraped responsibly will tell you their limits. Pay attention to these headers:

Header	Meaning
`Retry-After`	Seconds to wait before retrying (after 429)
`X-RateLimit-Limit`	Total requests allowed in window
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when window resets
`RateLimit-Policy`	IETF draft standard (newer sites)

async function scrapeRespectingHeaders(url: string) {
  try {
    return await client.scrape({ url });
  } catch (error: any) {
    if (error.status === 429) {
      const retryAfter = error.headers?.['retry-after'];
      if (retryAfter) {
        const delayMs = parseInt(retryAfter) * 1000;
        console.log(`Rate limited. Waiting ${retryAfter}s`);
        await new Promise(resolve => setTimeout(resolve, delayMs));
        return scrapeRespectingHeaders(url); // retry once
      }
    }
    throw error;
  }
}

Robots.txt and Crawl-Delay

Before writing a single line of scraping code, check the site's robots.txt. Beyond the Allow/Disallow directives, many sites include a Crawl-delay directive:

User-agent: *
Crawl-delay: 10

This means wait 10 seconds between requests. Ignoring this is both disrespectful and increasingly detectable. Parse and honor it:

import robotsParser from 'robots-parser';

async function getRobotsDelay(domain: string): Promise<number> {
  try {
    const response = await fetch(`https://${domain}/robots.txt`);
    const text = await response.text();
    const robots = robotsParser(`https://${domain}/robots.txt`, text);
    const delay = robots.getCrawlDelay('KnowledgeSDKBot')
      || robots.getCrawlDelay('*')
      || 1; // default 1 second
    return delay * 1000; // convert to ms
  } catch {
    return 1000; // default if robots.txt unreachable
  }
}

How KnowledgeSDK Handles Rate Limiting

KnowledgeSDK's scraping infrastructure manages several layers of rate limiting on your behalf:

Proxy Pool Management: Requests are distributed across a large pool of rotating IPs, so even high-volume scraping doesn't look like it's coming from a single source. This means a burst of 100 requests to example.com is spread across many IPs, none of which hit rate limits.

Adaptive Throttling: The infrastructure monitors response codes in real time. If a domain starts returning elevated 429 rates, requests to that domain are automatically throttled until the rate normalizes.

Retry Logic: Transient failures (network timeouts, 503s) are retried automatically with exponential backoff before returning an error to your code. You don't need to implement basic retry logic yourself.

JavaScript Execution Queuing: For JS-heavy sites that require headless browser rendering, requests are queued through a browser pool rather than spawning unlimited concurrent browser instances.

What KnowledgeSDK does not manage is your application-level business logic: which URLs to prioritize, how to sequence your crawl, or domain-specific limits you've agreed to in a site's terms of service. That logic belongs in your code.

import { KnowledgeSDK } from '@knowledgesdk/node';

const client = new KnowledgeSDK({
  apiKey: process.env.KNOWLEDGE_API_KEY,
});

// KnowledgeSDK handles retries, proxies, and JS rendering
// You handle which URLs to scrape and in what order
const result = await client.scrape({
  url: 'https://example.com/products',
});

console.log(result.markdown);

Get your API key at knowledgesdk.com/setup.

Production Monitoring

A production rate limiter needs observability. Track these metrics:

Request rate per domain — are you within your targets?
429 rate — rising 429s indicate you need to slow down
Queue depth — a growing queue means your rate limit is too conservative or you have too many URLs
Retry rate — high retry rates indicate instability

// Minimal metrics wrapper
class RateLimitedScraper {
  private metrics = {
    requests: 0,
    retries: 0,
    rate_limited: 0,
    errors: 0,
  };

  async scrape(url: string) {
    this.metrics.requests++;
    try {
      return await scrapeWithBackoff(url);
    } catch (error: any) {
      if (error.status === 429) this.metrics.rate_limited++;
      else this.metrics.errors++;
      throw error;
    }
  }

  getMetrics() {
    return { ...this.metrics };
  }
}

What to Rate Limit vs. What to Cache

Not every duplicate request needs to be retried — it needs to be cached. Before rate limiting, ask: should this URL even be fetched again?

A smart scraping pipeline combines rate limiting with aggressive caching:

Check your local cache (database, Redis, S3) before making any request
If cached and fresh enough for your use case, return the cached content
If not cached, acquire a rate limit token, scrape, store in cache
When you do re-scrape, use If-Modified-Since or ETag headers to skip unchanged content

KnowledgeSDK caches extraction results automatically, so calling scrape() on the same URL multiple times doesn't always result in multiple outbound requests.

Frequently Asked Questions

Q: What's a safe default rate limit for unknown sites?

Start at 1 request per 2 seconds for a single domain (0.5 req/s). This is conservative enough that it won't cause issues on even lightly-resourced sites. Increase gradually if you need more throughput and the site's responses remain healthy.

Q: Does rate limiting protect against IP bans?

Rate limiting reduces the probability of IP bans but doesn't eliminate it. Sites use many signals beyond request rate: user-agent strings, TLS fingerprints, behavioral patterns, and more. KnowledgeSDK's infrastructure handles these concerns at the platform level.

Q: Should I rate limit across subdomains separately?

Generally yes. api.example.com and www.example.com likely share backend infrastructure. Treat them as the same domain for rate limiting purposes unless you have specific knowledge otherwise.

Q: What's the difference between rate limiting and throttling?

Rate limiting is proactive (you choose not to exceed a threshold). Throttling is reactive (someone else is slowing you down). Good scraping implementations use both: self-imposed rate limits to be respectful, and adaptive throttling to respond when servers push back.

Q: How do I handle sites with no explicit rate limits documented?

Check robots.txt for Crawl-delay. If none, start conservative (1 req/2s), monitor 429 rates, and dial up slowly. If you're seeing 0% 429 at 1 req/s, try 2 req/s, then 5 req/s. The goal is finding the knee of the curve — the fastest rate that still gets clean responses.

Conclusion

Rate limiting isn't optional for production web scraping. It protects target servers, extends the lifespan of your proxies and IPs, and makes your pipeline more reliable. The right approach layers global limits, per-domain limits, exponential backoff, and distributed state management.

KnowledgeSDK handles the infrastructure-level concerns automatically — proxy rotation, adaptive throttling, and retry logic — so you can focus on the application logic that's specific to your use case.

Ready to build a production-grade scraping pipeline? Get your API key at knowledgesdk.com/setup and start with the @knowledgesdk/node SDK or the knowledgesdk Python package.

Try it now