knowledgesdk.com/blog/webhook-vs-polling-web-monitoring
technicalMarch 19, 2026·13 min read

Webhooks vs Polling for Web Change Detection: Developer Guide

Compare webhooks and polling for website change detection. Learn when to use each, production patterns for idempotency, retries, and signature verification.

Webhooks vs Polling for Web Change Detection: Developer Guide

You've built a pipeline that scrapes a competitor's pricing page, a documentation site, or a regulatory database. The content changes occasionally, and you need to know when it does. How do you detect those changes?

There are two approaches: polling (you check repeatedly on a schedule) and webhooks (you get notified when something changes). Both work. Neither is always correct. This guide explains the tradeoffs and shows you how to build production-grade implementations of each.

Polling: The Simple Approach

Polling means your system makes periodic requests to check for changes. Every 5 minutes, every hour, every day — your scheduler fires, you scrape the URL, you compare to the previous version, you act if different.

import { KnowledgeSDK } from '@knowledgesdk/node';
import crypto from 'crypto';

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGE_API_KEY });

// In-memory store (use a database in production)
const previousHashes = new Map<string, string>();

async function checkForChanges(url: string): Promise<boolean> {
  const result = await client.scrape({ url });
  const hash = crypto
    .createHash('sha256')
    .update(result.markdown)
    .digest('hex');

  const previousHash = previousHashes.get(url);

  if (previousHash && previousHash !== hash) {
    console.log(`Content changed at ${url}`);
    previousHashes.set(url, hash);
    return true;
  }

  previousHashes.set(url, hash);
  return false;
}

// Run on a schedule
setInterval(async () => {
  const urls = ['https://example.com/pricing', 'https://example.com/docs'];
  for (const url of urls) {
    await checkForChanges(url);
  }
}, 5 * 60 * 1000); // every 5 minutes

Why Polling Is Tempting

Polling is simple. There's no infrastructure to set up — no public endpoint, no TLS certificates, no reverse proxy. You write a cron job, it runs, done. It works behind firewalls and NAT. It's easy to debug (just look at the cron logs).

For small-scale monitoring (a handful of URLs, low frequency), polling is entirely reasonable. A startup checking their top 5 competitor pricing pages once a day doesn't need webhooks.

The Hidden Costs of Polling at Scale

The problems emerge when you scale:

Wasted requests: If you monitor 10,000 URLs with a 1-hour poll interval, you're making 240,000 requests per day. If 99% of pages don't change on any given day, you're making 237,600 requests to find 2,400 changes. That's 99% wasted API budget.

Latency: A 1-hour poll interval means you might not know about a change for up to 59 minutes after it happened. For real-time use cases — competitive intelligence, regulatory monitoring, price tracking — this is often unacceptable.

Thundering herd: Naive schedulers kick off all checks at the same time (e.g., top of every hour). This creates request spikes that stress your infrastructure and theirs. You need jitter:

// Add random jitter to distribute load
function scheduleWithJitter(fn: () => Promise<void>, intervalMs: number) {
  const jitter = Math.random() * intervalMs * 0.2; // up to 20% jitter
  setTimeout(async () => {
    await fn();
    scheduleWithJitter(fn, intervalMs); // reschedule
  }, intervalMs + jitter);
}

Cost at scale: At 240,000 requests/day at even a fraction of a cent each, you're spending real money to check pages that haven't changed. The economics are brutal at 100,000+ URLs.

Webhooks: Event-Driven Change Detection

With webhooks, the change detection system sends an HTTP POST to your endpoint when it detects a change. You process the change as it happens rather than periodically checking for it.

                 ┌─────────────┐
                 │ KnowledgeSDK│
                 │ monitoring  │
                 │ infrastructure│
                 └──────┬──────┘
                        │ Detects change
                        │ POST /webhooks/changes
                        ▼
                 ┌─────────────┐
                 │  Your App   │
                 │  (public    │
                 │  endpoint)  │
                 └─────────────┘

Webhook Advantages

Efficiency: You're notified only when something changes. No wasted requests.

Latency: Changes trigger near-real-time notifications. No polling delay.

Cost: Proportional to actual change frequency, not monitoring frequency.

Decoupling: Your application doesn't need to know about or manage a polling schedule. It just reacts to events.

Webhook Challenges

Webhooks have real operational requirements:

Public endpoint required: Your server must be reachable from the internet. This is a blocker for local development and systems behind firewalls. For development, use tools like ngrok or Cloudflare Tunnel.

You must handle delivery failures: Webhook delivery can fail. Networks go down. Your server restarts. The sending system must retry, and you must handle duplicate deliveries (idempotency).

Security verification: Anyone can POST to a public endpoint claiming to be from KnowledgeSDK. You must verify webhook signatures.

Endpoint reliability: If your webhook endpoint has downtime, you miss notifications. You need reliable infrastructure and failure recovery.

Building a Production Webhook Handler

Here's a complete production webhook handler with all the necessary patterns:

import express from 'express';
import crypto from 'crypto';

const app = express();

// Parse raw body for signature verification (must come before json parser)
app.use('/webhooks', express.raw({ type: 'application/json' }));

// Track processed event IDs for idempotency
const processedEvents = new Set<string>();

function verifyWebhookSignature(
  payload: Buffer,
  signature: string,
  secret: string
): boolean {
  const hmac = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');

  const expectedSig = `sha256=${hmac}`;

  // Timing-safe comparison to prevent timing attacks
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expectedSig)
  );
}

app.post('/webhooks/content-changes', async (req, res) => {
  const signature = req.headers['x-knowledgesdk-signature'] as string;
  const webhookSecret = process.env.KNOWLEDGESDK_WEBHOOK_SECRET!;

  // 1. Verify signature
  if (!signature || !verifyWebhookSignature(req.body, signature, webhookSecret)) {
    console.warn('Invalid webhook signature');
    return res.status(401).json({ error: 'Invalid signature' });
  }

  // 2. Parse payload
  const event = JSON.parse(req.body.toString());

  // 3. Respond quickly (before processing)
  // The sending system expects a 2xx response quickly
  // Do not wait for processing to complete
  res.status(200).json({ received: true });

  // 4. Check idempotency
  if (processedEvents.has(event.id)) {
    console.log(`Duplicate event ${event.id}, skipping`);
    return;
  }
  processedEvents.add(event.id);

  // 5. Process the event asynchronously
  processEvent(event).catch(err => {
    console.error(`Failed to process event ${event.id}:`, err);
    // Remove from processed set so it can be retried
    processedEvents.delete(event.id);
  });
});

async function processEvent(event: any) {
  switch (event.type) {
    case 'content.changed':
      await handleContentChange(event.data);
      break;
    case 'content.added':
      await handleNewContent(event.data);
      break;
    case 'content.removed':
      await handleRemovedContent(event.data);
      break;
    default:
      console.log(`Unhandled event type: ${event.type}`);
  }
}

async function handleContentChange(data: any) {
  console.log(`Content changed at ${data.url}`);
  console.log(`Previous hash: ${data.previousHash}`);
  console.log(`New hash: ${data.currentHash}`);
  console.log(`Changed sections: ${JSON.stringify(data.diff)}`);

  // Update your database, re-index for search, send alerts, etc.
  await updateKnowledgeBase(data.url, data.markdown);
}

Persistent Idempotency

The in-memory processedEvents set from the example above doesn't survive restarts. In production, use Redis or your database:

import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);

async function isEventProcessed(eventId: string): Promise<boolean> {
  const result = await redis.set(
    `webhook:processed:${eventId}`,
    '1',
    'NX',     // only set if not exists
    'EX',     // set expiry
    86400     // 24 hours (adjust based on retry window)
  );
  // Returns null if key already existed, 'OK' if newly set
  return result === null;
}

app.post('/webhooks/content-changes', async (req, res) => {
  // ... signature verification ...

  const event = JSON.parse(req.body.toString());
  res.status(200).json({ received: true });

  const alreadyProcessed = await isEventProcessed(event.id);
  if (alreadyProcessed) {
    console.log(`Duplicate event ${event.id}, skipping`);
    return;
  }

  await processEvent(event);
});

Retry Handling

KnowledgeSDK webhooks follow standard retry behavior: if your endpoint returns a non-2xx response or times out, the delivery is retried with exponential backoff. Ensure your endpoint:

  1. Returns 2xx within 10 seconds (don't process synchronously)
  2. Is idempotent (retried events must not cause duplicate side effects)
  3. Returns 2xx even for events you don't handle (unknown event types should be acknowledged, not rejected)

Setting Up KnowledgeSDK Webhooks

Register a webhook endpoint using the KnowledgeSDK API:

import { KnowledgeSDK } from '@knowledgesdk/node';

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGE_API_KEY });

// Register webhook for content change events
const webhook = await client.webhooks.create({
  url: 'https://your-app.com/webhooks/content-changes',
  events: ['content.changed', 'content.added'],
});

console.log(`Webhook ID: ${webhook.id}`);
console.log(`Webhook Secret: ${webhook.secret}`);
// Store webhook.secret securely — you'll need it to verify signatures

To monitor specific URLs for changes, use the extract endpoint with monitoring enabled:

// Scrape and monitor a URL for changes
const result = await client.extract({
  url: 'https://competitor.com/pricing',
  monitor: true,
  webhookId: webhook.id,
});

// KnowledgeSDK will now notify your webhook when this URL's content changes

Hybrid Approach: Webhooks for Speed, Polling as Fallback

The most resilient architecture uses webhooks as the primary notification mechanism with polling as a fallback:

class ChangeDetector {
  private client: KnowledgeSDK;
  private webhookId: string;

  constructor() {
    this.client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGE_API_KEY! });
  }

  // Primary path: webhook handler calls this
  async handleWebhookChange(url: string, newMarkdown: string) {
    await this.processChange(url, newMarkdown);
    // Update "last checked" timestamp
    await this.updateLastChecked(url, new Date());
  }

  // Fallback: catch any changes that slipped through
  async pollFallback() {
    const urls = await this.getMonitoredUrls();
    const staleUrls = urls.filter(u =>
      Date.now() - u.lastChecked.getTime() > 6 * 60 * 60 * 1000 // > 6 hours
    );

    for (const { url } of staleUrls) {
      const result = await this.client.scrape({ url });
      const changed = await this.detectChange(url, result.markdown);
      if (changed) {
        await this.processChange(url, result.markdown);
      }
      await this.updateLastChecked(url, new Date());
    }
  }

  private async detectChange(url: string, newMarkdown: string): Promise<boolean> {
    const hash = crypto.createHash('sha256').update(newMarkdown).digest('hex');
    const previousHash = await this.getPreviousHash(url);
    if (hash !== previousHash) {
      await this.storePreviousHash(url, hash);
      return true;
    }
    return false;
  }

  private async processChange(url: string, markdown: string) {
    // Re-index, alert, update database, etc.
    console.log(`Processing change for ${url}`);
  }

  // Stub implementations — replace with your database calls
  private async getMonitoredUrls(): Promise<Array<{ url: string; lastChecked: Date }>> {
    return [];
  }
  private async updateLastChecked(url: string, date: Date): Promise<void> {}
  private async getPreviousHash(url: string): Promise<string | null> { return null; }
  private async storePreviousHash(url: string, hash: string): Promise<void> {}
}

Decision Matrix

Factor Use Polling Use Webhooks
Number of URLs < 100 100+
Change frequency High (>50% daily) Low (<10% daily)
Latency requirements Minutes acceptable Near-real-time required
Infrastructure No public endpoint Public endpoint available
Team size Solo/small team Dedicated backend
Budget sensitivity Low priority High priority

Local Development with Webhooks

Testing webhooks locally requires exposing your localhost. The easiest options in 2026:

# ngrok (most popular)
ngrok http 3000
# → https://abc123.ngrok.io

# Cloudflare Tunnel (free, no account needed for temporary tunnels)
cloudflared tunnel --url http://localhost:3000

# VS Code port forwarding (if using GitHub Codespaces or VS Code remote)
# Available in the Ports panel

Point your webhook URL to the tunnel URL during development.

Frequently Asked Questions

Q: How long does KnowledgeSDK retry failed webhook deliveries?

KnowledgeSDK retries failed deliveries for up to 72 hours using exponential backoff. If your endpoint is down for longer than that, you'll need to manually re-trigger processing for missed events using the polling fallback.

Q: Can I test my webhook handler without a real event?

Yes. You can manually trigger a test event from the KnowledgeSDK dashboard, or construct a test payload and sign it with your webhook secret. This is useful for integration tests.

Q: What's the difference between content hash and semantic change detection?

Hash-based detection (SHA-256 of full content) catches any character-level change. Semantic change detection identifies whether the meaning of the content changed — useful for ignoring timestamp updates, view counts, and other dynamic content that changes constantly without new information. KnowledgeSDK uses semantic diffing to reduce noise.

Q: How do I handle webhooks in a serverless environment?

Serverless functions (Lambda, Cloudflare Workers, Vercel Functions) are well-suited for webhooks — they're short-lived and triggered by HTTP events. The critical thing: process the event asynchronously. In Lambda, call context.callbackWaitsForEmptyEventLoop = false and push to SQS. In Vercel, use background processing.

Q: Should I store raw webhook payloads?

Yes, always. Store the raw payload before processing so you can replay events if your processing logic has bugs. This is the webhook equivalent of event sourcing.

Conclusion

Both polling and webhooks are valid tools for web change detection. Polling is simpler and works for small-scale use cases. Webhooks are more efficient and lower-latency for production systems monitoring many URLs.

The most important production patterns are: always verify webhook signatures, respond immediately and process asynchronously, and ensure idempotent processing through a persistent deduplication store.

KnowledgeSDK's webhook system handles change detection infrastructure — you just handle what to do when a change is detected. Get your API key at knowledgesdk.com/setup.

Try it now

Scrape, search, and monitor any website with one API.

Get your API key in 30 seconds. First 1,000 requests free.

GET API KEY →
← Back to blog