knowledgesdk.com/blog/website-change-detection-webhook
use-caseMarch 19, 2026·11 min read

Website Change Detection with Webhooks: Build a Monitoring Agent in 50 Lines

Build a competitor pricing monitor with webhooks in 50 lines of code. Full tutorial: scrape baseline, subscribe to changes, receive structured diffs, trigger Slack alerts.

Website Change Detection with Webhooks: Build a Monitoring Agent in 50 Lines

Website Change Detection with Webhooks: Build a Monitoring Agent in 50 Lines

Knowing when a web page changes is surprisingly hard to do well. The naive approach — poll every URL on a cron job, hash the HTML, compare to yesterday's hash — is fragile, expensive, and slow. You're re-scraping pages that haven't changed (wasting money), missing changes that happen between runs (gaps in coverage), and dealing with false positives from dynamic content like timestamps and ad IDs.

Webhooks flip this model. Instead of your system asking "did this change?", the scraping infrastructure tells you when it does. You subscribe once, then receive a structured notification — including what changed — the moment it happens.

This tutorial builds three real monitoring use cases using knowledgeSDK webhooks:

  1. Competitor pricing monitor — get notified when a competitor's pricing page updates
  2. Documentation sync — keep your AI agent's knowledge base current
  3. News monitoring — track specific topics across multiple sources

How knowledgeSDK Webhooks Work

When you subscribe to a URL, knowledgeSDK:

  1. Scrapes a baseline snapshot of the content
  2. Monitors the URL for changes on a configurable schedule (default: every 15 minutes)
  3. When content changes, scrapes the new version
  4. Sends a POST request to your callbackUrl with:
    • The URL that changed
    • A structured diff (added/removed/modified sections)
    • The full new markdown content
    • A timestamp

The key advantage over polling yourself: knowledgeSDK detects changes using content-aware diffing, not just hash comparisons. You get semantic diffs (which sections changed) not just "something is different."


Use Case 1: Competitor Pricing Monitor

This is the most common use case. You want to know immediately when a competitor's pricing page changes so your sales team or pricing strategy can respond.

Node.js Implementation

import { KnowledgeSDK } from '@knowledgesdk/node';
import express from 'express';

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY });
const app = express();
app.use(express.json());

// Step 1: Scrape baseline content
const competitors = [
  { name: 'Competitor A', url: 'https://competitora.com/pricing' },
  { name: 'Competitor B', url: 'https://competitorb.com/pricing' },
  { name: 'Competitor C', url: 'https://competitorc.com/pricing' },
];

async function setupMonitoring() {
  console.log('Scraping baseline content...');

  for (const competitor of competitors) {
    // Step 2: Scrape and index the baseline
    const page = await client.scrape({ url: competitor.url });
    console.log(`Indexed: ${competitor.name} (${page.wordCount} words)`);

    // Step 3: Subscribe to changes
    await client.webhooks.subscribe({
      url: competitor.url,
      callbackUrl: `${process.env.PUBLIC_URL}/webhooks/pricing-change`,
      events: ['content.changed'],
      metadata: { competitorName: competitor.name },
    });

    console.log(`Monitoring: ${competitor.name}`);
  }
}

// Step 4: Handle webhook notifications
app.post('/webhooks/pricing-change', async (req, res) => {
  const { url, diff, newContent, changedAt, metadata } = req.body;

  console.log(`\nPricing change detected!`);
  console.log(`Competitor: ${metadata.competitorName}`);
  console.log(`URL: ${url}`);
  console.log(`Changed at: ${changedAt}`);
  console.log(`Sections added: ${diff.added.length}`);
  console.log(`Sections removed: ${diff.removed.length}`);
  console.log(`Sections modified: ${diff.modified.length}`);

  // Step 5: Send Slack notification
  await sendSlackAlert({
    competitor: metadata.competitorName,
    url,
    diff,
    changedAt,
  });

  res.sendStatus(200);
});

async function sendSlackAlert({ competitor, url, diff, changedAt }) {
  const changes = [];

  if (diff.added.length > 0) {
    changes.push(`Added ${diff.added.length} section(s)`);
  }
  if (diff.removed.length > 0) {
    changes.push(`Removed ${diff.removed.length} section(s)`);
  }
  if (diff.modified.length > 0) {
    changes.push(`Modified ${diff.modified.length} section(s)`);
  }

  const message = {
    text: `Competitor pricing change detected`,
    blocks: [
      {
        type: 'section',
        text: {
          type: 'mrkdwn',
          text: `*Pricing change at ${competitor}*\n${changes.join(', ')}\n<${url}|View pricing page>`,
        },
      },
      {
        type: 'section',
        text: {
          type: 'mrkdwn',
          text: `*Key changes:*\n${diff.modified
            .slice(0, 3)
            .map(m => `• ${m.section}: ${m.summary}`)
            .join('\n')}`,
        },
      },
    ],
  };

  await fetch(process.env.SLACK_WEBHOOK_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(message),
  });
}

// Start server and setup monitoring
app.listen(3000, async () => {
  console.log('Server running on port 3000');
  await setupMonitoring();
});

Python Implementation

import os
import json
from flask import Flask, request, jsonify
from knowledgesdk import KnowledgeSDK
import httpx

app = Flask(__name__)
client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])

COMPETITORS = [
    {"name": "Competitor A", "url": "https://competitora.com/pricing"},
    {"name": "Competitor B", "url": "https://competitorb.com/pricing"},
    {"name": "Competitor C", "url": "https://competitorc.com/pricing"},
]

def setup_monitoring():
    print("Scraping baseline content...")

    for competitor in COMPETITORS:
        # Scrape and index baseline
        page = client.scrape(url=competitor["url"])
        print(f"Indexed: {competitor['name']} ({page.word_count} words)")

        # Subscribe to changes
        client.webhooks.subscribe(
            url=competitor["url"],
            callback_url=f"{os.environ['PUBLIC_URL']}/webhooks/pricing-change",
            events=["content.changed"],
            metadata={"competitor_name": competitor["name"]},
        )
        print(f"Monitoring: {competitor['name']}")

@app.post("/webhooks/pricing-change")
def handle_pricing_change():
    data = request.json
    url = data["url"]
    diff = data["diff"]
    changed_at = data["changedAt"]
    competitor_name = data.get("metadata", {}).get("competitor_name", "Unknown")

    print(f"\nPricing change detected!")
    print(f"Competitor: {competitor_name}")
    print(f"URL: {url}")
    print(f"Changed at: {changed_at}")

    send_slack_alert(
        competitor=competitor_name,
        url=url,
        diff=diff,
        changed_at=changed_at,
    )

    return jsonify({"ok": True})

def send_slack_alert(competitor: str, url: str, diff: dict, changed_at: str):
    changes = []
    if diff.get("added"):
        changes.append(f"Added {len(diff['added'])} section(s)")
    if diff.get("removed"):
        changes.append(f"Removed {len(diff['removed'])} section(s)")
    if diff.get("modified"):
        changes.append(f"Modified {len(diff['modified'])} section(s)")

    key_changes = "\n".join(
        f"• {m['section']}: {m['summary']}"
        for m in diff.get("modified", [])[:3]
    )

    payload = {
        "text": f"Competitor pricing change detected",
        "blocks": [
            {
                "type": "section",
                "text": {
                    "type": "mrkdwn",
                    "text": f"*Pricing change at {competitor}*\n{', '.join(changes)}\n<{url}|View pricing page>",
                },
            },
            {
                "type": "section",
                "text": {
                    "type": "mrkdwn",
                    "text": f"*Key changes:*\n{key_changes}",
                },
            },
        ],
    }

    httpx.post(os.environ["SLACK_WEBHOOK_URL"], json=payload)

if __name__ == "__main__":
    setup_monitoring()
    app.run(port=3000)

That's the core of the pricing monitor in under 50 meaningful lines. The webhook handler is 10 lines; the Slack notification is 20 lines; the setup is 10 lines.


Understanding the Diff Payload

When knowledgeSDK sends a webhook, the diff object has this structure:

{
  "url": "https://competitora.com/pricing",
  "changedAt": "2026-03-19T14:32:00Z",
  "metadata": { "competitorName": "Competitor A" },
  "diff": {
    "added": [
      {
        "section": "Enterprise Plan",
        "content": "## Enterprise Plan\n\nStarting at $999/month...",
        "position": "after:Pro Plan"
      }
    ],
    "removed": [
      {
        "section": "Annual discount note",
        "content": "Save 20% with annual billing"
      }
    ],
    "modified": [
      {
        "section": "Pro Plan",
        "summary": "Price changed from $49/month to $59/month",
        "before": "## Pro Plan\n\n$49/month...",
        "after": "## Pro Plan\n\n$59/month..."
      }
    ]
  },
  "newContent": "# Full updated markdown content..."
}

The semantic diff is what makes webhook-based monitoring significantly more useful than simple hash comparison. You don't just know that something changed — you know what changed and where.


Use Case 2: Documentation Sync for AI Agents

If your AI agent answers questions about third-party APIs (Stripe, GitHub, Notion), you need to keep your knowledge base current when those docs update. Here's a documentation sync pipeline:

// Node.js: Documentation sync
const docSources = [
  'https://stripe.com/docs/api',
  'https://docs.github.com/en/rest',
  'https://developers.notion.com/reference',
];

// One-time setup
async function indexAndMonitorDocs() {
  for (const url of docSources) {
    // Index for immediate search
    await client.scrape({ url });

    // Subscribe to changes
    await client.webhooks.subscribe({
      url,
      callbackUrl: `${process.env.PUBLIC_URL}/webhooks/docs-updated`,
      events: ['content.changed'],
    });
  }
}

// Webhook handler — content is already re-indexed automatically
app.post('/webhooks/docs-updated', async (req, res) => {
  const { url, diff, changedAt } = req.body;

  // Content is already re-indexed by knowledgeSDK
  // You may want to:
  // 1. Invalidate any cached responses that used this content
  // 2. Notify your team
  // 3. Log the change for audit purposes

  await invalidateCache(url);

  console.log(`Docs updated: ${url} at ${changedAt}`);
  console.log(`${diff.modified.length} sections changed`);

  res.sendStatus(200);
});
# Python: Documentation sync
doc_sources = [
    "https://stripe.com/docs/api",
    "https://docs.github.com/en/rest",
    "https://developers.notion.com/reference",
]

def index_and_monitor_docs():
    for url in doc_sources:
        client.scrape(url=url)
        client.webhooks.subscribe(
            url=url,
            callback_url=f"{os.environ['PUBLIC_URL']}/webhooks/docs-updated",
            events=["content.changed"]
        )

@app.post("/webhooks/docs-updated")
def handle_docs_updated():
    data = request.json
    url = data["url"]
    diff = data["diff"]
    changed_at = data["changedAt"]

    # Content already re-indexed automatically
    # Invalidate cache, notify team, etc.
    invalidate_cache(url)
    print(f"Docs updated: {url} — {len(diff['modified'])} sections changed")

    return jsonify({"ok": True})

The key insight here: when you use knowledgeSDK webhooks for documentation sync, you don't need to trigger a re-indexing job. knowledgeSDK automatically re-scrapes and re-indexes the updated content before sending your webhook. Your search results are already up to date by the time you receive the notification.


Use Case 3: News and Topic Monitoring

Monitor specific topics across news sites. When new content appears matching your topic, process it for your AI pipeline.

// Monitor technology news sources for AI-related content
const newsSources = [
  { url: 'https://techcrunch.com/category/artificial-intelligence', topic: 'AI' },
  { url: 'https://venturebeat.com/category/ai', topic: 'AI' },
  { url: 'https://www.theverge.com/ai-artificial-intelligence', topic: 'AI' },
];

async function setupNewsMonitoring() {
  for (const source of newsSources) {
    await client.webhooks.subscribe({
      url: source.url,
      callbackUrl: `${process.env.PUBLIC_URL}/webhooks/news-update`,
      events: ['content.changed'],
      metadata: { topic: source.topic },
    });
  }
}

app.post('/webhooks/news-update', async (req, res) => {
  const { url, diff, metadata } = req.body;

  // Only process added content (new articles)
  if (diff.added.length === 0) {
    return res.sendStatus(200);
  }

  for (const addition of diff.added) {
    // Process new article content
    await processNewArticle({
      source: url,
      topic: metadata.topic,
      content: addition.content,
    });
  }

  res.sendStatus(200);
});

async function processNewArticle({ source, topic, content }) {
  // Extract article title and URL
  const titleMatch = content.match(/^## (.+)$/m);
  const urlMatch = content.match(/\[Read more\]\((.+)\)/);

  if (!titleMatch || !urlMatch) return;

  const articleTitle = titleMatch[1];
  const articleUrl = urlMatch[1];

  // Scrape full article and add to knowledge base
  await client.scrape({ url: articleUrl });

  // Optionally: summarize and send to Slack
  await notifyNewArticle({ title: articleTitle, url: articleUrl, topic });

  console.log(`New ${topic} article indexed: ${articleTitle}`);
}

Managing Webhook Subscriptions

List Active Subscriptions

const subscriptions = await client.webhooks.list();
console.log(`Active subscriptions: ${subscriptions.length}`);

for (const sub of subscriptions) {
  console.log(`${sub.url} — last checked: ${sub.lastChecked}`);
}
subscriptions = client.webhooks.list()
print(f"Active subscriptions: {len(subscriptions)}")

for sub in subscriptions:
    print(f"{sub.url} — last checked: {sub.last_checked}")

Update Subscription Settings

// Change check interval or callback URL
await client.webhooks.update(subscriptionId, {
  checkInterval: '5m', // Check every 5 minutes instead of default 15
  callbackUrl: 'https://your-new-app.com/webhooks/changes',
});

Unsubscribe

await client.webhooks.unsubscribe(subscriptionId);

// Or unsubscribe by URL
await client.webhooks.unsubscribeByUrl('https://competitora.com/pricing');
client.webhooks.unsubscribe(subscription_id)

# Or by URL
client.webhooks.unsubscribe_by_url("https://competitora.com/pricing")

Handling Webhook Reliability

Verify Webhook Signatures

Always verify that webhooks come from knowledgeSDK, not an attacker:

import crypto from 'crypto';

function verifyWebhookSignature(payload, signature, secret) {
  const expectedSig = crypto
    .createHmac('sha256', secret)
    .update(JSON.stringify(payload))
    .digest('hex');

  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(`sha256=${expectedSig}`)
  );
}

app.post('/webhooks/pricing-change', (req, res) => {
  const signature = req.headers['x-knowledgesdk-signature'];

  if (!verifyWebhookSignature(req.body, signature, process.env.WEBHOOK_SECRET)) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  // Process the webhook...
});
import hmac
import hashlib
import json

def verify_webhook_signature(payload: dict, signature: str, secret: str) -> bool:
    expected = hmac.new(
        secret.encode(),
        json.dumps(payload, separators=(",", ":")).encode(),
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(signature, f"sha256={expected}")

@app.post("/webhooks/pricing-change")
def handle_pricing_change():
    signature = request.headers.get("X-KnowledgeSDK-Signature", "")
    if not verify_webhook_signature(request.json, signature, os.environ["WEBHOOK_SECRET"]):
        return jsonify({"error": "Invalid signature"}), 401

    # Process the webhook...

Idempotent Handlers

Webhooks can be delivered more than once (at-least-once delivery). Make your handlers idempotent:

app.post('/webhooks/pricing-change', async (req, res) => {
  const { webhookId } = req.body;

  // Check if we've already processed this webhook
  const alreadyProcessed = await db.exists(`processed_webhook:${webhookId}`);
  if (alreadyProcessed) {
    return res.sendStatus(200);
  }

  // Mark as processing
  await db.set(`processed_webhook:${webhookId}`, true, { ex: 86400 });

  // Process...
  res.sendStatus(200);
});

Respond Quickly, Process Async

Webhook handlers must respond within 10 seconds or knowledgeSDK will retry. For long-running processing, respond immediately and process in the background:

app.post('/webhooks/pricing-change', async (req, res) => {
  // Respond immediately
  res.sendStatus(200);

  // Process in background
  processChangeAsync(req.body).catch(console.error);
});

async function processChangeAsync(data) {
  // This can take as long as needed
  await generateReport(data);
  await sendDetailedSlackMessage(data);
  await updateDatabase(data);
}

Comparing DIY Polling vs Webhooks

Aspect DIY Polling (cron + scrape) knowledgeSDK Webhooks
Setup time 2-4 hours 10 minutes
Detection latency 15 min - 24 hours (depends on cron) <15 minutes
Cost at 100 monitored URLs ~100 scrapes/day = $0.30/day Included in plan
False positives (timestamps, ads) High — need custom filtering Low — content-aware diff
What changed You have to implement diffing Structured diff in payload
Missed changes Possible (change and revert between runs) Rare (frequent polling)
Infrastructure to maintain Cron job, storage, diff logic, retry handling None

FAQ

How frequently does knowledgeSDK check for changes? The default check interval is every 15 minutes. You can configure it to 5 minutes, 30 minutes, or hourly depending on how time-sensitive the changes are for your use case.

Can I monitor pages behind authentication? Pages behind a standard login form require session cookies, which knowledgeSDK doesn't support by default. For API endpoints that accept bearer tokens, you can pass the Authorization header. For login-protected web pages, Browserbase-style session management is needed.

What counts as a "change"? Does knowledgeSDK ignore trivial differences like timestamps? Yes — knowledgeSDK uses content-aware diffing that ignores common dynamic content: timestamps, user counts, ad slots, and other frequently-changing noise. The webhook fires only on meaningful content changes (pricing, copy, structure, new sections).

Can I get the full diff as text to show to users or send to an LLM? Yes — the webhook payload includes both the structured diff (sections added/removed/modified) and the full new markdown content. You can format the diff however you need for display or LLM processing.

What happens if my webhook endpoint is down when a change fires? knowledgeSDK retries failed webhooks with exponential backoff: immediately, then 1 minute, 5 minutes, 30 minutes, and 2 hours. After 5 failed attempts, the webhook is paused and you'll receive an email notification.

Can I test my webhook handler locally? Yes — use a tunneling tool like ngrok to expose your local server, then use that ngrok URL as your callbackUrl. Alternatively, use the knowledgeSDK dashboard to send a test webhook to any URL.

How many URLs can I monitor simultaneously? The Starter plan ($29/mo) supports monitoring up to 100 URLs. The Pro plan ($99/mo) supports up to 1,000 URLs. For larger monitoring needs, contact the knowledgeSDK team.


Conclusion

Webhook-based change detection is a fundamentally better architecture than polling for any application where timeliness matters. The DIY approach works, but it requires building and maintaining infrastructure (cron jobs, hashing, diffing, retry logic) that doesn't differentiate your product.

With knowledgeSDK webhooks, you subscribe once and receive structured, semantically-aware diffs when content changes. The competitor pricing monitor built in this tutorial is roughly 50 lines of code. The polling equivalent would be 300+ lines with a separate scheduled job.

For related reading, see our guides on web scraping for RAG and building AI agents with web access.

Try knowledgeSDK free — get your API key at knowledgesdk.com/setup

Try it now

Scrape, search, and monitor any website with one API.

Get your API key in 30 seconds. First 1,000 requests free.

GET API KEY →
← Back to blog