Website Change Detection with Webhooks: Build a Monitoring Agent in 50 Lines
Knowing when a web page changes is surprisingly hard to do well. The naive approach — poll every URL on a cron job, hash the HTML, compare to yesterday's hash — is fragile, expensive, and slow. You're re-scraping pages that haven't changed (wasting money), missing changes that happen between runs (gaps in coverage), and dealing with false positives from dynamic content like timestamps and ad IDs.
Webhooks flip this model. Instead of your system asking "did this change?", the scraping infrastructure tells you when it does. You subscribe once, then receive a structured notification — including what changed — the moment it happens.
This tutorial builds three real monitoring use cases using knowledgeSDK webhooks:
- Competitor pricing monitor — get notified when a competitor's pricing page updates
- Documentation sync — keep your AI agent's knowledge base current
- News monitoring — track specific topics across multiple sources
How knowledgeSDK Webhooks Work
When you subscribe to a URL, knowledgeSDK:
- Scrapes a baseline snapshot of the content
- Monitors the URL for changes on a configurable schedule (default: every 15 minutes)
- When content changes, scrapes the new version
- Sends a
POSTrequest to yourcallbackUrlwith:- The URL that changed
- A structured diff (added/removed/modified sections)
- The full new markdown content
- A timestamp
The key advantage over polling yourself: knowledgeSDK detects changes using content-aware diffing, not just hash comparisons. You get semantic diffs (which sections changed) not just "something is different."
Use Case 1: Competitor Pricing Monitor
This is the most common use case. You want to know immediately when a competitor's pricing page changes so your sales team or pricing strategy can respond.
Node.js Implementation
import { KnowledgeSDK } from '@knowledgesdk/node';
import express from 'express';
const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY });
const app = express();
app.use(express.json());
// Step 1: Scrape baseline content
const competitors = [
{ name: 'Competitor A', url: 'https://competitora.com/pricing' },
{ name: 'Competitor B', url: 'https://competitorb.com/pricing' },
{ name: 'Competitor C', url: 'https://competitorc.com/pricing' },
];
async function setupMonitoring() {
console.log('Scraping baseline content...');
for (const competitor of competitors) {
// Step 2: Scrape and index the baseline
const page = await client.scrape({ url: competitor.url });
console.log(`Indexed: ${competitor.name} (${page.wordCount} words)`);
// Step 3: Subscribe to changes
await client.webhooks.subscribe({
url: competitor.url,
callbackUrl: `${process.env.PUBLIC_URL}/webhooks/pricing-change`,
events: ['content.changed'],
metadata: { competitorName: competitor.name },
});
console.log(`Monitoring: ${competitor.name}`);
}
}
// Step 4: Handle webhook notifications
app.post('/webhooks/pricing-change', async (req, res) => {
const { url, diff, newContent, changedAt, metadata } = req.body;
console.log(`\nPricing change detected!`);
console.log(`Competitor: ${metadata.competitorName}`);
console.log(`URL: ${url}`);
console.log(`Changed at: ${changedAt}`);
console.log(`Sections added: ${diff.added.length}`);
console.log(`Sections removed: ${diff.removed.length}`);
console.log(`Sections modified: ${diff.modified.length}`);
// Step 5: Send Slack notification
await sendSlackAlert({
competitor: metadata.competitorName,
url,
diff,
changedAt,
});
res.sendStatus(200);
});
async function sendSlackAlert({ competitor, url, diff, changedAt }) {
const changes = [];
if (diff.added.length > 0) {
changes.push(`Added ${diff.added.length} section(s)`);
}
if (diff.removed.length > 0) {
changes.push(`Removed ${diff.removed.length} section(s)`);
}
if (diff.modified.length > 0) {
changes.push(`Modified ${diff.modified.length} section(s)`);
}
const message = {
text: `Competitor pricing change detected`,
blocks: [
{
type: 'section',
text: {
type: 'mrkdwn',
text: `*Pricing change at ${competitor}*\n${changes.join(', ')}\n<${url}|View pricing page>`,
},
},
{
type: 'section',
text: {
type: 'mrkdwn',
text: `*Key changes:*\n${diff.modified
.slice(0, 3)
.map(m => `• ${m.section}: ${m.summary}`)
.join('\n')}`,
},
},
],
};
await fetch(process.env.SLACK_WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(message),
});
}
// Start server and setup monitoring
app.listen(3000, async () => {
console.log('Server running on port 3000');
await setupMonitoring();
});
Python Implementation
import os
import json
from flask import Flask, request, jsonify
from knowledgesdk import KnowledgeSDK
import httpx
app = Flask(__name__)
client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])
COMPETITORS = [
{"name": "Competitor A", "url": "https://competitora.com/pricing"},
{"name": "Competitor B", "url": "https://competitorb.com/pricing"},
{"name": "Competitor C", "url": "https://competitorc.com/pricing"},
]
def setup_monitoring():
print("Scraping baseline content...")
for competitor in COMPETITORS:
# Scrape and index baseline
page = client.scrape(url=competitor["url"])
print(f"Indexed: {competitor['name']} ({page.word_count} words)")
# Subscribe to changes
client.webhooks.subscribe(
url=competitor["url"],
callback_url=f"{os.environ['PUBLIC_URL']}/webhooks/pricing-change",
events=["content.changed"],
metadata={"competitor_name": competitor["name"]},
)
print(f"Monitoring: {competitor['name']}")
@app.post("/webhooks/pricing-change")
def handle_pricing_change():
data = request.json
url = data["url"]
diff = data["diff"]
changed_at = data["changedAt"]
competitor_name = data.get("metadata", {}).get("competitor_name", "Unknown")
print(f"\nPricing change detected!")
print(f"Competitor: {competitor_name}")
print(f"URL: {url}")
print(f"Changed at: {changed_at}")
send_slack_alert(
competitor=competitor_name,
url=url,
diff=diff,
changed_at=changed_at,
)
return jsonify({"ok": True})
def send_slack_alert(competitor: str, url: str, diff: dict, changed_at: str):
changes = []
if diff.get("added"):
changes.append(f"Added {len(diff['added'])} section(s)")
if diff.get("removed"):
changes.append(f"Removed {len(diff['removed'])} section(s)")
if diff.get("modified"):
changes.append(f"Modified {len(diff['modified'])} section(s)")
key_changes = "\n".join(
f"• {m['section']}: {m['summary']}"
for m in diff.get("modified", [])[:3]
)
payload = {
"text": f"Competitor pricing change detected",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"*Pricing change at {competitor}*\n{', '.join(changes)}\n<{url}|View pricing page>",
},
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"*Key changes:*\n{key_changes}",
},
},
],
}
httpx.post(os.environ["SLACK_WEBHOOK_URL"], json=payload)
if __name__ == "__main__":
setup_monitoring()
app.run(port=3000)
That's the core of the pricing monitor in under 50 meaningful lines. The webhook handler is 10 lines; the Slack notification is 20 lines; the setup is 10 lines.
Understanding the Diff Payload
When knowledgeSDK sends a webhook, the diff object has this structure:
{
"url": "https://competitora.com/pricing",
"changedAt": "2026-03-19T14:32:00Z",
"metadata": { "competitorName": "Competitor A" },
"diff": {
"added": [
{
"section": "Enterprise Plan",
"content": "## Enterprise Plan\n\nStarting at $999/month...",
"position": "after:Pro Plan"
}
],
"removed": [
{
"section": "Annual discount note",
"content": "Save 20% with annual billing"
}
],
"modified": [
{
"section": "Pro Plan",
"summary": "Price changed from $49/month to $59/month",
"before": "## Pro Plan\n\n$49/month...",
"after": "## Pro Plan\n\n$59/month..."
}
]
},
"newContent": "# Full updated markdown content..."
}
The semantic diff is what makes webhook-based monitoring significantly more useful than simple hash comparison. You don't just know that something changed — you know what changed and where.
Use Case 2: Documentation Sync for AI Agents
If your AI agent answers questions about third-party APIs (Stripe, GitHub, Notion), you need to keep your knowledge base current when those docs update. Here's a documentation sync pipeline:
// Node.js: Documentation sync
const docSources = [
'https://stripe.com/docs/api',
'https://docs.github.com/en/rest',
'https://developers.notion.com/reference',
];
// One-time setup
async function indexAndMonitorDocs() {
for (const url of docSources) {
// Index for immediate search
await client.scrape({ url });
// Subscribe to changes
await client.webhooks.subscribe({
url,
callbackUrl: `${process.env.PUBLIC_URL}/webhooks/docs-updated`,
events: ['content.changed'],
});
}
}
// Webhook handler — content is already re-indexed automatically
app.post('/webhooks/docs-updated', async (req, res) => {
const { url, diff, changedAt } = req.body;
// Content is already re-indexed by knowledgeSDK
// You may want to:
// 1. Invalidate any cached responses that used this content
// 2. Notify your team
// 3. Log the change for audit purposes
await invalidateCache(url);
console.log(`Docs updated: ${url} at ${changedAt}`);
console.log(`${diff.modified.length} sections changed`);
res.sendStatus(200);
});
# Python: Documentation sync
doc_sources = [
"https://stripe.com/docs/api",
"https://docs.github.com/en/rest",
"https://developers.notion.com/reference",
]
def index_and_monitor_docs():
for url in doc_sources:
client.scrape(url=url)
client.webhooks.subscribe(
url=url,
callback_url=f"{os.environ['PUBLIC_URL']}/webhooks/docs-updated",
events=["content.changed"]
)
@app.post("/webhooks/docs-updated")
def handle_docs_updated():
data = request.json
url = data["url"]
diff = data["diff"]
changed_at = data["changedAt"]
# Content already re-indexed automatically
# Invalidate cache, notify team, etc.
invalidate_cache(url)
print(f"Docs updated: {url} — {len(diff['modified'])} sections changed")
return jsonify({"ok": True})
The key insight here: when you use knowledgeSDK webhooks for documentation sync, you don't need to trigger a re-indexing job. knowledgeSDK automatically re-scrapes and re-indexes the updated content before sending your webhook. Your search results are already up to date by the time you receive the notification.
Use Case 3: News and Topic Monitoring
Monitor specific topics across news sites. When new content appears matching your topic, process it for your AI pipeline.
// Monitor technology news sources for AI-related content
const newsSources = [
{ url: 'https://techcrunch.com/category/artificial-intelligence', topic: 'AI' },
{ url: 'https://venturebeat.com/category/ai', topic: 'AI' },
{ url: 'https://www.theverge.com/ai-artificial-intelligence', topic: 'AI' },
];
async function setupNewsMonitoring() {
for (const source of newsSources) {
await client.webhooks.subscribe({
url: source.url,
callbackUrl: `${process.env.PUBLIC_URL}/webhooks/news-update`,
events: ['content.changed'],
metadata: { topic: source.topic },
});
}
}
app.post('/webhooks/news-update', async (req, res) => {
const { url, diff, metadata } = req.body;
// Only process added content (new articles)
if (diff.added.length === 0) {
return res.sendStatus(200);
}
for (const addition of diff.added) {
// Process new article content
await processNewArticle({
source: url,
topic: metadata.topic,
content: addition.content,
});
}
res.sendStatus(200);
});
async function processNewArticle({ source, topic, content }) {
// Extract article title and URL
const titleMatch = content.match(/^## (.+)$/m);
const urlMatch = content.match(/\[Read more\]\((.+)\)/);
if (!titleMatch || !urlMatch) return;
const articleTitle = titleMatch[1];
const articleUrl = urlMatch[1];
// Scrape full article and add to knowledge base
await client.scrape({ url: articleUrl });
// Optionally: summarize and send to Slack
await notifyNewArticle({ title: articleTitle, url: articleUrl, topic });
console.log(`New ${topic} article indexed: ${articleTitle}`);
}
Managing Webhook Subscriptions
List Active Subscriptions
const subscriptions = await client.webhooks.list();
console.log(`Active subscriptions: ${subscriptions.length}`);
for (const sub of subscriptions) {
console.log(`${sub.url} — last checked: ${sub.lastChecked}`);
}
subscriptions = client.webhooks.list()
print(f"Active subscriptions: {len(subscriptions)}")
for sub in subscriptions:
print(f"{sub.url} — last checked: {sub.last_checked}")
Update Subscription Settings
// Change check interval or callback URL
await client.webhooks.update(subscriptionId, {
checkInterval: '5m', // Check every 5 minutes instead of default 15
callbackUrl: 'https://your-new-app.com/webhooks/changes',
});
Unsubscribe
await client.webhooks.unsubscribe(subscriptionId);
// Or unsubscribe by URL
await client.webhooks.unsubscribeByUrl('https://competitora.com/pricing');
client.webhooks.unsubscribe(subscription_id)
# Or by URL
client.webhooks.unsubscribe_by_url("https://competitora.com/pricing")
Handling Webhook Reliability
Verify Webhook Signatures
Always verify that webhooks come from knowledgeSDK, not an attacker:
import crypto from 'crypto';
function verifyWebhookSignature(payload, signature, secret) {
const expectedSig = crypto
.createHmac('sha256', secret)
.update(JSON.stringify(payload))
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(`sha256=${expectedSig}`)
);
}
app.post('/webhooks/pricing-change', (req, res) => {
const signature = req.headers['x-knowledgesdk-signature'];
if (!verifyWebhookSignature(req.body, signature, process.env.WEBHOOK_SECRET)) {
return res.status(401).json({ error: 'Invalid signature' });
}
// Process the webhook...
});
import hmac
import hashlib
import json
def verify_webhook_signature(payload: dict, signature: str, secret: str) -> bool:
expected = hmac.new(
secret.encode(),
json.dumps(payload, separators=(",", ":")).encode(),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(signature, f"sha256={expected}")
@app.post("/webhooks/pricing-change")
def handle_pricing_change():
signature = request.headers.get("X-KnowledgeSDK-Signature", "")
if not verify_webhook_signature(request.json, signature, os.environ["WEBHOOK_SECRET"]):
return jsonify({"error": "Invalid signature"}), 401
# Process the webhook...
Idempotent Handlers
Webhooks can be delivered more than once (at-least-once delivery). Make your handlers idempotent:
app.post('/webhooks/pricing-change', async (req, res) => {
const { webhookId } = req.body;
// Check if we've already processed this webhook
const alreadyProcessed = await db.exists(`processed_webhook:${webhookId}`);
if (alreadyProcessed) {
return res.sendStatus(200);
}
// Mark as processing
await db.set(`processed_webhook:${webhookId}`, true, { ex: 86400 });
// Process...
res.sendStatus(200);
});
Respond Quickly, Process Async
Webhook handlers must respond within 10 seconds or knowledgeSDK will retry. For long-running processing, respond immediately and process in the background:
app.post('/webhooks/pricing-change', async (req, res) => {
// Respond immediately
res.sendStatus(200);
// Process in background
processChangeAsync(req.body).catch(console.error);
});
async function processChangeAsync(data) {
// This can take as long as needed
await generateReport(data);
await sendDetailedSlackMessage(data);
await updateDatabase(data);
}
Comparing DIY Polling vs Webhooks
| Aspect | DIY Polling (cron + scrape) | knowledgeSDK Webhooks |
|---|---|---|
| Setup time | 2-4 hours | 10 minutes |
| Detection latency | 15 min - 24 hours (depends on cron) | <15 minutes |
| Cost at 100 monitored URLs | ~100 scrapes/day = $0.30/day | Included in plan |
| False positives (timestamps, ads) | High — need custom filtering | Low — content-aware diff |
| What changed | You have to implement diffing | Structured diff in payload |
| Missed changes | Possible (change and revert between runs) | Rare (frequent polling) |
| Infrastructure to maintain | Cron job, storage, diff logic, retry handling | None |
FAQ
How frequently does knowledgeSDK check for changes? The default check interval is every 15 minutes. You can configure it to 5 minutes, 30 minutes, or hourly depending on how time-sensitive the changes are for your use case.
Can I monitor pages behind authentication?
Pages behind a standard login form require session cookies, which knowledgeSDK doesn't support by default. For API endpoints that accept bearer tokens, you can pass the Authorization header. For login-protected web pages, Browserbase-style session management is needed.
What counts as a "change"? Does knowledgeSDK ignore trivial differences like timestamps? Yes — knowledgeSDK uses content-aware diffing that ignores common dynamic content: timestamps, user counts, ad slots, and other frequently-changing noise. The webhook fires only on meaningful content changes (pricing, copy, structure, new sections).
Can I get the full diff as text to show to users or send to an LLM? Yes — the webhook payload includes both the structured diff (sections added/removed/modified) and the full new markdown content. You can format the diff however you need for display or LLM processing.
What happens if my webhook endpoint is down when a change fires? knowledgeSDK retries failed webhooks with exponential backoff: immediately, then 1 minute, 5 minutes, 30 minutes, and 2 hours. After 5 failed attempts, the webhook is paused and you'll receive an email notification.
Can I test my webhook handler locally?
Yes — use a tunneling tool like ngrok to expose your local server, then use that ngrok URL as your callbackUrl. Alternatively, use the knowledgeSDK dashboard to send a test webhook to any URL.
How many URLs can I monitor simultaneously? The Starter plan ($29/mo) supports monitoring up to 100 URLs. The Pro plan ($99/mo) supports up to 1,000 URLs. For larger monitoring needs, contact the knowledgeSDK team.
Conclusion
Webhook-based change detection is a fundamentally better architecture than polling for any application where timeliness matters. The DIY approach works, but it requires building and maintaining infrastructure (cron jobs, hashing, diffing, retry logic) that doesn't differentiate your product.
With knowledgeSDK webhooks, you subscribe once and receive structured, semantically-aware diffs when content changes. The competitor pricing monitor built in this tutorial is roughly 50 lines of code. The polling equivalent would be 300+ lines with a separate scheduled job.
For related reading, see our guides on web scraping for RAG and building AI agents with web access.
Try knowledgeSDK free — get your API key at knowledgesdk.com/setup