Most web monitoring pipelines are built on polling: a cron job runs every hour, fetches a URL, compares it to the last version, and triggers downstream logic if something changed. It works, but it is wasteful. For every actual change you detect, you might run 23 fetches that return identical content.
The webhook model inverts this. Instead of your system constantly asking "did it change?", the monitoring system tells you when it does. Your pipeline runs only on actual changes. Here is how to build it.
Why Polling Is the Wrong Default
Consider monitoring 50 competitor pages at hourly intervals. That is 1,200 fetches per day. At a typical API cost of $0.001 per request, you are spending $1.20/day — $36/month — on network requests, most of which confirm that nothing changed.
For content that changes infrequently (documentation, pricing pages, feature announcements), the useful signal-to-noise ratio is very low.
The larger problem for AI workflows: polling on a schedule means your LLM processes stale data or redundant data. You either update the agent's knowledge base too frequently (expensive) or not frequently enough (stale retrieval results).
The Webhook Alternative
None of the major web data tools offer developer-facing webhooks for content change detection. Tavily, Exa, Bright Data, ZenRows, Diffbot — none of these have a "register a URL and get notified when it changes" API for developers. You typically have to build the diffing logic yourself on top of their scraping products.
KnowledgeSDK includes this as a first-class feature: POST /v1/webhooks registers a URL for monitoring. When the extracted content of that URL changes meaningfully, KnowledgeSDK sends a POST to your callback URL with the changed content and metadata.
The flow:
Register URL → KnowledgeSDK monitors → Content changes → POST to your callback URL → Your LLM workflow runs
Building the System
Step 1: Register URLs for Monitoring
import KnowledgeSDK from "@knowledgesdk/node";
const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGESDK_API_KEY });
const pagesToMonitor = [
"https://competitorA.com/pricing",
"https://competitorA.com/changelog",
"https://competitorB.com/pricing",
"https://yourmarketleader.com/blog",
];
async function registerMonitoring(urls: string[]) {
const webhooks = await Promise.all(
urls.map((url) =>
client.webhooks.create({
url,
callbackUrl: `${process.env.YOUR_APP_URL}/webhooks/content-changed`,
events: ["content.changed"],
})
)
);
console.log(`Monitoring ${webhooks.length} URLs`);
return webhooks;
}
await registerMonitoring(pagesToMonitor);
from knowledgesdk import KnowledgeSDK
import os
client = KnowledgeSDK(api_key=os.environ["KNOWLEDGESDK_API_KEY"])
pages_to_monitor = [
"https://competitorA.com/pricing",
"https://competitorA.com/changelog",
"https://competitorB.com/pricing",
"https://yourmarketleader.com/blog",
]
webhooks = [
client.webhooks.create(
url=url,
callback_url=f"{os.environ['YOUR_APP_URL']}/webhooks/content-changed",
events=["content.changed"],
)
for url in pages_to_monitor
]
print(f"Monitoring {len(webhooks)} URLs")
Step 2: Handle the Webhook Payload
KnowledgeSDK sends a POST request to your callback URL when content changes. The payload includes the URL, the event type, and the new content.
import express from "express";
import Anthropic from "@anthropic-ai/sdk";
const app = express();
app.use(express.json());
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
app.post("/webhooks/content-changed", async (req, res) => {
// Acknowledge receipt immediately
res.status(200).json({ received: true });
const { url, event, content, previousContent } = req.body;
if (event !== "content.changed") return;
console.log(`Content changed: ${url}`);
// Run your LLM workflow only on actual changes
await processChange({ url, content, previousContent });
});
async function processChange({
url,
content,
previousContent,
}: {
url: string;
content: string;
previousContent: string;
}) {
// Step 1: Re-index in your knowledge base
await client.extract(url);
// Step 2: Use LLM to summarize what changed and why it matters
const response = await anthropic.messages.create({
model: "claude-opus-4-6",
max_tokens: 500,
messages: [
{
role: "user",
content: `A competitor page changed. Summarize the key differences and their business significance.
URL: ${url}
Previous content:
${previousContent.slice(0, 2000)}
New content:
${content.slice(0, 2000)}
What changed and why does it matter?`,
},
],
});
const summary = response.content[0].type === "text" ? response.content[0].text : "";
// Step 3: Store or send the intelligence digest
await sendAlert({
url,
summary,
timestamp: new Date().toISOString(),
});
}
async function sendAlert(data: { url: string; summary: string; timestamp: string }) {
// Send to Slack, email, or your internal system
console.log("Change detected:", data);
}
app.listen(3000, () => console.log("Webhook server running on port 3000"));
Step 3: Make It Production-Ready
For production use, add request validation and ensure your webhook handler is resilient:
import crypto from "crypto";
function verifyWebhookSignature(body: string, signature: string, secret: string): boolean {
const expected = crypto.createHmac("sha256", secret).update(body).digest("hex");
return crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected));
}
app.post("/webhooks/content-changed", express.raw({ type: "application/json" }), async (req, res) => {
const signature = req.headers["x-knowledgesdk-signature"] as string;
if (!verifyWebhookSignature(req.body.toString(), signature, process.env.WEBHOOK_SECRET!)) {
return res.status(401).json({ error: "Invalid signature" });
}
res.status(200).json({ received: true });
const payload = JSON.parse(req.body.toString());
// Process asynchronously — do not block the webhook response
setImmediate(() => processChange(payload));
});
Use Cases Where This Matters
Competitor pricing changes. When a competitor updates their pricing page, your AI agent gets new information within minutes — not at the next scheduled poll. Your competitive intelligence is always current.
Documentation updates. When a vendor updates their API documentation, your customer-facing agent that references those docs gets re-indexed automatically. No manual update cycle.
Job posting changes. When a company adds or removes job listings, your recruitment intelligence tool receives the update immediately and can generate a digest of hiring signal changes.
News and blog monitoring. Instead of polling 20 competitor blogs hourly, register them for webhook monitoring. Your summary agent runs only when new content actually appears.
The Cost Efficiency Argument
With polling at hourly intervals:
- 50 pages × 24 polls/day = 1,200 API calls/day
- At $0.001/call: $1.20/day = $36/month
- Most pages change once per day or less → most calls return unchanged data
With webhook monitoring:
- 0 polling calls
- LLM workflow runs only when content actually changes
- For a page that changes once per week: 4 LLM calls/month vs 720 polling calls/month
The savings compound as you scale the number of monitored pages.
Summary
Polling is a reasonable default when you are starting out and do not know how frequently content changes. Once you have data on change frequency, switching to a webhook model is almost always more efficient.
The key requirements for a webhook-driven web monitoring system:
- A monitoring API that detects content changes (KnowledgeSDK webhooks)
- A webhook handler that validates the payload and triggers downstream logic
- An LLM workflow that processes the changed content
- Re-indexing so your search corpus stays current
All four of these components can be built and operational in under an hour.
npm install @knowledgesdk/node
pip install knowledgesdk