Webhooks
Webhooks let you receive real-time HTTP notifications when events happen in your KnowledgeSDK account. Instead of polling for job status, register a webhook URL and KnowledgeSDK will POST event payloads to your server as they occur.
Events
KnowledgeSDK dispatches the following webhook events:
| Event | Description |
|---|---|
EXTRACTION_STARTED | An extraction job has started processing. |
PAGE_SCRAPED | A single page has been scraped during an extraction. Fired once per page. |
PAGE_UPDATED | A previously scraped page has been re-scraped with updated content. |
EXTRACTION_COMPLETED | An extraction job has finished successfully. Includes the full result. |
EXTRACTION_FAILED | An extraction job has failed. Includes the error message. |
TESTING_CONNECTION | A test event sent when you verify your webhook is working. |
Payload structure
Every webhook delivery is a POST request with a JSON body using this structure:
{
"event": "EXTRACTION_COMPLETED",
"payload": { ... },
"timestamp": "2026-03-20T14:30:00.000Z",
"version": 1
}| Field | Type | Description |
|---|---|---|
event | string | The event type (see table above). |
payload | object | Event-specific data (see payloads below). |
timestamp | string | ISO 8601 timestamp of when the event occurred. |
version | number | Payload schema version. Currently always 1. |
Headers
KnowledgeSDK includes two custom headers with every webhook delivery:
| Header | Description |
|---|---|
x-knowledgesdk-webhook-token | The verification token you set when creating the webhook. Use this to verify the request is authentic. |
x-knowledgesdk-dedup-key | A unique message ID for deduplication. The same event may be delivered more than once -- use this key to detect and ignore duplicates. |
Event payloads
EXTRACTION_STARTED
PAGE_SCRAPED
EXTRACTION_COMPLETED
EXTRACTION_FAILED
Setting up webhooks
Register a webhook URL using the API or an SDK. You choose which events to subscribe to and provide a verification token.
Create an HTTP endpoint on your server that accepts POST requests, verifies the token, and processes the event.
// Express.js example
app.post("/webhooks/knowledgesdk", (req, res) => {
// Verify the webhook token
const token = req.headers["x-knowledgesdk-webhook-token"];
if (token !== process.env.WEBHOOK_SECRET) {
return res.status(401).send("Unauthorized");
}
// Deduplicate using the message ID
const dedupKey = req.headers["x-knowledgesdk-dedup-key"];
const { event, payload, timestamp } = req.body;
switch (event) {
case "EXTRACTION_COMPLETED":
console.log(`Extraction complete for ${payload.url}`);
console.log(`${payload.result.knowledgeItems.length} items extracted`);
break;
case "EXTRACTION_FAILED":
console.error(`Extraction failed: ${payload.error}`);
break;
}
// Return 2xx to acknowledge receipt
res.status(200).send("OK");
});Send a test event to verify your endpoint is receiving and processing webhooks correctly.
Your endpoint will receive a TESTING_CONNECTION event:
{
"event": "TESTING_CONNECTION",
"payload": {
"message": "Test webhook from KnowledgeSDK"
},
"timestamp": "2026-03-20T14:30:00.000Z",
"version": 1
}Retry behavior
KnowledgeSDK retries failed webhook deliveries up to 5 times with exponential backoff. A delivery is considered failed if:
- Your endpoint does not return a
2xxstatus code - Your endpoint does not respond within 6 seconds
- The connection cannot be established
If all 5 retries fail, the webhook is automatically paused with the reason TOO_MANY_ERRORS. You will need to fix the issue and re-activate the webhook. Check the status field when listing your webhooks.
Managing webhooks
List all webhooks
curl https://api.knowledgesdk.com/v1/webhooks \
-H "x-api-key: sk_ks_your_api_key"Delete a webhook
curl -X DELETE https://api.knowledgesdk.com/v1/webhooks/{webhookId} \
-H "x-api-key: sk_ks_your_api_key"Best practices
Follow these practices to build reliable webhook integrations:
-
Always verify the token. Check the
x-knowledgesdk-webhook-tokenheader against the token you set when creating the webhook. -
Implement idempotency. Use the
x-knowledgesdk-dedup-keyheader to detect and ignore duplicate deliveries. Store processed message IDs for at least 24 hours. -
Respond quickly. Return a
2xxresponse within 6 seconds. If you need to do heavy processing, acknowledge the webhook immediately and process the payload asynchronously (e.g., push to a queue). -
Handle all subscribed events. Even if you only care about
EXTRACTION_COMPLETED, implement graceful handling for other events you have subscribed to. -
Monitor webhook health. Periodically check your webhook status via the list endpoint. If a webhook is paused due to errors, investigate and fix the issue before re-creating it.
Never expose your webhook verification token in client-side code. Store it as a server-side environment variable and validate it on every incoming request.