Webhooks
docs/Webhooks/Webhooks

Webhooks

Receive real-time notifications for extraction events.

Webhooks

Webhooks let you receive real-time HTTP notifications when events happen in your KnowledgeSDK account. Instead of polling for job status, register a webhook URL and KnowledgeSDK will POST event payloads to your server as they occur.

Events

KnowledgeSDK dispatches the following webhook events:

EventDescription
EXTRACTION_STARTEDAn extraction job has started processing.
PAGE_SCRAPEDA single page has been scraped during an extraction. Fired once per page.
PAGE_UPDATEDA previously scraped page has been re-scraped with updated content.
EXTRACTION_COMPLETEDAn extraction job has finished successfully. Includes the full result.
EXTRACTION_FAILEDAn extraction job has failed. Includes the error message.
TESTING_CONNECTIONA test event sent when you verify your webhook is working.

Payload structure

Every webhook delivery is a POST request with a JSON body using this structure:

json snippet{}json
{
  "event": "EXTRACTION_COMPLETED",
  "payload": { ... },
  "timestamp": "2026-03-20T14:30:00.000Z",
  "version": 1
}
FieldTypeDescription
eventstringThe event type (see table above).
payloadobjectEvent-specific data (see payloads below).
timestampstringISO 8601 timestamp of when the event occurred.
versionnumberPayload schema version. Currently always 1.

Headers

KnowledgeSDK includes two custom headers with every webhook delivery:

HeaderDescription
x-knowledgesdk-webhook-tokenThe verification token you set when creating the webhook. Use this to verify the request is authentic.
x-knowledgesdk-dedup-keyA unique message ID for deduplication. The same event may be delivered more than once -- use this key to detect and ignore duplicates.

Event payloads

EXTRACTION_STARTED

PAGE_SCRAPED

EXTRACTION_COMPLETED

EXTRACTION_FAILED

Setting up webhooks

1
Create a webhook endpoint

Register a webhook URL using the API or an SDK. You choose which events to subscribe to and provide a verification token.

2
Implement your handler

Create an HTTP endpoint on your server that accepts POST requests, verifies the token, and processes the event.

typescript snippetTStypescript
// Express.js example
app.post("/webhooks/knowledgesdk", (req, res) => {
  // Verify the webhook token
  const token = req.headers["x-knowledgesdk-webhook-token"];
  if (token !== process.env.WEBHOOK_SECRET) {
    return res.status(401).send("Unauthorized");
  }

  // Deduplicate using the message ID
  const dedupKey = req.headers["x-knowledgesdk-dedup-key"];

  const { event, payload, timestamp } = req.body;

  switch (event) {
    case "EXTRACTION_COMPLETED":
      console.log(`Extraction complete for ${payload.url}`);
      console.log(`${payload.result.knowledgeItems.length} items extracted`);
      break;
    case "EXTRACTION_FAILED":
      console.error(`Extraction failed: ${payload.error}`);
      break;
  }

  // Return 2xx to acknowledge receipt
  res.status(200).send("OK");
});
3
Test the webhook

Send a test event to verify your endpoint is receiving and processing webhooks correctly.

Your endpoint will receive a TESTING_CONNECTION event:

json snippet{}json
{
  "event": "TESTING_CONNECTION",
  "payload": {
    "message": "Test webhook from KnowledgeSDK"
  },
  "timestamp": "2026-03-20T14:30:00.000Z",
  "version": 1
}

Retry behavior

KnowledgeSDK retries failed webhook deliveries up to 5 times with exponential backoff. A delivery is considered failed if:

  • Your endpoint does not return a 2xx status code
  • Your endpoint does not respond within 6 seconds
  • The connection cannot be established

If all 5 retries fail, the webhook is automatically paused with the reason TOO_MANY_ERRORS. You will need to fix the issue and re-activate the webhook. Check the status field when listing your webhooks.

Managing webhooks

List all webhooks

terminal>_bash
curl https://api.knowledgesdk.com/v1/webhooks \
  -H "x-api-key: sk_ks_your_api_key"

Delete a webhook

terminal>_bash
curl -X DELETE https://api.knowledgesdk.com/v1/webhooks/{webhookId} \
  -H "x-api-key: sk_ks_your_api_key"

Best practices

Follow these practices to build reliable webhook integrations:

  1. Always verify the token. Check the x-knowledgesdk-webhook-token header against the token you set when creating the webhook.

  2. Implement idempotency. Use the x-knowledgesdk-dedup-key header to detect and ignore duplicate deliveries. Store processed message IDs for at least 24 hours.

  3. Respond quickly. Return a 2xx response within 6 seconds. If you need to do heavy processing, acknowledge the webhook immediately and process the payload asynchronously (e.g., push to a queue).

  4. Handle all subscribed events. Even if you only care about EXTRACTION_COMPLETED, implement graceful handling for other events you have subscribed to.

  5. Monitor webhook health. Periodically check your webhook status via the list endpoint. If a webhook is paused due to errors, investigate and fix the issue before re-creating it.

Never expose your webhook verification token in client-side code. Store it as a server-side environment variable and validate it on every incoming request.