Jobs
Poll async extraction jobs.
When you use the async extraction endpoint (POST /v1/extract/async), a job is created that runs in the background. Use the jobs endpoint to check the status of your extraction, track progress, and retrieve the result once it completes.
Get Job Status
/v1/jobs/{jobId}x-api-keyRetrieve the current status and result of an async extraction job. The jobId is returned when you start an async extraction.
Path Parameters
jobIdstringrequiredThe job identifier returned by POST /v1/extract/async. Job IDs have the format job_*.
Response
idstringThe unique job identifier.
statusstringCurrent job status. One of: pending, processing, completed, failed.
progressobjectProgress information for the extraction. Contains pagesScraped (number of pages scraped so far), totalPages (total pages planned), and currentStep (description of the current operation).
resultobjectThe full extraction result, available when status is completed. Contains the same fields as the sync extract response: business, knowledgeItems, pagesScraped, urlsDiscovered, durationMs, startedAt, and finishedAt.
errorstringError message if the job failed. Only present when status is failed.
startedAtstringISO 8601 timestamp of when the job started processing. null if the job is still pending.
finishedAtstringISO 8601 timestamp of when the job finished. null if the job has not completed yet.
durationMsnumberTotal job duration in milliseconds. null if the job has not completed yet.
Job Status Lifecycle
pending --> processing --> completed
--> failed
| Status | Description |
|---|---|
pending | Job is queued and waiting to be picked up by a worker. |
processing | Extraction is actively running. Check progress for details. |
completed | Extraction finished successfully. The result field contains the data. |
failed | Extraction encountered an error. Check the error field for details. |
We recommend polling every 5 seconds. For a better user experience, consider using the streaming endpoint (POST /v1/extract/stream) instead, which provides real-time progress updates without polling.
Example: Poll Until Complete
Example Responses
Pending job:
{
"id": "job_abc123def456",
"status": "pending",
"progress": null,
"result": null,
"error": null,
"startedAt": null,
"finishedAt": null,
"durationMs": null
}Processing job:
{
"id": "job_abc123def456",
"status": "processing",
"progress": {
"pagesScraped": 4,
"totalPages": 10,
"currentStep": "Scraping https://linear.app/features"
},
"result": null,
"error": null,
"startedAt": "2026-03-20T10:00:00.000Z",
"finishedAt": null,
"durationMs": null
}Completed job:
{
"id": "job_abc123def456",
"status": "completed",
"progress": {
"pagesScraped": 10,
"totalPages": 10,
"currentStep": "Complete"
},
"result": {
"business": {
"name": "Linear",
"domain": "linear.app",
"category": "Project Management",
"description": "Modern project management for software teams."
},
"knowledgeItems": [
{
"title": "Issue Tracking",
"description": "Fast, keyboard-first issue tracking.",
"content": "...",
"category": "FEATURE",
"source": "https://linear.app/features"
}
],
"pagesScraped": 10,
"urlsDiscovered": 34,
"durationMs": 72150,
"startedAt": "2026-03-20T10:00:00.000Z",
"finishedAt": "2026-03-20T10:01:12.150Z"
},
"error": null,
"startedAt": "2026-03-20T10:00:00.000Z",
"finishedAt": "2026-03-20T10:01:12.150Z",
"durationMs": 72150
}Failed job:
{
"id": "job_abc123def456",
"status": "failed",
"progress": {
"pagesScraped": 2,
"totalPages": 10,
"currentStep": "Failed"
},
"result": null,
"error": "Failed to scrape URL: Connection timeout after 30s",
"startedAt": "2026-03-20T10:00:00.000Z",
"finishedAt": "2026-03-20T10:00:35.000Z",
"durationMs": 35000
}