docs/API Reference/Jobs

Jobs

Poll async extraction jobs.

Endpoints

When you use the async extraction endpoint (POST /v1/extract/async), a job is created that runs in the background. Use the jobs endpoint to check the status of your extraction, track progress, and retrieve the result once it completes.

Get Job Status

GET/v1/jobs/{jobId}x-api-key

Retrieve the current status and result of an async extraction job. The jobId is returned when you start an async extraction.

Path Parameters

jobIdstringrequired

The job identifier returned by POST /v1/extract/async. Job IDs have the format job_*.

Response

idstring

The unique job identifier.

statusstring

Current job status. One of: pending, processing, completed, failed.

progressobject

Progress information for the extraction. Contains pagesScraped (number of pages scraped so far), totalPages (total pages planned), and currentStep (description of the current operation).

resultobject

The full extraction result, available when status is completed. Contains the same fields as the sync extract response: business, knowledgeItems, pagesScraped, urlsDiscovered, durationMs, startedAt, and finishedAt.

errorstring

Error message if the job failed. Only present when status is failed.

startedAtstring

ISO 8601 timestamp of when the job started processing. null if the job is still pending.

finishedAtstring

ISO 8601 timestamp of when the job finished. null if the job has not completed yet.

durationMsnumber

Total job duration in milliseconds. null if the job has not completed yet.

Job Status Lifecycle

terminal
pending  -->  processing  -->  completed
                          -->  failed
StatusDescription
pendingJob is queued and waiting to be picked up by a worker.
processingExtraction is actively running. Check progress for details.
completedExtraction finished successfully. The result field contains the data.
failedExtraction encountered an error. Check the error field for details.

We recommend polling every 5 seconds. For a better user experience, consider using the streaming endpoint (POST /v1/extract/stream) instead, which provides real-time progress updates without polling.

Example: Poll Until Complete

Example Responses

Pending job:

json snippet{}json
{
  "id": "job_abc123def456",
  "status": "pending",
  "progress": null,
  "result": null,
  "error": null,
  "startedAt": null,
  "finishedAt": null,
  "durationMs": null
}

Processing job:

json snippet{}json
{
  "id": "job_abc123def456",
  "status": "processing",
  "progress": {
    "pagesScraped": 4,
    "totalPages": 10,
    "currentStep": "Scraping https://linear.app/features"
  },
  "result": null,
  "error": null,
  "startedAt": "2026-03-20T10:00:00.000Z",
  "finishedAt": null,
  "durationMs": null
}

Completed job:

json snippet{}json
{
  "id": "job_abc123def456",
  "status": "completed",
  "progress": {
    "pagesScraped": 10,
    "totalPages": 10,
    "currentStep": "Complete"
  },
  "result": {
    "business": {
      "name": "Linear",
      "domain": "linear.app",
      "category": "Project Management",
      "description": "Modern project management for software teams."
    },
    "knowledgeItems": [
      {
        "title": "Issue Tracking",
        "description": "Fast, keyboard-first issue tracking.",
        "content": "...",
        "category": "FEATURE",
        "source": "https://linear.app/features"
      }
    ],
    "pagesScraped": 10,
    "urlsDiscovered": 34,
    "durationMs": 72150,
    "startedAt": "2026-03-20T10:00:00.000Z",
    "finishedAt": "2026-03-20T10:01:12.150Z"
  },
  "error": null,
  "startedAt": "2026-03-20T10:00:00.000Z",
  "finishedAt": "2026-03-20T10:01:12.150Z",
  "durationMs": 72150
}

Failed job:

json snippet{}json
{
  "id": "job_abc123def456",
  "status": "failed",
  "progress": {
    "pagesScraped": 2,
    "totalPages": 10,
    "currentStep": "Failed"
  },
  "result": null,
  "error": "Failed to scrape URL: Connection timeout after 30s",
  "startedAt": "2026-03-20T10:00:00.000Z",
  "finishedAt": "2026-03-20T10:00:35.000Z",
  "durationMs": 35000
}