Background Job

An asynchronous task that runs independently of the main request-response cycle, allowing long-running operations like web extraction to run without blocking.

What Is a Background Job?

A background job is a unit of work that executes outside the HTTP request-response cycle. Instead of making the client wait while the server processes a long-running task, the server accepts the request immediately, queues the work, and returns a response right away. A separate worker process picks up the job and runs it asynchronously.

Background jobs are essential for any operation that takes more than a few seconds — database migrations, email delivery, AI model inference, and web scraping all benefit from being offloaded to background workers.

The Problem They Solve

HTTP is a synchronous protocol with finite timeouts. Browsers, load balancers, and API gateways typically time out connections after 30–120 seconds. Web extraction — fetching a page, rendering JavaScript, chunking content, generating embeddings — can take 1–3 minutes for a complex site.

Without background jobs, you face an unpleasant choice:

Block the HTTP connection and risk timeout errors for legitimate, successful operations.
Return partial results and miss content that takes longer to load.
Use background jobs and return a job reference immediately, letting the work complete without time pressure.

KnowledgeSDK uses background jobs for all async extraction workflows.

How KnowledgeSDK Uses Background Jobs

When you call POST /v1/extract/async:

The API server validates your API key (knowledgesdk_live_*), records the job in the jobs table with status: "pending", and returns a jobId within milliseconds.
An Inngest worker picks up the job from the queue.
The worker performs the full extraction pipeline: scraping, AI processing, embedding generation, and indexing into your Typesense collection.
When done, the worker updates the job record to status: "completed" and POSTs the result to your callbackUrl.

You can also poll GET /v1/jobs/{jobId} at any time to check the current status.

Job Lifecycle States

pending → processing → completed
                    ↘ failed

pending — queued, waiting for a worker to pick it up.
processing — a worker is actively running the job.
completed — work finished successfully; result is available.
failed — an error occurred; check the error field for details.

Polling vs. Webhooks

Once you have a jobId, you have two options for receiving the result:

Pattern	How	Best When
Polling	Repeatedly call `GET /v1/jobs/{jobId}`	Simple scripts, CLI tools
Webhook	Receive a POST to `callbackUrl`	Production web applications

Webhooks are more efficient — your server is not making wasted network calls — but require a publicly reachable endpoint. During local development, tools like ngrok or Cloudflare Tunnel can expose your localhost.

Best Practices

Store job IDs persistently. If your application restarts, you need to recover job references to reconcile pending work.
Set reasonable timeouts. Treat a job that has been processing for more than 10 minutes as likely failed and alert accordingly.
Design for retries. Workers may re-attempt failed jobs. Ensure your webhook handler is idempotent using the jobId as a deduplication key.
Log job events. Recording state transitions (when a job moves from pending to processing to completed) provides invaluable debugging information.

Related Terms

Infrastructure & DevOpsintermediate

Async API

An API design pattern where long-running operations return a job ID immediately and deliver results via polling or webhook when complete.

Infrastructure & DevOpsbeginner

Webhook

An HTTP callback that sends real-time event notifications from a server to a client-specified URL when something happens.

Infrastructure & DevOpsintermediate

Idempotency

The property of an API operation where making the same request multiple times produces the same result as making it once.

← Autonomous Agent BM25 →

Try it now

Build with Background Job using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →

← Back to glossary