What Is a Background Job?
A background job is a unit of work that executes outside the HTTP request-response cycle. Instead of making the client wait while the server processes a long-running task, the server accepts the request immediately, queues the work, and returns a response right away. A separate worker process picks up the job and runs it asynchronously.
Background jobs are essential for any operation that takes more than a few seconds — database migrations, email delivery, AI model inference, and web scraping all benefit from being offloaded to background workers.
The Problem They Solve
HTTP is a synchronous protocol with finite timeouts. Browsers, load balancers, and API gateways typically time out connections after 30–120 seconds. Web extraction — fetching a page, rendering JavaScript, chunking content, generating embeddings — can take 1–3 minutes for a complex site.
Without background jobs, you face an unpleasant choice:
- Block the HTTP connection and risk timeout errors for legitimate, successful operations.
- Return partial results and miss content that takes longer to load.
- Use background jobs and return a job reference immediately, letting the work complete without time pressure.
KnowledgeSDK uses background jobs for all async extraction workflows.
How KnowledgeSDK Uses Background Jobs
When you call POST /v1/extract/async:
- The API server validates your API key (
knowledgesdk_live_*), records the job in thejobstable withstatus: "pending", and returns ajobIdwithin milliseconds. - An Inngest worker picks up the job from the queue.
- The worker performs the full extraction pipeline: scraping, AI processing, embedding generation, and indexing into your Typesense collection.
- When done, the worker updates the job record to
status: "completed"and POSTs the result to yourcallbackUrl.
You can also poll GET /v1/jobs/{jobId} at any time to check the current status.
Job Lifecycle States
pending → processing → completed
↘ failed
- pending — queued, waiting for a worker to pick it up.
- processing — a worker is actively running the job.
- completed — work finished successfully; result is available.
- failed — an error occurred; check the
errorfield for details.
Polling vs. Webhooks
Once you have a jobId, you have two options for receiving the result:
| Pattern | How | Best When |
|---|---|---|
| Polling | Repeatedly call GET /v1/jobs/{jobId} |
Simple scripts, CLI tools |
| Webhook | Receive a POST to callbackUrl |
Production web applications |
Webhooks are more efficient — your server is not making wasted network calls — but require a publicly reachable endpoint. During local development, tools like ngrok or Cloudflare Tunnel can expose your localhost.
Best Practices
- Store job IDs persistently. If your application restarts, you need to recover job references to reconcile pending work.
- Set reasonable timeouts. Treat a job that has been
processingfor more than 10 minutes as likely failed and alert accordingly. - Design for retries. Workers may re-attempt failed jobs. Ensure your webhook handler is idempotent using the
jobIdas a deduplication key. - Log job events. Recording state transitions (when a job moves from
pendingtoprocessingtocompleted) provides invaluable debugging information.