Knowledge Item

A single indexed unit of knowledge — a document chunk with title, content, category, and embedding — stored in KnowledgeSDK's search index.

What Is a Knowledge Item?

A knowledge item is the fundamental unit of storage in KnowledgeSDK's search index. It represents a single piece of knowledge — a document, a page extract, a fact summary, or any chunk of content — that has been processed, embedded, and indexed for semantic retrieval.

Every knowledge item has a title, full-text content, a category label, and a 1536-dimensional vector embedding. When you call /v1/extract or /v1/scrape on a URL, KnowledgeSDK can automatically create knowledge items from the extracted content and add them to your search index, scoped to your API key.

Anatomy of a Knowledge Item

{
  "id": "ki_01j9xkz4m",
  "title": "KnowledgeSDK Pricing Plans",
  "content": "KnowledgeSDK offers three plans: Usage (pay-as-you-go), Starter ($29/mo), and Pro ($99/mo). Each plan includes...",
  "category": "Pricing",
  "sourceUrl": "https://knowledgesdk.com/pricing",
  "extractedAt": "2025-03-20T14:32:00Z",
  "embedding": [0.023, -0.187, ...] // 1536 dimensions
}

The embedding is generated automatically from the content using a text embedding model and is used to power semantic similarity search.

How Knowledge Items Are Created

Via the Extract Endpoint

The most common way to create knowledge items is through the extraction pipeline:

curl -X POST https://api.knowledgesdk.com/v1/extract \
  -H "x-api-key: knowledgesdk_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com/docs/getting-started" }'

The API scrapes the URL, extracts structured knowledge (title, summary, key facts, category), creates a knowledge item, and indexes it — all in one call.

Via Async Extraction (for batch workloads)

For indexing large numbers of pages without blocking:

curl -X POST https://api.knowledgesdk.com/v1/extract/async \
  -H "x-api-key: knowledgesdk_live_your_key" \
  -d '{
    "url": "https://example.com/docs/api-reference",
    "callbackUrl": "https://yourapp.com/webhooks/knowledge"
  }'

Returns a jobId immediately. When extraction completes, your callbackUrl receives the knowledge item payload.

Searching Knowledge Items

Once items are indexed, they are retrievable via semantic search:

curl -X POST https://api.knowledgesdk.com/v1/search \
  -H "x-api-key: knowledgesdk_live_your_key" \
  -d '{
    "query": "What are the API rate limits?",
    "limit": 5
  }'

KnowledgeSDK runs a hybrid search — combining vector similarity (semantic intent) with keyword matching — across all knowledge items in your index and returns the most relevant ones. These items can then be injected as context into your LLM prompt.

Knowledge Items in a RAG Pipeline

Knowledge items are the retrieval layer of a RAG system:

Index phase: Pages, documents, and extracted facts are stored as knowledge items.
Query phase: A user query triggers a search that returns the top-K knowledge items.
Generation phase: The retrieved items' content fields are assembled into the LLM's context prompt.
Response: The LLM generates an answer grounded in the retrieved knowledge items.

Knowledge Items vs. Raw Chunks

Unlike raw document chunks produced by naive text splitters, knowledge items in KnowledgeSDK are:

Semantically coherent: Extracted at the page or section level with meaningful titles and categories.
Pre-categorized: The category field enables scoped searches and filtered retrieval.
Source-attributed: The sourceUrl and extractedAt fields allow attribution and freshness checks.
Immediately searchable: No separate embedding or indexing step required — it is handled by the API.

This makes knowledge items a higher-quality retrieval primitive than unstructured text chunks, leading to better precision and recall in downstream RAG applications.