Knowledge Base

A structured or unstructured collection of information that an AI system can query to answer questions or complete tasks.

What Is a Knowledge Base?

A knowledge base is a collection of information — documents, web pages, FAQs, API references, support tickets, product specs — that an AI system can search and retrieve from to answer questions or complete tasks.

In the context of RAG pipelines, a knowledge base is the external memory store that grounds LLM responses in factual, up-to-date, domain-specific content.

Knowledge Base vs Database vs Vector Store

These terms are related but distinct:

Term	Meaning
Knowledge base	The logical collection of information (the "what")
Vector database	A technical storage layer optimized for similarity search (the "how")
Document store	A storage layer for raw text/files (e.g., S3, MongoDB)

A knowledge base is often implemented using a combination of a document store (raw content) and a vector database (embeddings for search).

Types of Knowledge Bases

Unstructured

Free-form text: documentation pages, blog posts, support articles, PDFs, email threads. Most knowledge bases in RAG systems are unstructured.

Structured

Tables, databases, CSVs. Queried with SQL or structured APIs. Less common in RAG but increasingly supported (e.g., text-to-SQL pipelines).

Hybrid

A combination — structured product data enriched with unstructured descriptions and FAQs.

What Goes Into a Good AI Knowledge Base

Coverage — all information a user might ask about should be present
Freshness — content should be updated when source documents change
Granularity — chunks should be appropriately sized (not entire pages, not single sentences)
Metadata — each item should carry source URL, category, and timestamp for filtering and attribution
Deduplication — duplicate or near-duplicate content wastes index space and dilutes retrieval

Building a Knowledge Base with KnowledgeSDK

KnowledgeSDK is designed to make knowledge base construction trivial. Call POST /v1/extract with any URL, and KnowledgeSDK scrapes the page, extracts structured content, chunks it, embeds it, and stores it in your dedicated knowledge base — all in one API call:

# Index your documentation
curl -X POST https://api.knowledgesdk.com/v1/extract \
  -H "x-api-key: knowledgesdk_live_..." \
  -d '{"url": "https://docs.yourapp.com/getting-started"}'

# Index a product page
curl -X POST https://api.knowledgesdk.com/v1/extract \
  -H "x-api-key: knowledgesdk_live_..." \
  -d '{"url": "https://yourapp.com/pricing"}'

Once indexed, your knowledge base is immediately searchable:

curl -X POST https://api.knowledgesdk.com/v1/search \
  -H "x-api-key: knowledgesdk_live_..." \
  -d '{"query": "what plan includes API access?"}'

Knowledge Base Maintenance

A knowledge base is not a one-time build — it requires ongoing maintenance:

Re-index on content change — re-run POST /v1/extract when a source page is updated
Prune stale items — remove knowledge items whose source pages no longer exist
Monitor retrieval quality — log queries and retrieved chunks, identify gaps
Expand coverage — add new sources as your product grows

Knowledge Base Architecture Patterns

Sources                     Knowledge Base              LLM
─────────────────────────   ─────────────────────────   ───────
Docs site  → extract →      Vector index (chunks)    → search → context → response
Support KB → extract →      BM25 index (terms)
Product pages → extract →   Metadata store (source, date)

Each API key in KnowledgeSDK gets its own isolated knowledge base — your data is never mixed with other users' collections.