What Is a Knowledge Base?
A knowledge base is a collection of information — documents, web pages, FAQs, API references, support tickets, product specs — that an AI system can search and retrieve from to answer questions or complete tasks.
In the context of RAG pipelines, a knowledge base is the external memory store that grounds LLM responses in factual, up-to-date, domain-specific content.
Knowledge Base vs Database vs Vector Store
These terms are related but distinct:
| Term | Meaning |
|---|---|
| Knowledge base | The logical collection of information (the "what") |
| Vector database | A technical storage layer optimized for similarity search (the "how") |
| Document store | A storage layer for raw text/files (e.g., S3, MongoDB) |
A knowledge base is often implemented using a combination of a document store (raw content) and a vector database (embeddings for search).
Types of Knowledge Bases
Unstructured
Free-form text: documentation pages, blog posts, support articles, PDFs, email threads. Most knowledge bases in RAG systems are unstructured.
Structured
Tables, databases, CSVs. Queried with SQL or structured APIs. Less common in RAG but increasingly supported (e.g., text-to-SQL pipelines).
Hybrid
A combination — structured product data enriched with unstructured descriptions and FAQs.
What Goes Into a Good AI Knowledge Base
- Coverage — all information a user might ask about should be present
- Freshness — content should be updated when source documents change
- Granularity — chunks should be appropriately sized (not entire pages, not single sentences)
- Metadata — each item should carry source URL, category, and timestamp for filtering and attribution
- Deduplication — duplicate or near-duplicate content wastes index space and dilutes retrieval
Building a Knowledge Base with KnowledgeSDK
KnowledgeSDK is designed to make knowledge base construction trivial. Call POST /v1/extract with any URL, and KnowledgeSDK scrapes the page, extracts structured content, chunks it, embeds it, and stores it in your dedicated knowledge base — all in one API call:
# Index your documentation
curl -X POST https://api.knowledgesdk.com/v1/extract \
-H "x-api-key: knowledgesdk_live_..." \
-d '{"url": "https://docs.yourapp.com/getting-started"}'
# Index a product page
curl -X POST https://api.knowledgesdk.com/v1/extract \
-H "x-api-key: knowledgesdk_live_..." \
-d '{"url": "https://yourapp.com/pricing"}'
Once indexed, your knowledge base is immediately searchable:
curl -X POST https://api.knowledgesdk.com/v1/search \
-H "x-api-key: knowledgesdk_live_..." \
-d '{"query": "what plan includes API access?"}'
Knowledge Base Maintenance
A knowledge base is not a one-time build — it requires ongoing maintenance:
- Re-index on content change — re-run
POST /v1/extractwhen a source page is updated - Prune stale items — remove knowledge items whose source pages no longer exist
- Monitor retrieval quality — log queries and retrieved chunks, identify gaps
- Expand coverage — add new sources as your product grows
Knowledge Base Architecture Patterns
Sources Knowledge Base LLM
───────────────────────── ───────────────────────── ───────
Docs site → extract → Vector index (chunks) → search → context → response
Support KB → extract → BM25 index (terms)
Product pages → extract → Metadata store (source, date)
Each API key in KnowledgeSDK gets its own isolated knowledge base — your data is never mixed with other users' collections.