knowledgesdk.com/glossary/knowledge-base
RAG & Retrievalbeginner

Also known as: knowledge store, information base

Knowledge Base

A structured or unstructured collection of information that an AI system can query to answer questions or complete tasks.

What Is a Knowledge Base?

A knowledge base is a collection of information — documents, web pages, FAQs, API references, support tickets, product specs — that an AI system can search and retrieve from to answer questions or complete tasks.

In the context of RAG pipelines, a knowledge base is the external memory store that grounds LLM responses in factual, up-to-date, domain-specific content.

Knowledge Base vs Database vs Vector Store

These terms are related but distinct:

Term Meaning
Knowledge base The logical collection of information (the "what")
Vector database A technical storage layer optimized for similarity search (the "how")
Document store A storage layer for raw text/files (e.g., S3, MongoDB)

A knowledge base is often implemented using a combination of a document store (raw content) and a vector database (embeddings for search).

Types of Knowledge Bases

Unstructured

Free-form text: documentation pages, blog posts, support articles, PDFs, email threads. Most knowledge bases in RAG systems are unstructured.

Structured

Tables, databases, CSVs. Queried with SQL or structured APIs. Less common in RAG but increasingly supported (e.g., text-to-SQL pipelines).

Hybrid

A combination — structured product data enriched with unstructured descriptions and FAQs.

What Goes Into a Good AI Knowledge Base

  • Coverage — all information a user might ask about should be present
  • Freshness — content should be updated when source documents change
  • Granularity — chunks should be appropriately sized (not entire pages, not single sentences)
  • Metadata — each item should carry source URL, category, and timestamp for filtering and attribution
  • Deduplication — duplicate or near-duplicate content wastes index space and dilutes retrieval

Building a Knowledge Base with KnowledgeSDK

KnowledgeSDK is designed to make knowledge base construction trivial. Call POST /v1/extract with any URL, and KnowledgeSDK scrapes the page, extracts structured content, chunks it, embeds it, and stores it in your dedicated knowledge base — all in one API call:

# Index your documentation
curl -X POST https://api.knowledgesdk.com/v1/extract \
  -H "x-api-key: knowledgesdk_live_..." \
  -d '{"url": "https://docs.yourapp.com/getting-started"}'

# Index a product page
curl -X POST https://api.knowledgesdk.com/v1/extract \
  -H "x-api-key: knowledgesdk_live_..." \
  -d '{"url": "https://yourapp.com/pricing"}'

Once indexed, your knowledge base is immediately searchable:

curl -X POST https://api.knowledgesdk.com/v1/search \
  -H "x-api-key: knowledgesdk_live_..." \
  -d '{"query": "what plan includes API access?"}'

Knowledge Base Maintenance

A knowledge base is not a one-time build — it requires ongoing maintenance:

  • Re-index on content change — re-run POST /v1/extract when a source page is updated
  • Prune stale items — remove knowledge items whose source pages no longer exist
  • Monitor retrieval quality — log queries and retrieved chunks, identify gaps
  • Expand coverage — add new sources as your product grows

Knowledge Base Architecture Patterns

Sources                     Knowledge Base              LLM
─────────────────────────   ─────────────────────────   ───────
Docs site  → extract →      Vector index (chunks)    → search → context → response
Support KB → extract →      BM25 index (terms)
Product pages → extract →   Metadata store (source, date)

Each API key in KnowledgeSDK gets its own isolated knowledge base — your data is never mixed with other users' collections.

Related Terms

RAG & Retrievalbeginner
Retrieval-Augmented Generation
A technique that grounds LLM responses by retrieving relevant documents from an external knowledge base before generation.
RAG & Retrievalbeginner
Vector Database
A specialized database that stores high-dimensional embedding vectors and enables fast similarity search.
Knowledge & Memorybeginner
Knowledge Item
A single indexed unit of knowledge — a document chunk with title, content, category, and embedding — stored in KnowledgeSDK's search index.
RAG & Retrievalbeginner
Indexing
The process of transforming raw content into a searchable structure — embeddings, inverted indexes, or graph nodes — that enables fast retrieval.
JSON SchemaKnowledge Extraction

Try it now

Build with Knowledge Base using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →
← Back to glossary