knowledgesdk.com/glossary/document-store
Knowledge & Memorybeginner

Also known as: doc store, document database

Document Store

A database that stores semi-structured or unstructured documents (JSON, markdown, text) and supports retrieval by ID or metadata filters.

What Is a Document Store?

A document store (or document database) is a type of NoSQL database that stores data as self-describing documents — typically JSON, BSON, or XML — rather than as rows in fixed-schema tables. Each document can have a different structure, making document stores ideal for heterogeneous data like articles, product descriptions, knowledge items, or configuration records.

In AI and RAG pipelines, document stores play a dual role: as the primary storage backend for raw content before it is embedded, and as a retrieval layer that supports filtering by metadata (author, date, category, source URL) to narrow down candidates before or after vector search.

How Document Stores Work

Documents are stored as key-value collections where the key is a unique ID and the value is the document body (a JSON object or text blob). Retrieval works in several ways:

  • By ID: Fetch a specific document directly — GET /documents/{id}.
  • By query: Filter documents using field-based predicates — { category: "legal", date: { $gte: "2024-01-01" } }.
  • By full-text search: Many document stores include full-text indexing (inverted index) for keyword matching.
  • By vector similarity (hybrid stores): Modern document databases like MongoDB Atlas and Elasticsearch support storing embeddings alongside documents and running ANN queries.

Popular Document Stores

  • MongoDB: The most widely used general-purpose document store. Supports rich queries, aggregation pipelines, and Atlas Vector Search.
  • Elasticsearch / OpenSearch: Full-text search optimized, widely used for log analysis and knowledge retrieval. Now supports vector search.
  • Firestore: Google's managed document store, popular in mobile and web apps.
  • CouchDB: Open-source, designed for offline-first and replication scenarios.
  • DynamoDB: AWS's managed key-value and document store, optimized for high throughput at scale.

Document Stores vs. Vector Databases

Feature Document Store Vector Database
Primary index Metadata / full-text Embedding vectors
Query type Exact / range / keyword Approximate nearest neighbor
Filtering Rich metadata filters Limited (varies by system)
Best for Structured retrieval, CRUD Semantic similarity search
Examples MongoDB, Elasticsearch Pinecone, Weaviate, Qdrant

In production RAG systems, document stores and vector databases are often used together: the vector database finds semantically similar candidates, and the document store stores and serves the full content of those candidates.

Document Stores in Knowledge Pipelines

A typical knowledge pipeline might use a document store as follows:

  1. Raw content is scraped and parsed into structured documents with fields: id, title, content, source_url, category, extracted_at.
  2. Documents are stored in MongoDB or a similar system.
  3. Documents are also embedded and indexed in a vector store.
  4. At query time, vector search finds candidate document IDs, which are then fetched from the document store to get full content.

KnowledgeSDK abstracts this pattern — the /v1/search endpoint handles semantic retrieval, while the underlying knowledge item store persists the structured document data, so you do not need to manage the storage layer separately.

When to Choose a Document Store

  • Your data is heterogeneous in structure.
  • You need filtering by metadata fields alongside text search.
  • You want the simplicity of storing and retrieving records without a strict schema.
  • You are building a knowledge base that will be updated frequently with new documents.

Related Terms

RAG & Retrievalbeginner
Knowledge Base
A structured or unstructured collection of information that an AI system can query to answer questions or complete tasks.
RAG & Retrievalbeginner
Vector Database
A specialized database that stores high-dimensional embedding vectors and enables fast similarity search.
RAG & Retrievalbeginner
Indexing
The process of transforming raw content into a searchable structure — embeddings, inverted indexes, or graph nodes — that enables fast retrieval.
Dense RetrievalDOM Parsing

Try it now

Build with Document Store using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →
← Back to glossary