What Is Dense Retrieval?
Dense retrieval is a family of information retrieval techniques where both queries and documents are encoded as dense, continuous vectors (embeddings) by a neural network. Retrieval is performed by finding the document vectors nearest to the query vector in the embedding space.
The term "dense" contrasts with "sparse" retrieval (like BM25), where documents are represented as high-dimensional but mostly-zero term-frequency vectors.
Origins: Dense Passage Retrieval (DPR)
Dense retrieval was popularized by the Dense Passage Retrieval (DPR) paper from Facebook AI (2020). DPR used two separate BERT encoders — one for queries, one for passages — trained on question-answer pairs so that relevant passages would land close to their questions in vector space.
This approach dramatically outperformed BM25 on open-domain question answering benchmarks.
How Dense Retrieval Works
Query: "what is the refund window?"
↓ Query Encoder (bi-encoder)
Query Vector: [0.23, -0.71, 0.44, ...] (768 dims)
Indexed Passages:
"You have 30 days to request a refund." → [0.25, -0.69, 0.41, ...]
"Contact support for billing issues." → [-0.12, 0.33, -0.55, ...]
↓ ANN Search
Top result: "You have 30 days to request a refund."
Bi-Encoder Architecture
Dense retrieval uses a bi-encoder (also called a dual encoder):
- The query encoder and document encoder share weights or are trained jointly
- Encoding is done independently, so document vectors can be pre-computed offline
- At query time, only the query needs to be encoded (fast)
- Similarity is computed as dot product or cosine between the two vectors
This offline pre-computation is what makes dense retrieval practical at scale.
Training Dense Retrievers
A dense retriever needs training (or fine-tuning) to be effective. Common training signals:
- Positive pairs — (question, relevant passage) from QA datasets
- Hard negatives — passages that look relevant but are not (improves discriminability)
- In-batch negatives — other questions' passages in the same training batch serve as negatives (computationally efficient)
Pre-trained models like sentence-transformers/all-mpnet-base-v2 or OpenAI's embedding models can be used without task-specific fine-tuning for general-purpose retrieval.
Dense vs Sparse Retrieval
| Dense | Sparse | |
|---|---|---|
| Representation | Continuous float vector | Sparse term-frequency vector |
| Index type | HNSW / IVF | Inverted index |
| Handles synonyms | Yes | No |
| Handles exact terms | Poorly | Perfectly |
| Requires training data | Yes (for best results) | No |
| Query latency | ~5–20ms (ANN) | ~1–5ms |
Dense Retrieval Limitations
- Vocabulary mismatch is solved but exact-term recall suffers — product codes, version numbers, and proper nouns may not retrieve well
- Requires a good embedding model — quality degrades significantly with weak encoders
- Index must be rebuilt when documents change — no dynamic term statistics like BM25
For these reasons, dense retrieval is almost always combined with sparse retrieval in production (see hybrid search).
Dense Retrieval with KnowledgeSDK
KnowledgeSDK uses dense retrieval as one component of its hybrid search pipeline. When you call POST /v1/extract, each chunk is encoded and stored in a Typesense vector field. The POST /v1/search endpoint automatically performs dense retrieval using the query embedding alongside BM25 scoring.