Dense Retrieval

A retrieval method that represents both queries and documents as dense vectors and finds matches via nearest-neighbor search.

What Is Dense Retrieval?

Dense retrieval is a family of information retrieval techniques where both queries and documents are encoded as dense, continuous vectors (embeddings) by a neural network. Retrieval is performed by finding the document vectors nearest to the query vector in the embedding space.

The term "dense" contrasts with "sparse" retrieval (like BM25), where documents are represented as high-dimensional but mostly-zero term-frequency vectors.

Origins: Dense Passage Retrieval (DPR)

Dense retrieval was popularized by the Dense Passage Retrieval (DPR) paper from Facebook AI (2020). DPR used two separate BERT encoders — one for queries, one for passages — trained on question-answer pairs so that relevant passages would land close to their questions in vector space.

This approach dramatically outperformed BM25 on open-domain question answering benchmarks.

How Dense Retrieval Works

Query: "what is the refund window?"
          ↓ Query Encoder (bi-encoder)
Query Vector: [0.23, -0.71, 0.44, ...]   (768 dims)

Indexed Passages:
  "You have 30 days to request a refund." → [0.25, -0.69, 0.41, ...]
  "Contact support for billing issues."   → [-0.12, 0.33, -0.55, ...]
          ↓ ANN Search
Top result: "You have 30 days to request a refund."

Bi-Encoder Architecture

Dense retrieval uses a bi-encoder (also called a dual encoder):

The query encoder and document encoder share weights or are trained jointly
Encoding is done independently, so document vectors can be pre-computed offline
At query time, only the query needs to be encoded (fast)
Similarity is computed as dot product or cosine between the two vectors

This offline pre-computation is what makes dense retrieval practical at scale.

Training Dense Retrievers

A dense retriever needs training (or fine-tuning) to be effective. Common training signals:

Positive pairs — (question, relevant passage) from QA datasets
Hard negatives — passages that look relevant but are not (improves discriminability)
In-batch negatives — other questions' passages in the same training batch serve as negatives (computationally efficient)

Pre-trained models like sentence-transformers/all-mpnet-base-v2 or OpenAI's embedding models can be used without task-specific fine-tuning for general-purpose retrieval.

Dense vs Sparse Retrieval

	Dense	Sparse
Representation	Continuous float vector	Sparse term-frequency vector
Index type	HNSW / IVF	Inverted index
Handles synonyms	Yes	No
Handles exact terms	Poorly	Perfectly
Requires training data	Yes (for best results)	No
Query latency	~5–20ms (ANN)	~1–5ms

Dense Retrieval Limitations

Vocabulary mismatch is solved but exact-term recall suffers — product codes, version numbers, and proper nouns may not retrieve well
Requires a good embedding model — quality degrades significantly with weak encoders
Index must be rebuilt when documents change — no dynamic term statistics like BM25

For these reasons, dense retrieval is almost always combined with sparse retrieval in production (see hybrid search).

Dense Retrieval with KnowledgeSDK

KnowledgeSDK uses dense retrieval as one component of its hybrid search pipeline. When you call POST /v1/extract, each chunk is encoded and stored in a Typesense vector field. The POST /v1/search endpoint automatically performs dense retrieval using the query embedding alongside BM25 scoring.