Recall

An information retrieval metric measuring the fraction of all relevant documents that were successfully retrieved.

What Is Recall?

Recall is one of the two fundamental metrics in information retrieval, alongside precision. It measures the completeness of a retrieval system: out of all the documents in the corpus that are actually relevant to a query, what fraction did the system successfully return?

Formula:

Recall = True Positives / (True Positives + False Negatives)
       = Retrieved Relevant / All Relevant

A recall of 1.0 means the system returned every relevant document. A recall of 0.5 means it missed half of them.

Recall vs. Precision: The Trade-off

Recall and precision are inversely related in most retrieval systems. To maximize recall, you retrieve more documents — but this typically reduces precision because you also retrieve more irrelevant ones. To maximize precision, you retrieve fewer, higher-confidence documents — but this risks missing relevant ones.

Scenario	Precision	Recall
Return only the single best document	High	Low
Return all documents in the corpus	Low	High
Balanced retrieval (top-K)	Moderate	Moderate

The right balance depends on the application. A medical diagnosis assistant should favor high recall (do not miss relevant symptoms or contraindications). A customer-facing Q&A bot should favor high precision (do not surface irrelevant or confusing answers).

Recall in RAG Pipelines

In Retrieval-Augmented Generation, recall is especially important at the retrieval stage. If the retrieval step fails to return a document containing the answer, the LLM cannot produce a correct response — no matter how capable it is. This is known as a retrieval miss.

Common causes of low recall in RAG:

Vocabulary mismatch: The query uses different terms than the document ("cardiac arrest" vs. "heart attack"). Hybrid search (keyword + semantic) mitigates this.
Insufficient chunk overlap: If documents are chunked too aggressively, relevant context may span two chunks and neither alone is retrieved.
Low k value: Setting the retrieval limit too low means some relevant documents are outside the returned set.
Poor embedding quality: If the embedding model does not capture the query's semantic intent, relevant documents receive low similarity scores.

How to Measure Recall

To measure recall, you need a ground truth dataset: a set of queries paired with the list of known relevant documents. Recall is then computed by checking how many of those relevant documents appear in the system's top-K results.

Common evaluation frameworks for RAG recall include:

RAGAS: Provides a context_recall metric that checks whether retrieved chunks contain the information needed to answer the question.
BEIR Benchmark: A standard benchmark for evaluating retrieval recall across diverse domains.
Custom evaluation sets: Hand-labeled query-document pairs for your specific domain.

Improving Recall

Practical strategies to increase retrieval recall:

Use hybrid search combining dense vector search with sparse keyword (BM25) search.
Increase the retrieval limit (k) and rely on re-ranking to maintain precision.
Use query expansion to generate alternative phrasings of the query before retrieval.
Choose embedding models trained on domain-similar data.
Tune chunk size and overlap to avoid splitting relevant context across boundaries.