What Is Recall?
Recall is one of the two fundamental metrics in information retrieval, alongside precision. It measures the completeness of a retrieval system: out of all the documents in the corpus that are actually relevant to a query, what fraction did the system successfully return?
Formula:
Recall = True Positives / (True Positives + False Negatives)
= Retrieved Relevant / All Relevant
A recall of 1.0 means the system returned every relevant document. A recall of 0.5 means it missed half of them.
Recall vs. Precision: The Trade-off
Recall and precision are inversely related in most retrieval systems. To maximize recall, you retrieve more documents — but this typically reduces precision because you also retrieve more irrelevant ones. To maximize precision, you retrieve fewer, higher-confidence documents — but this risks missing relevant ones.
| Scenario | Precision | Recall |
|---|---|---|
| Return only the single best document | High | Low |
| Return all documents in the corpus | Low | High |
| Balanced retrieval (top-K) | Moderate | Moderate |
The right balance depends on the application. A medical diagnosis assistant should favor high recall (do not miss relevant symptoms or contraindications). A customer-facing Q&A bot should favor high precision (do not surface irrelevant or confusing answers).
Recall in RAG Pipelines
In Retrieval-Augmented Generation, recall is especially important at the retrieval stage. If the retrieval step fails to return a document containing the answer, the LLM cannot produce a correct response — no matter how capable it is. This is known as a retrieval miss.
Common causes of low recall in RAG:
- Vocabulary mismatch: The query uses different terms than the document ("cardiac arrest" vs. "heart attack"). Hybrid search (keyword + semantic) mitigates this.
- Insufficient chunk overlap: If documents are chunked too aggressively, relevant context may span two chunks and neither alone is retrieved.
- Low
kvalue: Setting the retrieval limit too low means some relevant documents are outside the returned set. - Poor embedding quality: If the embedding model does not capture the query's semantic intent, relevant documents receive low similarity scores.
How to Measure Recall
To measure recall, you need a ground truth dataset: a set of queries paired with the list of known relevant documents. Recall is then computed by checking how many of those relevant documents appear in the system's top-K results.
Common evaluation frameworks for RAG recall include:
- RAGAS: Provides a
context_recallmetric that checks whether retrieved chunks contain the information needed to answer the question. - BEIR Benchmark: A standard benchmark for evaluating retrieval recall across diverse domains.
- Custom evaluation sets: Hand-labeled query-document pairs for your specific domain.
Improving Recall
Practical strategies to increase retrieval recall:
- Use hybrid search combining dense vector search with sparse keyword (BM25) search.
- Increase the retrieval limit (
k) and rely on re-ranking to maintain precision. - Use query expansion to generate alternative phrasings of the query before retrieval.
- Choose embedding models trained on domain-similar data.
- Tune chunk size and overlap to avoid splitting relevant context across boundaries.