What Is Precision?
Precision is one of the two fundamental metrics in information retrieval. It measures the accuracy of a retrieval system: out of all the documents the system returned, what fraction were actually relevant to the query?
Formula:
Precision = True Positives / (True Positives + False Positives)
= Retrieved Relevant / Total Retrieved
A precision of 1.0 means every returned document was relevant. A precision of 0.3 means 70% of the returned documents were noise.
Precision vs. Recall: The Trade-off
Precision and recall exist in tension. Returning fewer, more confident results typically improves precision but reduces recall. Returning many results to ensure completeness typically improves recall but reduces precision.
| Scenario | Precision | Recall |
|---|---|---|
| Return only the top 1 result | High | Low |
| Return the top 100 results | Low | High |
| Return top-5 with re-ranking | Moderate-High | Moderate |
Precision@K
Because most retrieval systems return a ranked list, Precision@K is the most commonly reported variant. It measures precision among only the top K results:
Precision@5 = Relevant documents in top 5 / 5
For a customer-facing search or Q&A system, Precision@3 or Precision@5 is usually the most meaningful metric — users rarely look past the first few results, and irrelevant early results destroy the user experience.
Precision in RAG Pipelines
In RAG pipelines, low precision creates a specific failure mode: the LLM receives too much irrelevant context. This can cause:
- Context dilution: The relevant information is buried among noise, and the model pays less attention to it.
- Hallucination: Irrelevant context confuses the model, leading it to generate answers that mix relevant and irrelevant content.
- Increased cost: More tokens in the context window means higher inference cost.
- Latency: Larger prompts take longer to process.
How to Measure Precision
To measure precision, you need a ground truth evaluation set: queries paired with lists of known relevant documents. Precision is computed by checking what fraction of the system's returned documents appear in the relevant set.
Evaluation tools for RAG precision:
- RAGAS
context_precision: Measures whether the retrieved chunks are actually useful for answering the question (uses an LLM judge). - Mean Average Precision (MAP): Averages precision across multiple recall levels, giving a more complete picture of ranking quality.
- NDCG (Normalized Discounted Cumulative Gain): A rank-aware metric that penalizes relevant documents appearing lower in the results list.
Improving Precision
Strategies to increase retrieval precision without sacrificing too much recall:
- Re-ranking: Use a cross-encoder or LLM-based reranker to re-score retrieved candidates and surface the most relevant ones at the top.
- Metadata filtering: Pre-filter the corpus by category, date, or source before running vector search, reducing the candidate pool to relevant documents.
- Query clarification: Resolve ambiguous queries before retrieval by asking a clarifying question or inferring intent from context.
- Smaller chunks: More granular chunks tend to be more topically focused, reducing the chance that a retrieved chunk contains mixed signals.
- Similarity threshold: Discard retrieved documents below a minimum similarity score rather than always returning a fixed number of results.
KnowledgeSDK's /v1/search endpoint supports metadata-based filtering, allowing you to scope searches to specific categories or topics and improve precision for domain-specific queries.