Precision

An information retrieval metric measuring the fraction of retrieved documents that are actually relevant to the query.

What Is Precision?

Precision is one of the two fundamental metrics in information retrieval. It measures the accuracy of a retrieval system: out of all the documents the system returned, what fraction were actually relevant to the query?

Formula:

Precision = True Positives / (True Positives + False Positives)
           = Retrieved Relevant / Total Retrieved

A precision of 1.0 means every returned document was relevant. A precision of 0.3 means 70% of the returned documents were noise.

Precision vs. Recall: The Trade-off

Precision and recall exist in tension. Returning fewer, more confident results typically improves precision but reduces recall. Returning many results to ensure completeness typically improves recall but reduces precision.

Scenario	Precision	Recall
Return only the top 1 result	High	Low
Return the top 100 results	Low	High
Return top-5 with re-ranking	Moderate-High	Moderate

Precision@K

Because most retrieval systems return a ranked list, Precision@K is the most commonly reported variant. It measures precision among only the top K results:

Precision@5 = Relevant documents in top 5 / 5

For a customer-facing search or Q&A system, Precision@3 or Precision@5 is usually the most meaningful metric — users rarely look past the first few results, and irrelevant early results destroy the user experience.

Precision in RAG Pipelines

In RAG pipelines, low precision creates a specific failure mode: the LLM receives too much irrelevant context. This can cause:

Context dilution: The relevant information is buried among noise, and the model pays less attention to it.
Hallucination: Irrelevant context confuses the model, leading it to generate answers that mix relevant and irrelevant content.
Increased cost: More tokens in the context window means higher inference cost.
Latency: Larger prompts take longer to process.

How to Measure Precision

To measure precision, you need a ground truth evaluation set: queries paired with lists of known relevant documents. Precision is computed by checking what fraction of the system's returned documents appear in the relevant set.

Evaluation tools for RAG precision:

RAGAS context_precision: Measures whether the retrieved chunks are actually useful for answering the question (uses an LLM judge).
Mean Average Precision (MAP): Averages precision across multiple recall levels, giving a more complete picture of ranking quality.
NDCG (Normalized Discounted Cumulative Gain): A rank-aware metric that penalizes relevant documents appearing lower in the results list.

Improving Precision

Strategies to increase retrieval precision without sacrificing too much recall:

Re-ranking: Use a cross-encoder or LLM-based reranker to re-score retrieved candidates and surface the most relevant ones at the top.
Metadata filtering: Pre-filter the corpus by category, date, or source before running vector search, reducing the candidate pool to relevant documents.
Query clarification: Resolve ambiguous queries before retrieval by asking a clarifying question or inferring intent from context.
Smaller chunks: More granular chunks tend to be more topically focused, reducing the chance that a retrieved chunk contains mixed signals.
Similarity threshold: Discard retrieved documents below a minimum similarity score rather than always returning a fixed number of results.

KnowledgeSDK's /v1/search endpoint supports metadata-based filtering, allowing you to scope searches to specific categories or topics and improve precision for domain-specific queries.