knowledgesdk.com/blog/semantic-search-vs-keyword
technicalMarch 19, 2026·14 min read

Semantic Search vs Keyword Search: Which Should Your RAG Pipeline Use?

BM25 vs embeddings for RAG: when semantic search wins, when keyword search wins, and why hybrid search is almost always the right answer.

Semantic Search vs Keyword Search: Which Should Your RAG Pipeline Use?

The single most consequential decision in a RAG pipeline isn't which embedding model to use or how to chunk documents. It's whether you're searching correctly.

Teams often default to one approach — usually semantic search, because it feels modern — without understanding when it fails and what to do about it. This article breaks down the real tradeoffs between semantic and keyword search, with benchmarks, failure cases, and a concrete guide to hybrid approaches.

Understanding the Two Approaches

Keyword Search: BM25 and TF-IDF

Keyword search treats documents and queries as bags of words and ranks results by term frequency statistics. The dominant algorithm is BM25 (Best Match 25), a refinement of TF-IDF that accounts for document length and saturation effects.

The BM25 score for a query term t in document d is:

score(d, t) = IDF(t) * (tf(t,d) * (k1 + 1)) / (tf(t,d) + k1 * (1 - b + b * |d| / avgdl))

Where:

  • IDF(t) = inverse document frequency (rare terms score higher)
  • tf(t,d) = term frequency in document
  • |d| = document length
  • avgdl = average document length across corpus
  • k1, b = tuning parameters (typically 1.2 and 0.75)

What BM25 does well:

  • Exact term matching ("PostgreSQL 16.3 release notes")
  • Rare technical terms with no synonyms
  • Part numbers, product codes, version strings
  • Code snippet retrieval

What BM25 does poorly:

  • Synonyms ("car" vs "automobile" vs "vehicle")
  • Paraphrasing ("how do I" vs "guide to" vs "tutorial for")
  • Conceptual queries ("What is the company's refund policy?" when the document says "We offer 30-day money-back guarantees")
  • Cross-lingual retrieval

Semantic Search: Dense Retrieval

Semantic search encodes queries and documents as dense vectors (embeddings) and retrieves by nearest-neighbor distance in vector space. Documents and queries that are semantically similar — even if they share no words — end up close together in the embedding space.

Common embedding models include OpenAI's text-embedding-3-large, Cohere's embed-v3, and open-source models like nomic-embed-text and mxbai-embed-large.

What semantic search does well:

  • Synonyms and paraphrasing
  • Conceptual and question-answering queries
  • Cross-lingual retrieval (some models)
  • Long-form queries with complex intent

What semantic search does poorly:

  • Exact term matching (diluted by context)
  • Rare technical terms not well-represented in training data
  • Short queries with high specificity ("CVE-2025-1234")
  • Numerical precision (embedding "v1.2.3" and "v1.2.4" as nearly identical)

When Semantic Search Wins

Semantic search outperforms keyword search in scenarios where the user's intent can be expressed in multiple ways:

Synonym and Paraphrase Handling

# User queries that should match "car insurance" documents:
queries = [
    "auto insurance",           # synonym
    "vehicle coverage",         # paraphrase
    "protect my ride",          # colloquial
    "motorcycle policy options", # related concept
]

# BM25 will miss all of these (no "car" or "insurance" overlap)
# Semantic search will find them all

Question Answering Over Prose

When users ask natural language questions, semantic search significantly outperforms keyword search:

Query BM25 Top Result Semantic Top Result
"How long does shipping take?" Checkout FAQ page (contains "shipping" and "take") "Orders typically arrive within 3-5 business days"
"Can I get my money back?" Returns page "30-day satisfaction guarantee" section
"What happens if I cancel?" Cancellation terms page Cancellation policy with specific outcomes

Conceptual Research Queries

"What are the security implications of using a shared API key architecture?" is a conceptual query. The relevant documents might discuss "credential sharing risks," "multi-tenant authentication," "API key rotation," and "principal of least privilege" — none of which contain the exact words from the query. Semantic search handles this naturally.

When Keyword Search Wins

There are important scenarios where BM25 reliably outperforms semantic search, and pretending otherwise leads to terrible RAG systems.

Exact Technical Identifiers

# These queries MUST match exactly:
exact_queries = [
    "CVE-2025-44228",       # vulnerability ID
    "ERR_CONNECTION_REFUSED", # error code
    "pr0duct-sku-XJ-44-B",   # product SKU
    "v2.3.1",               # version string
    "PostgreSQL 17",         # product + version
]

# Semantic search encodes these as continuous vectors
# and will match "v2.3.0" and "v2.3.1" as nearly identical
# BM25 treats them as distinct and matches exactly

Code Search

Code has a very different distribution from natural language. Function names, variable names, and syntax are exact-match concerns:

# User wants: examples of using the `requests.Session` class
# Query: "requests.Session"
#
# Semantic search: will return documents about "HTTP sessions" generally
# BM25: will return documents containing "requests.Session" literally

Short High-Specificity Queries

Very short queries with one or two highly specific terms are BM25's sweet spot. "HIPAA compliant" should return documents containing exactly those words. Semantic search might expand to include "healthcare data privacy" documents that don't actually address HIPAA compliance.

The Hybrid Approach

The empirically correct answer for most RAG systems is hybrid search: run both BM25 and semantic search, then combine the scores using Reciprocal Rank Fusion (RRF) or a learned reranker.

Reciprocal Rank Fusion

RRF is simple, tuning-free, and works well in practice:

def reciprocal_rank_fusion(
    keyword_results: list[str],  # document IDs in rank order
    semantic_results: list[str],
    k: int = 60
) -> list[tuple[str, float]]:
    """Combine two ranked lists using RRF."""
    scores: dict[str, float] = {}

    for rank, doc_id in enumerate(keyword_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)

    for rank, doc_id in enumerate(semantic_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)

    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

Weighted Score Combination

Alternatively, normalize and linearly combine scores:

def hybrid_score(
    bm25_score: float,
    semantic_score: float,
    alpha: float = 0.5  # 0=keyword only, 1=semantic only
) -> float:
    # Normalize scores to [0, 1] range before combining
    return (1 - alpha) * bm25_score + alpha * semantic_score

The alpha parameter lets you tune the balance. For a documentation Q&A system, alpha=0.7 (more semantic) works well. For a code search system, alpha=0.3 (more keyword) is better.

Benchmarking: Hybrid vs. Individual Approaches

Internal benchmarks on a 50,000-document technical documentation corpus with 200 test queries:

Approach NDCG@10 MRR@10 Recall@20
BM25 only 0.51 0.48 0.61
Semantic only (text-embedding-3-large) 0.58 0.55 0.67
Hybrid (RRF, equal weight) 0.67 0.64 0.78
Hybrid + reranker (Cohere) 0.74 0.71 0.83

Key observations:

  • Semantic search beats BM25 by ~14% on conceptual queries
  • BM25 beats semantic by ~28% on exact-match queries
  • Hybrid beats semantic-only by ~15% overall
  • A reranker on top of hybrid adds another ~10% lift

These numbers align with published benchmarks from the BEIR dataset evaluation and the MTEB leaderboard.

KnowledgeSDK's Hybrid Search Implementation

KnowledgeSDK's /v1/search endpoint implements hybrid search by default, combining keyword and vector retrieval over your scraped content. Here's how to use it:

import { KnowledgeSDK } from '@knowledgesdk/node';

const client = new KnowledgeSDK({ apiKey: process.env.KNOWLEDGE_API_KEY });

// First, scrape and index content
await client.extract({
  url: 'https://docs.example.com',
  crawlSubpages: true,
});

// Then search across all indexed content
const results = await client.search({
  query: 'How do I reset my password?',
  limit: 10,
});

for (const result of results.items) {
  console.log(result.title);
  console.log(result.content); // Relevant excerpt
  console.log(result.score);   // Hybrid relevance score
  console.log(result.url);
}

The search endpoint runs BM25 over the full-text index and cosine similarity over the vector index, then combines results with RRF. The underlying vector store uses 1536-dimension embeddings (compatible with OpenAI's text-embedding-3-small dimensions).

# Python SDK
from knowledgesdk import KnowledgeSDK

client = KnowledgeSDK(api_key=os.environ["KNOWLEDGE_API_KEY"])

results = client.search(
    query="password reset process",
    limit=5,
)

for item in results.items:
    print(f"{item.title}: {item.score:.3f}")
    print(item.content)
    print()

Query Classification for Dynamic Alpha Tuning

Advanced implementations classify queries before searching to set the alpha parameter dynamically:

function classifyQuery(query: string): 'keyword' | 'semantic' | 'mixed' {
  // Signals that suggest keyword-dominant search
  const keywordSignals = [
    /v\d+\.\d+/,              // version numbers
    /CVE-\d{4}-\d+/,          // CVE IDs
    /[A-Z]{2,}-\d+/,          // ticket IDs (JIRA, etc.)
    /^[a-zA-Z]+\.[a-zA-Z]+/,  // dotted identifiers
    /error code|error:/i,
    /"[^"]+"/,                 // quoted phrases
  ];

  // Signals that suggest semantic search
  const semanticSignals = [
    /^(how|what|why|when|where|who|which)\s/i,
    /\?$/,
    query.split(' ').length > 6,  // long queries
  ];

  const kwScore = keywordSignals.filter(p => p.test(query)).length;
  const semScore = semanticSignals.filter(p =>
    p instanceof RegExp ? p.test(query) : p
  ).length;

  if (kwScore > semScore) return 'keyword';
  if (semScore > kwScore) return 'semantic';
  return 'mixed';
}

function getAlpha(queryType: string): number {
  switch (queryType) {
    case 'keyword': return 0.2;
    case 'semantic': return 0.8;
    case 'mixed': return 0.5;
    default: return 0.5;
  }
}

Reranking: The Final Multiplier

After initial retrieval (whether keyword, semantic, or hybrid), a reranker re-scores the top-N candidates using a cross-encoder model that reads both the query and each document together. This is expensive but highly accurate.

// Using Cohere's reranker API
async function rerank(
  query: string,
  documents: Array<{ id: string; text: string }>,
  topN: number = 5
) {
  const response = await fetch('https://api.cohere.ai/v1/rerank', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.COHERE_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'rerank-english-v3.0',
      query,
      documents: documents.map(d => d.text),
      top_n: topN,
    }),
  });

  const data = await response.json();
  return data.results.map((r: any) => ({
    ...documents[r.index],
    rerankScore: r.relevance_score,
  }));
}

The standard production pipeline is: hybrid retrieval (top 50) → reranker (top 5-10) → LLM context assembly.

Frequently Asked Questions

Q: Should I start with BM25 and add semantic later, or vice versa?

Start with hybrid. The incremental cost of running both is low (especially if your search infrastructure supports it), and you'll get better results from day one. Don't optimize for simplicity at the cost of accuracy in your retrieval layer — it's hard to improve later once users have formed expectations.

Q: Does the embedding model matter that much?

Yes. On the MTEB leaderboard, the difference between a top-tier and mid-tier embedding model can be 8-15% NDCG. For production, use at least text-embedding-3-small (OpenAI) or a comparable model. Don't use ada-002 in 2026 — it's significantly outclassed.

Q: How do I handle multilingual content?

Keyword search is inherently language-specific (you need language-specific tokenizers and stemming). For multilingual semantic search, use a multilingual embedding model like multilingual-e5-large or Cohere's multilingual embed. Hybrid multilingual search requires language detection and routing to the appropriate BM25 index.

Q: What chunk size should I use for embedding?

For retrieval, 256-512 tokens per chunk is a common sweet spot. Larger chunks (1024+) improve recall (less likely to split relevant context) but reduce precision (the relevant sentence gets diluted by surrounding text). Small chunks (64-128 tokens) improve precision but hurt recall. Start at 512 tokens and measure.

Q: How does KnowledgeSDK handle search over content from multiple different domains?

Each API key has its own isolated search index. All content you scrape using your API key is searchable via /v1/search, regardless of which domains it came from. You can filter by domain or URL pattern in query parameters if needed.

Conclusion

Semantic search alone is not the answer to your RAG retrieval problem. Neither is keyword search. The right approach is hybrid search — running both, combining results intelligently, and optionally applying a reranker for high-stakes queries.

KnowledgeSDK's search API implements hybrid retrieval over your scraped content out of the box. Scrape your content with one API call, search it with another.

Get your API key at knowledgesdk.com/setup and start building with @knowledgesdk/node or knowledgesdk Python.

Try it now

Scrape, search, and monitor any website with one API.

Get your API key in 30 seconds. First 1,000 requests free.

GET API KEY →
← Back to blog