Query Expansion

A retrieval technique that augments the original query with synonyms, related terms, or generated sub-questions to improve recall.

What Is Query Expansion?

Query expansion is a retrieval technique that modifies or augments a user's original query before sending it to the search index, with the goal of increasing recall — finding relevant documents that the original query would have missed.

Short, ambiguous, or poorly worded queries are the most common cause of retrieval failure in RAG systems. Query expansion attacks this problem directly.

Why Queries Fail

User queries are often:

Too short — "refund?" gives the retriever little to work with
Ambiguous — "integration" could mean third-party tools, system design, or team collaboration
Vocabulary-mismatched — the user says "cancel" but the docs say "unsubscribe"
Implicit — the user assumes context the retriever does not have

Query Expansion Techniques

1. Synonym Expansion

Add synonyms and related terms to the query:

"cancel subscription" → "cancel OR unsubscribe OR terminate OR stop plan"

Can be done with a thesaurus, word embeddings, or an LLM.

2. Multi-Query Retrieval

Use an LLM to generate N paraphrases of the original query, retrieve independently for each, and merge results:

prompt = """Generate 3 different versions of this question for document retrieval:
Question: {question}
Return a JSON array of 3 strings."""

paraphrases = llm.generate(prompt.format(question=user_query))
all_results = [vector_db.search(q, top_k=10) for q in paraphrases]
merged = deduplicate_and_merge(all_results)

3. HyDE (Hypothetical Document Embeddings)

Instead of embedding the query directly, ask an LLM to generate a hypothetical answer, then embed that answer as the search query:

hypothetical_answer = llm.generate(
    f"Write a detailed answer to: {user_query}"
)
query_vector = embed(hypothetical_answer)  # embed the answer, not the question
results = vector_db.search(query_vector, top_k=10)

HyDE works because document chunks look more like other answers than they look like questions — the hypothetical answer lives in the same embedding space as indexed content.

4. Step-Back Prompting

Reformulate the query at a higher level of abstraction before retrieval:

User: "Why does my React hook fire twice?"
Step-back: "How does React's useEffect lifecycle work?"

This retrieves foundational content that answers the specific question indirectly.

5. Sub-Question Decomposition

Decompose a complex multi-part query into simpler sub-questions, retrieve for each, and synthesize:

Complex: "Compare the pricing and integration complexity of Stripe and Paddle"
Sub-questions:
  - "What is Stripe's pricing?"
  - "What is Paddle's pricing?"
  - "How complex is Stripe integration?"
  - "How complex is Paddle integration?"

Trade-offs

Technique	Latency Impact	Recall Improvement	Cost
Synonym expansion	Minimal	Low–Medium	Free
Multi-query	High (N × retrieval)	High	LLM tokens
HyDE	Medium (1 LLM call + embed)	High	LLM tokens
Step-back	Medium	Medium	LLM tokens

Query Expansion with KnowledgeSDK

You can implement query expansion on top of POST /v1/search by expanding your query before the API call:

const paraphrases = await llm.generate(`Rephrase: "${userQuery}" in 3 ways`);
const results = await Promise.all(
  paraphrases.map(q =>
    fetch("https://api.knowledgesdk.com/v1/search", {
      method: "POST",
      headers: { "x-api-key": "knowledgesdk_live_..." },
      body: JSON.stringify({ query: q, limit: 10 })
    }).then(r => r.json())
  )
);
const merged = deduplicateByContent(results.flat());

This pattern significantly improves recall for conversational AI applications built on top of KnowledgeSDK.

Related Terms

RAG & Retrievalbeginner

Semantic Search

A search approach that finds results based on meaning and intent rather than exact keyword matching.

RAG & Retrievalintermediate

Re-ranking

A post-retrieval step that re-scores and reorders retrieved documents using a more powerful cross-encoder model to improve relevance.

RAG & Retrievalintermediate

Retrieval Pipeline

The end-to-end sequence of steps — query processing, search, re-ranking, and context assembly — that retrieves relevant documents for an LLM.

← Proxy Rotation Rate Limiting →

Try it now

Build with Query Expansion using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →

← Back to glossary