What Is Query Expansion?
Query expansion is a retrieval technique that modifies or augments a user's original query before sending it to the search index, with the goal of increasing recall — finding relevant documents that the original query would have missed.
Short, ambiguous, or poorly worded queries are the most common cause of retrieval failure in RAG systems. Query expansion attacks this problem directly.
Why Queries Fail
User queries are often:
- Too short — "refund?" gives the retriever little to work with
- Ambiguous — "integration" could mean third-party tools, system design, or team collaboration
- Vocabulary-mismatched — the user says "cancel" but the docs say "unsubscribe"
- Implicit — the user assumes context the retriever does not have
Query Expansion Techniques
1. Synonym Expansion
Add synonyms and related terms to the query:
"cancel subscription" → "cancel OR unsubscribe OR terminate OR stop plan"
Can be done with a thesaurus, word embeddings, or an LLM.
2. Multi-Query Retrieval
Use an LLM to generate N paraphrases of the original query, retrieve independently for each, and merge results:
prompt = """Generate 3 different versions of this question for document retrieval:
Question: {question}
Return a JSON array of 3 strings."""
paraphrases = llm.generate(prompt.format(question=user_query))
all_results = [vector_db.search(q, top_k=10) for q in paraphrases]
merged = deduplicate_and_merge(all_results)
3. HyDE (Hypothetical Document Embeddings)
Instead of embedding the query directly, ask an LLM to generate a hypothetical answer, then embed that answer as the search query:
hypothetical_answer = llm.generate(
f"Write a detailed answer to: {user_query}"
)
query_vector = embed(hypothetical_answer) # embed the answer, not the question
results = vector_db.search(query_vector, top_k=10)
HyDE works because document chunks look more like other answers than they look like questions — the hypothetical answer lives in the same embedding space as indexed content.
4. Step-Back Prompting
Reformulate the query at a higher level of abstraction before retrieval:
User: "Why does my React hook fire twice?"
Step-back: "How does React's useEffect lifecycle work?"
This retrieves foundational content that answers the specific question indirectly.
5. Sub-Question Decomposition
Decompose a complex multi-part query into simpler sub-questions, retrieve for each, and synthesize:
Complex: "Compare the pricing and integration complexity of Stripe and Paddle"
Sub-questions:
- "What is Stripe's pricing?"
- "What is Paddle's pricing?"
- "How complex is Stripe integration?"
- "How complex is Paddle integration?"
Trade-offs
| Technique | Latency Impact | Recall Improvement | Cost |
|---|---|---|---|
| Synonym expansion | Minimal | Low–Medium | Free |
| Multi-query | High (N × retrieval) | High | LLM tokens |
| HyDE | Medium (1 LLM call + embed) | High | LLM tokens |
| Step-back | Medium | Medium | LLM tokens |
Query Expansion with KnowledgeSDK
You can implement query expansion on top of POST /v1/search by expanding your query before the API call:
const paraphrases = await llm.generate(`Rephrase: "${userQuery}" in 3 ways`);
const results = await Promise.all(
paraphrases.map(q =>
fetch("https://api.knowledgesdk.com/v1/search", {
method: "POST",
headers: { "x-api-key": "knowledgesdk_live_..." },
body: JSON.stringify({ query: q, limit: 10 })
}).then(r => r.json())
)
);
const merged = deduplicateByContent(results.flat());
This pattern significantly improves recall for conversational AI applications built on top of KnowledgeSDK.