knowledgesdk.com/glossary/query-expansion
RAG & Retrievalintermediate

Also known as: query augmentation

Query Expansion

A retrieval technique that augments the original query with synonyms, related terms, or generated sub-questions to improve recall.

What Is Query Expansion?

Query expansion is a retrieval technique that modifies or augments a user's original query before sending it to the search index, with the goal of increasing recall — finding relevant documents that the original query would have missed.

Short, ambiguous, or poorly worded queries are the most common cause of retrieval failure in RAG systems. Query expansion attacks this problem directly.

Why Queries Fail

User queries are often:

  • Too short — "refund?" gives the retriever little to work with
  • Ambiguous — "integration" could mean third-party tools, system design, or team collaboration
  • Vocabulary-mismatched — the user says "cancel" but the docs say "unsubscribe"
  • Implicit — the user assumes context the retriever does not have

Query Expansion Techniques

1. Synonym Expansion

Add synonyms and related terms to the query:

"cancel subscription" → "cancel OR unsubscribe OR terminate OR stop plan"

Can be done with a thesaurus, word embeddings, or an LLM.

2. Multi-Query Retrieval

Use an LLM to generate N paraphrases of the original query, retrieve independently for each, and merge results:

prompt = """Generate 3 different versions of this question for document retrieval:
Question: {question}
Return a JSON array of 3 strings."""

paraphrases = llm.generate(prompt.format(question=user_query))
all_results = [vector_db.search(q, top_k=10) for q in paraphrases]
merged = deduplicate_and_merge(all_results)

3. HyDE (Hypothetical Document Embeddings)

Instead of embedding the query directly, ask an LLM to generate a hypothetical answer, then embed that answer as the search query:

hypothetical_answer = llm.generate(
    f"Write a detailed answer to: {user_query}"
)
query_vector = embed(hypothetical_answer)  # embed the answer, not the question
results = vector_db.search(query_vector, top_k=10)

HyDE works because document chunks look more like other answers than they look like questions — the hypothetical answer lives in the same embedding space as indexed content.

4. Step-Back Prompting

Reformulate the query at a higher level of abstraction before retrieval:

User: "Why does my React hook fire twice?"
Step-back: "How does React's useEffect lifecycle work?"

This retrieves foundational content that answers the specific question indirectly.

5. Sub-Question Decomposition

Decompose a complex multi-part query into simpler sub-questions, retrieve for each, and synthesize:

Complex: "Compare the pricing and integration complexity of Stripe and Paddle"
Sub-questions:
  - "What is Stripe's pricing?"
  - "What is Paddle's pricing?"
  - "How complex is Stripe integration?"
  - "How complex is Paddle integration?"

Trade-offs

Technique Latency Impact Recall Improvement Cost
Synonym expansion Minimal Low–Medium Free
Multi-query High (N × retrieval) High LLM tokens
HyDE Medium (1 LLM call + embed) High LLM tokens
Step-back Medium Medium LLM tokens

Query Expansion with KnowledgeSDK

You can implement query expansion on top of POST /v1/search by expanding your query before the API call:

const paraphrases = await llm.generate(`Rephrase: "${userQuery}" in 3 ways`);
const results = await Promise.all(
  paraphrases.map(q =>
    fetch("https://api.knowledgesdk.com/v1/search", {
      method: "POST",
      headers: { "x-api-key": "knowledgesdk_live_..." },
      body: JSON.stringify({ query: q, limit: 10 })
    }).then(r => r.json())
  )
);
const merged = deduplicateByContent(results.flat());

This pattern significantly improves recall for conversational AI applications built on top of KnowledgeSDK.

Related Terms

RAG & Retrievalbeginner
Semantic Search
A search approach that finds results based on meaning and intent rather than exact keyword matching.
RAG & Retrievalintermediate
Re-ranking
A post-retrieval step that re-scores and reorders retrieved documents using a more powerful cross-encoder model to improve relevance.
RAG & Retrievalintermediate
Retrieval Pipeline
The end-to-end sequence of steps — query processing, search, re-ranking, and context assembly — that retrieves relevant documents for an LLM.
Proxy RotationRate Limiting

Try it now

Build with Query Expansion using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →
← Back to glossary