Chapter 2.6

Query Rewriting & Expansion

Users type imperfect queries. Rewriting and expansion bridge the gap between query and corpus.

The Expansion Trade-off

This graph visualizes the classic trade-off in query rewriting: as you expand queries more aggressively (moving left to right), you find more results (Recall increases), but the relevance of those results typically drops (Precision decreases).

Precision vs Recall by Expansion Level

Expansion is a balancing act. The crossover point is where you maximize finding relevant items without flooding the user with junk.

Key Insight

The sweet spot is usually "Synonyms + High Confidence Semantic". Going broader often hurts user experience more than it helps.

Rewriting Logic

How query rewriting looks in code. This example shows both simple synonym expansion and LLM-based rewriting.

rewriter.py

class QueryRewriter:

def rewrite(self, query: str) -> Dict:

# 1. Structure Extraction (Regex/Templates)

# "nike shoes under $100" -> brand:nike, price<100

structured = self.template_matcher.extract(query)

if structured:

return structured

# 2. Synonym Expansion

# "sneakers" -> "sneakers OR athletic shoes"

expanded_query = self.synonym_engine.expand(query)

# 3. LLM Rewriting (Slow, use only for complex queries)

if self.is_complex(query):

# Prompt: "Convert user query to Elasticsearch filter JSON..."

llm_response = self.llm_client.complete(

prompt=f"Rewrite this for search: {query}"

)

return json.loads(llm_response)

return {"match": expanded_query}

# Usage

rewriter = QueryRewriter()

# Output: { 'bool': { 'must': [{ 'term': { 'brand': 'nike' } }, { 'range': { 'price': { 'lt': 100 } } }] } }

print(rewriter.rewrite("nike shoes under $100"))

The Expansion Spectrum

No ExpansionLightModerateHeavy

High Precision

Many zero results

Synonyms only

Safe expansion

+ Related terms

Balanced

High Recall

Some noise

Query Rewriting

Structured Query Conversion

Input:

"blue nike running shoes size 10 under $100"

Output:

{
  "text_query": "running shoes",
  "filters": {
    "color": "blue",
    "brand": "Nike",
    "size": "10",
    "price": {"max": 100}
  }
}

Entity → Filter

Extract entities, convert to filters

"Nike" → brand:Nike

Template Matching

Common patterns to structure

"X under $Y" → price_max:Y

LLM Rewriting

For natural language queries

"comfy shoes for standing" → cushioned, supportive

Industry Case Studies

Amazon

Multi-layer approach:

1. Dictionary (60%, 98% precision)
2. Templates (20%, 95% precision)
3. ML Model (15%, 88% precision)
4. LLM fallback (5%, cached)

Google

Key learnings:

• Over-expansion hurts more than under
• Show original results first
• Mark expanded clearly

Spotify

Mood expansion:

• "sad" → audio features
• Personalized by history
• Genre diversification

Adaptive Expansion

Adjust expansion based on result count:

>100

Results: Many

Don't expand (preserve precision)

No Expansion

20-100

Results: Good

Light expansion (synonyms only)

Light

Results: Zero

Heavy expansion + relax constraints

Fallback

Key Takeaways

Trade-off

Expansion increases recall (finding more) but hurts precision (relevance).

Structured Rewriting

Convert unstructured text to structured filters (e.g., "under $100" -> price < 100).

Constraints

Never expand named entities (Brands) or negations. It destroys trust.

Adaptive Strategy

Expand aggressively only when you have low result counts (Zero Results).

Understanding Pipeline Next: Handling Ambiguity