Chapter 2.5
The Query Understanding Pipeline
Query understanding is not one step it's a pipeline of transformations that must complete in <50ms.
System Architecture
This high-level architecture shows how a query flows from the user's device through the understanding layers before reaching the retrieval engine.
Implementation
A simplified Python implementation showing how the pipeline components fit together. Notice the parallel execution for independent tasks using asyncio.
Latency Breakdown
Every millisecond counts. This chart breaks down the latency budget for a typical query. Note how spell checking and NER consume the bulk of the time budget.
⚡ Latency Budget: <50ms total
Spell correction and NER are the most expensive. Consider caching for head queries.
Stage Deep Dive
Normalization
- • Lowercase (case-fold for Unicode)
- • Remove extra whitespace
- • Unicode normalization (é → e)
- • Handle special characters
Spell Correction
- • SymSpell (edit distance 2)
- • Phonetic matching
- • Query log corrections
- • Protect known brands
Entity Extraction (NER)
- • Brand detection
- • Category/product type
- • Attributes (size, color)
- • Price constraints
Intent Classification
- • Navigational (go to page)
- • Informational (learn)
- • Transactional (buy)
- • Local (near me)
Production Metrics
| Metric | Target | Alert If |
|---|---|---|
| P50 Latency | 35ms | >50ms |
| P99 Latency | 65ms | >100ms |
| Cache Hit Rate | 70% | <50% |
| NER F1 Score | >0.92 | <0.85 |
Key Takeaways
7-Stage Pipeline
Query understanding is a deterministic pipeline: Clean -> Fix -> Extract -> Classify -> Enrich.
Latency Budget
You have ~50ms total. P50 should be 35ms.
Bottlenecks
Spell correction and NER are the most expensive steps.
Optimization
Parallelize NER + Intent. Cache aggressively (head queries).