Systems Atlas

Chapter 2.5

The Query Understanding Pipeline

Query understanding is not one step it's a pipeline of transformations that must complete in <50ms.


System Architecture

This high-level architecture shows how a query flows from the user's device through the understanding layers before reaching the retrieval engine.

Client App
Gateway
Query Service
1. Clean
Normalize
2. Fix
Spell Check
3. Extract
NER
4. Classify
Intent
5. Enrich
Rewrite
{json}
Redis Cache
Vector DB

Implementation

A simplified Python implementation showing how the pipeline components fit together. Notice the parallel execution for independent tasks using asyncio.

pipeline.py
import asyncio
from typing import Dict, Any
class QueryPipeline:
def __init__(self):
self.normalizer = TextNormalizer()
self.spell_checker = SymSpellChecker()
self.ner_model = BERTEntityExtractor()
self.intent_model = FastTextIntentClassifier()
self.expander = SynonymExpander()
async def process(self, raw_query: str) -> Dict[str, Any]:
# 1. Normalization (CPU bound, fast)
normalized = self.normalizer.normalize(raw_query)
# 2. Spell Check (Memory bound, dictionary lookup)
corrected = self.spell_checker.correct(normalized)
# 3. Parallel Execution: NER & Intent are independent
entities_future = self.ner_model.extract_async(corrected)
intent_future = self.intent_model.predict_async(corrected)
entities, intent = await asyncio.gather(entities_future, intent_future)
# 4. Expansion depending on intent
expanded_terms = []
if intent.confidence < 0.9 or intent.label == "broad":
expanded_terms = self.expander.expand(corrected)
return {
"original": raw_query,
"corrected": corrected,
"entities": entities,
"intent": intent,
"expansion": expanded_terms
}

Latency Breakdown

Every millisecond counts. This chart breaks down the latency budget for a typical query. Note how spell checking and NER consume the bulk of the time budget.

⚡ Latency Budget: <50ms total

Spell correction and NER are the most expensive. Consider caching for head queries.

Stage Deep Dive

Normalization

  • • Lowercase (case-fold for Unicode)
  • • Remove extra whitespace
  • • Unicode normalization (é → e)
  • • Handle special characters
"Running SHOES" → "running shoes"

Spell Correction

  • • SymSpell (edit distance 2)
  • • Phonetic matching
  • • Query log corrections
  • • Protect known brands
"iphoen" → "iPhone" (query log)

Entity Extraction (NER)

  • • Brand detection
  • • Category/product type
  • • Attributes (size, color)
  • • Price constraints
Nikesize 10under $100

Intent Classification

  • • Navigational (go to page)
  • • Informational (learn)
  • • Transactional (buy)
  • • Local (near me)
"how to" → Informational
"buy X" → Transactional

Production Metrics

MetricTargetAlert If
P50 Latency35ms>50ms
P99 Latency65ms>100ms
Cache Hit Rate70%<50%
NER F1 Score>0.92<0.85

Key Takeaways

01

7-Stage Pipeline

Query understanding is a deterministic pipeline: Clean -> Fix -> Extract -> Classify -> Enrich.

02

Latency Budget

You have ~50ms total. P50 should be 35ms.

03

Bottlenecks

Spell correction and NER are the most expensive steps.

04

Optimization

Parallelize NER + Intent. Cache aggressively (head queries).