Chapter 2.5

The Query Understanding Pipeline

Query understanding is not one step it's a pipeline of transformations that must complete in <50ms.

System Architecture

This high-level architecture shows how a query flows from the user's device through the understanding layers before reaching the retrieval engine.

Client App

Gateway

Query Service

1. Clean

Normalize

2. Fix

Spell Check

3. Extract

NER

4. Classify

Intent

5. Enrich

Rewrite

{json}

Redis Cache

Vector DB

Implementation

A simplified Python implementation showing how the pipeline components fit together. Notice the parallel execution for independent tasks using asyncio.

pipeline.py

import asyncio

from typing import Dict, Any

class QueryPipeline:

def __init__(self):

self.normalizer = TextNormalizer()

self.spell_checker = SymSpellChecker()

self.ner_model = BERTEntityExtractor()

self.intent_model = FastTextIntentClassifier()

self.expander = SynonymExpander()

async def process(self, raw_query: str) -> Dict[str, Any]:

# 1. Normalization (CPU bound, fast)

normalized = self.normalizer.normalize(raw_query)

# 2. Spell Check (Memory bound, dictionary lookup)

corrected = self.spell_checker.correct(normalized)

# 3. Parallel Execution: NER & Intent are independent

entities_future = self.ner_model.extract_async(corrected)

intent_future = self.intent_model.predict_async(corrected)

entities, intent = await asyncio.gather(entities_future, intent_future)

# 4. Expansion depending on intent

expanded_terms = []

if intent.confidence < 0.9 or intent.label == "broad":

expanded_terms = self.expander.expand(corrected)

return {

"original": raw_query,

"corrected": corrected,

"entities": entities,

"intent": intent,

"expansion": expanded_terms

}

Latency Breakdown

Every millisecond counts. This chart breaks down the latency budget for a typical query. Note how spell checking and NER consume the bulk of the time budget.

⚡ Latency Budget: <50ms total

Spell correction and NER are the most expensive. Consider caching for head queries.

Stage Deep Dive

Normalization

• Lowercase (case-fold for Unicode)
• Remove extra whitespace
• Unicode normalization (é → e)
• Handle special characters

"Running SHOES" → "running shoes"

Spell Correction

• SymSpell (edit distance 2)
• Phonetic matching
• Query log corrections
• Protect known brands

"iphoen" → "iPhone" (query log)

Entity Extraction (NER)

• Brand detection
• Category/product type
• Attributes (size, color)
• Price constraints

Nikesize 10under $100

Intent Classification

• Navigational (go to page)
• Informational (learn)
• Transactional (buy)
• Local (near me)

"how to" → Informational

"buy X" → Transactional

Production Metrics

Metric	Target	Alert If
P50 Latency	35ms	>50ms
P99 Latency	65ms	>100ms
Cache Hit Rate	70%	<50%
NER F1 Score	>0.92	<0.85

Key Takeaways

7-Stage Pipeline

Query understanding is a deterministic pipeline: Clean -> Fix -> Extract -> Classify -> Enrich.

Latency Budget

You have ~50ms total. P50 should be 35ms.

Bottlenecks

Spell correction and NER are the most expensive steps.

Optimization

Parallelize NER + Intent. Cache aggressively (head queries).

Power Laws Next: Query Rewriting