Chapter 2.3
Intent vs Tokens
Tokens are what the user typed. Intent is what they meant. These often diverge and understanding this gap is fundamental to building good search.
The Core Problem
Traditional search engines work by matching tokens (words) in the query against tokens in documents. This sounds reasonable until you realize that users express intent through words, but intent and words are not the same thing.
When a user types "cheap laptop", they don't want documents containing the word "cheap" they want laptops under a certain price. When they type "laptop without touchscreen", matching "touchscreen" actually gives them the opposite of what they want.
Intent Categories (Google Framework)
Google famously categorizes queries into "Do, Know, Go". Here is how we handle them differently.
Navigational Intent
The user wants to go to a specific website or page. They are using search as a bookmark bar.
Examples
- • "facebook login"
- • "youtube" (Homepage)
- • "hbo max"
- • "united airlines support"
System Action
- • Show single official link at #1
- • Show sitelinks (sub-pages)
- • Don't show ads if brand owner
Informational Intent
The user wants to learn something. These queries make up 80% of web searches but monetize poorly.
Examples
- • "how to tie a tie"
- • "how to upload video to youtube"
- • "capital of france"
System Action
- • Show Direct Answer / Snippet
- • Show "People Also Ask"
- • Rank authoritative content (Wikipedia)
Transactional Intent
The user wants to buy or perform an action. This is where the money is (Ads, E-commerce).
Examples
- • "buy iphone 15"
- • "cheap flights to nyc"
- • "download spotify"
System Action
- • Show Shopping Grid
- • Show Filters (Price, Brand)
- • Show Reviews and Ratings
Quantifying the Gap
This checkout breakdown shows the gap between token matching and intent understanding.
Success Rate: Token vs Intent Matching
Negation Failure
Token matching fails 90% of the time on negation because it treats "without" as just another word or noise.
Synonym Gap
Token matching misses 75% of relevant results when users use synonyms (e.g., "sofa" instead of "couch") that aren't in the product text.
Deep Dive: Tokens vs Entities
Solutions for Common Failures
The "Cheap" Problem
Removing the word "cheap" and applying a price filter.
The Negation Problem
Converting "without X" to rigid exclusion.
Precision vs Recall Strategy
A token-only search has high precision but low recall (misses synonyms). A semantic-only search has high recall but low precision (drifts topic). The industry standard is Dynamic Hybrid Retrieval.
Bridging the Gap
There are three main approaches to bridging the token-intent gap, each with trade-offs:
1. Synonym Expansion
Add known synonyms to the query. Simple and interpretable, but requires manual curation and can reduce precision if synonyms are too broad.
2. Semantic Search
Use embeddings to find semantically similar content. Automatic and handles unseen synonyms, but can over-generalize and is less interpretable.
3. Hybrid Approach
1. Token match (BM25) for precision
2. Semantic rerank for relevance
3. Fallback to semantic if needed
Combine both: use tokens for high-confidence matches, semantics for recall. This is the approach most production systems use.
Key Takeaways
Tokens vs Intent
The literal words typed are just a hint. You must infer the underlying goal.
Synonyms
Table stakes. You must handle "couch" vs "sofa" and "sneakers" vs "shoes".
Hard Problems
Negation ("without") and modifiers ("cheap", "best") require logic, not just matching.
Hybrid Wins
Use tokens for precision (exact match) and semantics (vectors) for recall.