Chapter 1.4
Types of Search Systems
Not all search is the same. Different domains have different requirements.
E-commerce Search
Amazon, Flipkart, Shopify, Etsy
Characteristics
- • Structured data (price, brand, size)
- • Business logic heavy (inventory, margins)
- • Faceted navigation
- • Personalization
Real Problems
- • Variant explosion: Same shoe, 100 variants
- • "Cheap" problem: Popularity over price intent
- • Seller spam: 47 identical listings
Document / Enterprise Search
Notion, Confluence, SharePoint, Google Drive
Characteristics
- • Unstructured text (docs, PDFs, wikis)
- • Access control critical
- • Recency matters
- • Entity extraction
Real Problems
- • Permission explosion: 10M docs, user sees 1K
- • Version confusion: 5 versions of same doc
- • No click signal: How to measure success?
Web Search
Google, Bing, DuckDuckGo
Characteristics
- • Massive scale (billions of pages)
- • Crawling required
- • Link analysis (PageRank)
- • Spam detection
Real Problems
- • Freshness vs authority: News vs Wikipedia
- • Local intent: "pizza near me"
- • Zero-click: Featured snippets
Code Search
GitHub, Sourcegraph, Grep.app
Characteristics
- • Syntax-aware
- • Regex/pattern support
- • Exact match important
- • Cross-repository
Real Problems
- • Naming conventions: getUser vs get_user
- • Symbol vs text: Function vs comment
- • Monorepo scale: Billions of lines
Log / Observability Search
Splunk, ELK, Datadog, Loki
Characteristics
- • Time-series data
- • High volume (millions/second)
- • Aggregations
- • Retention policies
Real Problems
- • Ingestion during incidents: 10x volume
- • Hot/cold storage: Cost optimization
- • Full-text cost: 1PB = expensive
Media Search
YouTube, Spotify, Netflix, Pinterest
Characteristics
- • Multimodal (text, image, audio, video)
- • Content understanding (ML)
- • Personalization heavy
- • Engagement focus
Real Problems
- • Content understanding: What's IN the video?
- • Audio search: "That song that goes..."
- • Creator SEO abuse: Clickbait titles
Comparison Matrix
| Type | Personalization | Latency SLA | Key Challenge | Scale Example |
|---|---|---|---|---|
| E-commerce | Medium | P99 < 100ms | Business logic | 500M products (Amazon) |
| Document | Low | P99 < 500ms | Permissions | 2B docs (Google Drive) |
| Web | High | P99 < 200ms | Spam | 130T pages (Google) |
| Code | Low | P99 < 500ms | Tokenization | 200M repos (GitHub) |
| Log | None | P99 < 1s | Volume | 1PB/day (Netflix) |
| Media | Very High | P99 < 200ms | Content understanding | 800M videos (YouTube) |
Key Takeaways
01
E-commerce
Focus on structure usage (price/brand) and business logic (inventory/margins).
02
Document Search
Permissions and recency are the hardest problems. Recall is more important than precision.
03
Web Search
Scale (billions of pages) and spam detection are the unique challenges.
04
Media Search
Content understanding (ML on video/audio) is key for retrieval.