Systems Atlas

Chapter 1.4

Types of Search Systems

Not all search is the same. Different domains have different requirements.


E-commerce Search

Amazon, Flipkart, Shopify, Etsy

Characteristics

  • • Structured data (price, brand, size)
  • • Business logic heavy (inventory, margins)
  • • Faceted navigation
  • • Personalization

Real Problems

  • Variant explosion: Same shoe, 100 variants
  • "Cheap" problem: Popularity over price intent
  • Seller spam: 47 identical listings

Document / Enterprise Search

Notion, Confluence, SharePoint, Google Drive

Characteristics

  • • Unstructured text (docs, PDFs, wikis)
  • • Access control critical
  • • Recency matters
  • • Entity extraction

Real Problems

  • Permission explosion: 10M docs, user sees 1K
  • Version confusion: 5 versions of same doc
  • No click signal: How to measure success?

Web Search

Google, Bing, DuckDuckGo

Characteristics

  • • Massive scale (billions of pages)
  • • Crawling required
  • • Link analysis (PageRank)
  • • Spam detection

Real Problems

  • Freshness vs authority: News vs Wikipedia
  • Local intent: "pizza near me"
  • Zero-click: Featured snippets

Code Search

GitHub, Sourcegraph, Grep.app

Characteristics

  • • Syntax-aware
  • • Regex/pattern support
  • • Exact match important
  • • Cross-repository

Real Problems

  • Naming conventions: getUser vs get_user
  • Symbol vs text: Function vs comment
  • Monorepo scale: Billions of lines

Log / Observability Search

Splunk, ELK, Datadog, Loki

Characteristics

  • • Time-series data
  • • High volume (millions/second)
  • • Aggregations
  • • Retention policies

Real Problems

  • Ingestion during incidents: 10x volume
  • Hot/cold storage: Cost optimization
  • Full-text cost: 1PB = expensive

Media Search

YouTube, Spotify, Netflix, Pinterest

Characteristics

  • • Multimodal (text, image, audio, video)
  • • Content understanding (ML)
  • • Personalization heavy
  • • Engagement focus

Real Problems

  • Content understanding: What's IN the video?
  • Audio search: "That song that goes..."
  • Creator SEO abuse: Clickbait titles

Comparison Matrix

TypePersonalizationLatency SLAKey ChallengeScale Example
E-commerceMediumP99 < 100msBusiness logic500M products (Amazon)
DocumentLowP99 < 500msPermissions2B docs (Google Drive)
WebHighP99 < 200msSpam130T pages (Google)
CodeLowP99 < 500msTokenization200M repos (GitHub)
LogNoneP99 < 1sVolume1PB/day (Netflix)
MediaVery HighP99 < 200msContent understanding800M videos (YouTube)

Key Takeaways

01

E-commerce

Focus on structure usage (price/brand) and business logic (inventory/margins).

02

Document Search

Permissions and recency are the hardest problems. Recall is more important than precision.

03

Web Search

Scale (billions of pages) and spam detection are the unique challenges.

04

Media Search

Content understanding (ML on video/audio) is key for retrieval.