Chapter 0.1

Who This Guide Is For

Engineers, ML practitioners, product managers, and founders who want to go beyond "using a search API" to understanding how search systems actually work.

Primary Audiences

Software Engineers (Backend/Platform)

Profile: 2-5 years experience building APIs and services. Knows databases, REST, microservices. Has "used" Elasticsearch/Algolia but doesn't understand the internals.

📍 Common situation:

"I followed a tutorial to set up Elasticsearch. It worked for 10K products. Now we have 10M and everything is slow. I don't know where to start."

What you'll learn:

Move from "consumer of search API" to "builder of search infrastructure"
Understand trade-offs: Why does ranking matter more than retrieval?
Debug production issues: Why is P99 latency spiking?
Schema design, sharding strategies, and reindexing without downtime

ML Engineers / Data Scientists

Profile: Strong in embeddings, LLMs, recommendation systems. Weak in systems engineering (distributed systems, caching, latency).

📍 Common situation:

"My BERT reranker has 0.85 NDCG offline. But when we deployed it, CTR didn't change. The team says it's 'too slow' but I don't understand what that means for search."

What you'll learn:

How to take a model from Jupyter notebook to production search
The full pipeline: Retrieval → Ranking → Serving
Feature stores, model serving latency, where ML fits in
Why retrieval is the bottleneck (can't rank what you don't retrieve)
Training embeddings on click data, dealing with position bias

Product Managers (Technical)

Profile: Owns the search experience for an e-commerce or SaaS product. Reports to leadership on search KPIs.

📍 Common situation:

"I asked the team to 'add synonyms' and they said it would take 3 sprints. Why? Also, why can't we just use ChatGPT for search?"

What you'll learn:

Vocabulary to communicate with engineering: recall, precision, P99
Why some improvements are 2-week projects and others are 6-month investments
Framework for prioritizing: relevance vs latency vs personalization
How to read search dashboards and identify opportunities
When to push back on "it's too hard" vs when to trust the team

Founders / CTOs

Profile: Building a product where search is core (marketplace, knowledge base, etc.). Need to make build-vs-buy decisions.

📍 Common situation:

"We started with Algolia but it's costing $10K/month. Should we migrate to Elasticsearch? Also, our engineer says we need a 'vector database' now. What even is that?"

What you'll learn:

When to use Algolia vs Elasticsearch vs Typesense vs build custom
What's the minimum viable search stack for a startup?
How search affects retention and revenue (with numbers)
How to hire for search roles, what to look for
Red flags: over-engineering vs under-investing

Prerequisites

✓ You should know

• Basic programming (Python, JavaScript, or similar)
• What an API is and how HTTP works
• What a database is (SQL or NoSQL)
• Basic data structures (arrays, hash maps)

○ Nice to have

• Experience with Elasticsearch, Solr, or Algolia
• Basic understanding of distributed systems
• Familiarity with ML concepts (embeddings)
• Production experience with high-traffic systems

Who This Is NOT For

✗Complete beginners You need basic programming skills first. Try freeCodeCamp or similar.
✗Academic IR researchers This is practical, not theoretical. We skip the math proofs.
✗Copy-paste coders You won't find "paste this YAML" tutorials here. We focus on understanding.
✗People looking for quick fixes Search is complex. This guide respects that complexity.

Time Investment

2-3 hrs

To understand the fundamentals (Ch 0-3)

10-15 hrs

To complete the core curriculum (Ch 0-14)

30+ hrs

For deep mastery with exercises (All chapters)

How to Read This Next: Problems This Solves