Chapter 0.1
Who This Guide Is For
Engineers, ML practitioners, product managers, and founders who want to go beyond "using a search API" to understanding how search systems actually work.
Primary Audiences
Software Engineers (Backend/Platform)
Profile: 2-5 years experience building APIs and services. Knows databases, REST, microservices. Has "used" Elasticsearch/Algolia but doesn't understand the internals.
📍 Common situation:
"I followed a tutorial to set up Elasticsearch. It worked for 10K products. Now we have 10M and everything is slow. I don't know where to start."
What you'll learn:
- Move from "consumer of search API" to "builder of search infrastructure"
- Understand trade-offs: Why does ranking matter more than retrieval?
- Debug production issues: Why is P99 latency spiking?
- Schema design, sharding strategies, and reindexing without downtime
ML Engineers / Data Scientists
Profile: Strong in embeddings, LLMs, recommendation systems. Weak in systems engineering (distributed systems, caching, latency).
📍 Common situation:
"My BERT reranker has 0.85 NDCG offline. But when we deployed it, CTR didn't change. The team says it's 'too slow' but I don't understand what that means for search."
What you'll learn:
- How to take a model from Jupyter notebook to production search
- The full pipeline: Retrieval → Ranking → Serving
- Feature stores, model serving latency, where ML fits in
- Why retrieval is the bottleneck (can't rank what you don't retrieve)
- Training embeddings on click data, dealing with position bias
Product Managers (Technical)
Profile: Owns the search experience for an e-commerce or SaaS product. Reports to leadership on search KPIs.
📍 Common situation:
"I asked the team to 'add synonyms' and they said it would take 3 sprints. Why? Also, why can't we just use ChatGPT for search?"
What you'll learn:
- Vocabulary to communicate with engineering: recall, precision, P99
- Why some improvements are 2-week projects and others are 6-month investments
- Framework for prioritizing: relevance vs latency vs personalization
- How to read search dashboards and identify opportunities
- When to push back on "it's too hard" vs when to trust the team
Founders / CTOs
Profile: Building a product where search is core (marketplace, knowledge base, etc.). Need to make build-vs-buy decisions.
📍 Common situation:
"We started with Algolia but it's costing $10K/month. Should we migrate to Elasticsearch? Also, our engineer says we need a 'vector database' now. What even is that?"
What you'll learn:
- When to use Algolia vs Elasticsearch vs Typesense vs build custom
- What's the minimum viable search stack for a startup?
- How search affects retention and revenue (with numbers)
- How to hire for search roles, what to look for
- Red flags: over-engineering vs under-investing
Prerequisites
✓ You should know
- • Basic programming (Python, JavaScript, or similar)
- • What an API is and how HTTP works
- • What a database is (SQL or NoSQL)
- • Basic data structures (arrays, hash maps)
○ Nice to have
- • Experience with Elasticsearch, Solr, or Algolia
- • Basic understanding of distributed systems
- • Familiarity with ML concepts (embeddings)
- • Production experience with high-traffic systems
Who This Is NOT For
- ✗Complete beginners You need basic programming skills first. Try freeCodeCamp or similar.
- ✗Academic IR researchers This is practical, not theoretical. We skip the math proofs.
- ✗Copy-paste coders You won't find "paste this YAML" tutorials here. We focus on understanding.
- ✗People looking for quick fixes Search is complex. This guide respects that complexity.
Time Investment
2-3 hrs
To understand the fundamentals (Ch 0-3)
10-15 hrs
To complete the core curriculum (Ch 0-14)
30+ hrs
For deep mastery with exercises (All chapters)