Chapter 5
Retrieval
The wide net. Retrieval is the stage where we select the best few thousand candidates from billions of documents. It prioritizes Recall (finding everything relevant) over Precision (ranking it perfectly). This chapter covers Inverted Indices, WAND, and Vector Search.
In This Chapter
5.1 Recall vs. Precision
Why the "Unrecoverable Error" dictates retrieval architecture.
5.2 Boolean Retrieval
AND, OR, NOT and bitset operations.
5.3 TF-IDF & BM25
The math of keyword relevance counting.
5.6 WAND Algorithm
How to assume 10k results without scoring 1B docs.
5.8 HNSW (Vector)
Approximate Nearest Neighbor search in high dimensions.
5.9 Hybrid Retrieval
Merging Keyword and Vector scores (RRF vs Linear).