Early Termination & WAND
How search engines find the top 10 results without scoring every document. The WAND algorithm is the secret weapon that makes sub-second search possible at scale.
When a query matches millions of documents, scoring them all is wasteful—users only see the top 10. This chapter reveals the algorithms that make search engines fast by safely skipping documents that provably cannot rank in the top-k results.
2-10x
Block-Max WAND vs exhaustive scoring on multi-term queries.
90%+
Typical skip rate for top-10 queries on large indices.
128 docs
Standard block size in Lucene for cache-optimal skipping.
1. The Problem: Why Score Everything?
Consider searching for "machine learning tutorial" in an index with 100 million documents. Boolean retrieval might find 2 million matching documents—but users only see the top 10. Scoring all 2 million with BM25 wastes 99.9995% of effort.
The traditional approach to ranked retrieval is called exhaustive scoring: for every document that matches the query, compute its full BM25 score, maintain a min-heap of the top k results, and return that heap when done. This approach is conceptually simple and guarantees correct results—but it scales terribly. A query matching 5 million documents requires 5 million score computations, even if you only want the top 10. Each BM25 computation involves fetching term frequencies, applying length normalization, and accumulating partial scores across terms. At scale, this becomes the dominant cost of query execution.
The insight that unlocks early termination is recognizing that score distributions are highly skewed. Most documents have mediocre scores that will never compete with the top results. If we could somehowpredict which documents have no chance of ranking highly, we could skip them entirely—saving enormous amounts of computation. This is exactly what algorithms like WAND, MaxScore, and Block-Max WAND do: they use upper bound scores to prove that certain documents cannot possibly make the top-k, allowing safe skipping without any risk of missing relevant results.
If we can prove a document cannot score higher than the current 10th-best result, we can skip it entirely. This is the foundation of early termination algorithms. The challenge is: how do we prove a document's score is too low without actually computing the score? The answer lies in upper bound estimation.
2. Upper Bound Scores: The Foundation
The key to safe skipping is computing upper bounds: the maximum possible score a term can contribute to ANY document. If the sum of upper bounds for a document's terms is less than the current threshold, that document can be safely skipped.
Recall how BM25 works: the score for a document is the sum of per-term contributions, where each term's contribution depends on its term frequency (TF) in that document and its inverse document frequency (IDF) across the corpus. The IDF component is constant for a term across all documents, but the TF component varies. The key insight is that we can precompute the maximum possibleTF contribution for each term by finding which document has the highest TF for that term. This "max TF" combined with the IDF gives us an upper bound that no document can exceed.
For example, if the term "machine" appears at most 150 times in any single document (in a research paper about machine learning), then no document can score higher than BM25_TF(150) × IDF("machine") for this term. We precompute and store this upper bound for every term at index time. At query time, we sum the upper bounds for all query terms to get a ceiling on any document's possible score. If this ceiling is below the current k-th best score, we can skip the document without any further computation.
For each term t in the index:
This is precomputed at index time and stored with the posting list.
For query "fast algorithm":
If current threshold is 8.5, this document cannot make top-k.
3. The WAND Algorithm
WAND (Weak AND) is the foundational algorithm for top-k retrieval with early termination. It cleverly uses upper bounds and a pivot mechanism to skip documents that cannot possibly rank in the top-k. The name "Weak AND" comes from its behavior: unlike strict AND (which requires all query terms to match), WAND is "weak" because it considers documents matching any subset of terms, but it's smart about which candidates to evaluate.
The algorithm maintains iterators over the posting lists of all query terms, always sorted by their current document ID. In each iteration, it accumulates upper bounds from left to right (smallest docID to largest) until the cumulative bound meets or exceeds the current score threshold. The document at this point is called the pivot. Documents before the pivot cannot possibly score high enough because even their maximum possible scores don't reach the threshold.
Once a pivot is found, WAND checks if all iterators up to and including the pivot point to the same document ID. If they do, that document is a "full candidate" and gets fully scored with BM25. If not, the iterators that lag behind are advanced to the pivot document ID, effectively skipping all documents in between. This skip-and-advance pattern is what makes WAND dramatically faster than exhaustive scoring.
WAND Core Concept
Sort posting list iterators by their current docID (ascending). This ordering is maintained throughout execution and is key to the algorithm's efficiency.
Find smallest docID where cumulative UB ≥ threshold.
If all iterators align at pivot → score it. Otherwise → advance lagging iterators.
4. Block-Max WAND: Tighter Bounds
Basic WAND uses global upper bounds—the max score across the entire posting list. This is often too loose because one outlier document with TF=1000 dominates the bound, while most documents have TF=1-5. Block-Max WAND solves this by storing per-block max scores.
The problem with global upper bounds becomes clear with a concrete example: imagine the term "python" appears 500 times in a tutorial document (which discusses Python extensively), but only 1-10 times in 99% of other matching documents. The global upper bound is based on that outlier tutorial, so it's unrealistically high for most documents. This means WAND's pivot calculation is too pessimistic—it assumes every document could potentially score as high as that tutorial, reducing skip opportunities.
Block-Max WAND (BMW) subdivides each posting list into blocks of typically 128 documents and precomputes the maximum score within each block. Now when WAND processes a block, it uses that block's local maximum instead of the global maximum. If a block's max score is below the threshold, the entire block (all 128 documents) can be skipped without any per-document evaluation. This is a massive win: instead of skipping individual documents, BMW skips thousands of documents at once by jumping over entire blocks.
Block-Max Index Structure
Key insight: Block 3 can be entirely skipped if threshold > 1.8 + other terms' block maxes. With global UB (5.2), we'd have to check every document in Block 3.
| Aspect | Basic WAND | Block-Max WAND |
|---|---|---|
| Upper Bound Granularity | Global (entire posting list) | Per-block (128 docs) |
| Skip Unit | Individual documents | Entire blocks |
| Index Overhead | 1 float per term | 1 float per block per term |
| Speedup (typical) | 2-3x vs exhaustive | 5-10x vs exhaustive |
5. MaxScore: The Alternative Approach
MaxScore takes a different approach: instead of pivot-based skipping, it partitions query terms into essential and non-essential based on their upper bounds. Non-essential terms are only evaluated when a candidate is found via essential terms.
The core idea is elegant: sort query terms by their upper bounds and compute a running cumulative sum from lowest to highest. Terms whose cumulative upper bound is still below the threshold are "non-essential"— even if a document matched ALL of them and scored perfectly on each, it still couldn't reach the threshold. Only the remaining terms are "essential" and must be iterated. For a 10-term query, only 2-3 terms might be essential, reducing iteration overhead by 70-80%.
When MaxScore finds a candidate document via essential terms, it then checks the non-essential terms to compute the full score. The brilliance is that this two-phase approach avoids maintaining sorted iterators across all terms (as WAND does). For queries with many terms or when k is large (causing the threshold to stay low), MaxScore often outperforms WAND because its per-document overhead is lower. Modern Lucene versions use Block-Max MaxScore as the default for disjunctive queries.
MaxScore Term Partitioning
Non-essential: "fast", "efficient", "algorithm" (only score if candidate found)
- Few query terms (2-5): With fewer terms, the pivot calculation is cheaper and the sorted iterator approach is more efficient than partitioning.
- Small result set (10-20): The threshold rises quickly when k is small, enabling aggressive skipping early in query processing.
- AND-heavy queries: Documents must match all terms anyway, so WAND's pivot mechanism aligns naturally with the intersection requirement.
- Many query terms (5+): Term partitioning becomes advantageous—most terms end up non-essential, reducing iteration overhead.
- Large result set (100+): With more results needed, the threshold stays lower longer, making WAND's pivot less effective.
- OR-heavy disjunctive queries: Documents matching any subset of terms qualify, and MaxScore excels at iterating only essential terms.
Lucene 8 introduced Block-Max WAND. Lucene 9+ uses Block-Max MaxScore for top-level disjunctions, finding it has lower per-hit overhead for many-term queries.
6. Skip Pointers: Fast Advancement
WAND and MaxScore rely heavily on advance_to(target_doc)operations. Skip pointers are the data structure that makes these operations fast—allowing binary-search-like jumps within a posting list instead of linear scans.
A posting list without skip pointers is just a sequence of document IDs. To find document 1,000,000 you'd need to iterate through 999,999 entries first. Skip pointers solve this by storing "checkpoints" at regular intervals: every √N documents (where N is the posting list length), we record the document ID and its position. Now advancing to document 1,000,000 requires jumping through ~1,000 skip points then a short linear scan, instead of a million iterations.
Modern Lucene uses a two-level skip structure: coarse skips for large jumps and fine skips for precision. When WAND needs to advance all lagging iterators to the pivot document, each iterator uses its skip pointers to jump ahead efficiently. Combined with Block-Max indexing, skip pointers enable Lucene to advance through millions of documents in microseconds—a crucial capability for sub-100ms query latency on large corpora.
2. Go back to pos 6 (value 55)
3. Linear scan: 55 → 67 ≥ 60 → Found!
For a posting list of length P, the optimal skip interval is √P. This balances:
Long linear scans between skip points
Overhead of checking many skip points
7. terminate_after: Simple Early Stop
Elasticsearch provides a simpler early termination mechanism: terminate_after. It stops collecting documents after a fixed count per shard—useful when approximate results are acceptable.
Unlike WAND and MaxScore (which guarantee returning the exact top-k highest-scoring documents),terminate_after makes no quality guarantees. It simply stops after evaluating a fixed number of documents per shard, returning whatever top-k results were found so far. The documents aren't evaluated in score order, so you might miss high-scoring documents that happen to be stored later in the index.
Despite this limitation, terminate_after shines in specific scenarios: exploratory searches where any reasonable results are acceptable, debugging queries to test if any documents match, or as a safety valve against runaway queries that match too many documents. It's essentially a "good enough" optimization trading precision for guaranteed latency bounds.
- Per-shard limit: The limit applies independently to each shard. With 5 shards and terminate_after=1000, you may process up to 5000 documents total.
- Incomplete results: You might miss relevant documents that happen to be stored after the cutoff point in each shard.
- Sorting conflicts: If you're sorting by a field (not relevance), the best documents might be beyond the termination point and never evaluated.
- Response flag: The response includes
terminated_early: trueso your application knows results may be incomplete.
- Existence checks: When you just need to know "does any document match this query?" rather than finding the best matches.
- Sampling queries: When you need a random sample of matching documents for analytics or testing purposes.
- Filter-only queries: When there's no relevance ranking involved—all matching documents are equally valid.
- Latency-critical paths: When returning "good enough" results quickly is more important than finding the absolute best matches.
8. Performance Benchmarks
Early termination effectiveness depends on query characteristics, index size, and score distribution. Here are typical benchmarks from Lucene and Elasticsearch deployments.
The key factor is score distribution skew: when scores vary widely (some documents clearly better than others), early termination works extremely well because the threshold rises quickly. The top-10 threshold might reach 90% of the maximum possible score after just evaluating 1% of documents, allowing the remaining 99% to be skipped. Conversely, when scores are flat (many documents scoring similarly), skipping opportunities are limited.
Query length also matters significantly. Longer queries (more OR terms) paradoxically perform betterwith early termination because each additional term creates more opportunities to accumulate high scores on a few documents while most documents only match one or two terms. A 5-term OR query might skip 90%+ of candidate documents, while a 2-term OR query achieves only 70% skip rate.
| Query Type | Docs Evaluated | Skip Rate | Speedup |
|---|---|---|---|
| 2-term OR | ~30% of matches | 70% | 2-3x |
| 3-term OR | ~20% of matches | 80% | 3-5x |
| 5-term OR | ~10% of matches | 90% | 5-10x |
| Phrase query | ~60% of matches | 40% | 1-2x |
| Single term | ~50% of matches | 50% | 1.5-2x |
Maximum benefit when: (1) many terms in query, (2) small k, (3) skewed score distribution (few high-scoring docs), and (4) large posting lists with good block-max variation.
Key Takeaways
Upper Bound Scores
Precompute the maximum possible score each term can contribute to ANY document. If the cumulative upper bound for a document is less than the current threshold, you can safely skip it without missing any top-k results.
Block-Max WAND
Instead of using global upper bounds (which can be skewed by rare outlier documents), store per-block (128 docs) maximum scores. This enables 2-10x speedup over basic WAND by allowing entire blocks to be skipped.
MaxScore Alternative
Partition query terms into essential and non-essential based on cumulative upper bounds. Only iterate through essential terms, lazily evaluating non-essential terms when candidates are found. Preferred for high-k or many-term OR queries.
Skip to Victory
Skip pointers within posting lists enable binary-search-like advancement instead of linear scans. Combined with block-max indexes, Lucene can jump over millions of documents that mathematically cannot qualify for top-k.