Chapter 5.6Retrieval Optimization

Early Termination & WAND

How search engines find the top 10 results without scoring every document. The WAND algorithm is the secret weapon that makes sub-second search possible at scale.

When a query matches millions of documents, scoring them all is wasteful—users only see the top 10. This chapter reveals the algorithms that make search engines fast by safely skipping documents that provably cannot rank in the top-k results.

Speedup Factor

2-10x

Block-Max WAND vs exhaustive scoring on multi-term queries.

Documents Skipped

90%+

Typical skip rate for top-10 queries on large indices.

Block Size

128 docs

Standard block size in Lucene for cache-optimal skipping.

Prerequisites: This chapter builds on Execution Plans (query pipeline),BM25 (scoring), andBoolean Retrieval (posting lists).

1. The Problem: Why Score Everything?

Consider searching for "machine learning tutorial" in an index with 100 million documents. Boolean retrieval might find 2 million matching documents—but users only see the top 10. Scoring all 2 million with BM25 wastes 99.9995% of effort.

The traditional approach to ranked retrieval is called exhaustive scoring: for every document that matches the query, compute its full BM25 score, maintain a min-heap of the top k results, and return that heap when done. This approach is conceptually simple and guarantees correct results—but it scales terribly. A query matching 5 million documents requires 5 million score computations, even if you only want the top 10. Each BM25 computation involves fetching term frequencies, applying length normalization, and accumulating partial scores across terms. At scale, this becomes the dominant cost of query execution.

The insight that unlocks early termination is recognizing that score distributions are highly skewed. Most documents have mediocre scores that will never compete with the top results. If we could somehowpredict which documents have no chance of ranking highly, we could skip them entirely—saving enormous amounts of computation. This is exactly what algorithms like WAND, MaxScore, and Block-Max WAND do: they use upper bound scores to prove that certain documents cannot possibly make the top-k, allowing safe skipping without any risk of missing relevant results.

# The Exhaustive Approach (Slow)

Query: "machine learning tutorial"

Matches:2,000,000 documents

BM25 scores:2,000,000 computations

Returned:10 documents

Wasted:1,999,990 score calculations

Score Distribution:

Top 10

Could rank

No chance

~85% of documents have zero chance of making top-k

The Key Insight

If we can prove a document cannot score higher than the current 10th-best result, we can skip it entirely. This is the foundation of early termination algorithms. The challenge is: how do we prove a document's score is too low without actually computing the score? The answer lies in upper bound estimation.

2. Upper Bound Scores: The Foundation

The key to safe skipping is computing upper bounds: the maximum possible score a term can contribute to ANY document. If the sum of upper bounds for a document's terms is less than the current threshold, that document can be safely skipped.

Recall how BM25 works: the score for a document is the sum of per-term contributions, where each term's contribution depends on its term frequency (TF) in that document and its inverse document frequency (IDF) across the corpus. The IDF component is constant for a term across all documents, but the TF component varies. The key insight is that we can precompute the maximum possibleTF contribution for each term by finding which document has the highest TF for that term. This "max TF" combined with the IDF gives us an upper bound that no document can exceed.

For example, if the term "machine" appears at most 150 times in any single document (in a research paper about machine learning), then no document can score higher than BM25_TF(150) × IDF("machine") for this term. We precompute and store this upper bound for every term at index time. At query time, we sum the upper bounds for all query terms to get a ceiling on any document's possible score. If this ceiling is below the current k-th best score, we can skip the document without any further computation.

Computing Upper Bounds

For each term t in the index:

UB(t) = max_tf_score(t) × IDF(t)

# max_tf_score = BM25 TF component

# using the document with highest TF

This is precomputed at index time and stored with the posting list.

Using Upper Bounds

For query "fast algorithm":

UB("fast") = 3.2

UB("algorithm") = 4.8

Max possible score = 3.2 + 4.8 = 8.0

If current threshold is 8.5, this document cannot make top-k.

upper_bounds.py

def compute_upper_bound(term, index):

# Find document with highest TF for this term

max_tf = max(tf(term, doc) for doc in posting_list(term))

# Compute IDF (same for all docs)

idf = log(N / df(term))

# Upper bound = max TF component × IDF

return bm25_tf(max_tf) * idf

3. The WAND Algorithm

WAND (Weak AND) is the foundational algorithm for top-k retrieval with early termination. It cleverly uses upper bounds and a pivot mechanism to skip documents that cannot possibly rank in the top-k. The name "Weak AND" comes from its behavior: unlike strict AND (which requires all query terms to match), WAND is "weak" because it considers documents matching any subset of terms, but it's smart about which candidates to evaluate.

The algorithm maintains iterators over the posting lists of all query terms, always sorted by their current document ID. In each iteration, it accumulates upper bounds from left to right (smallest docID to largest) until the cumulative bound meets or exceeds the current score threshold. The document at this point is called the pivot. Documents before the pivot cannot possibly score high enough because even their maximum possible scores don't reach the threshold.

Once a pivot is found, WAND checks if all iterators up to and including the pivot point to the same document ID. If they do, that document is a "full candidate" and gets fully scored with BM25. If not, the iterators that lag behind are advanced to the pivot document ID, effectively skipping all documents in between. This skip-and-advance pattern is what makes WAND dramatically faster than exhaustive scoring.

WAND Core Concept

1. Sort Iterators

Sort posting list iterators by their current docID (ascending). This ordering is maintained throughout execution and is key to the algorithm's efficiency.

2. Find Pivot

Find smallest docID where cumulative UB ≥ threshold.

3. Skip or Score

If all iterators align at pivot → score it. Otherwise → advance lagging iterators.

# WAND Pivot Selection Example

Query: "machine learning" | Threshold θ = 7.5

Posting Lists (sorted by docID):

"machine"[3, 7, 12, 25, 31...]UB=5.2

"learning"[5, 12, 18, 25, 40...]UB=4.8

Pivot Calculation:

Just "machine": 5.2 < 7.5 ✗

"machine"+"learning": 5.2+4.8 = 10.0 ≥ 7.5 ✓

Pivot = doc 5 (first doc where UB sum qualifies)

Action:

"machine" points to doc 3, but pivot is doc 5 → Advance "machine" iterator to doc 5 (skips doc 3!)

wand_algorithm.py

def wand(query_terms, k):

iterators = [PostingIterator(t) for t in query_terms]

threshold = 0

top_k = MinHeap(capacity=k)

while not all_exhausted(iterators):

# Sort by current docID

iterators.sort(key=lambda it: it.doc_id())

# Find pivot: smallest doc where cumulative UB ≥ threshold

cumulative = 0

pivot_idx = -1

for i, it in enumerate(iterators):

cumulative += upper_bounds[it.term]

if cumulative >= threshold:

pivot_idx = i

break

if pivot_idx == -1:

break # No more candidates

pivot_doc = iterators[pivot_idx].doc_id()

# Check if all terms up to pivot point to same doc

if all(it.doc_id() == pivot_doc for it in iterators[:pivot_idx+1]):

# Full evaluation

score = compute_bm25(pivot_doc, query_terms)

if score > threshold:

top_k.push((score, pivot_doc))

if len(top_k) == k:

threshold = top_k.min_score()

advance_all(iterators[:pivot_idx+1])

else:

# Advance lagging iterators to pivot

for it in iterators[:pivot_idx]:

it.advance_to(pivot_doc)

return top_k.to_sorted_list()

4. Block-Max WAND: Tighter Bounds

Basic WAND uses global upper bounds—the max score across the entire posting list. This is often too loose because one outlier document with TF=1000 dominates the bound, while most documents have TF=1-5. Block-Max WAND solves this by storing per-block max scores.

The problem with global upper bounds becomes clear with a concrete example: imagine the term "python" appears 500 times in a tutorial document (which discusses Python extensively), but only 1-10 times in 99% of other matching documents. The global upper bound is based on that outlier tutorial, so it's unrealistically high for most documents. This means WAND's pivot calculation is too pessimistic—it assumes every document could potentially score as high as that tutorial, reducing skip opportunities.

Block-Max WAND (BMW) subdivides each posting list into blocks of typically 128 documents and precomputes the maximum score within each block. Now when WAND processes a block, it uses that block's local maximum instead of the global maximum. If a block's max score is below the threshold, the entire block (all 128 documents) can be skipped without any per-document evaluation. This is a massive win: instead of skipping individual documents, BMW skips thousands of documents at once by jumping over entire blocks.

Block-Max Index Structure

Posting List for "machine" (divided into 128-doc blocks):

Block 0:[docs 0-127]maxScore = 3.2

Block 1:[docs 128-255]maxScore = 2.1

Block 2:[docs 256-383]maxScore = 5.2← global max (outlier)

Block 3:[docs 384-511]maxScore = 1.8

Key insight: Block 3 can be entirely skipped if threshold > 1.8 + other terms' block maxes. With global UB (5.2), we'd have to check every document in Block 3.

Aspect	Basic WAND	Block-Max WAND
Upper Bound Granularity	Global (entire posting list)	Per-block (128 docs)
Skip Unit	Individual documents	Entire blocks
Index Overhead	1 float per term	1 float per block per term
Speedup (typical)	2-3x vs exhaustive	5-10x vs exhaustive

# Block-Max Skipping in Action

Threshold θ = 6.0|Query: "machine learning"

Block 0

maxScore: 3.2+4.1=7.3

✓ Evaluate

Block 1

maxScore: 2.1+3.2=5.3

✗ Skip

Block 2

maxScore: 5.2+4.8=10.0

✓ Evaluate

Block 3

maxScore: 1.8+2.9=4.7

✗ Skip

50% of blocks skipped entirely → major I/O savings

5. MaxScore: The Alternative Approach

MaxScore takes a different approach: instead of pivot-based skipping, it partitions query terms into essential and non-essential based on their upper bounds. Non-essential terms are only evaluated when a candidate is found via essential terms.

The core idea is elegant: sort query terms by their upper bounds and compute a running cumulative sum from lowest to highest. Terms whose cumulative upper bound is still below the threshold are "non-essential"— even if a document matched ALL of them and scored perfectly on each, it still couldn't reach the threshold. Only the remaining terms are "essential" and must be iterated. For a 10-term query, only 2-3 terms might be essential, reducing iteration overhead by 70-80%.

When MaxScore finds a candidate document via essential terms, it then checks the non-essential terms to compute the full score. The brilliance is that this two-phase approach avoids maintaining sorted iterators across all terms (as WAND does). For queries with many terms or when k is large (causing the threshold to stay low), MaxScore often outperforms WAND because its per-document overhead is lower. Modern Lucene versions use Block-Max MaxScore as the default for disjunctive queries.

MaxScore Term Partitioning

Query: "fast efficient algorithm search" | θ = 12.0

"fast"UB=2.1

"efficient"UB=1.8

"algorithm"UB=4.5

"search"UB=5.2

Partitioning (sorted by UB, cumulative):

Cumulative UB from low to high:

efficient (1.8) → 1.8 < 12.0

fast (2.1) → 3.9 < 12.0

algorithm (4.5) → 8.4 < 12.0

search (5.2) → 13.6 ≥ 12.0 ✓

Essential: "search" (must iterate)
Non-essential: "fast", "efficient", "algorithm" (only score if candidate found)

Use WAND When:

Few query terms (2-5): With fewer terms, the pivot calculation is cheaper and the sorted iterator approach is more efficient than partitioning.
Small result set (10-20): The threshold rises quickly when k is small, enabling aggressive skipping early in query processing.
AND-heavy queries: Documents must match all terms anyway, so WAND's pivot mechanism aligns naturally with the intersection requirement.

Use MaxScore When:

Many query terms (5+): Term partitioning becomes advantageous—most terms end up non-essential, reducing iteration overhead.
Large result set (100+): With more results needed, the threshold stays lower longer, making WAND's pivot less effective.
OR-heavy disjunctive queries: Documents matching any subset of terms qualify, and MaxScore excels at iterating only essential terms.

Lucene Evolution

Lucene 8 introduced Block-Max WAND. Lucene 9+ uses Block-Max MaxScore for top-level disjunctions, finding it has lower per-hit overhead for many-term queries.

6. Skip Pointers: Fast Advancement

WAND and MaxScore rely heavily on advance_to(target_doc)operations. Skip pointers are the data structure that makes these operations fast—allowing binary-search-like jumps within a posting list instead of linear scans.

A posting list without skip pointers is just a sequence of document IDs. To find document 1,000,000 you'd need to iterate through 999,999 entries first. Skip pointers solve this by storing "checkpoints" at regular intervals: every √N documents (where N is the posting list length), we record the document ID and its position. Now advancing to document 1,000,000 requires jumping through ~1,000 skip points then a short linear scan, instead of a million iterations.

Modern Lucene uses a two-level skip structure: coarse skips for large jumps and fine skips for precision. When WAND needs to advance all lagging iterators to the pivot document, each iterator uses its skip pointers to jump ahead efficiently. Combined with Block-Max indexing, skip pointers enable Lucene to advance through millions of documents in microseconds—a crucial capability for sub-100ms query latency on large corpora.

# Skip Pointers in a Posting List

Posting List: [3, 7, 12, 25, 31, 48, 55, 67, 82, 91, 105, 120, ...]

Skip Interval:√12 ≈ 3(every 3rd element)

Skip List Structure:

pos 0

→→→

pos 3

→→→

pos 6

→→→

pos 9

advance_to(60):

1. Check skips: 3 → 25 → 55 → 91 > 60
2. Go back to pos 6 (value 55)
3. Linear scan: 55 → 67 ≥ 60 → Found!

Cost: 4 skip checks + 2 linear steps = 6 ops (vs 10 linear scan)

Optimal Skip Interval: √P

For a posting list of length P, the optimal skip interval is √P. This balances:

Too few skips:

Long linear scans between skip points

Too many skips:

Overhead of checking many skip points

7. terminate_after: Simple Early Stop

Elasticsearch provides a simpler early termination mechanism: terminate_after. It stops collecting documents after a fixed count per shard—useful when approximate results are acceptable.

Unlike WAND and MaxScore (which guarantee returning the exact top-k highest-scoring documents),terminate_after makes no quality guarantees. It simply stops after evaluating a fixed number of documents per shard, returning whatever top-k results were found so far. The documents aren't evaluated in score order, so you might miss high-scoring documents that happen to be stored later in the index.

Despite this limitation, terminate_after shines in specific scenarios: exploratory searches where any reasonable results are acceptable, debugging queries to test if any documents match, or as a safety valve against runaway queries that match too many documents. It's essentially a "good enough" optimization trading precision for guaranteed latency bounds.

elasticsearch_query.json

{

"query": {

"match": { "content": "machine learning" }

"terminate_after": 1000,

"size": 10

}

Important Caveats

Per-shard limit: The limit applies independently to each shard. With 5 shards and terminate_after=1000, you may process up to 5000 documents total.
Incomplete results: You might miss relevant documents that happen to be stored after the cutoff point in each shard.
Sorting conflicts: If you're sorting by a field (not relevance), the best documents might be beyond the termination point and never evaluated.
Response flag: The response includes terminated_early: true so your application knows results may be incomplete.

Good Use Cases

Existence checks: When you just need to know "does any document match this query?" rather than finding the best matches.
Sampling queries: When you need a random sample of matching documents for analytics or testing purposes.
Filter-only queries: When there's no relevance ranking involved—all matching documents are equally valid.
Latency-critical paths: When returning "good enough" results quickly is more important than finding the absolute best matches.

8. Performance Benchmarks

Early termination effectiveness depends on query characteristics, index size, and score distribution. Here are typical benchmarks from Lucene and Elasticsearch deployments.

The key factor is score distribution skew: when scores vary widely (some documents clearly better than others), early termination works extremely well because the threshold rises quickly. The top-10 threshold might reach 90% of the maximum possible score after just evaluating 1% of documents, allowing the remaining 99% to be skipped. Conversely, when scores are flat (many documents scoring similarly), skipping opportunities are limited.

Query length also matters significantly. Longer queries (more OR terms) paradoxically perform betterwith early termination because each additional term creates more opportunities to accumulate high scores on a few documents while most documents only match one or two terms. A 5-term OR query might skip 90%+ of candidate documents, while a 2-term OR query achieves only 70% skip rate.

Query Type	Docs Evaluated	Skip Rate	Speedup
2-term OR	~30% of matches	70%	2-3x
3-term OR	~20% of matches	80%	3-5x
5-term OR	~10% of matches	90%	5-10x
Phrase query	~60% of matches	40%	1-2x
Single term	~50% of matches	50%	1.5-2x

When Early Termination Shines

Maximum benefit when: (1) many terms in query, (2) small k, (3) skewed score distribution (few high-scoring docs), and (4) large posting lists with good block-max variation.

Key Takeaways

Upper Bound Scores

Precompute the maximum possible score each term can contribute to ANY document. If the cumulative upper bound for a document is less than the current threshold, you can safely skip it without missing any top-k results.

Block-Max WAND

Instead of using global upper bounds (which can be skewed by rare outlier documents), store per-block (128 docs) maximum scores. This enables 2-10x speedup over basic WAND by allowing entire blocks to be skipped.

MaxScore Alternative

Partition query terms into essential and non-essential based on cumulative upper bounds. Only iterate through essential terms, lazily evaluating non-essential terms when candidates are found. Preferred for high-k or many-term OR queries.

Skip to Victory

Skip pointers within posting lists enable binary-search-like advancement instead of linear scans. Combined with block-max indexes, Lucene can jump over millions of documents that mathematically cannot qualify for top-k.

Execution Plans Caching at Retrieval Layer