Chapter 3.3: Indexing & Infrastructure
BKD Trees & DocValues
The inverted index excels at text, but struggles with numbers. Range queries, sorting, and aggregations require specialized data structures: BKD Trees and DocValues.
Where BKD & DocValues Fit in the Search Pipeline
In a modern search engine, different data structures serve different stages of the pipeline:
BKD answers "Which docs match this range?". DocValues answer "What value does this doc have?". They are complementary.
Why Inverted Index Fails for Range Queries
The inverted index is perfect for exact keyword matches, but it fails miserably at range queries. Imagine trying to find products priced between $100 and $200. In an inverted index, numbers are just terms. You would have to look up every single possible price point: "100.00", "100.01", "100.02"... up to "200.00". That creates thousands of independent dictionary lookups for a simple query. We need a structure that understands order.
Deep Dive: Inside the BKD Tree
1. What it Actually Stores
A BKD tree (Block KD Tree) is a space-partitioning index. Internally:
- Stores points grouped into fixed-size blocks (leaves).
- Each block has a bounding box (min/max per dimension).
- Internal nodes store only these bounding boxes to guide navigation.
2. How it is Built
Unlike databases that update rows, BKD trees are immutable per segment:
- All points for a field are collected in memory buffer.
- The space is recursively split (like a k-d tree) until leaves are small.
- Leaf blocks are flushed to disk sequentially.
3. How Range Queries Execute
For a query like price BETWEEN 1000 AND 2000:
- Start at the root node.
- Check node's bounding box against query range.
- If
node.max < 1000→ SKIP SUBTREE (Pruned) - If
node.min > 2000→ SKIP SUBTREE (Pruned) - Otherwise → Descend into children.
- If
- At leaf nodes, scan actual values and collect matching docIDs.
How BKD Trees Work
A BKD Tree (Block K-Dimensional Tree) solves this by organizing numbers into a hierarchy of blocks. It recursively cuts the number line in half. To find "Price > 200", we don't scan all numbers; we check the top-level block. If it covers $0-$200, we skip it entirely. If it covers $200-$500, we descend into it. This allows us to discard massive chunks of data instantly, achieving logarithmic O(log N) performance.
price > 200, the engine skips the entire left branch ($0-$200).Performance: The Numbers
Dataset: 100 million products with a price field
| Query | Inverted Index | BKD Tree |
|---|---|---|
| price = 100 | 5ms | 2ms |
| price > 100 | 5000ms (!) | 10ms |
| price BETWEEN 100 AND 200 | 500ms | 5ms |
| price BETWEEN 100 AND 100000 | Timeout | 15ms |
Mental Model: Performance Complexity
Why BKD is faster for ranges:
- Inverted Index: O(Scan Term List Size) ≈ O(N) for wide ranges.
- DocValues Scan: O(N) linear scan of all docs.
- BKD Tree Range: O(log N + matches) ≈ Tree Pruning.
DocValues: Columnar Storage for Sorting
Search engines have a split personality. For searching (filtering), they use the Inverted Index ("Which docs have term X?"). But for sorting and aggregating, they need the reverse: "What is the value of field Y for Doc Z?". Opening every JSON document to extract a price for sorting is too slow. DocValues solves this by storing field values in tight, column-oriented arrays on disk, optimized for sequential access.
Without DocValues (Slow)
With DocValues (Fast)
Deep Dive: Inside DocValues
DocValues are not just arrays. Internally, they are:
- Block-compressed: Stored in 16KB blocks compressed with LZ4.
- Delta-encoded: Stores the difference between values (smaller integers = fewer bits).
- Sequential: Laid out by DocID order for CPU-cache friendliness.
How BKD and DocValues Work Together
A typical query like "Sort electronics by price" uses both:
Used first to find the candidate DocIDs.
"Find all docs where category='electronics'".
Used second on the matching DocIDs to get values.
"For each match, load 'price' to sort".
If a text filter reduces results to a tiny set, the engine might skip BKD and just verify the numeric condition using DocValues (linear scan of small set).
Facets & Aggregations: Analytics Engine
A facet is not just a UI feature. It is an aggregation query (GROUP BY) over the index.
How Faceting Executes
"Find laptops" → [Doc 1, Doc 4, Doc 9]
Doc 1 → "Apple", Doc 4 → "Dell", Doc 9 → "Apple"
Apple: 2, Dell: 1
Categorical (Brand/Color)
Uses Ordinals. Strings are mapped to integers (0="Apple", 1="Dell") for fast counting arrays.
Numeric (Price/Date)
Uses BKD + DocValues. BKD quickly identifies bucket ranges, or DocValues scans values to bin them.
Why Faceting is Expensive
Faceting complexity is O(N) where N = number of query matches.
Calculating facets for "all products" (millions of docs) triggers a massive scan of DocValues.
Global Ordinals: The Memory Trap
Aggregating on text (keyword) fields comes with a hidden cost. Strings are heavy and slow to compare. To make aggregations fast, Elasticsearch replaces each unique string with a small integer ID (an "ordinal") at query time. This mapping table must be built and held in **Java Heap Memory**. If you have a high-cardinality field (like `user_id` with 100M unique values), this table can consume gigabytes of heap, potentially causing OutOfMemory crashes.
Heap Memory Warning
High-cardinality string aggregations can crash your cluster. Use composite aggregation for pagination or avoid aggregating on unique fields.
Elasticsearch Field Type Reference
Choosing the right field type is critical for performance. `integer` and `long` are standard, but specialized types like `keyword` and `scaled_float` offer significant advantages for specific use cases.
{
"mappings": {
"properties": {
"title": { "type": "text", "analyzer": "english" },
"category": { "type": "keyword" }, // Exact match, aggs
"price": { "type": "scaled_float", "scaling_factor": 100 },
"quantity": { "type": "integer" },
"location": { "type": "geo_point" } // 2D BKD tree
}
}
}| Type | Bytes | Use For |
|---|---|---|
| byte | 1 | Small enums (0-127) |
| integer | 4 | Counts, quantities |
| long | 8 | Timestamps, big IDs |
| float | 4 | Prices (has rounding errors!) |
| scaled_float | 4 | Money no rounding! |
Disabling DocValues (Space Optimization)
DocValues consume significant storage. If you never sort or aggregate on a field, you can disable them to save 30-50% storage.
{
"mappings": {
"properties": {
"description": {
"type": "text",
"doc_values": false // Can't sort/aggregate
}
}
}
}When to Disable
- • Text fields you'll NEVER sort on
- • Fields never used in aggregations
- • Saves ~30-50% storage per field
When NOT to Disable
- • Any field used in sorting
- • Fields used in aggregations
- • Fields used in script scoring
Geo Queries (BKD in 2D)
Geographic data isn't special; it's just a multi-dimensional BKD tree.
Multi-Dimensional BKD:
- Geo: Dim 1 = Lat, Dim 2 = Lon.
- Time+Price: Dim 1 = Timestamp, Dim 2 = Price.
// Mapping
{ "location": { "type": "geo_point" } }
// Document
{
"name": "Starbucks",
"location": { "lat": 37.7749, "lon": -122.4194 }
}Geographic space split recursively like a quadtree
Distance Query: "Within 5km"
Bounding Box Query
Storage & Memory Benchmarks
Numeric indexing is extremely efficient. Compared to text indices, BKD trees are tiny (often 5% of original data size) and fast. Here is a breakdown of typical performance numbers for a 100M document dataset.
| Operation | Time |
|---|---|
| BKD range query (price 100-200) | 5ms |
| DocValues sort (by price) | 50ms |
| Aggregation (avg price) | 100ms |
| Geo distance (within 5km) | 30ms |
| Geo bounding box | 10ms |
What BKD and DocValues Are Not Used For
Deletes, Updates, and Segments
BKD trees and DocValues are immutable per segment. This design choice has massive implications:
Deletes are Soft
Deleted documents are just marked in a "bitset" file. They are physically removed during segment merging.
Updates are Expensive
An update is actually Delete Old + Index New. This is why heavy update workloads churn CPU/Disk.
Merges are Heavy
Merging segments requires rebuilding the BKD tree and rewriting DocValues columns from scratch.
When Not to Use BKD or DocValues
- Do not use BKD for text fields: It handles numbers/geo. Text belongs in the Inverted Index.
- Do not rely on DocValues for filtering: Scanning billions of values is O(N). Always filter with BKD/Inverted Index first.
- Disable DocValues for fields that are never sorted, aggregated, or scripted to save disk space.
Unifying Mental Model
A modern search engine is a multi-index system. It doesn't put everything in one bucket. Instead, it maintains four distinct data structures in parallel for every document:
Key Takeaways
BKD Trees
Enable O(log N) range queries on numeric, date, and geo fields.
DocValues
Store data column-wise to enable fast sorting and aggregations.
Precision
Use 'scaled_float' for money to avoid floating-point precision issues.
Global Ordinals
High-cardinality aggregations on text fields consume massive heap memory.
Space Optimization
Disable DocValues on fields you never sort or aggregate to save 30-50% storage.