Chapter 3.5: Indexing & Infrastructure
Segments & Immutability
The golden rule of Lucene: once written, never modified. This design enables lock-free reads, perfect caching, and crash recovery.
What is a Segment?
A segment is an immutable, self-contained piece of the index. Every search query must check all segments, then merge the results. Unlike traditional databases where indexes are a single mutable file, Lucene splits its index into many segments that are created over time and periodically merged together.
Lucene Segment
- • Immutable: Never modified after creation
- • Self-contained: Has its own term dictionary, postings, stored fields
- • Searchable independently: Each segment can answer queries
- • Created on refresh: New segment every 1 second (default)
Traditional DB Index
- • Mutable: Updated in-place (B-tree pages)
- • Single structure: One index file with locks
- • Page-level writes: Complex crash recovery
- • Immediate visibility: But at cost of locking
Index Directory Structure
/data/indices/my_index/ ├── segments_5 ← Commit point (which segments are live) ├── _0.si ← Segment 0 info (metadata) ├── _0.cfs ← Compound file (all data) ├── _0.cfe ← Compound entries ├── _1.si ← Segment 1 info ├── _1_Lucene90_0.dvd ← Doc values ├── _1_Lucene90_0.dvm ← Doc values metadata ├── _1.fdx ← Stored fields index ├── _1.fdt ← Stored fields data ├── _1.liv ← Live docs bitmap (deleted = 0) └── write.lock ← Prevents concurrent writers
Index Approaches Compared
Different storage systems handle indexing differently. Lucene's segment-based approach is optimized for read-heavy, write-once workloads typical in search. Here's how it compares to other approaches.
| System | Index Type | Mutability | Best For |
|---|---|---|---|
| Lucene/ES | Inverted index + segments | Immutable | Full-text search, analytics |
| RocksDB | LSM-Tree + SST files | Immutable (SSTs) | Key-value, high write throughput |
| PostgreSQL | B-tree pages | Mutable (in-place) | OLTP, transactions |
| Cassandra SAI | Per-SSTable index | Immutable | Secondary indexes at scale |
Why Immutability?
Most databases update data "in-place", overwriting the old record with the new one. While intuitive, this approach is full of dangers: concurrency issues require complex locking, and a system crash during a write can leave the database corrupted. Lucene takes a radically different approach: segments are immutable. Once a file is written to disk, it is never changed. "Updating" a document means writing a new version to a new segment and marking the old one as deleted.
- • Requires locking during writes
- • Risk of corruption on crash
- • Cache invalidation complexity
- • No locks needed for reads
- • Perfect cache utilization
- • Crash recovery via translog
The Segment Lifecycle
How does a raw JSON document become a searchable immutable segment? It's a journey through memory and disk. First, documents land in an In-Memory Buffer (RAM). Every second (by default), a process called Refresh turns this buffer into a new small segment on disk. At this moment and only at this moment the document becomes visible to search. This "refresh" mechanism is why Elasticsearch is called "Near Real-Time" (NRT).
Segment Files & Formats
Each segment consists of multiple files, each storing a specific type of data. Understanding these files helps debug issues and optimize storage. The compound file format (.cfs) combines these into one file for efficiency on older filesystems.
| Extension | Name | Contents |
|---|---|---|
| segments_N | Commit Point | Lists all live segments in the index |
| .si | Segment Info | Metadata: doc count, codec, diagnostics |
| .cfs / .cfe | Compound File | Bundled segment data (reduces file handles) |
| .tim / .tip | Term Dictionary | All unique terms + pointers to postings |
| .doc / .pos | Postings | Doc IDs + positions for each term |
| .fdt / .fdx | Stored Fields | Original document content (for _source) |
| .liv | Live Docs | Bitmap of non-deleted documents |
| .dvd / .dvm | Doc Values | Columnar data for sorting/aggregations |
The Tombstone Tax (Deletes)
Because segments are immutable, deletes just mark documents as "dead" in a .liv file. The data remains on disk until merge reclaims it. This creates hidden costs that grow with your delete rate.
Deleted Docs Still Consume Resources
| Delete Ratio | Storage Overhead | Query Slowdown | Action Needed |
|---|---|---|---|
| 0-10% | Minimal | ~5% | Normal operation |
| 10-30% | Noticeable | 10-30% | Consider expunge_deletes |
| >30% | Severe | 50%+ | Force merge required |
The Segment Explosion Problem
If we create a new segment every second, after an hour we'll have 3,600 files. After a day, 86,400. Since every search query has to check every segment, performance degrades linearly with the number of segments. To solve this, Lucene runs Background Merges constantly picking small segments and merging them into larger ones (like 2048 game tiles).
| Segment Count | Query Latency | Memory Overhead | File Descriptors |
|---|---|---|---|
| 1-5 | 10ms (baseline) | Low | ~50 |
| 10-50 | 15ms (+50%) | Moderate | ~500 |
| 100-500 | 40ms (+300%) | High | ~5,000 |
| 1,000+ | 100ms+ (degraded) | Critical | Risk of exhaustion |
Symptoms of Too Many Segments
- • Query slowdown: Must check thousands of segments
- • File handle exhaustion: Linux default ~65,000
- • Memory pressure: Each segment has heap metadata
- • Merge storms: Catch-up merging consumes all I/O
Solutions
Shards & Segments
In a distributed Elasticsearch cluster, each shard is an independent Lucene index with its own segments. A query fans out to all shards, and each shard searches its own segments. This creates a multiplication effect: total_segments = shards × segments_per_shard.
Impact on Query Performance
- • More shards = more network overhead (coordinator gathers all results)
- • Segment count per shard adds to per-node latency
- • Hot spots occur when one shard has many more segments than others
- • Force-merge across shards helps maintain uniform performance
Practical Tuning Tips
Segment management requires balancing search latency, indexing throughput, and resource usage. Here are production-tested guidelines for different workloads.
Refresh Interval Tuning
- 1s (default): Real-time search, many small segments
- 30s: Good for logs/metrics, fewer segments
- 60s+: Near-batch workloads
- -1: Disabled bulk indexing only
Merge Policy Types
- TieredMergePolicy: Default, balances size tiers
- LogByteSizeMergePolicy: Older, less adaptive
- max_merged_segment_size: Cap at 5GB (default)
- segments_per_tier: 10 (default), lower = more merging
When to Force Merge
✓ Good Use Cases:
- • After bulk indexing is complete
- • Read-only/archived indices
- • Before taking a snapshot
✗ Avoid:
- • On actively written indices
- • During peak query times
- • On indices with ILM rollover
Real-World Use Cases
Understanding segment behavior in production helps you anticipate issues before they become problems. Here are common scenarios and their segment patterns.
🔥 High-Throughput Ingestion (10K docs/sec)
Fix: Set refresh_interval: "30s" during ingestion, force-merge after.
📊 Logs Index (Time-Series)
Key: Force-merge on rollover to warm tier.
📈 Typical Production Stats
Code & API Snippets
Here are the essential Elasticsearch APIs and settings for monitoring and managing segments.
Write Amplification: The Hidden Cost
Immutability has a hidden cost: write amplification. Because segments are never modified in place, the same data gets rewritten multiple times as it moves through the system first to the translog, then to a segment, then through multiple merge phases.
Key Takeaways
Segments are Immutable
Once written, never modified. This enables lock-free reads, perfect caching, and robust crash recovery.
Refresh = Visibility
Documents are only searchable after a 'refresh' writes them to a segment (default 1s). Costly for throughput.
The Merge Tax
Merging reclaims space from deleted docs but consumes I/O. Never force-merge actively written indices.
Write Amplification
Due to immutable segments and merging, a single document write results in 5-7x physical disk I/O.