Chapter 4.7: Data Foundation
Deletes, Partial Updates & Reindexing
In Search, "Mutability" is a leaky abstraction. Why deleting data might actually increase your disk usage, and why updates burn CPU.
The Lie: "Delete"
When you send a DELETE request, the search engine lies to you. It says "200 OK", but nothing was removed. Because search segments are highly compressed and optimized for read speed, they are Immutable (Write-Once). Modifying a file on disk to remove data is impossible without corrupting the index.
Instead, Lucene maintains a parallel file called the Bitset (Live Docs). A delete is simply flipping a bit from 1 to 0. The document is still on disk, still in memory, and still being processed by your search queries it is merely "tombstoned" and filtered out at the very last step.
Lifecycle of a Deleted Doc
The Performance Tax
- Latency:Your query matches 1,000,000 documents. The engine calculates scores for ALL of them. Only at the very end does it check the Bitset to hide the 500,000 deleted ones.You pay CPU for ghosts.
- Heap:The Bitset must be loaded into JVM Heap for fast access. Heavy deletes = Heavy Heap pressure.
The Solution: Force Merge?
You can manually trigger `_forcemerge` to clean up, but be careful. It works like a "Garbage Collection" for disk extremely I/O intensive.
Partial Updates
"I just want to update the view count. Why is my CPU hitting 100%?"
Because specific fields cannot be modified in place. Lucene stores documents in Compressed Blocks (LZ4/Deflate). You cannot just "seek and overwrite" a few bytes. To change even a single counter, the engine must decompress the whole block, reconstruct the JSON, apply the change, and re-index the result as a new document. This turns a tiny update into a heavy Read-Modify-Write cycle.
Retrieve `_source` JSON from disk
Parse JSON + Apply Diff in Memory
Soft-delete old doc ID
Write NEW doc to buffer
_source to save disk space, you CANNOT use the Update API. You must provide the full document from your application side every time.Reindexing at Scale
Changing a data type (e.g., `string` → `date`) requires a full reindex because the Inverted Index is built once. You cannot "ALTER TABLE" on an inverted index. You must rebuild it from scratch. You have two choices: The way that causes downtime, or the way that doesn't.
🚨 DOWNTIME STARTS (Search returns 404)
Index exists but is empty.
Script running for 4 hours... users see 0 results.
✅ Live traffic matches v1 (Old Data)
Zero impact on users.
✅ Users instantly see new data. No 404s.
Production Tuning Guide
Never hold a connection open for long jobs. Fire asynchronously and poll the Task API (`GET /_tasks/task_id`) to check progress.
Parallelizes the reindex by splitting the work into sub-slices (usually equal to shard count). speeds up large jobs significantly.
Essential. Throttles the write rate to ensure the reindex job doesn't consume all I/O and CPU, starving live search traffic.
The Hidden Trap: Nested Objects
Why "Nested" is mostly a trap
Developers love `type: nested` because it preserves object relationships (e.g. `comments.author` linked to `comments.text`). But Lucene doesn't actually support "nested" objects. It pulls a sleight of hand.
{
"id": 1,
"title": "Blog Post",
"comments": [
{ "user": "Alice", "text": "Nice!" },
{ "user": "Bob", "text": "Cool!" }
]
}* Hidden "Shadow Documents" created for every single list item.
The Amplification Factor
To update 1 comment in a post with 50,000 comments:
Cost = Reindex Parent + Reindex ALL 50,000 Children
Result: Massive CPU burn and eventual cluster instability.
Key Takeaways
Deletes are Forever (Almost)
Deleted docs persist as 'tombstones' until a Segment Merge event. They consume heap (bitsets) and slow down search.
Partial Updates = Full Rewrites
Updating 1 byte requires retrieving the full JSON, parsing it, modifying it, and indexing a whole new document.
The _source tax
You cannot do partial updates if you disable the `_source` field to save disk. You must have the original JSON.
Reindex with Care
Use `slices` for parallel speed, but throttle `requests_per_second` to avoid taking down your primary cluster.