Chapter 4.6

Freshness & Updates

The race against the refresh interval. Why your search engine is always living in the past (by at least 1 second), and why forcing it to be "real-time" might kill your cluster.

Expectation vs. Reality

Engineers coming from a relational database background (Postgres, MySQL) are used to ACID consistency. When you `COMMIT`, the data is there. Immediately. For everyone.

Search engines are different. They prioritize Read Throughput over Write Latency. To achieve millisecond search speeds across billions of documents, they cheat. They buffer writes in memory and only flush to disk periodically. This creates a fundamental disconnect between what users expect and what the system actually does.

User Model

T+0sClick "Save"

T+0.1sSearch

FOUND! (Expectation)

System Model

T+0s

Write to DB

T+0.5s

CDC Event → Kafka

T+1.0s

In-Memory Buffer
(Not Searchable)

T+2.0s

Refresh → Segment
(Now Searchable!)

The Near-Real-Time (NRT) Architecture

Why is there a delay? Because writing to disk is slow. Lucene cheats by writing to a memory buffer first. This buffer is durable (via Translog) but not searchable until a "Refresh" operation turns it into a Segment.

JVM Heap

Indexing Buffer

Doc A

Doc B

Fast Write

Not Searchable

REFRESH (1s)

Lucene Segment

Doc A

Doc B

Searchable

The Refresh Interval Trade-off

By default, refresh_interval: "1s". This is the heartbeat of your search engine.

Lower (e.g., 100ms): Near-instant results, but creates 10x more segments. This spikes CPU usage and triggers massive "merge storms" as the system tries to combine them.
Higher (e.g., 30s): Very efficient. Low CPU, fewer segments. But users won't see their updates for half a minute.

There is no free lunch.

The Lost Update Problem

In a distributed system, relying on "last write wins" is dangerous. If two users update a document at the same time, the slower request might overwrite the newer data.

The Race Condition

1. Admin reads Product A (Stock: 10)
2. Customer buys item (Stock → 9) [Writes to DB]
3. Admin saves stats update (Stock: 10) [Writes to DB]
Result: Stock reset to 10. Customer purchase lost.

Solution: Optimistic Concurrency Control (OCC).

Instead of locking the database (slow), we use versioning. Every document has a _seq_no (sequence number) and _primary_term.

When you write, you must pass the version you read. If the version on the server is higher than what you passed, the server rejects your write (409 Conflict), forcing you to re-read and retry.

// Safe Update with OCC

PUT /products/_doc/123?if_seq_no=34&if_primary_term=1

{
  "stock": 9
}

// If current seq_no is 35, returns 409 Conflict

Architecture Patterns for High Velocity

A common mistake is trying to tune a single index to handle everything. You want the deep textual relevance of a search engine, but the real-time updates of a database. If you force a massive index (e.g., 50GB) to refresh every 1 second just so price updates are live, you will kill your cluster with I/O overhead.

Instead, separate your data by its rate of change (velocity).

1. The Sidecar Index

Main Index:
Title, Description, Images.
Refresh: 30s
Sidecar:
Price, Stock, Availability.
Refresh: 1s

Query combines both at runtime (application join).

2. Real-Time API Fallback

Search provides the IDs, but the UI fetches the source of truth for display.

1. Search returns: ["id_123", "id_456"]

2. UI calls: GET /api/products?ids=123,456

3. Render with latest Price from Redis/SQL

War Story: The 30-Minute Earthquake

The Setup: A major news portal cached search results for 5 minutes (TTL) to save costs.

The Incident: A massive earthquake hit. Millions searched "earthquake".

The Failure: The first user searched at T+0s (0 results). This empty result was cached. For the next 5 minutes, 10 million users saw "No results found" while the homepage front story was... the earthquake.

LESSON LEARNED:

Never cache "Zero Results" for high-velocity terms.

Or better: Use event-driven invalidation (purge cache on 'publish' event) instead of time-based TTL.

Case Study: The Flash Sale

The ultimate stress test for search freshness is a "Flash Sale" or "Product Drop". Imagine 10,000 users competing for 100 units of a limited sneaker.

The Limit of NRT

At T+0s, inventory drops to 0 in the database. The search index still thinks stock is 100 for the next 1 second (Refresh Interval).

In that 1 second, 500 more users click "Add to Cart" because Search said "In Stock". All 500 requests hit the database, fail, and show error messages.Result: Poor UX, DB overload, and customer rage.

The Architecture Fix

1
Search (Discovery Only): Use Search ONLY to find the product ID. Do not trust its `stock` field for critical decisions.
2
UI (Real-time Overlay): On the Product Page, fire a direct GET /api/inventory/:id to the primary DB.
3
Graceful Degradation: If DB load is too high, assume "In Stock" but validate at Checkout (ultimate source of truth).

Advanced: Measuring & Scaling

1. Measuring the True Visibility Lag

You configured a 1s refresh, but under heavy indexing load (e.g., bulk backfill), Elasticsearch might intentionally skip refreshes to save CPU. You cannot trust the configuration. You must measure the reality using a "Canary" loop.

Step 1: Write Canary

POST /index/_doc/canary_123

{ "timestamp": "2024-01-01T12:00:00.000Z" }

Step 2: Poll Until Found

Loop every 100ms...

GET /index/_doc/canary_123

Step 3: Calculate Metric

Lag = TimeFound - TimeWritten

* If Lag > 5s, trigger PagerDuty alert.

2. The Lambda Architecture (Hybrid)

For massive social feeds (Twitter, LinkedIn), even a 1-second lag is unacceptable. Users expect to see their own post instantly. To achieve 0ms latency without killing the search cluster, we use a hybrid read-path.

Speed Layer

Redis / Memcached

Holds only the last 60 seconds of data. Fast, ephemeral, but expensive RAM.

Batch Layer

Search Index

Holds everything older than 60 seconds. Efficient, scalable, cheaper disk.

Application Logic:
return merge(speed_layer, batch_layer).dedupe()

Key Takeaways

Freshness is Expensive

Real-time search means high CPU. Every refresh creates a new segment file.

The 1-Second Gap

Data moves from Buffer -> Translog -> Segment. It is durable before it is searchable.

Concurrency Matters

Distributed writes race. Use Optimistic Concurrency Control (versioning) to prevent data loss.

Architecture Patterns

Split fast-moving data (stock/price) into separate 'Sidecar' indices from slow content.

Cleaning & Normalization Next: Deletes & Ghost Data