Systems Atlas
Chapter 6.5Vector & Semantic Search

HNSW Deep Dive

Hierarchical Navigable Small World is the dominant graph-based ANN algorithm. It achieves O(log N) search by building a multi-layer graph where upper layers are express highways and lower layers are local streets. Used in Lucene/ES, Qdrant, Weaviate, and pgvector.

Search Latency

<1ms

Sub-millisecond latency for millions of vectors with 98-99% recall@10.

Layer Distribution

~63%

Of nodes only exist at Layer 0. ~1% reach Layer 4+. Geometric distribution.

Build Time (100M)

~1.5 hrs

100M vectors, M=16, ef_construction=200 on 48-core Xeon.

Prerequisite: This chapter builds on Chapter 6.4 (Vector Indexing Basics). Understand why brute force fails and how IVF/PQ work before diving into graph-based approaches.

1. From NSW to HNSW: The Skip List Insight

Before HNSW, Navigable Small World (NSW) graphs built a flat graph where each node connects to its approximate nearest neighbors. Search works by greedy traversal: start from any node, always move to the neighbor closest to the query. The "small world" property ensures short paths exist between any two nodes — but search complexity is O(N^(1/2)), not logarithmic.

HNSW's breakthrough was layering the graph like a probability skip list: Layer 0 contains all nodes with dense local connections. Higher layers contain exponentially fewer nodes with longer-range connections. Search starts at the top for coarse navigation and descends to the bottom for precision — achieving O(log N) search complexity.

# HNSW multi-layer structure (skip list analogy)
Layer 3:
A
D
(express highway)
Layer 2:
A
C
D
F
(regional roads)
Layer 1:
A
B
C
D
E
F
G
(local streets)
Layer 0:
A
B
C
D
E
F
G
H
I
J
(all nodes)

2. Construction Algorithm

Each new vector is assigned a maximum layer using a geometric distribution: floor(-ln(random()) × m_L) where m_L = 1/ln(M). Most nodes stay at Layer 0 (~63%), with exponentially fewer reaching higher layers (~23% at Layer 1, ~9% at Layer 2, ~1% at Layer 4+).

hnsw_insert.py
def insert(new_node, graph, M, ef_construction):
# 1. Assign random layer (geometric distribution)
new_layer = floor(-ln(random()) * m_L)
# 2. Greedy search from top to new_layer+1 (ef=1)
current = graph.entry_point
for layer in range(graph.max_layer, new_layer + 1, -1):
current = greedy_search(new_node, current, layer, ef=1)
# 3. For each layer from new_layer down to 0:
for layer in range(min(new_layer, graph.max_layer), -1, -1):
# Beam search with ef_construction width
candidates = beam_search(new_node, current, layer, ef=ef_construction)
# Select M diverse neighbors (heuristic)
neighbors = select_neighbors_heuristic(new_node, candidates, M)
# Add bidirectional edges + prune if needed
for neighbor in neighbors:
graph.add_edge(new_node, neighbor, layer)
graph.add_edge(neighbor, new_node, layer)

Neighbor Selection Heuristic

Simple nearest-neighbor selection creates clusters — nodes all connect to the same popular neighbors. HNSW uses a diversity heuristic: a candidate is added only if it's closer to the query than to any existing selected neighbor. This ensures neighbors aren't all clustered in the same direction, promoting diverse paths through the graph.


3. Search Algorithm

Search starts at the entry point and does a greedy search from the top layer down to Layer 1 (ef=1 — just find the single nearest neighbor). At Layer 0, switch to beam search with ef_search width to explore more candidates. Return the top K from the beam search results.

# Search trace example: finding neighbors of vector Q
Layer 3: Entry point A (dist=0.8)
→ A's neighbor D (dist=0.4) ← closer! Move to D
→ No improvement... descend with D
Layer 2: Start at D (dist=0.4)
→ D's neighbor C (dist=0.35) ← closer!
Descend with C
Layer 1: Start at C (dist=0.35)
→ Beam search with ef=64
→ Found: C(0.35), E(0.12), G(0.08), H(0.15), ...
Layer 0: Start at G (dist=0.08)
→ Beam search with ef=200
Results: [I(0.05), J(0.06), G(0.08), K(0.09), ...]
OperationComplexityNotes
Construction (per element)O(log N)Search + edge creation per layer
Construction (total)O(N log N)Insert N elements
SearchO(log N)Layers = log N, each O(1) avg hops
Memory (vectors)O(N × d)Store all vectors
Memory (graph)O(N × M)M edges per node, ~4 bytes each

4. Key Parameters & Tuning

ParameterRangeEffectMemory
M8-64 (default 16)Max connections per node per layer+8 bytes/node per M increase
ef_construction64-512 (default 200)Build-time beam widthNo memory impact
ef_search32-512Query-time beam width (primary knob)No memory impact
M_max2×MMax connections at Layer 0 (denser base)Affects Layer 0 density
High Recall (>99%)

M=32-48, ef_construction=400, ef_search=200-500

Balanced (95-98%)

M=16, ef_construction=200, ef_search=64-128

Speed-Optimized (~90%)

M=8-12, ef_construction=100, ef_search=32-64

Memory Formula

# Memory calculation
Total = N × (d × sizeof(float) + M × 2 × sizeof(int))
Example: 10M vectors, 768-dim, M=16
Vectors: 10M × 768 × 4 bytes = 30.7 GB
Graph: 10M × 16 × 2 × 4 bytes = 1.28 GB
Total: ~32 GB
Example: 1B vectors, 768-dim, M=16
Vectors: 1B × 768 × 4 bytes = 3,072 GB (3 TB!)
Graph: 1B × 16 × 2 × 4 bytes = 128 GB
Total: ~3.2 TB ← Needs disk-backed approach

5. HNSW vs DiskANN

DiskANN (Microsoft, 2019) stores the Vamana graph on SSD instead of RAM, keeping only PQ codes in memory. This trades latency (5-20ms vs <1ms) for 15-50x lower memory cost at billion scale.

AspectHNSWDiskANN
StorageRAM (all layers)SSD (graph + vectors), RAM (PQ codes only)
Memory/vector~4 KB (768d + graph)~200 bytes (PQ codes + metadata)
Query latency<1ms5-20ms (SSD I/O bound)
Recall@1098-99%95-98%
UpdatesOnline (insert/delete)Batch (rebuild for changes)
VectorsHNSW (RAM)DiskANN (RAM + SSD)Cost Ratio
10M~40 GB RAM~2 GB RAM + 40 GB SSD8x
100M~400 GB RAM~20 GB RAM + 400 GB SSD10x
1B~3.2 TB RAM~64 GB RAM + 3.2 TB SSD15-50x

6. HNSW in Lucene / Elasticsearch

Lucene 9+ uses HNSW for kNN search. Unlike standalone HNSW (single graph), Lucene maintains per-segment HNSW graphs. When segments merge, the graph is rebuilt from scratch — this is the biggest operational challenge with vector search in Elasticsearch.

# Lucene segment architecture with HNSW
Segment 1
HNSW graph (500K)
Segment 2
HNSW graph (300K)
Segment 3
HNSW graph (200K)
Query → search all 3 graphs independently → merge results → return top-K
After segment merge
Merged Segment
HNSW graph (1M vectors) ── REBUILT from scratch
⚠️ The Merge Problem
  • Graph rebuild: Hierarchical structure from individual segments can't be combined — rebuilt from scratch = O(N log N)
  • Memory spikes: 9M vectors with M=16 can allocate over 2.5 GB of heap during merge
  • Blocking: Long merges block new segment creation → indexing latency spikes
ES SettingDefaultProduction Recommendation
m1616-32 (higher for better recall)
ef_construction100200-400 (worth the build cost)
num_candidatesvaries100-200 for top-10 queries
similaritycosinecosine for text, l2 for images

Key Takeaways

01

Skip List Analogy

HNSW layers work like a skip list: Layer 3 has express highways (few nodes), Layer 0 has local streets (all nodes). Search starts at the top for coarse navigation and zooms in at the bottom for precision.

02

O(log N) Search Complexity

Each layer roughly halves the search space. Greedy search descends from the top, then beam search at Layer 0 with ef_search width. 1B vectors needs only ~30 layers, each with O(1) average hops.

03

Three Key Parameters

M (max connections, default 16): affects recall and memory. ef_construction (build beam width, default 200): affects graph quality. ef_search (query beam width): the primary accuracy/speed knob at query time.

04

Memory Formula: 32 GB for 10M Vectors

Total = N × (d × 4 + M × 2 × 4) bytes. 10M × 768-dim × M=16 = 30.7 GB vectors + 1.28 GB graph = ~32 GB. At 1B: ~3.2 TB. This is why DiskANN exists.

05

Lucene Segment Merges Are the Operational Challenge

Lucene maintains per-segment HNSW graphs. When segments merge, the graph is rebuilt from scratch — O(N log N). Memory spikes during merge can hit 2.5 GB. Use larger initial segments and TieredMergePolicy.