Chapter 6.5Vector & Semantic Search

HNSW Deep Dive

Hierarchical Navigable Small World is the dominant graph-based ANN algorithm. It achieves O(log N) search by building a multi-layer graph where upper layers are express highways and lower layers are local streets. Used in Lucene/ES, Qdrant, Weaviate, and pgvector.

Search Latency

<1ms

Sub-millisecond latency for millions of vectors with 98-99% recall@10.

Layer Distribution

~63%

Of nodes only exist at Layer 0. ~1% reach Layer 4+. Geometric distribution.

Build Time (100M)

~1.5 hrs

100M vectors, M=16, ef_construction=200 on 48-core Xeon.

Prerequisite: This chapter builds on Chapter 6.4 (Vector Indexing Basics). Understand why brute force fails and how IVF/PQ work before diving into graph-based approaches.

1. From NSW to HNSW: The Skip List Insight

Before HNSW, Navigable Small World (NSW) graphs built a flat graph where each node connects to its approximate nearest neighbors. Search works by greedy traversal: start from any node, always move to the neighbor closest to the query. The "small world" property ensures short paths exist between any two nodes — but search complexity is O(N^(1/2)), not logarithmic.

HNSW's breakthrough was layering the graph like a probability skip list: Layer 0 contains all nodes with dense local connections. Higher layers contain exponentially fewer nodes with longer-range connections. Search starts at the top for coarse navigation and descends to the bottom for precision — achieving O(log N) search complexity.

# HNSW multi-layer structure (skip list analogy)

Layer 3:

(express highway)

Layer 2:

(regional roads)

Layer 1:

(local streets)

Layer 0:

(all nodes)

2. Construction Algorithm

Each new vector is assigned a maximum layer using a geometric distribution: floor(-ln(random()) × m_L) where m_L = 1/ln(M). Most nodes stay at Layer 0 (~63%), with exponentially fewer reaching higher layers (~23% at Layer 1, ~9% at Layer 2, ~1% at Layer 4+).

hnsw_insert.py

def insert(new_node, graph, M, ef_construction):

# 1. Assign random layer (geometric distribution)

new_layer = floor(-ln(random()) * m_L)

# 2. Greedy search from top to new_layer+1 (ef=1)

current = graph.entry_point

for layer in range(graph.max_layer, new_layer + 1, -1):

current = greedy_search(new_node, current, layer, ef=1)

# 3. For each layer from new_layer down to 0:

for layer in range(min(new_layer, graph.max_layer), -1, -1):

# Beam search with ef_construction width

candidates = beam_search(new_node, current, layer, ef=ef_construction)

# Select M diverse neighbors (heuristic)

neighbors = select_neighbors_heuristic(new_node, candidates, M)

# Add bidirectional edges + prune if needed

for neighbor in neighbors:

graph.add_edge(new_node, neighbor, layer)

graph.add_edge(neighbor, new_node, layer)

Neighbor Selection Heuristic

Simple nearest-neighbor selection creates clusters — nodes all connect to the same popular neighbors. HNSW uses a diversity heuristic: a candidate is added only if it's closer to the query than to any existing selected neighbor. This ensures neighbors aren't all clustered in the same direction, promoting diverse paths through the graph.

3. Search Algorithm

Search starts at the entry point and does a greedy search from the top layer down to Layer 1 (ef=1 — just find the single nearest neighbor). At Layer 0, switch to beam search with ef_search width to explore more candidates. Return the top K from the beam search results.

# Search trace example: finding neighbors of vector Q

Layer 3: Entry point A (dist=0.8)

→ A's neighbor D (dist=0.4) ← closer! Move to D

→ No improvement... descend with D

Layer 2: Start at D (dist=0.4)

→ D's neighbor C (dist=0.35) ← closer!

→ Descend with C

Layer 1: Start at C (dist=0.35)

→ Beam search with ef=64

→ Found: C(0.35), E(0.12), G(0.08), H(0.15), ...

Layer 0: Start at G (dist=0.08)

→ Beam search with ef=200

→ Results: [I(0.05), J(0.06), G(0.08), K(0.09), ...]

Operation	Complexity	Notes
Construction (per element)	O(log N)	Search + edge creation per layer
Construction (total)	O(N log N)	Insert N elements
Search	O(log N)	Layers = log N, each O(1) avg hops
Memory (vectors)	O(N × d)	Store all vectors
Memory (graph)	O(N × M)	M edges per node, ~4 bytes each

4. Key Parameters & Tuning

Parameter	Range	Effect	Memory
M	8-64 (default 16)	Max connections per node per layer	+8 bytes/node per M increase
ef_construction	64-512 (default 200)	Build-time beam width	No memory impact
ef_search	32-512	Query-time beam width (primary knob)	No memory impact
M_max	2×M	Max connections at Layer 0 (denser base)	Affects Layer 0 density

High Recall (>99%)

M=32-48, ef_construction=400, ef_search=200-500

Balanced (95-98%)

M=16, ef_construction=200, ef_search=64-128

Speed-Optimized (~90%)

M=8-12, ef_construction=100, ef_search=32-64

Memory Formula

# Memory calculation

Total = N × (d × sizeof(float) + M × 2 × sizeof(int))

Example: 10M vectors, 768-dim, M=16

Vectors: 10M × 768 × 4 bytes = 30.7 GB

Graph: 10M × 16 × 2 × 4 bytes = 1.28 GB

Total: ~32 GB

Example: 1B vectors, 768-dim, M=16

Vectors: 1B × 768 × 4 bytes = 3,072 GB (3 TB!)

Graph: 1B × 16 × 2 × 4 bytes = 128 GB

Total: ~3.2 TB ← Needs disk-backed approach

5. HNSW vs DiskANN

DiskANN (Microsoft, 2019) stores the Vamana graph on SSD instead of RAM, keeping only PQ codes in memory. This trades latency (5-20ms vs <1ms) for 15-50x lower memory cost at billion scale.

Aspect	HNSW	DiskANN
Storage	RAM (all layers)	SSD (graph + vectors), RAM (PQ codes only)
Memory/vector	~4 KB (768d + graph)	~200 bytes (PQ codes + metadata)
Query latency	<1ms	5-20ms (SSD I/O bound)
Recall@10	98-99%	95-98%
Updates	Online (insert/delete)	Batch (rebuild for changes)

Vectors	HNSW (RAM)	DiskANN (RAM + SSD)	Cost Ratio
10M	~40 GB RAM	~2 GB RAM + 40 GB SSD	8x
100M	~400 GB RAM	~20 GB RAM + 400 GB SSD	10x
1B	~3.2 TB RAM	~64 GB RAM + 3.2 TB SSD	15-50x

6. HNSW in Lucene / Elasticsearch

Lucene 9+ uses HNSW for kNN search. Unlike standalone HNSW (single graph), Lucene maintains per-segment HNSW graphs. When segments merge, the graph is rebuilt from scratch — this is the biggest operational challenge with vector search in Elasticsearch.

# Lucene segment architecture with HNSW

Segment 1

HNSW graph (500K)

Segment 2

HNSW graph (300K)

Segment 3

HNSW graph (200K)

↓

Query → search all 3 graphs independently → merge results → return top-K

After segment merge

Merged Segment

HNSW graph (1M vectors) ── REBUILT from scratch

⚠️ The Merge Problem

Graph rebuild: Hierarchical structure from individual segments can't be combined — rebuilt from scratch = O(N log N)
Memory spikes: 9M vectors with M=16 can allocate over 2.5 GB of heap during merge
Blocking: Long merges block new segment creation → indexing latency spikes

ES Setting	Default	Production Recommendation
m	16	16-32 (higher for better recall)
ef_construction	100	200-400 (worth the build cost)
num_candidates	varies	100-200 for top-10 queries
similarity	cosine	cosine for text, l2 for images

Key Takeaways

Skip List Analogy

HNSW layers work like a skip list: Layer 3 has express highways (few nodes), Layer 0 has local streets (all nodes). Search starts at the top for coarse navigation and zooms in at the bottom for precision.

O(log N) Search Complexity

Each layer roughly halves the search space. Greedy search descends from the top, then beam search at Layer 0 with ef_search width. 1B vectors needs only ~30 layers, each with O(1) average hops.

Three Key Parameters

M (max connections, default 16): affects recall and memory. ef_construction (build beam width, default 200): affects graph quality. ef_search (query beam width): the primary accuracy/speed knob at query time.

Memory Formula: 32 GB for 10M Vectors

Total = N × (d × 4 + M × 2 × 4) bytes. 10M × 768-dim × M=16 = 30.7 GB vectors + 1.28 GB graph = ~32 GB. At 1B: ~3.2 TB. This is why DiskANN exists.

Lucene Segment Merges Are the Operational Challenge

Lucene maintains per-segment HNSW graphs. When segments merge, the graph is rebuilt from scratch — O(N log N). Memory spikes during merge can hit 2.5 GB. Use larger initial segments and TieredMergePolicy.

6.4 Vector Indexing Basics 6.6 Latency vs Recall Tradeoffs