Systems Atlas
Chapter 6.4Vector & Semantic Search

Vector Indexing Basics

Brute-force search is O(N×d) per query — impossibly slow at scale. This chapter covers the fundamental indexing strategies: IVF (Inverted File), PQ (Product Quantization), and IVF-PQ — including Voronoi boundaries, ADC vs SDC distance computation, and FAISS configuration.

Brute Force at 1B

~13 min

768B distance operations at 1B ops/sec per query. Completely unusable.

IVF Speedup

~100x

Search 10 of 1024 clusters = skip 99% of data with ~1% recall loss.

PQ Compression

32x

768-dim float32 (3072 B) → 96-byte PQ code. 1B vectors: 3 TB → 96 GB.

1. The Brute-Force Baseline

The simplest approach: compare the query against every vector in the database. This gives exact results but costs O(N × d) per query. At scale, this quickly becomes impractical.

brute_force.py
def brute_force_search(query_vec, all_vectors, k=10):
"""Compare query against EVERY vector. O(N × d) per query."""
distances = []
for i, vec in enumerate(all_vectors):
dist = cosine_distance(query_vec, vec)
distances.append((dist, i))
distances.sort()
return distances[:k]
N (docs)d (dims)Distance opsTimeVerdict
10K7687.68M~8ms✅ Fine
100K76876.8M~77ms⚠️ Borderline
1M768768M~768ms❌ Too slow
100M76876.8B~77s❌ Unusable
1B768768B~768s❌ Absurd
Three Strategies to Beat Brute Force

1. Partition the space → only search relevant regions (IVF). 2. Compress the vectors → smaller, faster distance computation (PQ). 3. Build a graph → navigate to neighbors efficiently (HNSW, Chapter 6.5).


2. IVF (Inverted File Index)

Partition the vector space into clusters (Voronoi cells) using k-means. Each cluster defines a Voronoi cell: the region of space closer to this centroid than any other. Every vector belongs to exactly one cell. At query time, only search the closest nprobe clusters out ofnlist total.

# Voronoi cells in 2D (conceptual)
C1
C2
C3
C4
ivf_search.py
def ivf_search(query_vec, centroids, inverted_lists, n_probe=10, k=10):
"""Speedup ≈ nlist / nprobe (e.g., 1024/10 = 100x)"""
# Find nearest centroids (fast: compare query to nlist centroids)
nearest = find_k_nearest(query_vec, centroids, k=n_probe)
# Search only in selected clusters (skip 99% of data)
candidates = []
for cluster_id in nearest:
for doc_id, vec in inverted_lists[cluster_id]:
dist = cosine_distance(query_vec, vec)
candidates.append((dist, doc_id))
candidates.sort()
return candidates[:k]

The Boundary Problem

The fundamental weakness of IVF: a query near a cluster boundary may have true nearest neighbors in an adjacent cell that isn't probed. With nprobe=1, you search only the nearest cluster and miss neighbors just across the boundary. Mitigation: higher nprobe (diminishing returns above ~5% of clusters), multi-probe hashing, or over-clustering (more clusters → smaller cells → fewer boundary issues).

Dataset SizeRecommended nlistRationale
<1M4 × √N (≈ 4K)Rule of thumb for small datasets
1M - 10M16 × √N (≈ 16K-50K)Finer partitioning
10M - 1B65,536 - 262,144Typical production values
>1B1M+Very fine partitioning

3. PQ (Product Quantization)

Compress vectors by splitting them into sub-vectors and quantizing each independently. A 768-dim vector (3072 bytes) is split into 96 sub-vectors of 8 dims each. For each 8-dim sub-space, learn 256 centroids via k-means. Replace each sub-vector with its nearest centroid ID (1 byte). Result: 3072 bytes → 96 bytes = 32x compression.

# PQ Compression Pipeline
1
Original Float32 Vector (768 dims, 3072 bytes)
[0.21, -0.04, 0.67...]
Sub-vec 1 (8 dims)
...
[0.15, 0.88, -0.32...]
Sub-vec 96 (8 dims)
Split into M=96 sub-vectors
2
Quantization (Match to nearest centroid)
ID: 42
...
ID: 199
Learn 256 centroids/sub-space via k-means. Replace vec with 1-byte ID
3
Final PQ Code (96 bytes total)
[42, 187, 5, 220, ..., 73, 18, 199]
Scalefloat32PQ (M=96)Savings
1M vectors3 GB96 MB31x
100M vectors300 GB9.6 GB31x
1B vectors3 TB96 GB31x

ADC vs SDC Distance Computation

ADC (Asymmetric) — Preferred

Query stays uncompressed. Pre-compute distance from each query sub-vector to all 256 centroids (M lookup tables). Then: per-vector distance = 96 table lookupsinstead of 768 multiplications. ~8x faster with higher accuracy.

Per-query pre-comp: O(M × 256 × d_sub) — negligible

SDC (Symmetric) — Faster but Less Accurate

Both query AND database vectors are quantized. Distances come from pre-calculated centroid-to-centroid tables (built once at index time). Faster for streaming but loses accuracy because the query is also compressed.

Pre-comp: M × 256 × 256 distances — one-time

OPQ: Optimized Product Quantization

Standard PQ splits vectors into contiguous blocks, but correlations between dimensions may span block boundaries. OPQ learns an optimal rotation of the vector space before quantization, reducing cross-dimension correlations. Result: 5-15% recall improvementat the cost of ~2x longer training time.


4. IVF-PQ: The Billion-Scale Combination

Combine IVF's cluster pruning with PQ's compression. The key insight: PQ is applied to residual vectors (vector - centroid), not the original vectors. Residuals are smaller in magnitude and more uniformly distributed, so PQ works better on them.

IVF-PQ Pipeline

Training Phase

1. k-means → nlist centroids (Voronoi partition)

2. Compute residual = vector - nearest_centroid

3. Train PQ codebooks on residuals

4. Store PQ-encoded residuals in inverted lists

Query Phase

1. Find nearest centroids (nprobe)

2. Compute query_residual = query - centroid

3. ADC search within cluster using residual PQ codes

4. Return top-k across all probed clusters

faiss_ivfpq.py
import faiss
nlist = 4096 # Number of Voronoi cells
M = 96 # Number of sub-vectors for PQ
nbits = 8 # Bits per sub-vector (256 centroids per sub-space)
d = 768 # Vector dimension
# Create quantizer + IVF-PQ index
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits)
index.train(training_vectors) # Learns centroids + PQ codebooks
index.add(all_vectors) # Assigns + PQ-encodes residuals
index.nprobe = 64 # Search 64 of 4096 clusters
distances, indices = index.search(query_vectors, k=10)

5. FAISS Index Decision Matrix

IndexMemory/Vec (768d)Recall@10Best For
Flat3,072 B100%Ground truth, <100K vecs
IVF-Flat3,072 B95-99%Medium datasets, high accuracy
IVF-PQ~96 B85-95%Billion-scale, memory-constrained
IVF-PQ+refine~96 B + disk95-98%Best accuracy/memory tradeoff
HNSW (Ch 6.5)~3,500 B98-99%High recall, RAM available
practical_config.py
# Small (<100K): Flat — don't overthink it
index = faiss.IndexFlatL2(768)
# Medium (100K-10M): IVF-Flat for accuracy
index = faiss.IndexIVFFlat(quantizer, 768, n_clusters)
# Large (10M-1B): IVF-PQ for memory efficiency
index = faiss.IndexIVFPQ(quantizer, 768, 65536, 96, 8)
# Billion-scale: IVF-PQ with OPQ preprocessing
index = faiss.index_factory(768, "OPQ96_768,IVF262144,PQ96")

Key Takeaways

01

Brute Force Is Exact but Unscalable

O(N×d) per query: fine for <100K vectors, borderline at 100K, and completely unusable above 1M. At 1B vectors with 768 dims, a single query takes ~768 seconds.

02

IVF: Partition → Search Subsets

K-means splits the vector space into Voronoi cells. At query time, only search nprobe cells out of nlist total → ~100x speedup. The boundary problem (nearest neighbor in adjacent cell) is mitigated by increasing nprobe.

03

PQ: 32x Compression via Subspace Quantization

Split 768-dim vector into 96 sub-vectors of 8 dims each. Learn 256 centroids per sub-space. Replace each sub-vector with a 1-byte centroid ID. Result: 3072 bytes → 96 bytes.

04

ADC Is Always Preferred over SDC

Asymmetric distance computation keeps the query at full float32 precision and only quantizes database vectors. The accuracy gain over SDC (both quantized) is significant and the per-query table computation is negligible.

05

IVF-PQ: The Billion-Scale Default

Combine IVF cluster pruning + PQ compression. Key: PQ is applied to residual vectors (vector - centroid), not originals. Residuals are smaller and more uniform → PQ works better. FAISS makes this production-ready.