Vector Indexing Basics
Brute-force search is O(N×d) per query — impossibly slow at scale. This chapter covers the fundamental indexing strategies: IVF (Inverted File), PQ (Product Quantization), and IVF-PQ — including Voronoi boundaries, ADC vs SDC distance computation, and FAISS configuration.
~13 min
768B distance operations at 1B ops/sec per query. Completely unusable.
~100x
Search 10 of 1024 clusters = skip 99% of data with ~1% recall loss.
32x
768-dim float32 (3072 B) → 96-byte PQ code. 1B vectors: 3 TB → 96 GB.
1. The Brute-Force Baseline
The simplest approach: compare the query against every vector in the database. This gives exact results but costs O(N × d) per query. At scale, this quickly becomes impractical.
| N (docs) | d (dims) | Distance ops | Time | Verdict |
|---|---|---|---|---|
| 10K | 768 | 7.68M | ~8ms | ✅ Fine |
| 100K | 768 | 76.8M | ~77ms | ⚠️ Borderline |
| 1M | 768 | 768M | ~768ms | ❌ Too slow |
| 100M | 768 | 76.8B | ~77s | ❌ Unusable |
| 1B | 768 | 768B | ~768s | ❌ Absurd |
1. Partition the space → only search relevant regions (IVF). 2. Compress the vectors → smaller, faster distance computation (PQ). 3. Build a graph → navigate to neighbors efficiently (HNSW, Chapter 6.5).
2. IVF (Inverted File Index)
Partition the vector space into clusters (Voronoi cells) using k-means. Each cluster defines a Voronoi cell: the region of space closer to this centroid than any other. Every vector belongs to exactly one cell. At query time, only search the closest nprobe clusters out ofnlist total.
The Boundary Problem
The fundamental weakness of IVF: a query near a cluster boundary may have true nearest neighbors in an adjacent cell that isn't probed. With nprobe=1, you search only the nearest cluster and miss neighbors just across the boundary. Mitigation: higher nprobe (diminishing returns above ~5% of clusters), multi-probe hashing, or over-clustering (more clusters → smaller cells → fewer boundary issues).
| Dataset Size | Recommended nlist | Rationale |
|---|---|---|
| <1M | 4 × √N (≈ 4K) | Rule of thumb for small datasets |
| 1M - 10M | 16 × √N (≈ 16K-50K) | Finer partitioning |
| 10M - 1B | 65,536 - 262,144 | Typical production values |
| >1B | 1M+ | Very fine partitioning |
3. PQ (Product Quantization)
Compress vectors by splitting them into sub-vectors and quantizing each independently. A 768-dim vector (3072 bytes) is split into 96 sub-vectors of 8 dims each. For each 8-dim sub-space, learn 256 centroids via k-means. Replace each sub-vector with its nearest centroid ID (1 byte). Result: 3072 bytes → 96 bytes = 32x compression.
| Scale | float32 | PQ (M=96) | Savings |
|---|---|---|---|
| 1M vectors | 3 GB | 96 MB | 31x |
| 100M vectors | 300 GB | 9.6 GB | 31x |
| 1B vectors | 3 TB | 96 GB | 31x |
ADC vs SDC Distance Computation
ADC (Asymmetric) — Preferred
Query stays uncompressed. Pre-compute distance from each query sub-vector to all 256 centroids (M lookup tables). Then: per-vector distance = 96 table lookupsinstead of 768 multiplications. ~8x faster with higher accuracy.
Per-query pre-comp: O(M × 256 × d_sub) — negligible
SDC (Symmetric) — Faster but Less Accurate
Both query AND database vectors are quantized. Distances come from pre-calculated centroid-to-centroid tables (built once at index time). Faster for streaming but loses accuracy because the query is also compressed.
Pre-comp: M × 256 × 256 distances — one-time
OPQ: Optimized Product Quantization
Standard PQ splits vectors into contiguous blocks, but correlations between dimensions may span block boundaries. OPQ learns an optimal rotation of the vector space before quantization, reducing cross-dimension correlations. Result: 5-15% recall improvementat the cost of ~2x longer training time.
4. IVF-PQ: The Billion-Scale Combination
Combine IVF's cluster pruning with PQ's compression. The key insight: PQ is applied to residual vectors (vector - centroid), not the original vectors. Residuals are smaller in magnitude and more uniformly distributed, so PQ works better on them.
IVF-PQ Pipeline
1. k-means → nlist centroids (Voronoi partition)
2. Compute residual = vector - nearest_centroid
3. Train PQ codebooks on residuals
4. Store PQ-encoded residuals in inverted lists
1. Find nearest centroids (nprobe)
2. Compute query_residual = query - centroid
3. ADC search within cluster using residual PQ codes
4. Return top-k across all probed clusters
5. FAISS Index Decision Matrix
| Index | Memory/Vec (768d) | Recall@10 | Best For |
|---|---|---|---|
| Flat | 3,072 B | 100% | Ground truth, <100K vecs |
| IVF-Flat | 3,072 B | 95-99% | Medium datasets, high accuracy |
| IVF-PQ | ~96 B | 85-95% | Billion-scale, memory-constrained |
| IVF-PQ+refine | ~96 B + disk | 95-98% | Best accuracy/memory tradeoff |
| HNSW (Ch 6.5) | ~3,500 B | 98-99% | High recall, RAM available |
Key Takeaways
Brute Force Is Exact but Unscalable
O(N×d) per query: fine for <100K vectors, borderline at 100K, and completely unusable above 1M. At 1B vectors with 768 dims, a single query takes ~768 seconds.
IVF: Partition → Search Subsets
K-means splits the vector space into Voronoi cells. At query time, only search nprobe cells out of nlist total → ~100x speedup. The boundary problem (nearest neighbor in adjacent cell) is mitigated by increasing nprobe.
PQ: 32x Compression via Subspace Quantization
Split 768-dim vector into 96 sub-vectors of 8 dims each. Learn 256 centroids per sub-space. Replace each sub-vector with a 1-byte centroid ID. Result: 3072 bytes → 96 bytes.
ADC Is Always Preferred over SDC
Asymmetric distance computation keeps the query at full float32 precision and only quantizes database vectors. The accuracy gain over SDC (both quantized) is significant and the per-query table computation is negligible.
IVF-PQ: The Billion-Scale Default
Combine IVF cluster pruning + PQ compression. Key: PQ is applied to residual vectors (vector - centroid), not originals. Residuals are smaller and more uniform → PQ works better. FAISS makes this production-ready.