Chapter 6.11Vector & Semantic Search

Vector Database Comparison

The landscape includes purpose-built databases (Pinecone, Qdrant, Milvus, Weaviate), extensions of existing databases (pgvector), and search engines with vector capabilities (Elasticsearch). Each makes fundamentally different tradeoffs in architecture, performance, and operational complexity.

# The Decision Spectrum: Integration ← → Specialization

pgvector

Simplest ops

Elasticsearch

Familiar

Weaviate / Qdrant

Balanced

Pinecone / Milvus

Fastest at scale

pgvector (PostgreSQL Extension)

If you already run PostgreSQL, pgvector is the path of least resistance. One CREATE EXTENSION vector and you have a working vector column type with HNSW indexing, cosine/L2/inner product distance operators, and the ability to combine vector similarity with SQL WHERE clauses, JOINs, and aggregations in a single query. No new infrastructure, no new operational runbook, no new backup strategy. Your vectors live alongside your relational data with ACID guarantees.

The limitation is performance at scale. pgvector runs inside the PostgreSQL process, sharing memory with your OLTP workload. HNSW index builds are single-threaded and lock the table. At >5M vectors, purpose-built databases outperform it by 2-5x on QPS. pgvector also lacks advanced features like product quantization, tiered storage, or GPU-accelerated search. But for <5M vectors, none of that matters — and the SQL integration is unbeatable.

pgvector_example.sql

CREATE TABLE products (

id SERIAL PRIMARY KEY,

name TEXT, price DECIMAL,

embedding VECTOR(768)

);

-- Create HNSW index

CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops)

WITH (m = 16, ef_construction = 200);

-- Vector search with SQL filters — all in one query!

SELECT name, price, 1 - (embedding <=> query_vec) AS similarity

FROM products

WHERE category = 'electronics' AND price < 500

ORDER BY embedding <=> query_vec LIMIT 10;

Strengths

Zero additional infrastructure — one CREATE EXTENSION
ACID transactional consistency with relational data
Rich SQL filtering, joins, aggregations

Limitations

Performance ceiling at >5M vectors
Resource contention with OLTP workload
No PQ, tiered storage, or GPU search

Elasticsearch (with kNN)

Elasticsearch added dense vector fields and HNSW-based kNN search in version 8.0. The killer feature is native hybrid search: a single query can combine BM25 text scoring with kNN vector similarity using Reciprocal Rank Fusion (RRF) — no external orchestration needed. If you already run Elasticsearch for text search, adding vector capabilities to existing indices is straightforward.

The architectural limitation is that Elasticsearch wasn't designed for vectors. Each Lucene segment contains its own HNSW graph, and segment merges trigger full graph rebuilds (causing CPU spikes). Vectors are stored on-heap, competing with the JVM's garbage collector and text search caches. At the same scale, purpose-built vector databases achieve 2-5x higher QPS. Still, for teams already invested in the Elastic ecosystem, the operational simplicity of co-locating text and vectors is often worth the performance trade-off.

Strengths

Native hybrid search: BM25 + kNN with built-in RRF fusion
Mature ecosystem: Kibana, Elastic Cloud, decade of production use
Co-located text + vectors in same index

Limitations

Segment merges rebuild HNSW graph (CPU spikes)
On-heap vectors compete with JVM/text search
Purpose-built DBs achieve 2-5x better QPS

Purpose-Built Vector Databases

These databases are engineered from the ground up for vector workloads: custom memory layouts for SIMD-optimized distance computation, purpose-built index structures, and APIs designed around the embed-index-query workflow. They achieve the highest QPS and lowest latency at scale, but come with varying levels of operational complexity. The four major options occupy different points on the managed-vs-self-hosted and simplicity-vs-capability spectrum.

Pinecone

Fully managed, serverless. Zero operations — API endpoint, query it. Serverless pricing (pay per query/GB).

Best for: Teams without infra expertise, startups, <10M vectors

Watch out: Vendor lock-in, limited control, cost at high QPS

Qdrant

Open-source, Rust. Payload indexes for filtered search during HNSW traversal. Docker → K8s → Cloud.

Best for: Self-hosted perf, filtered search, up to ~500M vectors

Watch out: Younger ecosystem, no native BM25

Weaviate

Open-source, Go. Native hybrid search (BM25 + vector). Modular vectorization (auto-embed via OpenAI/Cohere). Multi-tenancy.

Best for: All-in-one hybrid search, SaaS multi-tenant, auto-embedding

Watch out: Memory intensive (~50% more overhead), slower QPS

Milvus

Open-source, distributed. Widest index variety: HNSW, IVF-PQ, DiskANN, GPU indexes. Managed via Zilliz Cloud.

Best for: 100M+ vectors, GPU-accelerated search, index variety

Watch out: Requires etcd + MinIO + Pulsar, high ops complexity

Head-to-Head Comparison

The table below compares all six options across the dimensions that matter most in practice: implementation language (affects performance characteristics), self-hosting support, maximum practical scale, hybrid search capability, quantization options, operational complexity, latency at 1M vectors, and approximate monthly cost. Use this as a starting point, then drill into the specific factors that matter for your use case.

Feature	pgvector	ES	Pinecone	Qdrant	Weaviate	Milvus
Language	C	Java	—	Rust	Go	Go+C++
Self-hosted	✅	✅	❌	✅	✅	✅
Max scale	~5M	~50M	~100M+	~500M	~100M	10B+
Hybrid search	✅ FTS	✅ Native	❌	❌	✅	❌
Quantization	❌	Scalar	Auto	SQ/PQ/BQ	PQ/BQ	PQ/SQ8/GPU
Ops complexity	Minimal	Medium	Zero	Low-Med	Medium	High
Latency (1M)	~5ms	~3ms	~2ms	~1ms	~3ms	~1ms
Cost (1M)	~$50	~$300	~$70	~$100	~$150	~$400+

Decision Framework

The most common mistake is over-engineering: choosing Milvus for 500K vectors, or Pinecone when you already have PostgreSQL. Walk through these questions in order — stop at the first "yes" that matches your situation. The flowchart below encodes the decision logic that experienced teams follow.

Q1: Already use PostgreSQL with <5M vectors?

→ YES → pgvector. Stop here.

Q2: Need hybrid (keyword + vector) search?

→ YES + have ES → Stay with ES kNN

→ YES + no ES → Weaviate

Q3: Dedicated infrastructure team?

→ NO → Pinecone or Qdrant Cloud

Q4: How many vectors?

→ <100M → Qdrant

→ 100M-1B → Qdrant or Milvus

→ >1B → Milvus

Key Takeaways

pgvector: Simplest for <5M Vectors

Zero additional infrastructure — CREATE EXTENSION away. ACID transactions, SQL filtering, joins. But: performance ceiling at >5M vectors, resource contention with OLTP, limited ANN sophistication.

Elasticsearch: Best for Hybrid Search

Native BM25 + HNSW kNN with built-in RRF fusion. Mature ecosystem (Kibana, Elastic Cloud). But: segment merge rebuilds HNSW graph, on-heap vectors compete with JVM, not purpose-built for vectors.

Qdrant: Best Performance Per Dollar (Self-Hosted)

Rust implementation delivers excellent QPS. Payload indexes enable efficient filtered search during HNSW traversal. Flexible deployment: Docker → K8s → Cloud. Scales to ~500M vectors.

Milvus: Purpose-Built for Billion Scale

Distributed architecture handles 10B+ vectors. Widest index variety: HNSW, IVF-PQ, DiskANN, GPU indexes. But: requires etcd + MinIO + Pulsar — significant operational burden.

Don't Over-Engineer

The most common mistake is choosing a specialized vector database for a use case pgvector or Elasticsearch handles perfectly. If you have <5M vectors and already run PostgreSQL, pgvector is the answer.

6.10 Chunking Strategies 6.12 Evaluating Search Quality