Vector search
Vector search is retrieval by similarity in a high-dimensional embedding space, typically using cosine similarity or dot product. It returns semantically related documents even when no keywords match — the retrieval layer beneath most RAG pipelines and modern semantic search products.
Long definition
Classic keyword search ranks documents by how their lexical terms match a query, weighted by frequency and rarity (TF-IDF, BM25). Vector search does something different: it represents both the query and every document as a vector (an embedding) in the same semantic space, then returns the documents whose vectors sit closest to the query vector.
The closeness measure is almost always one of:
- Cosine similarity — angle between vectors, ignoring magnitude. Default for most embedding models.
- Dot product — cosine times magnitudes. Equivalent to cosine on normalized vectors and faster to compute.
- Euclidean (L2) distance — straight-line distance. Used in some legacy vector setups; less common for normalized embeddings.
The advantage is semantic. Query "low-impact running shoes for flat feet" and a document titled "stability trainers for fallen arches" will rank highly even though the lexical overlap is minimal. The disadvantage is the inverse: exact-match queries (product codes, error messages, proper nouns the model has never seen) underperform. Production systems almost always run hybrid retrieval — BM25 plus vector — and merge the two with a re-ranker.
The tooling landscape splits four ways:
- Embedded in the database:
pgvectorextension for Postgres, native vector types in MongoDB, Elasticsearch, Redis, and Cassandra. Cheapest path when you already run the DB. - Managed pure vector services: Pinecone, Weaviate Cloud, Qdrant Cloud, Vespa Cloud. Optimized for scale; integrate with most RAG frameworks out of the box.
- Self-hosted vector engines: Qdrant, Weaviate, Milvus, Vespa, FAISS as a library. Full control, ops cost.
- Hybrid search platforms: Typesense, Meilisearch, OpenSearch — keyword-first systems that added vector search.
For RAG, vector search is the retrieval step before the generation step. The LLM never sees your full corpus; it sees the top-k documents the vector search returned. Quality of retrieval caps quality of the answer — a hallucinating model often started with a bad top-k.
Practical scaling note: under 1M documents, almost any tool works. Above 100M, indexing strategy (HNSW vs IVF, quantization, sharding) starts to matter, and the choice between Pinecone-class managed services and self-hosted Qdrant or Vespa becomes load-bearing.
Common misconceptions
- "Vector search replaces keyword search." It doesn't. Hybrid retrieval (BM25 + vector + re-ranker) consistently outperforms pure vector search on real query distributions, especially for rare terms, codes, and proper nouns.
- "All vector databases are equivalent." They aren't. pgvector at 1M docs and Pinecone at 1M docs are similar; pgvector at 100M docs without good HNSW tuning is not. Pick by scale and ops profile.
- "Vector search is fuzzy keyword search." Different mechanism. Fuzzy keyword tolerates typos in lexical match. Vector search retrieves on meaning regardless of typing — "Maria's recipe for paella" can return a document titled "traditional Valencian rice dish" with no lexical overlap.
- "You only need vector search for RAG." RAG is the headline use case but not the only one. Internal site search, product recommendations, duplicate detection, content clustering, and content-gap analysis all run on the same primitive.
Continue exploring