GEO & AI Search · Glossary · Updated Apr 2026

Vector search

Definition

Vector search is retrieval by similarity in a high-dimensional embedding space, typically using cosine similarity or dot product. It returns semantically related documents even when no keywords match — the retrieval layer beneath most RAG pipelines and modern semantic search products.

Find related

Long definition

Classic keyword search ranks documents by how their lexical terms match a query, weighted by frequency and rarity (TF-IDF, BM25). Vector search does something different: it represents both the query and every document as a vector (an embedding) in the same semantic space, then returns the documents whose vectors sit closest to the query vector.

The closeness measure is almost always one of:

  • Cosine similarity — angle between vectors, ignoring magnitude. Default for most embedding models.
  • Dot product — cosine times magnitudes. Equivalent to cosine on normalized vectors and faster to compute.
  • Euclidean (L2) distance — straight-line distance. Used in some legacy vector setups; less common for normalized embeddings.

The advantage is semantic. Query "low-impact running shoes for flat feet" and a document titled "stability trainers for fallen arches" will rank highly even though the lexical overlap is minimal. The disadvantage is the inverse: exact-match queries (product codes, error messages, proper nouns the model has never seen) underperform. Production systems almost always run hybrid retrieval — BM25 plus vector — and merge the two with a re-ranker.

The tooling landscape splits four ways:

  • Embedded in the database: pgvector extension for Postgres, native vector types in MongoDB, Elasticsearch, Redis, and Cassandra. Cheapest path when you already run the DB.
  • Managed pure vector services: Pinecone, Weaviate Cloud, Qdrant Cloud, Vespa Cloud. Optimized for scale; integrate with most RAG frameworks out of the box.
  • Self-hosted vector engines: Qdrant, Weaviate, Milvus, Vespa, FAISS as a library. Full control, ops cost.
  • Hybrid search platforms: Typesense, Meilisearch, OpenSearch — keyword-first systems that added vector search.

For RAG, vector search is the retrieval step before the generation step. The LLM never sees your full corpus; it sees the top-k documents the vector search returned. Quality of retrieval caps quality of the answer — a hallucinating model often started with a bad top-k.

Practical scaling note: under 1M documents, almost any tool works. Above 100M, indexing strategy (HNSW vs IVF, quantization, sharding) starts to matter, and the choice between Pinecone-class managed services and self-hosted Qdrant or Vespa becomes load-bearing.

Common misconceptions

  • "Vector search replaces keyword search." It doesn't. Hybrid retrieval (BM25 + vector + re-ranker) consistently outperforms pure vector search on real query distributions, especially for rare terms, codes, and proper nouns.
  • "All vector databases are equivalent." They aren't. pgvector at 1M docs and Pinecone at 1M docs are similar; pgvector at 100M docs without good HNSW tuning is not. Pick by scale and ops profile.
  • "Vector search is fuzzy keyword search." Different mechanism. Fuzzy keyword tolerates typos in lexical match. Vector search retrieves on meaning regardless of typing — "Maria's recipe for paella" can return a document titled "traditional Valencian rice dish" with no lexical overlap.
  • "You only need vector search for RAG." RAG is the headline use case but not the only one. Internal site search, product recommendations, duplicate detection, content clustering, and content-gap analysis all run on the same primitive.