Vector Search Isn't a Database

Thesis

Vector search is a retrieval technique. It isn't a database. Treating it as one is the source of most production RAG failures.

Three years into the RAG era, the pattern is clear: teams stand up a vector store, chunk their corpus, embed it, query it by cosine similarity, and ship. Then production happens. A user asks "what was our Q3 2025 revenue?" and the system confidently returns the Q3 2024 number — because in embedding space, 2024 and 2025 are neighbors. A support query mentions "ISO 27001" and the system returns three documents about "security best practices" instead. The one document that actually names the standard gets buried because it doesn't have the richest semantic context.

These aren't edge cases. They're what happens when a retrieval technique gets load-bearing responsibility it was never designed for. The fix isn't a better embedding model. It's architectural: treat embeddings as the recall layer, structure as the correctness layer, and reranking as the precision layer — and never let the vector index act as your system of record.

Prerequisites

You've shipped a RAG or retrieval-augmented system in production
You know what a dense embedding is and how cosine similarity works
You understand the difference between recall (finding relevant items) and precision (the retrieved items being correct)
Ideally, you've felt the specific pain of a vector search returning semantically close but factually wrong results

What vector search is vs what it isn't

Vector search projects text into a high-dimensional space — typically 384 to 4096 floats — where cosine or dot-product similarity approximates semantic proximity. Query the index with an embedded question, get back the k nearest chunks, feed them to an LLM as context. The technique is elegant and genuinely useful.

What vector search isn't:

A database. It has no transactional guarantees, no isolation levels, no consistency model. Update semantics are ad-hoc. Deletion is a soft concept.
A source of correctness. Similarity is not truth. Two documents can be cosine-close and factually contradictory.
A substitute for structure. Metadata, relationships, identifiers, and time — the information your business actually runs on — don't survive the embedding well.

The marketing around "vector databases" has conflated these. Pinecone, Weaviate, Chroma, Qdrant, pgvector — all useful tools — are retrieval indexes with persistence. Calling them databases blurs the thing they do well (nearest-neighbor lookup) with things they don't do at all (ACID semantics, structured queries, relational integrity).

The distinction is not pedantic. Nearly every production failure mode traces back to teams treating the retrieval index as the authoritative system.

The genuine pros

Semantic recall at scale. Find me documents about a topic across millions of unstructured items — no vector-free approach comes close. Bi-encoder embeddings bootstrap this in an afternoon.

Robustness to surface variation. Typos, paraphrases, multilingual queries, synonyms — embeddings handle these gracefully where keyword approaches break.

Scale. Production systems today serve billion-vector indexes with millisecond p99 latency. The infrastructure has caught up to the ambition.

Low starting effort. Chunk, embed, store, query — a working retrieval layer in a day. That's real value and shouldn't be dismissed by purists.

The genuine cons

Embeddings are lossy by design. A 1,500-word document compressed to a single 1536-dimensional vector loses information. Fine for matching concepts, catastrophic for specifics.

Precision on specifics collapses. An embedding model treats "Q3 2024" and "Q3 2025" as near-identical. It cannot distinguish employee ID E-1047 from E-1074 by meaning, because there is no meaning there to distinguish. The same failure hits product SKUs, version numbers, monetary amounts, ISO standards, legal citations, drug names. The cases where specificity matters most are the cases where pure vector search fails worst.

Drift is silent. Same query, same corpus, different scores month-over-month as embedding models update or content shifts. Without a structured ground-truth system, you can't tell whether retrieval got better or worse.

Structured questions have no vector answer. Show me all invoices over ₹50,000 from Vendor X in Q3. That isn't a nearest-neighbor problem — it's a filter problem. Dress it as a vector query and you'll get documents about Vendor X, not the correct list.

Multi-hop reasoning breaks. Questions that chain relationships — who is the manager of the person who approved this filing? — need a graph or a relational schema. Vectors don't model relationships; they model vibes.

Evaluation is genuinely hard. Without structured ground truth, "good recall" often means "the first draft that didn't obviously embarrass us." Teams ship vector-only RAG systems and measure satisfaction via the absence of complaints.

Our position

Embeddings for recall. Structure for correctness. Reranking for precision. None of the three alone is enough.

The hybrid retrieval cascade that works in production:

Structured filters first. Apply the query's bounded constraints — tenant, date range, entity, document type — at the metadata layer before the vector search runs. This is where correctness enters the system.
Parallel sparse + dense retrieval over the filtered set. BM25 catches exact-keyword matches (identifiers, standards, names) that embeddings blur. Vector search catches semantic matches that keywords miss.
Reciprocal Rank Fusion to combine the two result lists. RRF handles score incompatibility — BM25 and cosine live in different scales — without needing to hand-tune weights.
Cross-encoder reranker on the top ~100 candidates. BGE-Reranker, Cohere Rerank 3.5, or equivalent. This is the precision layer — a cross-encoder actually reads the query-document pair instead of comparing compressed vectors.
Short, ordered final context to the LLM. Five to ten chunks, best first. More than that and you hit lost-in-the-middle effects where the model ignores the buried relevant item.

The crucial design call: the vector index is a derived, rebuildable view over your system of record. Not the system of record itself. If you can't regenerate the embeddings by replaying your canonical data, the architecture is inverted. When compliance or audit comes asking why did the system return this?, you need to point to the structured source, not a similarity score.

The reframe worth holding onto

A vector index is a cache with semantics. Like any cache, it needs invalidation, audit, and a canonical source. If the vector store is your only source, you don't have a system — you have a benchmark.

The stakes sharpen in compliance-sensitive or multi-tenant contexts. A vector-only retrieval layer over regulatory documents can return something relevant but cannot guarantee it returned the correct citation. For agent systems where the control plane is what keeps reach from becoming harm — retrieval inherits the same discipline. Any tool call that pulls from a vector index is pulling from a probabilistic view. Treat its output the way you should treat any untrusted external response, and surround it with structured claims you can actually verify.

References / further reading

Anthropic, Contextual Retrieval — the chunk-isolation problem and the prepended-context fix. The reference practitioner approach for making embeddings preserve document-level meaning.
Sarmah et al., HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction (arXiv:2408.04948). KG + vector combination with measurable accuracy gains on financial documents.
Redis, RAG at Scale: How to Build Production AI Systems in 2026 (Jan 2026). Dual-pipeline architecture, hybrid retrieval patterns, and production latency numbers.
LlamaIndex 2026 benchmark — structured vs pure-vector precision comparison across metadata-rich corpora.
Cormack, Clarke, Buettcher (2009), Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. The original RRF paper; still the cleanest combiner for sparse + dense.
Cross-encoder rerankers — BGE-Reranker, Cohere Rerank 3.5. Reference implementations of the precision layer.

Vector Search Isn't a Database: When Embeddings Help, When They Hurt, and What Hybrid Retrieval Actually Means

Thesis

Prerequisites

What vector search is vs what it isn't

The genuine pros

The genuine cons

Our position

The reframe worth holding onto

References / further reading

Ready to level up?

Working on something like this?

Thesis

Prerequisites

What vector search is vs what it isn't

The genuine pros

The genuine cons

Our position

The reframe worth holding onto

References / further reading

Ready to level up?

MCP in Practice: A Standard Interface for Tools, or a New Layer of Operational Risk?

Working on something like this?