RAG vs Traditional Search: What Changed, What Did Not, and Why BM25 Is Not Dead
Traditional search returns documents. RAG returns answers. That gap sounds simple but it changes everything about how you build, evaluate, and maintain an information retrieval system. This guide breaks down how keyword search, semantic search, and RAG differ, where each wins, and why the best production systems in 2026 combine all three.
In 2016, when a user searched your company's internal wiki for "password reset steps," they got a list of ten documents sorted by keyword relevance. They clicked the first result, skimmed three paragraphs, found the steps, and did the thing.
In 2026, that same query goes into a RAG system. The system retrieves the relevant sections from your documentation. The LLM reads them. It returns: "To reset your password, go to Account Settings, click Security, then select Reset Password. You will receive an email within two minutes."
Same underlying information. Different interface. Different outcome.
The retrieval mechanics have more in common than most people think. What changed is what happens after retrieval — and that change has enormous implications for when you use which system.
What Traditional Search Actually Does
Traditional search is a retrieval and ranking problem. You have a corpus of documents. A user submits a query. The search engine scores every document for relevance to that query and returns a ranked list.
BM25 — Best Match 25 — is the algorithm that has dominated that ranking problem for two decades. It is the default ranking function in Elasticsearch, Apache Solr, and OpenSearch. It evaluates relevance using three signals: how often the query terms appear in the document (term frequency), how rare those terms are across the entire corpus (inverse document frequency), and document length as a normalization factor.
BM25 sees text as a bag of tokens. It has no understanding of language, meaning, or intent. It matches words, not concepts. A document that says "cardiac arrest treatment protocol" will rank low for the query "heart failure emergency procedures" even though those phrases describe the same clinical situation. The words do not overlap.
Elasticsearch has powered over 80% of the world's search infrastructure on the strength of this model. Fast, interpretable, scalable to billions of documents, cheap to run. For use cases where users know the right words and exact matching is what they need — log search, compliance document lookup, product SKU search — BM25 is still hard to beat.
What BM25 cannot do is understand that "money back guarantee" and "refund policy" describe the same thing, or that a question about "what to do if my payment fails" should retrieve a document titled "billing error resolution." For that, you need semantic understanding.
What Semantic Search Adds
Semantic search converts queries and documents into dense vector embeddings — numerical representations that capture meaning rather than surface form. Documents about similar concepts cluster together in vector space regardless of the specific words used. A query about "heart failure emergency procedures" retrieves documents about "cardiac arrest treatment protocol" because those phrases land near each other in the embedding space.
This is powerful. It is also not a replacement for keyword search. Semantic search has a mirror-image weakness to BM25. Where BM25 misses synonyms and paraphrases, semantic search misses exact terms. A product code like AX-7200-PRO, an error string like ECONNREFUSED, or a statute number like § 1983 may embed poorly relative to their meaning in context. The embedding model tokenizes these identifiers into subword pieces that do not cluster near the exact string a user searches for.
This is the core tension that drove the development of hybrid retrieval — and it is the reason BM25 is more alive than ever in 2026.
The Three Retrieval Models Side by Side
| Dimension | Keyword Search (BM25) | Semantic Search (Dense) | Hybrid (BM25 + Dense) |
|---|---|---|---|
| Matching mechanism | Exact token overlap | Vector similarity in embedding space | Both, merged via RRF |
| Handles synonyms | No | Yes | Yes |
| Handles exact identifiers | Yes | Inconsistent | Yes |
| Latency | 10 to 50ms | 100 to 500ms (unoptimized) | 100 to 600ms |
| Index build cost | Low — pure arithmetic | High — embedding API call per chunk | Moderate — both indexes |
| Interpretability | High — term frequency scores | Low — vector distances | Moderate |
| Recall (typical production) | 62% top-10 | 71% top-10 | 87% top-10 |
| Best for | Exact queries, identifiers, logs | Conceptual questions, paraphrases | Most production RAG systems |
A 2025 hybrid search implementation benchmark showed: BM25 alone at 62% of user-relevant documents in top 10 results, semantic search alone at 71%, and hybrid BM25 plus semantic plus reranking at 87%. That 16-point recall gap between pure semantic and hybrid is not marginal. At 100,000 queries per day, it is the difference between 71,000 and 87,000 users getting a useful answer.
What RAG Adds on Top of Retrieval
RAG does not replace the retrieval layer. It extends it.
Traditional search returns a list of documents ranked by relevance. The user reads those results. RAG takes the top-ranked results, passes them to a language model as context, and asks the model to synthesize a direct answer from what it retrieved. The user receives an answer, not a list of documents.
Traditional Search Flow
+---------------------------+
| User Query |
| | |
| v |
| BM25 / Vector Index |
| | |
| v |
| Ranked Document List | <-- user reads this and extracts answer
+---------------------------+
RAG Flow
+---------------------------+
| User Query |
| | |
| v |
| BM25 + Vector Index |
| (hybrid retrieval) |
| | |
| v |
| Reranker |
| | |
| v |
| Top-k Chunks (context) |
| | |
| v |
| LLM Generation |
| | |
| v |
| Synthesized Answer | <-- user reads this
| + Source Citations |
+---------------------------+The retrieval step inside RAG is identical in structure to traditional search. What changes is the terminal step. Instead of presenting documents for the user to read, RAG feeds those documents to a model that reads them on the user's behalf and produces a direct, synthesized, cited answer.
This has consequences in both directions.
RAG wins when: The user needs an answer, not a document. When the relevant information is spread across multiple sections of multiple documents and the user would need to read and cross-reference all of them to construct the answer themselves. When the query is conversational and contextual, building on previous turns. When traceability and citation of sources matter for compliance.
Traditional search wins when: The user needs the source document, not a summary. When the corpus has very high query volume and sub-second response is required. When exact document retrieval is the goal — a compliance audit that requires the specific PDF, not a model's interpretation of it. When the cost per query of running an LLM at scale is unjustifiable for the value it adds.
How BM25 Became a First-Class RAG Component
The early assumption in RAG system design was that semantic retrieval would replace keyword search. Dense embeddings capture meaning better than term frequency. Why keep a legacy system around?
Practice proved that assumption wrong inside of about six months of production deployments.
BM25 keyword matching is fast and precise for exact-match queries. Semantic search via dense embeddings captures meaning. Neither is wrong. They fail on different query types. Every production RAG system that needs to serve a broad range of user queries needs both.
Reciprocal Rank Fusion (RRF) is the standard way to merge them. Both BM25 and dense retrieval produce independent ranked lists. RRF scores each document based on its position in each list. A document that ranks 2nd in dense and 8th in BM25 gets a higher combined score than one that ranks 1st in only dense and absent from BM25.
from rank_bm25 import BM25Okapi
from qdrant_client import QdrantClient
import numpy as np
def reciprocal_rank_fusion(
dense_results: list[tuple[str, float]], # (doc_id, score)
bm25_results: list[tuple[str, float]], # (doc_id, score)
k: int = 60 # RRF constant — 60 is standard
) -> list[tuple[str, float]]:
"""
Merge two ranked lists using Reciprocal Rank Fusion.
k=60 is the standard constant that prevents top-ranked items from dominating.
Returns merged list sorted by combined RRF score descending.
"""
rrf_scores: dict[str, float] = {}
# Score from dense retrieval ranks
for rank, (doc_id, _) in enumerate(dense_results):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
# Score from BM25 retrieval ranks
for rank, (doc_id, _) in enumerate(bm25_results):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
# Sort by combined score descending
merged = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
return merged
def hybrid_retrieve(
query: str,
corpus: list[str],
doc_ids: list[str],
embedding_fn,
vector_db_client: QdrantClient,
top_k: int = 20
) -> list[dict]:
"""
Full hybrid retrieval: BM25 + dense vector, merged with RRF.
"""
# BM25 retrieval
tokenized_corpus = [doc.split() for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)
bm25_scores = bm25.get_scores(query.split())
# Get top-k BM25 results as (doc_id, score) tuples
bm25_top_indices = np.argsort(bm25_scores)[::-1][:top_k]
bm25_results = [(doc_ids[i], bm25_scores[i]) for i in bm25_top_indices]
# Dense vector retrieval
query_vector = embedding_fn(query)
dense_hits = vector_db_client.search(
collection_name="knowledge_base",
query_vector=query_vector,
limit=top_k,
with_payload=True
)
dense_results = [(hit.id, hit.score) for hit in dense_hits]
# Merge with RRF
merged = reciprocal_rank_fusion(dense_results, bm25_results)
return [
{"doc_id": doc_id, "rrf_score": score}
for doc_id, score in merged[:top_k]
]The RRF constant k=60 is not arbitrary. Lower values give more weight to top-ranked documents and punish lower-ranked ones more aggressively. Higher values smooth out the differences. At k=60, a document ranked 1st contributes a score of 1/61 ≈ 0.016. A document ranked 10th contributes 1/70 ≈ 0.014. The scores stay close enough that a document ranked well in both lists reliably beats one ranked first in only one. That is the behavior you want for fusion.
Where Traditional Search Still Wins Outright
BM25 keyword matching is fast and precise: 10 to 50 milliseconds even on 10GB document collections. A basic RAG pipeline adds the embedding computation plus the LLM generation step, pushing total end-to-end latency to 1 to 3 seconds for a straightforward query. With reranking, 1.5 to 3.5 seconds. With agentic multi-hop retrieval, 5 to 15 seconds.
For use cases where that latency gap matters, and where the generation step adds no value, traditional search is the right architecture.
| Use Case | Traditional Search | RAG | Winner |
|---|---|---|---|
| E-commerce product catalog search | Fast, exact match on SKU and attributes | LLM generation adds latency, no value | Traditional Search |
| Log analysis and monitoring | Sub-millisecond exact match, aggregation | Generation not needed | Traditional Search |
| Compliance document retrieval | User needs the exact PDF, not a summary | Citation to specific document matters but generation adds risk | Traditional Search |
| Internal knowledge base Q&A | Relevant but spread across 5 docs | Synthesis saves user 20 minutes of reading | RAG |
| Customer support chatbot | User wants an answer, not a list | Direct answer with source reduces support load | RAG |
| Legal case research | Exact case citation retrieval | Multi-document synthesis for complex questions | Both |
| Code documentation search | Exact function name or error matching | Conversational explanation with context | Hybrid |
| Medical literature review | Exact term matching for drug names | Cross-document synthesis for clinical questions | Both |
The Migration Path: Elasticsearch to RAG
Most organizations already have an Elasticsearch or Solr deployment that handles their internal search. The question is not "should we replace it with RAG?" It is "how do we layer RAG generation on top of what we already have?"
Elasticsearch added native vector search support in version 8.x and supports hybrid retrieval combining BM25 with dense vectors through its own RRF implementation. Solr, built on Apache Lucene — the same foundation as Elasticsearch — has powered enterprise search since 2006 and has also added vector search capabilities.
Teams running existing Elasticsearch infrastructure can add RAG without replacing their search layer. Elasticsearch handles retrieval. An LLM handles generation from those results. The practical architecture:
Existing Elasticsearch Index
|
v
Hybrid Query (BM25 + KNN vector) <-- add embedding column + vector index
|
v
Top-k Results (with source metadata)
|
v
LLM Generation Layer (new) <-- this is what RAG adds
|
v
Synthesized Answer + CitationsThe main changes required: add an embedding step to your ingestion pipeline so each document gets a dense vector stored alongside its text. Enable the KNN search feature in your Elasticsearch index. Add an LLM generation step after retrieval. That is the core migration.
Organizations that already run Elasticsearch for structured operational search can keep it for those workloads and add a RAG layer over the unstructured documentation subset — the product manuals, policy documents, and knowledge base articles that benefit from conversational answer generation. This avoids the operational complexity of replacing a working system while capturing the value of RAG for the use cases that justify it.
How Evaluation Differs
Traditional search evaluation uses information retrieval metrics: precision (fraction of retrieved documents that are relevant), recall (fraction of all relevant documents that were retrieved), MRR (Mean Reciprocal Rank — how high does the first relevant result appear), and NDCG (Normalized Discounted Cumulative Gain — a weighted measure of ranked list quality).
RAG evaluation requires a different framework because the output is a generated answer, not a ranked list. RAGAS measures faithfulness (whether generated claims are grounded in retrieved context), answer relevancy, context precision, and context recall. Traditional IR metrics apply to the retrieval component of a RAG pipeline. RAGAS metrics apply to the end-to-end pipeline.
| Metric Type | What It Measures | Used For |
|---|---|---|
| Precision@k | Fraction of top-k results that are relevant | Retrieval layer only |
| Recall@k | Fraction of all relevant docs in top-k | Retrieval layer only |
| MRR | Average rank of first relevant result | Retrieval layer only |
| NDCG | Ranked list quality weighted by position | Retrieval layer only |
| RAGAS Faithfulness | Answer grounded in retrieved context | Full RAG pipeline |
| RAGAS Answer Relevancy | Answer addresses the question | Full RAG pipeline |
| RAGAS Context Precision | Retrieved chunks are relevant | Retrieval layer, RAG context |
| RAGAS Context Recall | Right chunks were retrieved | Retrieval layer, RAG context |
A complete evaluation framework for a hybrid RAG system runs both sets of metrics. Retrieval metrics tell you whether the right documents are surfacing at the right ranks. RAGAS metrics tell you whether the end-to-end system is producing correct, grounded answers. Low context precision in RAGAS combined with good precision@10 in IR evaluation points to a reranking problem — the right documents are in the top 10 but are not being placed at the top of the context where the LLM can attend to them.
What the Numbers Say in 2026
From 2025 production hybrid search benchmarks, the recall improvement from adding BM25 to pure dense retrieval is consistent and significant: recall increases from approximately 0.72 with BM25 alone to 0.91 with hybrid retrieval. Precision improves from 0.68 to 0.87 across the same benchmarks.
Enterprise intent to adopt hybrid retrieval tripled from 10.3% to 33.3% in a single quarter in late 2025, driven by teams hitting the limits of pure semantic search in production. The long-context-as-dominant-architecture position — the idea that million-token context windows would make retrieval unnecessary — collapsed from 15.5% to 3.5% in that same period.
Traditional search is not going away. It is becoming a component inside RAG systems rather than the terminal step. BM25 is not dead — it is the part of hybrid retrieval that catches what semantic search misses.
The Practical Decision
The decision between traditional search and RAG is not binary for most organizations. It is a question of what you add on top of what you already have, and where the generation layer earns its cost.
If users need documents, use traditional search. If users need answers synthesized from documents, use RAG. If your corpus mixes structured operational data with unstructured knowledge, use traditional search for the former and RAG for the latter.
And whichever retrieval layer you build, use hybrid search. The 16-point recall gap between pure semantic and hybrid is too large to ignore in any system where retrieval quality is the product.
For how the hybrid retrieval layer fits into the full RAG architecture — including chunking, embedding model selection, and reranking — read RAG Architecture Explained. For how the vector database handles the dense retrieval side of hybrid search, see Vector Database in RAG. For what happens when retrieval produces the wrong results despite hybrid search, see Why RAG Fails.
The final article in this series covers the embedding models that power both the dense retrieval side of hybrid search and the semantic understanding that makes RAG meaningfully better than pure keyword search: How Embeddings Work in RAG.
If you are still deciding whether RAG is the right architecture for your problem at all, start with What Is RAG in AI and RAG vs Fine-Tuning.
Follow on Google
Add as a preferred source in Search & Discover
Add as preferred sourceKrunal Kanojiya
Technical Content Writer
Technical Content Writer and former software developer from India. I write in-depth articles on blockchain, AI/ML, data engineering, web development, and developer careers. Currently at Lucent Innovation, previously at Cromtek Solution and freelance.