K
Krunal Kanojiya
HomeAboutServicesBlog
Hire Me
K
Krunal Kanojiya

Technical Content Writer

BlogRSSSitemapEmail
© 2026 Krunal Kanojiya · Built with Next.js
Privacy PolicyTerms of Service
  1. Home
  2. /
  3. Blog
  4. /
  5. RAG vs Traditional Search: What Changed, What Did Not, and Why BM25 Is Not Dead
Tech15 min read2,892 words

RAG vs Traditional Search: What Changed, What Did Not, and Why BM25 Is Not Dead

Traditional search returns documents. RAG returns answers. That gap sounds simple but it changes everything about how you build, evaluate, and maintain an information retrieval system. This guide breaks down how keyword search, semantic search, and RAG differ, where each wins, and why the best production systems in 2026 combine all three.

Krunal Kanojiya

Krunal Kanojiya

May 08, 2026
Share:
#rag#traditional-search#bm25#keyword-search#semantic-search#elasticsearch#hybrid-search#retrieval-augmented-generation#vector-search#llm

In 2016, when a user searched your company's internal wiki for "password reset steps," they got a list of ten documents sorted by keyword relevance. They clicked the first result, skimmed three paragraphs, found the steps, and did the thing.

In 2026, that same query goes into a RAG system. The system retrieves the relevant sections from your documentation. The LLM reads them. It returns: "To reset your password, go to Account Settings, click Security, then select Reset Password. You will receive an email within two minutes."

Same underlying information. Different interface. Different outcome.

The retrieval mechanics have more in common than most people think. What changed is what happens after retrieval — and that change has enormous implications for when you use which system.

What Traditional Search Actually Does

Traditional search is a retrieval and ranking problem. You have a corpus of documents. A user submits a query. The search engine scores every document for relevance to that query and returns a ranked list.

BM25 — Best Match 25 — is the algorithm that has dominated that ranking problem for two decades. It is the default ranking function in Elasticsearch, Apache Solr, and OpenSearch. It evaluates relevance using three signals: how often the query terms appear in the document (term frequency), how rare those terms are across the entire corpus (inverse document frequency), and document length as a normalization factor.

BM25 sees text as a bag of tokens. It has no understanding of language, meaning, or intent. It matches words, not concepts. A document that says "cardiac arrest treatment protocol" will rank low for the query "heart failure emergency procedures" even though those phrases describe the same clinical situation. The words do not overlap.

Elasticsearch has powered over 80% of the world's search infrastructure on the strength of this model. Fast, interpretable, scalable to billions of documents, cheap to run. For use cases where users know the right words and exact matching is what they need — log search, compliance document lookup, product SKU search — BM25 is still hard to beat.

What BM25 cannot do is understand that "money back guarantee" and "refund policy" describe the same thing, or that a question about "what to do if my payment fails" should retrieve a document titled "billing error resolution." For that, you need semantic understanding.

What Semantic Search Adds

Semantic search converts queries and documents into dense vector embeddings — numerical representations that capture meaning rather than surface form. Documents about similar concepts cluster together in vector space regardless of the specific words used. A query about "heart failure emergency procedures" retrieves documents about "cardiac arrest treatment protocol" because those phrases land near each other in the embedding space.

Vector search can match "cardiac arrest" to a document about "heart failure" even though none of the words overlap, because the embedding model has learned that these concepts live close together in semantic space.

This is powerful. It is also not a replacement for keyword search. Semantic search has a mirror-image weakness to BM25. Where BM25 misses synonyms and paraphrases, semantic search misses exact terms. A product code like AX-7200-PRO, an error string like ECONNREFUSED, or a statute number like § 1983 may embed poorly relative to their meaning in context. The embedding model tokenizes these identifiers into subword pieces that do not cluster near the exact string a user searches for.

This is the core tension that drove the development of hybrid retrieval — and it is the reason BM25 is more alive than ever in 2026.

The Three Retrieval Models Side by Side

DimensionKeyword Search (BM25)Semantic Search (Dense)Hybrid (BM25 + Dense)
Matching mechanismExact token overlapVector similarity in embedding spaceBoth, merged via RRF
Handles synonymsNoYesYes
Handles exact identifiersYesInconsistentYes
Latency10 to 50ms100 to 500ms (unoptimized)100 to 600ms
Index build costLow — pure arithmeticHigh — embedding API call per chunkModerate — both indexes
InterpretabilityHigh — term frequency scoresLow — vector distancesModerate
Recall (typical production)62% top-1071% top-1087% top-10
Best forExact queries, identifiers, logsConceptual questions, paraphrasesMost production RAG systems

A 2025 hybrid search implementation benchmark showed: BM25 alone at 62% of user-relevant documents in top 10 results, semantic search alone at 71%, and hybrid BM25 plus semantic plus reranking at 87%. That 16-point recall gap between pure semantic and hybrid is not marginal. At 100,000 queries per day, it is the difference between 71,000 and 87,000 users getting a useful answer.

What RAG Adds on Top of Retrieval

RAG does not replace the retrieval layer. It extends it.

Traditional search returns a list of documents ranked by relevance. The user reads those results. RAG takes the top-ranked results, passes them to a language model as context, and asks the model to synthesize a direct answer from what it retrieved. The user receives an answer, not a list of documents.

plaintext
Traditional Search Flow
+---------------------------+
|  User Query               |
|          |                |
|          v                |
|  BM25 / Vector Index      |
|          |                |
|          v                |
|  Ranked Document List     |  <-- user reads this and extracts answer
+---------------------------+

RAG Flow
+---------------------------+
|  User Query               |
|          |                |
|          v                |
|  BM25 + Vector Index      |
|  (hybrid retrieval)       |
|          |                |
|          v                |
|  Reranker                 |
|          |                |
|          v                |
|  Top-k Chunks (context)   |
|          |                |
|          v                |
|  LLM Generation           |
|          |                |
|          v                |
|  Synthesized Answer       |  <-- user reads this
|  + Source Citations       |
+---------------------------+

The retrieval step inside RAG is identical in structure to traditional search. What changes is the terminal step. Instead of presenting documents for the user to read, RAG feeds those documents to a model that reads them on the user's behalf and produces a direct, synthesized, cited answer.

This has consequences in both directions.

RAG wins when: The user needs an answer, not a document. When the relevant information is spread across multiple sections of multiple documents and the user would need to read and cross-reference all of them to construct the answer themselves. When the query is conversational and contextual, building on previous turns. When traceability and citation of sources matter for compliance.

Traditional search wins when: The user needs the source document, not a summary. When the corpus has very high query volume and sub-second response is required. When exact document retrieval is the goal — a compliance audit that requires the specific PDF, not a model's interpretation of it. When the cost per query of running an LLM at scale is unjustifiable for the value it adds.

How BM25 Became a First-Class RAG Component

The early assumption in RAG system design was that semantic retrieval would replace keyword search. Dense embeddings capture meaning better than term frequency. Why keep a legacy system around?

Practice proved that assumption wrong inside of about six months of production deployments.

BM25 keyword matching is fast and precise for exact-match queries. Semantic search via dense embeddings captures meaning. Neither is wrong. They fail on different query types. Every production RAG system that needs to serve a broad range of user queries needs both.

Reciprocal Rank Fusion (RRF) is the standard way to merge them. Both BM25 and dense retrieval produce independent ranked lists. RRF scores each document based on its position in each list. A document that ranks 2nd in dense and 8th in BM25 gets a higher combined score than one that ranks 1st in only dense and absent from BM25.

python
from rank_bm25 import BM25Okapi
from qdrant_client import QdrantClient
import numpy as np

def reciprocal_rank_fusion(
    dense_results: list[tuple[str, float]],   # (doc_id, score)
    bm25_results: list[tuple[str, float]],    # (doc_id, score)
    k: int = 60   # RRF constant — 60 is standard
) -> list[tuple[str, float]]:
    """
    Merge two ranked lists using Reciprocal Rank Fusion.
    k=60 is the standard constant that prevents top-ranked items from dominating.
    Returns merged list sorted by combined RRF score descending.
    """
    rrf_scores: dict[str, float] = {}

    # Score from dense retrieval ranks
    for rank, (doc_id, _) in enumerate(dense_results):
        rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)

    # Score from BM25 retrieval ranks
    for rank, (doc_id, _) in enumerate(bm25_results):
        rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)

    # Sort by combined score descending
    merged = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
    return merged


def hybrid_retrieve(
    query: str,
    corpus: list[str],
    doc_ids: list[str],
    embedding_fn,
    vector_db_client: QdrantClient,
    top_k: int = 20
) -> list[dict]:
    """
    Full hybrid retrieval: BM25 + dense vector, merged with RRF.
    """
    # BM25 retrieval
    tokenized_corpus = [doc.split() for doc in corpus]
    bm25 = BM25Okapi(tokenized_corpus)
    bm25_scores = bm25.get_scores(query.split())
    # Get top-k BM25 results as (doc_id, score) tuples
    bm25_top_indices = np.argsort(bm25_scores)[::-1][:top_k]
    bm25_results = [(doc_ids[i], bm25_scores[i]) for i in bm25_top_indices]

    # Dense vector retrieval
    query_vector = embedding_fn(query)
    dense_hits = vector_db_client.search(
        collection_name="knowledge_base",
        query_vector=query_vector,
        limit=top_k,
        with_payload=True
    )
    dense_results = [(hit.id, hit.score) for hit in dense_hits]

    # Merge with RRF
    merged = reciprocal_rank_fusion(dense_results, bm25_results)

    return [
        {"doc_id": doc_id, "rrf_score": score}
        for doc_id, score in merged[:top_k]
    ]

The RRF constant k=60 is not arbitrary. Lower values give more weight to top-ranked documents and punish lower-ranked ones more aggressively. Higher values smooth out the differences. At k=60, a document ranked 1st contributes a score of 1/61 ≈ 0.016. A document ranked 10th contributes 1/70 ≈ 0.014. The scores stay close enough that a document ranked well in both lists reliably beats one ranked first in only one. That is the behavior you want for fusion.

Where Traditional Search Still Wins Outright

BM25 keyword matching is fast and precise: 10 to 50 milliseconds even on 10GB document collections. A basic RAG pipeline adds the embedding computation plus the LLM generation step, pushing total end-to-end latency to 1 to 3 seconds for a straightforward query. With reranking, 1.5 to 3.5 seconds. With agentic multi-hop retrieval, 5 to 15 seconds.

For use cases where that latency gap matters, and where the generation step adds no value, traditional search is the right architecture.

Use CaseTraditional SearchRAGWinner
E-commerce product catalog searchFast, exact match on SKU and attributesLLM generation adds latency, no valueTraditional Search
Log analysis and monitoringSub-millisecond exact match, aggregationGeneration not neededTraditional Search
Compliance document retrievalUser needs the exact PDF, not a summaryCitation to specific document matters but generation adds riskTraditional Search
Internal knowledge base Q&ARelevant but spread across 5 docsSynthesis saves user 20 minutes of readingRAG
Customer support chatbotUser wants an answer, not a listDirect answer with source reduces support loadRAG
Legal case researchExact case citation retrievalMulti-document synthesis for complex questionsBoth
Code documentation searchExact function name or error matchingConversational explanation with contextHybrid
Medical literature reviewExact term matching for drug namesCross-document synthesis for clinical questionsBoth

The Migration Path: Elasticsearch to RAG

Most organizations already have an Elasticsearch or Solr deployment that handles their internal search. The question is not "should we replace it with RAG?" It is "how do we layer RAG generation on top of what we already have?"

Elasticsearch added native vector search support in version 8.x and supports hybrid retrieval combining BM25 with dense vectors through its own RRF implementation. Solr, built on Apache Lucene — the same foundation as Elasticsearch — has powered enterprise search since 2006 and has also added vector search capabilities.

Teams running existing Elasticsearch infrastructure can add RAG without replacing their search layer. Elasticsearch handles retrieval. An LLM handles generation from those results. The practical architecture:

plaintext
Existing Elasticsearch Index
          |
          v
Hybrid Query (BM25 + KNN vector)  <-- add embedding column + vector index
          |
          v
Top-k Results (with source metadata)
          |
          v
LLM Generation Layer (new)        <-- this is what RAG adds
          |
          v
Synthesized Answer + Citations

The main changes required: add an embedding step to your ingestion pipeline so each document gets a dense vector stored alongside its text. Enable the KNN search feature in your Elasticsearch index. Add an LLM generation step after retrieval. That is the core migration.

Organizations that already run Elasticsearch for structured operational search can keep it for those workloads and add a RAG layer over the unstructured documentation subset — the product manuals, policy documents, and knowledge base articles that benefit from conversational answer generation. This avoids the operational complexity of replacing a working system while capturing the value of RAG for the use cases that justify it.

How Evaluation Differs

Traditional search evaluation uses information retrieval metrics: precision (fraction of retrieved documents that are relevant), recall (fraction of all relevant documents that were retrieved), MRR (Mean Reciprocal Rank — how high does the first relevant result appear), and NDCG (Normalized Discounted Cumulative Gain — a weighted measure of ranked list quality).

RAG evaluation requires a different framework because the output is a generated answer, not a ranked list. RAGAS measures faithfulness (whether generated claims are grounded in retrieved context), answer relevancy, context precision, and context recall. Traditional IR metrics apply to the retrieval component of a RAG pipeline. RAGAS metrics apply to the end-to-end pipeline.

Metric TypeWhat It MeasuresUsed For
Precision@kFraction of top-k results that are relevantRetrieval layer only
Recall@kFraction of all relevant docs in top-kRetrieval layer only
MRRAverage rank of first relevant resultRetrieval layer only
NDCGRanked list quality weighted by positionRetrieval layer only
RAGAS FaithfulnessAnswer grounded in retrieved contextFull RAG pipeline
RAGAS Answer RelevancyAnswer addresses the questionFull RAG pipeline
RAGAS Context PrecisionRetrieved chunks are relevantRetrieval layer, RAG context
RAGAS Context RecallRight chunks were retrievedRetrieval layer, RAG context

A complete evaluation framework for a hybrid RAG system runs both sets of metrics. Retrieval metrics tell you whether the right documents are surfacing at the right ranks. RAGAS metrics tell you whether the end-to-end system is producing correct, grounded answers. Low context precision in RAGAS combined with good precision@10 in IR evaluation points to a reranking problem — the right documents are in the top 10 but are not being placed at the top of the context where the LLM can attend to them.

What the Numbers Say in 2026

From 2025 production hybrid search benchmarks, the recall improvement from adding BM25 to pure dense retrieval is consistent and significant: recall increases from approximately 0.72 with BM25 alone to 0.91 with hybrid retrieval. Precision improves from 0.68 to 0.87 across the same benchmarks.

Enterprise intent to adopt hybrid retrieval tripled from 10.3% to 33.3% in a single quarter in late 2025, driven by teams hitting the limits of pure semantic search in production. The long-context-as-dominant-architecture position — the idea that million-token context windows would make retrieval unnecessary — collapsed from 15.5% to 3.5% in that same period.

Traditional search is not going away. It is becoming a component inside RAG systems rather than the terminal step. BM25 is not dead — it is the part of hybrid retrieval that catches what semantic search misses.

The Practical Decision

The decision between traditional search and RAG is not binary for most organizations. It is a question of what you add on top of what you already have, and where the generation layer earns its cost.

If users need documents, use traditional search. If users need answers synthesized from documents, use RAG. If your corpus mixes structured operational data with unstructured knowledge, use traditional search for the former and RAG for the latter.

And whichever retrieval layer you build, use hybrid search. The 16-point recall gap between pure semantic and hybrid is too large to ignore in any system where retrieval quality is the product.

For how the hybrid retrieval layer fits into the full RAG architecture — including chunking, embedding model selection, and reranking — read RAG Architecture Explained. For how the vector database handles the dense retrieval side of hybrid search, see Vector Database in RAG. For what happens when retrieval produces the wrong results despite hybrid search, see Why RAG Fails.

The final article in this series covers the embedding models that power both the dense retrieval side of hybrid search and the semantic understanding that makes RAG meaningfully better than pure keyword search: How Embeddings Work in RAG.

If you are still deciding whether RAG is the right architecture for your problem at all, start with What Is RAG in AI and RAG vs Fine-Tuning.

On this page

What Traditional Search Actually DoesWhat Semantic Search AddsThe Three Retrieval Models Side by SideWhat RAG Adds on Top of RetrievalHow BM25 Became a First-Class RAG ComponentWhere Traditional Search Still Wins OutrightThe Migration Path: Elasticsearch to RAGHow Evaluation DiffersWhat the Numbers Say in 2026The Practical Decision

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source
Appears in Google Discover
All posts

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source
Appears in Google Discover
Krunal Kanojiya

Krunal Kanojiya

Technical Content Writer

Technical Content Writer and former software developer from India. I write in-depth articles on blockchain, AI/ML, data engineering, web development, and developer careers. Currently at Lucent Innovation, previously at Cromtek Solution and freelance.

GitHubLinkedIn

Related Posts

How Embeddings Work in RAG: The Complete Guide (2026)

May 08, 2026 · 19 min read

Why RAG Fails: Every Failure Mode and How to Fix Each One (2026)

May 07, 2026 · 17 min read

RAG Architecture Explained: How Production Pipelines Actually Work (2026)

May 04, 2026 · 18 min read