Can Elasticsearch be used as a vector database?

Yes. Elasticsearch supports kNN dense vector search through its k-nearest neighbors API and sparse neural retrieval through ELSER (Elastic Learned Sparse EncodeR). It can run hybrid search combining BM25 keyword matching with vector similarity, fused using Reciprocal Rank Fusion. For teams already operating an Elasticsearch cluster, this is a practical starting point. For greenfield AI applications where vector search is the primary workload, purpose-built databases like Qdrant, Weaviate, or Pinecone offer better latency and simpler operational models.

Is Elasticsearch faster or slower than purpose-built vector databases?

Elasticsearch is slower for pure vector search workloads. Benchmarks from Zilliz show Elasticsearch takes 200ms per query on 1 million vectors compared to 6ms on Milvus, a 30x difference. Qdrant's own benchmarks confirm Elasticsearch is 10x slower for indexing 10 million vectors (5.5 hours versus 32 minutes). The gap narrows for hybrid workloads where BM25 strength compensates. For RAG pipelines where LLM latency dominates, the difference may be imperceptible to end users.

What is the difference between Elasticsearch kNN and HNSW in dedicated vector databases?

Both use HNSW as the underlying algorithm. The architectural difference is that Elasticsearch was built around Apache Lucene's inverted index and the JVM. Vector search was added to that architecture, which creates overhead in memory management and index optimization that a purpose-built C++ or Rust system does not have. Elasticsearch also has slower index build times for large vector datasets because its indexing pipeline was designed for text documents, not high-dimensional float arrays.

Should I migrate from Elasticsearch to a dedicated vector database?

Only if your vector workloads have outgrown Elasticsearch's performance envelope. If you run Elasticsearch for log analytics, site search, or SIEM alongside a RAG feature, keeping vectors in Elasticsearch avoids adding a second system. If vector search is your primary workload and your dataset exceeds tens of millions of vectors, or if your latency requirements are below 50ms at the p95 percentile, benchmarking Qdrant or Milvus against your actual workload is worth the time.

Which vector database is best for a new RAG application in 2026?

For a greenfield RAG application: Qdrant if you want open-source, high performance, and strong filtering. Weaviate if you want built-in hybrid search and a GraphQL API. Pinecone if you want fully managed with zero infrastructure. pgvector if you already run PostgreSQL and your corpus is under 10 million vectors. Elasticsearch makes sense if you already run it and want to add vector search without a new system. Chroma if you are prototyping locally before choosing a production store.

Vector Database vs Elasticsearch (2026 Comparison)

Q: What is ELSER and how does it relate to vector search?

ELSER (Elastic Learned Sparse EncodeR) is Elasticsearch's own sparse neural retrieval model. It is inspired by SPLADE and produces sparse vector representations that combine keyword precision with learned semantic expansion. A query about 'car' activates dimensions for 'vehicle' and 'automobile' automatically. ELSER closes the vocabulary mismatch gap that pure BM25 cannot handle, without requiring you to manage a separate dense embedding model. It is available as a first-party model through Elastic's machine learning features.

Intercom built Fin, their AI customer support agent, on top of Elasticsearch rather than a dedicated vector database. At 20 million embeddings and projecting to 100 million, they evaluated Pinecone, Milvus, Qdrant, Weaviate, and Elasticsearch. Their conclusion: for an AI agent where LLM latency is measured in seconds, even if search latency dropped to zero, they would only save approximately 200ms per request. Elasticsearch was the most cost-effective, most operationally familiar option for their specific workload. They stayed with it.

That is not a story about Elasticsearch winning. It is a story about the right tool for a specific workload profile.

The question "vector database vs Elasticsearch" does not have a universal answer. It has a decision framework. This article builds that framework from first principles: how the architectures differ, what benchmarks actually show, where each system wins and loses, and how to map those trade-offs to your own use case.

This is the sixth article in the Vector Database Fundamentals series. It applies the comparison pattern from vector database vs traditional database to the specific case of Elasticsearch, using the retrieval concepts from dense vs sparse vectors and semantic search. The pillar article covers the full landscape of vector database options.

What Elasticsearch Was Built to Do

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It has been in production since 2010. Its core competence is inverted-index-based full-text search: indexing JSON documents, tokenizing text fields, and scoring results using BM25.

Companies like eBay, Stack Overflow, Wikipedia, and NASA run Elasticsearch for log analytics, site search, SIEM, and application search. Its horizontal scalability, mature REST API, rich query DSL, and extensive tooling ecosystem are the result of 15 years of production refinement.

json

// A standard Elasticsearch query
{
  "query": {
    "bool": {
      "must": {
        "match": { "body": "connection timeout error" }
      },
      "filter": [
        { "term":  { "status": "open" } },
        { "range": { "created_at": { "gte": "now-30d" } } }
      ]
    }
  }
}

This query is what Elasticsearch was designed for. BM25 scoring on the body field, filtered by structured metadata. The inverted index handles it in single-digit milliseconds regardless of document count.

How Elasticsearch Added Vector Search

Starting with version 7.3, Elasticsearch added support for dense vector fields. Version 8.0 introduced the knn search option using HNSW as the underlying algorithm. The current implementation supports approximate nearest neighbor search, metadata filtering, and hybrid search combining BM25 and kNN results using Reciprocal Rank Fusion.

python

from elasticsearch import Elasticsearch
import openai

es = Elasticsearch("http://localhost:9200")
oai = openai.OpenAI(api_key="your-key")

# Create an index with a dense vector field
es.indices.create(
    index="knowledge-base",
    body={
        "mappings": {
            "properties": {
                "content":   {"type": "text"},
                "category":  {"type": "keyword"},
                "embedding": {
                    "type": "dense_vector",
                    "dims": 1536,
                    "index": True,
                    "similarity": "cosine",
                }
            }
        }
    }
)

# Ingest a document with its embedding
doc = "Connection timeout errors occur on slow or unreliable networks."
vec = oai.embeddings.create(input=doc, model="text-embedding-3-small").data[0].embedding

es.index(
    index="knowledge-base",
    document={
        "content":   doc,
        "category":  "troubleshooting",
        "embedding": vec,
    }
)

# Hybrid search: BM25 + kNN with RRF fusion
query_text = "why does my app crash on slow connections"
query_vec  = oai.embeddings.create(
    input=query_text, model="text-embedding-3-small"
).data[0].embedding

response = es.search(
    index="knowledge-base",
    body={
        "retriever": {
            "rrf": {
                "retrievers": [
                    {
                        "standard": {
                            "query": {"match": {"content": query_text}}
                        }
                    },
                    {
                        "knn": {
                            "field": "k",
                            "query_vector": query_vec,
                            "num_candidates": 50,
                        }
                    }
                ]
            }
        },
        "size": 5,
    }
)

for hit in response["hits"]["hits"]:
    print(f"Score {hit['_score']:.4f}: {hit['_source']['content'][:70]}")

The retriever abstraction with rrf is Elasticsearch's modern syntax for hybrid search. It runs BM25 and kNN in parallel and fuses the ranked lists using Reciprocal Rank Fusion, the same algorithm described in the dense vs sparse vectors article.

ELSER: Elasticsearch's Sparse Neural Model

Beyond dense vector support, Elastic shipped ELSER (Elastic Learned Sparse EncodeR), a first-party sparse neural retrieval model inspired by SPLADE. ELSER maps text to sparse vectors over the vocabulary space where learned weights replace raw term frequency counts. A query about "car" activates dimensions for "vehicle" and "automobile" automatically through term expansion.

ELSER occupies the middle ground between pure BM25 (fast but vocabulary-limited) and dense embeddings (semantic but opaque). It uses Lucene's inverted index for retrieval, which means it benefits from the same mature indexing infrastructure that Elasticsearch has refined for 15 years, while adding learned semantic generalization on top.

According to Pureinsights' 2026 hybrid search analysis, by the time the market consolidated, Elasticsearch had native dense vector support, ELSER for sparse neural retrieval, and dual fusion methods — RRF and weighted linear combination — for blending keyword and vector results. It had also been named a Leader in the 2025 Forrester Wave for Cognitive Search Platforms.

The Architectural Difference That Explains the Performance Gap

Elasticsearch is a Java application running on the JVM. Its core data structure is the Lucene inverted index, which maps tokens to posting lists. Vector search was added to this architecture, not designed into it from the start.

Purpose-built vector databases like Qdrant (implemented in Rust), Milvus (implemented in C++), and Weaviate (implemented in Go) were designed with high-dimensional vector storage and ANN search as the primary workload. Their memory management, index structures, and query planners are optimized specifically for float arrays of thousands of dimensions.

According to Zilliz's analysis, the core vector indexing engine in Milvus is implemented in C++, offering more efficient memory management than a Java-based system. This alone reduces memory footprint by saving gigabytes compared to the JVM-based approach. Even with just 1 million vectors, Elasticsearch takes 200 milliseconds to return results compared to 6ms on Milvus, a difference of over 30 times.

The indexing speed gap is even larger. According to Qdrant's benchmark data, Elasticsearch is 10 times slower for indexing when storing 10 million vectors of 96 dimensions: 32 minutes for Qdrant versus 5.5 hours for Elasticsearch. For large corpus ingestion or frequent reindexing, this matters significantly.

Benchmark Comparison: What the Numbers Actually Show

Several independent benchmarks are available, and they generally agree on the following picture.

For pure vector search latency, Elasticsearch is not the fastest option. Purpose-built systems consistently outperform it on pure ANN workloads. A 14-case independent benchmark by Muhammad Imran Zaman, published on Hugging Face, showed that even pgvector beat Elasticsearch on every local category for a 10,000-row dataset with a properly tuned HNSW index.

For hybrid workloads combining full-text search and vector similarity, Elasticsearch performs much more competitively. Its BM25 implementation is among the most mature in the industry, and its ELSER model closes vocabulary mismatch gaps that pure dense retrieval misses. Teams that need both capabilities from one system often find Elasticsearch "good enough" rather than "blazing fast."

For indexing throughput and update latency, Elasticsearch lags behind purpose-built systems. Qdrant processes mutations with lower latency and higher concurrent throughput based on multiple independent benchmarks.

For recall at high precision thresholds, Elasticsearch's HNSW implementation is competitive. The recall quality at the same precision threshold is roughly comparable to other HNSW implementations. The performance gap is primarily about speed and throughput, not about the accuracy of the results returned.

The Intercom case is an instructive data point here. Their benchmark showed Elasticsearch latency between 100ms and 200ms for 2 million embeddings at 768 dimensions. According to their published research, for an AI agent the bottleneck is generally LLM latency measured in seconds. Even if search latency dropped to zero, they would only save approximately 200ms. That tradeoff analysis is what led them to keep Elasticsearch.

plaintext

Use case: AI customer support agent (Intercom Fin)
Dataset:  20M embeddings, 768D
Evaluated: Pinecone, Milvus, Qdrant, Weaviate, Elasticsearch

Elasticsearch search latency:   100 to 200ms
LLM generation latency:         2,000 to 5,000ms

Latency savings from switching: ~200ms out of 4,000ms total
Operational cost of switching:  New system, new expertise, new infra
Decision:                        Stay on Elasticsearch

This is the right analysis. Optimize for the bottleneck. If LLM latency is your bottleneck, saving 150ms on search has minimal user impact.

Where Elasticsearch Wins

Existing Elasticsearch Infrastructure

If your organization already runs Elasticsearch for log analytics, application search, or SIEM, adding vector fields to existing indexes adds AI-powered search without a new system, new operational expertise, or new infrastructure costs.

The operational cost of running one system versus two is real. Every additional system adds monitoring, upgrade management, backup procedures, failure modes, and on-call complexity. For teams that already know Elasticsearch, the operational overhead argument strongly favors staying there until the performance limits are actually hit.

Hybrid Search With Full-Text Strength

Elasticsearch's BM25 implementation is one of the most production-hardened in the industry. For applications that require both semantic search and exact string matching, especially with complex query DSL features like field boosting, proximity scoring, and faceted aggregation, Elasticsearch provides both capabilities in a single system with a unified query language.

python

# Elasticsearch: complex hybrid query that no purpose-built
# vector database can replicate in a single query
response = es.search(
    index="products",
    body={
        "retriever": {
            "rrf": {
                "retrievers": [
                    {
                        "standard": {
                            "query": {
                                "bool": {
                                    "should": [
                                        {"match": {"name": {"query": "wireless headphones", "boost": 2}}},
                                        {"match": {"description": "wireless headphones"}},
                                    ],
                                    "filter": [
                                        {"term":  {"in_stock": True}},
                                        {"range": {"price": {"lte": 300}}},
                                    ]
                                }
                            }
                        }
                    },
                    {
                        "knn": {
                            "field": "embedding",
                            "query_vector": query_vec,
                            "num_candidates": 100,
                            "filter": [
                                {"term":  {"in_stock": True}},
                                {"range": {"price": {"lte": 300}}},
                            ]
                        }
                    }
                ],
                "rank_window_size": 50,
            }
        },
        "aggs": {
            "brands": {
                "terms": {"field": "brand"}
            },
            "price_ranges": {
                "range": {
                    "field": "price",
                    "ranges": [
                        {"to": 100}, {"from": 100, "to": 200}, {"from": 200}
                    ]
                }
            }
        },
        "size": 10,
    }
)

The aggregation block at the bottom produces facets for filtering by brand and price range. Purpose-built vector databases have metadata filtering but not the rich aggregation pipeline that powers faceted search interfaces.

Existing Query DSL Knowledge and Tooling

Teams with years of Elasticsearch query DSL expertise face a real learning curve with a purpose-built vector database API. Kibana dashboards, existing alerting rules, and existing ingest pipelines all continue to work when you add vector fields. The Kibana vector search UI can visualize semantic search results without any additional tooling.

Where Purpose-Built Vector Databases Win

Pure Vector Search Performance

When vector search is the primary or sole workload, purpose-built systems consistently outperform Elasticsearch at scale. Qdrant's Rust-based storage engine, Milvus's GPU-accelerated indexing, and Pinecone's managed infrastructure all deliver lower latency at higher throughput than Elasticsearch for workloads involving hundreds of millions of vectors.

python

from qdrant_client import QdrantClient, models

client = QdrantClient(host="localhost", port=6333)

# Qdrant: single hybrid query with DBSF fusion
# Competes directly with Elasticsearch's RRF hybrid but with
# lower latency at scale due to Rust storage engine
results = client.query_points(
    collection_name="knowledge-base",
    prefetch=[
        models.Prefetch(
            query=models.SparseVector(
                indices=[101, 503, 2041],
                values=[0.82, 1.41, 0.54]
            ),
            using="sparse",
            limit=50,
        ),
        models.Prefetch(
            query=query_vec,   # dense embedding
            using="dense",
            limit=50,
        ),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=10,
    with_payload=True,
)

Qdrant's hybrid search uses the same RRF fusion as Elasticsearch but runs on a storage engine purpose-built for vector workloads. For datasets above 50 million vectors with tight latency requirements, the performance difference becomes operationally significant.

Simpler Operational Model for Vector-Primary Workloads

Elasticsearch clusters require careful tuning of JVM heap sizes, shard counts, replica configurations, and refresh intervals. The configuration surface area is large because the system was designed to handle many different types of workloads and has accumulated that complexity over 15 years.

For a team whose only database workload is vector search and retrieval, a dedicated vector database offers a simpler operational model. Pinecone's fully managed serverless offering requires zero infrastructure management. Qdrant's Docker Compose single-node deployment is operational in minutes with sensible defaults.

GPU-Accelerated Indexing

Milvus supports GPU-accelerated index building for datasets of hundreds of millions of vectors. This capability does not exist in Elasticsearch. For organizations ingesting billions of document chunks at real-time rates, GPU indexing is a meaningful capability.

Advanced Quantization and Memory Efficiency

Qdrant, Weaviate, and Pinecone all support product quantization and scalar quantization for compressing vector storage. Binary quantization can reduce storage by 32 times with acceptable recall degradation. These features are more mature in dedicated systems than in Elasticsearch's vector implementation.

According to the Elastic community forum discussion on vector database pros and cons, Elasticsearch does not support CUDA or GPU acceleration and has performance limitations on ANN benchmarks compared to purpose-built systems.

The Comparison Table

plaintext

Dimension              | Elasticsearch            | Dedicated Vector DB
                       |                          | (Qdrant / Weaviate / Milvus)
-----------------------+--------------------------+---------------------------
Primary workload       | Full-text search,        | Vector similarity search
                       | log analytics, SIEM      | and ANN retrieval
Vector search added    | Yes (kNN, v8.0+)         | Core capability from day 1
Architecture           | JVM, Lucene inverted     | C++ / Rust / Go, purpose-
                       | index + HNSW addon       | built HNSW storage engine
BM25 full-text         | Mature, 15+ years        | Basic or via integration
Hybrid search          | Yes (RRF, weighted)      | Yes (RRF, DBSF, weighted)
ELSER sparse neural    | Yes (first-party model)  | Via SPLADE or BM25 index
Pure vector latency    | 100 to 200ms at 2M vecs  | 6 to 20ms at same scale
Indexing speed         | Slow (5.5hr per 10M)     | Fast (32min per 10M)
GPU acceleration       | No                       | Yes (Milvus)
Quantization           | Limited                  | Mature (PQ, SQ, binary)
Aggregations / facets  | Rich (Lucene aggregation)| Limited metadata filters
Query DSL              | Mature, expressive SQL   | Proprietary API / gRPC
Operational complexity | High (cluster tuning)    | Lower for vector workloads
Managed cloud option   | Elastic Cloud            | Pinecone, Weaviate Cloud,
                       |                          | Zilliz Cloud
Ecosystem maturity     | 15 years                 | 2021 to present
Open source            | Yes (with paid features) | Yes (Qdrant, Weaviate, Milvus)
Best for               | Existing ES users,       | Greenfield AI apps, RAG,
                       | hybrid text + vector,    | billion-scale vector search,
                       | aggregation + facets      | latency-critical use cases

A Practical Decision Framework

The choice follows from a short sequence of questions.

Do you already run Elasticsearch in production? If yes, start there. Add the dense_vector field type and ELSER to your existing indexes. Run a hybrid search proof-of-concept on your actual data. Measure latency on your actual query patterns. If latency and throughput are acceptable, you are done. Adding a new system has real operational cost that a marginally better benchmark does not justify.

Is full-text search a core requirement alongside vector search? If your users search for exact product names, legal case numbers, and error codes alongside natural language queries, Elasticsearch's BM25 implementation remains one of the strongest in the field. The combination of ELSER, BM25, and dense kNN in a single system is a genuine competitive strength.

Do you need rich aggregations or faceted search? Elasticsearch's aggregation pipeline has no equivalent in purpose-built vector databases. If your application requires faceted filtering, date histograms, field value analytics, or Kibana-style dashboards, Elasticsearch is the right choice regardless of vector search performance.

Is this a greenfield AI application where vector search dominates? If vector retrieval for a RAG pipeline or recommendation engine is the primary workload, start with a purpose-built database. The simpler operational model, lower latency, and purpose-built API are worth more than Elasticsearch compatibility when you are not building on existing Elasticsearch infrastructure.

Do you have very large datasets or strict latency requirements? Above 50 to 100 million vectors, or with p95 latency requirements below 50ms, benchmark Qdrant or Milvus against your workload before committing to Elasticsearch. The architectural differences produce real performance gaps at scale.

python

# Decision logic in code form
def choose_system(
    already_runs_elasticsearch: bool,
    needs_full_text_search: bool,
    needs_aggregations: bool,
    dataset_size_millions: float,
    p95_latency_requirement_ms: float,
    primary_workload_is_vectors: bool,
) -> str:

    if already_runs_elasticsearch:
        # Start with ES unless proven insufficient
        if dataset_size_millions > 100 or p95_latency_requirement_ms < 50:
            return "Benchmark Qdrant or Milvus vs your Elasticsearch deployment"
        return "Elasticsearch with kNN + ELSER"

    if needs_aggregations:
        return "Elasticsearch (no equivalent in purpose-built vector DBs)"

    if needs_full_text_search and not primary_workload_is_vectors:
        return "Elasticsearch (BM25 + kNN hybrid)"

    if dataset_size_millions < 10:
        return "pgvector (if on Postgres) or Chroma (for prototyping)"

    if dataset_size_millions < 100:
        return "Qdrant or Weaviate (open source, good defaults)"

    return "Qdrant, Milvus, or Pinecone (benchmark for your workload)"

The Convergence Trend

The boundary between Elasticsearch and purpose-built vector databases is narrowing from both directions.

Elasticsearch added HNSW, ELSER, hybrid search with RRF, and third-party sparse model support from Hugging Face. The vector feature set in Elasticsearch 8.x is genuinely capable, not a checkbox feature.

Purpose-built vector databases added BM25 support, richer metadata filtering, and aggregation capabilities. Milvus released native full-text search using Sparse-BM25. Weaviate added BM25F. Qdrant extended its filtering API significantly.

According to Pureinsights, vectors have become a feature, not a moat. MongoDB Atlas added vector search. Redis added vector search. The expectation that "you need a dedicated vector database for AI" has given way to "every serious database now supports vectors."

What remains is not a categorical distinction but a performance tradeoff. For workloads where pure vector search performance is the primary concern at very large scale, purpose-built systems still hold a structural advantage from their C++/Rust foundation versus Elasticsearch's JVM heritage.

Connecting This to the Broader Series

The semantic search article covers the full query pipeline: chunking, embedding, ANN retrieval, and reranking. Elasticsearch can serve as the ANN retrieval layer in that pipeline, with the same embedding models and rerankers available regardless of which storage backend you choose.

The dense vs sparse vectors article covers BM25, SPLADE, and hybrid fusion in depth. Elasticsearch's ELSER is a concrete implementation of the SPLADE principles described there. The RRF fusion algorithm is the same whether it runs in Elasticsearch or in Qdrant.

The why traditional indexes fail for vector search article explains why the JVM overhead and Lucene inverted index architecture create a performance ceiling for Elasticsearch on pure vector workloads, covering the same HNSW mechanics from first principles.

Summary

Elasticsearch is a capable vector search platform for teams already operating it. Its BM25 implementation is production-hardened, ELSER provides neural sparse retrieval without requiring a separate embedding API, and hybrid search with RRF works out of the box since version 8.9.

It is not the fastest option for pure vector search. At 1 million vectors, benchmarks show a 30x latency gap versus Milvus. At 10 million vectors, indexing takes 10 times longer than Qdrant. Those gaps matter for latency-sensitive applications and large-scale real-time ingestion.

The practical decision: if you already run Elasticsearch and your vector workload fits within its performance envelope, stay there. If you are building a greenfield AI application where vector retrieval is the primary workload, choose a purpose-built database and size it for your actual scale requirements.

The full landscape of vector database options is covered in the vector database fundamentals pillar article. The next article in this series covers why traditional indexes fail for vector search, explaining the mathematical and architectural reasons that underlie the performance gaps described here.

Sources and Further Reading

Elastic. ELSER: Elastic Learned Sparse EncodeR. elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html
Elastic. Semantic Reranking. elastic.co/docs/solutions/search/ranking/semantic-reranking
Zilliz. Elasticsearch Was Great, But Vector Databases Are the Future. medium.com/@zilliz_learn/elasticsearch-was-great-but-vector-databases-are-the-future
Qdrant. Vector Search Benchmarks. qdrant.tech/benchmarks
Intercom / Fin AI. Do You Really Need a Vector Search Database? fin.ai/research/do-you-really-need-a-vector-search-database
Pureinsights. From Vector Hype to Hybrid Reality: Is Elasticsearch Still the Right Bet? pureinsights.com/blog/2026/from-vector-hype-to-hybrid-reality-is-elasticsearch-still-the-right-bet
Hugging Face / Muhammad Imran Zaman. pgvector vs Elasticsearch vs Qdrant vs Pinecone vs Weaviate: A 14-Case Benchmark. huggingface.co/blog/ImranzamanML/pgvector-vs-elasticsearch-vs-qdrant-vs-pinecone-vs
Elastic Community. Pros and Cons of Using Elastic as a Vector Database. discuss.elastic.co/t/pros-and-cons-of-using-elastic-as-a-vector-database/338733
Pureinsights. Comparing Vector Search Solutions 2024. pureinsights.com/blog/2024/comparing-vector-search-solutions-2024
AI Multiple. Top Vector Database for RAG: Qdrant vs Weaviate vs Pinecone. aimultiple.com/vector-database-for-rag
Firecrawl Blog. Best Vector Databases in 2026: A Complete Comparison Guide. firecrawl.dev/blog/best-vector-databases
Capella Solutions. Elasticsearch vs Vector Databases: Decoding the Best Data Management Solution. capellasolutions.com/blog/elasticsearch-vs-vector-databases
Milvus. Official Documentation. milvus.io/docs
Weaviate. Hybrid Search Explained. weaviate.io/blog/hybrid-search-explained

That is not a story about Elasticsearch winning. It is a story about the right tool for a specific workload profile.

What Elasticsearch Was Built to Do

json

// A standard Elasticsearch query
{
  "query": {
    "bool": {
      "must": {
        "match": { "body": "connection timeout error" }
      },
      "filter": [
        { "term":  { "status": "open" } },
        { "range": { "created_at": { "gte": "now-30d" } } }
      ]
    }
  }
}

How Elasticsearch Added Vector Search

python

from elasticsearch import Elasticsearch
import openai

es = Elasticsearch("http://localhost:9200")
oai = openai.OpenAI(api_key="your-key")

# Create an index with a dense vector field
es.indices.create(
    index="knowledge-base",
    body={
        "mappings": {
            "properties": {
                "content":   {"type": "text"},
                "category":  {"type": "keyword"},
                "embedding": {
                    "type": "dense_vector",
                    "dims": 1536,
                    "index": True,
                    "similarity": "cosine",
                }
            }
        }
    }
)

# Ingest a document with its embedding
doc = "Connection timeout errors occur on slow or unreliable networks."
vec = oai.embeddings.create(input=doc, model="text-embedding-3-small").data[0].embedding

es.index(
    index="knowledge-base",
    document={
        "content":   doc,
        "category":  "troubleshooting",
        "embedding": vec,
    }
)

# Hybrid search: BM25 + kNN with RRF fusion
query_text = "why does my app crash on slow connections"
query_vec  = oai.embeddings.create(
    input=query_text, model="text-embedding-3-small"
).data[0].embedding

response = es.search(
    index="knowledge-base",
    body={
        "retriever": {
            "rrf": {
                "retrievers": [
                    {
                        "standard": {
                            "query": {"match": {"content": query_text}}
                        }
                    },
                    {
                        "knn": {
                            "field": "k",
                            "query_vector": query_vec,
                            "num_candidates": 50,
                        }
                    }
                ]
            }
        },
        "size": 5,
    }
)

for hit in response["hits"]["hits"]:
    print(f"Score {hit['_score']:.4f}: {hit['_source']['content'][:70]}")

ELSER: Elasticsearch's Sparse Neural Model

The Architectural Difference That Explains the Performance Gap

Benchmark Comparison: What the Numbers Actually Show

Several independent benchmarks are available, and they generally agree on the following picture.

plaintext

Use case: AI customer support agent (Intercom Fin)
Dataset:  20M embeddings, 768D
Evaluated: Pinecone, Milvus, Qdrant, Weaviate, Elasticsearch

Elasticsearch search latency:   100 to 200ms
LLM generation latency:         2,000 to 5,000ms

Latency savings from switching: ~200ms out of 4,000ms total
Operational cost of switching:  New system, new expertise, new infra
Decision:                        Stay on Elasticsearch

This is the right analysis. Optimize for the bottleneck. If LLM latency is your bottleneck, saving 150ms on search has minimal user impact.

Where Elasticsearch Wins

Existing Elasticsearch Infrastructure

Hybrid Search With Full-Text Strength

python

# Elasticsearch: complex hybrid query that no purpose-built
# vector database can replicate in a single query
response = es.search(
    index="products",
    body={
        "retriever": {
            "rrf": {
                "retrievers": [
                    {
                        "standard": {
                            "query": {
                                "bool": {
                                    "should": [
                                        {"match": {"name": {"query": "wireless headphones", "boost": 2}}},
                                        {"match": {"description": "wireless headphones"}},
                                    ],
                                    "filter": [
                                        {"term":  {"in_stock": True}},
                                        {"range": {"price": {"lte": 300}}},
                                    ]
                                }
                            }
                        }
                    },
                    {
                        "knn": {
                            "field": "embedding",
                            "query_vector": query_vec,
                            "num_candidates": 100,
                            "filter": [
                                {"term":  {"in_stock": True}},
                                {"range": {"price": {"lte": 300}}},
                            ]
                        }
                    }
                ],
                "rank_window_size": 50,
            }
        },
        "aggs": {
            "brands": {
                "terms": {"field": "brand"}
            },
            "price_ranges": {
                "range": {
                    "field": "price",
                    "ranges": [
                        {"to": 100}, {"from": 100, "to": 200}, {"from": 200}
                    ]
                }
            }
        },
        "size": 10,
    }
)

Existing Query DSL Knowledge and Tooling

Where Purpose-Built Vector Databases Win

Pure Vector Search Performance

python

from qdrant_client import QdrantClient, models

client = QdrantClient(host="localhost", port=6333)

# Qdrant: single hybrid query with DBSF fusion
# Competes directly with Elasticsearch's RRF hybrid but with
# lower latency at scale due to Rust storage engine
results = client.query_points(
    collection_name="knowledge-base",
    prefetch=[
        models.Prefetch(
            query=models.SparseVector(
                indices=[101, 503, 2041],
                values=[0.82, 1.41, 0.54]
            ),
            using="sparse",
            limit=50,
        ),
        models.Prefetch(
            query=query_vec,   # dense embedding
            using="dense",
            limit=50,
        ),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=10,
    with_payload=True,
)

Simpler Operational Model for Vector-Primary Workloads

GPU-Accelerated Indexing

Advanced Quantization and Memory Efficiency

The Comparison Table

plaintext

Dimension              | Elasticsearch            | Dedicated Vector DB
                       |                          | (Qdrant / Weaviate / Milvus)
-----------------------+--------------------------+---------------------------
Primary workload       | Full-text search,        | Vector similarity search
                       | log analytics, SIEM      | and ANN retrieval
Vector search added    | Yes (kNN, v8.0+)         | Core capability from day 1
Architecture           | JVM, Lucene inverted     | C++ / Rust / Go, purpose-
                       | index + HNSW addon       | built HNSW storage engine
BM25 full-text         | Mature, 15+ years        | Basic or via integration
Hybrid search          | Yes (RRF, weighted)      | Yes (RRF, DBSF, weighted)
ELSER sparse neural    | Yes (first-party model)  | Via SPLADE or BM25 index
Pure vector latency    | 100 to 200ms at 2M vecs  | 6 to 20ms at same scale
Indexing speed         | Slow (5.5hr per 10M)     | Fast (32min per 10M)
GPU acceleration       | No                       | Yes (Milvus)
Quantization           | Limited                  | Mature (PQ, SQ, binary)
Aggregations / facets  | Rich (Lucene aggregation)| Limited metadata filters
Query DSL              | Mature, expressive SQL   | Proprietary API / gRPC
Operational complexity | High (cluster tuning)    | Lower for vector workloads
Managed cloud option   | Elastic Cloud            | Pinecone, Weaviate Cloud,
                       |                          | Zilliz Cloud
Ecosystem maturity     | 15 years                 | 2021 to present
Open source            | Yes (with paid features) | Yes (Qdrant, Weaviate, Milvus)
Best for               | Existing ES users,       | Greenfield AI apps, RAG,
                       | hybrid text + vector,    | billion-scale vector search,
                       | aggregation + facets      | latency-critical use cases

A Practical Decision Framework

The choice follows from a short sequence of questions.

python

# Decision logic in code form
def choose_system(
    already_runs_elasticsearch: bool,
    needs_full_text_search: bool,
    needs_aggregations: bool,
    dataset_size_millions: float,
    p95_latency_requirement_ms: float,
    primary_workload_is_vectors: bool,
) -> str:

    if already_runs_elasticsearch:
        # Start with ES unless proven insufficient
        if dataset_size_millions > 100 or p95_latency_requirement_ms < 50:
            return "Benchmark Qdrant or Milvus vs your Elasticsearch deployment"
        return "Elasticsearch with kNN + ELSER"

    if needs_aggregations:
        return "Elasticsearch (no equivalent in purpose-built vector DBs)"

    if needs_full_text_search and not primary_workload_is_vectors:
        return "Elasticsearch (BM25 + kNN hybrid)"

    if dataset_size_millions < 10:
        return "pgvector (if on Postgres) or Chroma (for prototyping)"

    if dataset_size_millions < 100:
        return "Qdrant or Weaviate (open source, good defaults)"

    return "Qdrant, Milvus, or Pinecone (benchmark for your workload)"

The Convergence Trend

The boundary between Elasticsearch and purpose-built vector databases is narrowing from both directions.

Connecting This to the Broader Series

Summary

Sources and Further Reading

Elastic. ELSER: Elastic Learned Sparse EncodeR. elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html
Elastic. Semantic Reranking. elastic.co/docs/solutions/search/ranking/semantic-reranking
Zilliz. Elasticsearch Was Great, But Vector Databases Are the Future. medium.com/@zilliz_learn/elasticsearch-was-great-but-vector-databases-are-the-future
Qdrant. Vector Search Benchmarks. qdrant.tech/benchmarks
Intercom / Fin AI. Do You Really Need a Vector Search Database? fin.ai/research/do-you-really-need-a-vector-search-database
Pureinsights. From Vector Hype to Hybrid Reality: Is Elasticsearch Still the Right Bet? pureinsights.com/blog/2026/from-vector-hype-to-hybrid-reality-is-elasticsearch-still-the-right-bet
Hugging Face / Muhammad Imran Zaman. pgvector vs Elasticsearch vs Qdrant vs Pinecone vs Weaviate: A 14-Case Benchmark. huggingface.co/blog/ImranzamanML/pgvector-vs-elasticsearch-vs-qdrant-vs-pinecone-vs
Elastic Community. Pros and Cons of Using Elastic as a Vector Database. discuss.elastic.co/t/pros-and-cons-of-using-elastic-as-a-vector-database/338733
Pureinsights. Comparing Vector Search Solutions 2024. pureinsights.com/blog/2024/comparing-vector-search-solutions-2024
AI Multiple. Top Vector Database for RAG: Qdrant vs Weaviate vs Pinecone. aimultiple.com/vector-database-for-rag
Firecrawl Blog. Best Vector Databases in 2026: A Complete Comparison Guide. firecrawl.dev/blog/best-vector-databases
Capella Solutions. Elasticsearch vs Vector Databases: Decoding the Best Data Management Solution. capellasolutions.com/blog/elasticsearch-vs-vector-databases
Milvus. Official Documentation. milvus.io/docs
Weaviate. Hybrid Search Explained. weaviate.io/blog/hybrid-search-explained

What Elasticsearch Was Built to Do

How Elasticsearch Added Vector Search

ELSER: Elasticsearch's Sparse Neural Model

The Architectural Difference That Explains the Performance Gap

Benchmark Comparison: What the Numbers Actually Show

Where Elasticsearch Wins

Existing Elasticsearch Infrastructure

Hybrid Search With Full-Text Strength

Existing Query DSL Knowledge and Tooling

Where Purpose-Built Vector Databases Win

Pure Vector Search Performance

Simpler Operational Model for Vector-Primary Workloads

GPU-Accelerated Indexing

Advanced Quantization and Memory Efficiency

The Comparison Table

A Practical Decision Framework

The Convergence Trend

Connecting This to the Broader Series

Summary

Sources and Further Reading

Krunal Kanojiya

Related Posts

What Elasticsearch Was Built to Do

How Elasticsearch Added Vector Search

ELSER: Elasticsearch's Sparse Neural Model

The Architectural Difference That Explains the Performance Gap

Benchmark Comparison: What the Numbers Actually Show

Where Elasticsearch Wins

Existing Elasticsearch Infrastructure

Hybrid Search With Full-Text Strength

Existing Query DSL Knowledge and Tooling

Where Purpose-Built Vector Databases Win

Pure Vector Search Performance

Simpler Operational Model for Vector-Primary Workloads

GPU-Accelerated Indexing

Advanced Quantization and Memory Efficiency

The Comparison Table

A Practical Decision Framework

The Convergence Trend

Connecting This to the Broader Series

Summary

Sources and Further Reading

Krunal Kanojiya

Related Posts