What is the difference between dense and sparse vectors?

A dense vector has a value in nearly every dimension, typically produced by a neural network. A sparse vector has mostly zero values with only a handful of non-zero entries, typically produced by statistical methods like TF-IDF or BM25. Dense vectors encode semantic meaning. Sparse vectors encode lexical presence. Dense vectors are better for finding conceptually similar content. Sparse vectors are better for finding exact keyword matches.

What is a sparse vector in machine learning?

A sparse vector is a list of numbers where the vast majority are zero. In natural language processing, the classic example is a bag-of-words representation. A vocabulary of 50,000 words produces a 50,000-dimensional vector for each document, but most documents use only a few hundred unique words, so 49,000-plus values are zero. Sparse vectors are memory-efficient and highly interpretable because each dimension maps to a known feature.

What is a dense vector in machine learning?

A dense vector is a fixed-length list of numbers where nearly every element carries a non-zero value. These are produced by embedding models like BERT, Word2Vec, and OpenAI's embedding API. A typical dense vector has 384 to 3072 dimensions and every dimension encodes some aspect of semantic meaning, though no single dimension has a human-readable interpretation.

What is BM25 and how does it produce sparse vectors?

BM25 (Best Matching 25) is a probabilistic ranking function that scores how relevant a document is to a query based on term frequency and inverse document frequency, with a document length normalization penalty. It produces sparse vectors because it assigns a weight only to terms that appear in the document. All other vocabulary terms get a weight of zero. It is the default ranking algorithm in Elasticsearch, Solr, and Lucene.

What is SPLADE and how is it different from BM25?

SPLADE (Sparse Lexical and Expansion model) is a transformer-based model that produces sparse vectors similar in structure to BM25 but with two improvements. First, its weights are learned rather than computed by a fixed formula. Second, it performs query and document expansion: a query about 'car' will also activate dimensions for 'vehicle', 'automobile', and 'automotive', closing the vocabulary mismatch problem that BM25 cannot handle.

What is hybrid search and when should I use it?

Hybrid search runs a sparse retriever (BM25 or SPLADE) and a dense vector retriever in parallel, then merges and reranks results using a fusion algorithm like Reciprocal Rank Fusion. You should use it when your application handles both semantic queries ('what is the best approach for onboarding new employees') and exact match queries ('PROD-SKU-7842X'). Research shows hybrid search reaches 91 percent recall at 10 compared to 78 percent for dense-only and 65 percent for sparse-only retrieval.

Which vector databases support hybrid search?

Weaviate, Qdrant, Elasticsearch (v8.9 and above), OpenSearch, Milvus, and Redis all support hybrid search combining dense vector search with BM25 or SPLADE sparse retrieval. Weaviate's hybrid() API accepts an alpha parameter to control the weighting between sparse and dense scores. Qdrant supports DBSF (Distribution-Based Score Fusion) as an alternative to Reciprocal Rank Fusion.

Dense vs Sparse Vectors: Examples + When to Use (2026)

Run a semantic search for "ERR_CONN_RESET_4XX retry semantics" using only dense embeddings. The retriever returns semantically adjacent documents about networking delays and connection errors. The actual answer, buried in section 3.2, gets ranked eleventh. Now run the same query through a BM25 index. The string "ERR_CONN_RESET_4XX" has an extremely high IDF score because it appears in almost no document other than section 3.2. BM25 returns that section as result number one.

That failure mode reveals the structural weakness of dense vectors. They were trained to generalize across language, which means exact string matching is exactly what they sacrifice. Sparse vectors were built for precise retrieval, so semantic generalization is what they never had.

Understanding both representations, when each one works, and how to combine them is now table stakes for anyone building production retrieval systems. This article covers the mechanics of each type, the algorithms that produce them, and the hybrid search architecture that brings them together.

This is the third article in the Vector Database Fundamentals series. It builds directly on what a vector is and how embeddings are generated, and connects forward to semantic search mechanics and the vector database infrastructure that stores and indexes both types.

What Makes a Vector Dense or Sparse?

The classification comes down to one property: what fraction of the dimensions have non-zero values.

A dense vector has a value in nearly every dimension. If the vector has 768 dimensions, almost all 768 positions contain a floating-point number that is not zero. Every dimension contributes to the representation.

A sparse vector has values in only a small number of dimensions. If the vector has 50,000 dimensions corresponding to a vocabulary of 50,000 words, a typical document might activate only 200 to 500 of those dimensions. The remaining 49,500-plus values are zero.

python

import numpy as np

# Dense vector (768-dim embedding — most values non-zero)
dense = np.array([0.412, -0.231, 0.887, 0.051, -0.330, 0.712, ...])
# nearly all 768 dimensions contain a meaningful float

# Sparse vector (10,000-dim vocabulary — mostly zeros)
sparse = np.zeros(10000)
sparse[243]  = 0.82    # "machine" appears 3 times
sparse[1047] = 1.41    # "learning" appears 5 times
sparse[8833] = 0.54    # "vector" appears 2 times
# the remaining 9,997 positions stay zero

nonzero_count = np.count_nonzero(sparse)
print(f"Non-zero dimensions: {nonzero_count} out of {len(sparse)}")
# Non-zero dimensions: 3 out of 10000

According to Weaviate's hybrid search documentation, sparse vectors have mostly zero values with only a few non-zero values, while dense vectors mostly contain non-zero values. Dense embeddings are generated from machine learning models, and sparse embeddings are generated from algorithms like BM25 and SPLADE.

Dense Vectors: Semantic Meaning as Geometry

Dense vectors are the output of trained embedding models. When you call OpenAI's embedding API or run a Sentence Transformers model, you get a dense vector for each input.

The core property of a dense vector is that its geometry encodes semantic meaning. Content with similar meanings produces vectors that point in similar directions in the vector space. The individual dimensions do not have human-readable interpretations. Meaning is distributed across all of them simultaneously.

python

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "How do I return a product?",
    "What is your refund policy?",
    "I want to cancel my order.",
    "What is the distance from Earth to Mars?",
]

embeddings = model.encode(sentences)

def cosine_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare semantically related sentences
print(cosine_sim(embeddings[0], embeddings[1]))   # ~0.85 — very similar
print(cosine_sim(embeddings[0], embeddings[2]))   # ~0.72 — related
print(cosine_sim(embeddings[0], embeddings[3]))   # ~0.12 — unrelated

"Return a product" and "refund policy" share no keywords, yet their dense vectors are close because both belong to the semantic neighborhood of customer returns and commerce support. The embedding model learned this relationship from patterns in training data.

As Milvus' quick reference on dense and sparse embeddings explains, dense embeddings excel at capturing nuanced relationships and contextual meaning, making them ideal for tasks like semantic search or recommendation systems.

Characteristics of Dense Vectors

Dense vectors have fixed dimensionality. All documents indexed with a given model produce vectors of identical length. You cannot compare a 768-dimensional BERT embedding to a 1536-dimensional OpenAI embedding. They live in incompatible spaces.

Dense vectors are not interpretable. You cannot look at dimension 412 and say "this dimension represents sports content." Information is distributed and entangled across all dimensions by design.

Dense vectors are computationally expensive to index for exact nearest-neighbor search. At millions of documents, brute-force comparison is not viable, which is why vector databases use approximate nearest neighbor algorithms like HNSW. This is covered in the vector database fundamentals article.

Sparse Vectors: Term Presence as Weight

Sparse vectors encode which terms are present in a document and how important they are. Each dimension maps to one term in a fixed vocabulary. The value at that position reflects the weight of that term in the document.

The classic method for producing sparse vectors is TF-IDF, and its more refined descendant BM25.

TF-IDF

TF-IDF (Term Frequency Inverse Document Frequency) assigns a weight to each term in a document by multiplying two factors: how often the term appears in that document (term frequency) and how rare the term is across all documents (inverse document frequency).

python

from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

corpus = [
    "machine learning is a subset of artificial intelligence",
    "deep learning uses neural networks",
    "artificial intelligence includes machine learning and deep learning",
    "neural networks are inspired by the human brain",
]

vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(corpus)

# Each document is now a sparse vector
doc0_vector = tfidf_matrix[0].toarray()[0]
vocab = vectorizer.get_feature_names_out()

# Print only non-zero terms for document 0
nonzero_indices = np.nonzero(doc0_vector)[0]
for idx in nonzero_indices:
    print(f"  '{vocab[idx]}': {doc0_vector[idx]:.4f}")

# Output (only words present in doc 0):
#   'artificial': 0.3853
#   'intelligence': 0.3853
#   'is': 0.5087
#   'learning': 0.2810
#   'machine': 0.3853
#   'of': 0.5087
#   'subset': 0.5087

The word "machine" gets a moderate score because it appears in document 0 but also in document 2. The word "subset" gets a high score because it appears only in document 0, making it more distinctive.

BM25: The Industry Standard for Keyword Search

BM25 (Best Matching 25) improves on TF-IDF with two additions. First, it applies term frequency saturation: a word appearing 10 times in a document contributes more weight than one appearing once, but the weight does not grow linearly. After a point, additional occurrences contribute diminishing returns. Second, it applies document length normalization: a term appearing twice in a short document is more meaningful than the same term appearing twice in a very long document.

According to Weaviate's documentation, BM25 builds on TF-IDF by taking the Binary Independence Model from the IDF calculation and adding a normalization penalty that weighs a document's length relative to the average length of all documents in the database.

python

from rank_bm25 import BM25Okapi

corpus = [
    "how do I return a product",
    "what is your refund policy",
    "I want to cancel my subscription",
    "how to contact customer support",
    "machine learning model training tutorial",
]

# Tokenize
tokenized_corpus = [doc.split() for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)

# Query
query = "product return"
tokenized_query = query.split()
scores = bm25.get_scores(tokenized_query)

for doc, score in zip(corpus, scores):
    print(f"Score {score:.4f}: {doc}")

# Output:
# Score 0.9516: how do I return a product         ← highest
# Score 0.1823: what is your refund policy
# Score 0.0000: I want to cancel my subscription
# Score 0.0000: how to contact customer support
# Score 0.0000: machine learning model training tutorial

BM25 correctly ranks "return a product" highest for the query "product return." However, "refund policy" gets a very low score because it shares no keywords with the query, even though it is semantically related. This is the fundamental limitation of sparse retrieval.

BM25 is the default ranking algorithm in Elasticsearch, Solr, Lucene, and OpenSearch. It has been the backbone of keyword search for decades.

SPLADE: A Neural Sparse Model

SPLADE (Sparse Lexical and Expansion model) is the most important development in sparse retrieval in recent years. It uses a transformer model to produce sparse vectors that have the same structure as BM25 output (one dimension per vocabulary term, most values zero) but with two critical differences: the weights are learned rather than computed by formula, and the model performs query and document expansion.

plaintext

BM25 query "car":
  Activates dimensions: {car: 2.1}

SPLADE query "car":
  Activates dimensions: {car: 1.8, vehicle: 1.4, automobile: 1.1, automotive: 0.7, driver: 0.4}

SPLADE learns that "car," "vehicle," "automobile," and "automotive" appear in similar contexts and therefore expands the sparse vector to include all of them. According to Chroma's sparse vector documentation, SPLADE combines the precision of keyword search with the contextual awareness of neural models.

According to Elasticsearch's sparse embedding documentation, sparse vectors rely on term-based representations, making them more effective for zero-shot retrieval — where the model handles queries it has not explicitly been trained on. Unlike dense vector models that often need domain-specific training, sparse vectors generalize better to new domains out of the box.

Elastic's own implementation of this idea is ELSER (Elastic Learned Sparse EncodeR), which uses the same SPLADE principles tuned for English retrieval.

The practical tradeoff: SPLADE produces better retrieval quality than BM25 on most benchmarks, but requires running a transformer at inference time, adding roughly 100 to 300ms of latency depending on hardware. For exact-match heavy use cases involving product codes, error identifiers, or document IDs, BM25 often wins because those tokens get maximal IDF scores with zero inference overhead.

The Structural Failure Modes

Understanding where each representation breaks down is as important as understanding where it succeeds.

Where Dense Vectors Fail

Dense vectors fail on exact identifier matching. When a user queries "PROD-SKU-7842X", the embedding model maps this to a neighborhood of similar-looking product codes. It might return "PROD-SKU-7842Y" with high confidence. That is a wrong answer delivered with high confidence.

Dense vectors also fail on low-frequency proper nouns. If a new product name or person's name appears rarely or never in the training corpus, the embedding model has no good representation for it. The embedding lands in an arbitrary neighborhood that may bear no relationship to the actual meaning.

Dense vectors fail on technical jargon that differs from natural language usage. An error code like "ERR_CONN_RESET_4XX" is not in any training corpus. The dense model cannot distinguish it from similar-looking strings.

Where Sparse Vectors Fail

Sparse vectors fail on vocabulary mismatch. "How do I get a refund?" and "What is your return policy?" share no keywords, so BM25 returns zero similarity between them. A user asking in different words than the document uses will get no results.

Sparse vectors fail on synonyms by default (though SPLADE partially fixes this). "Car," "vehicle," and "automobile" are completely different tokens to a BM25 index. None of the word's relationships are captured.

Sparse vectors fail on semantic queries that rely on contextual interpretation. "What should I eat when I am feeling anxious?" requires understanding that "anxious" relates to mental states, that certain foods affect mood, and that the user is asking for a recommendation rather than a factual definition. BM25 has none of this.

The failures are structural and complementary. Dense misses exact strings. Sparse misses semantic relationships. This is the premise of hybrid search.

Hybrid Search: Combining Both Representations

Hybrid search runs a sparse retriever and a dense vector retriever in parallel, then merges their ranked results using a fusion algorithm before passing the top chunks to an LLM or presenting results to a user.

plaintext

User query: "how to handle HTTP timeout errors in Python"

Sparse (BM25) pipeline:
  Query → tokenize → BM25 score → ranked list A
  Strong on: "HTTP", "timeout", "Python" — exact matches

Dense (embedding) pipeline:
  Query → embedding model → ANN search → ranked list B
  Strong on: "connection error handling", "retry logic",
             "request exceptions" — semantic matches

Fusion (RRF):
  Combine ranked list A + ranked list B
  → Final merged ranked list

Top K chunks → LLM context → grounded response

According to research cited in Supermemory's hybrid search guide, dense-only retrieval hits 78 percent recall at 10, sparse-only BM25 lands at 65 percent, and hybrid search reaches 91 percent recall at 10. That gap between 78 percent and 91 percent is the difference between a production-ready RAG system and one that hallucinates on edge cases.

Reciprocal Rank Fusion

The standard fusion algorithm is Reciprocal Rank Fusion (RRF). Rather than trying to normalize and combine raw scores from two different scoring systems (which creates subtle bugs when score distributions differ), RRF uses only rank positions. Each document gets a score of 1 / (k + rank) from each retriever, where k defaults to 60. Those rank scores are summed across retrievers, and the merged list is sorted by the combined score.

python

def reciprocal_rank_fusion(sparse_results, dense_results, k=60):
    """
    Merge two ranked result lists using RRF.

    Args:
        sparse_results: list of doc IDs ordered by sparse score (best first)
        dense_results:  list of doc IDs ordered by dense score  (best first)
        k:              constant to prevent high weighting of top-1 results
    Returns:
        merged: list of (doc_id, rrf_score) sorted by combined score
    """
    scores = {}

    for rank, doc_id in enumerate(sparse_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k + rank + 1)

    for rank, doc_id in enumerate(dense_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k + rank + 1)

    merged = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    return merged


sparse_results = ["doc_A", "doc_C", "doc_E", "doc_B"]
dense_results  = ["doc_B", "doc_A", "doc_D", "doc_C"]

merged = reciprocal_rank_fusion(sparse_results, dense_results)
print("Merged ranking:")
for doc_id, score in merged:
    print(f"  {doc_id}: {score:.5f}")

# Merged ranking:
#   doc_A: 0.03226    ← ranked 1st in sparse, 2nd in dense
#   doc_B: 0.03175    ← ranked 4th in sparse, 1st in dense
#   doc_C: 0.02969    ← ranked 2nd in sparse, 4th in dense
#   doc_E: 0.01587    ← ranked 3rd in sparse only
#   doc_D: 0.01575    ← ranked 3rd in dense only

RRF is immune to the score normalization bugs that plague linear interpolation fusion. If one BM25 document has an outlier score because a query term appears 200 times in it, that does not collapse all other BM25 scores toward zero. RRF only cares about the rank, not the magnitude of the score.

According to Prems hybrid search guide, RRF at k=60 is the zero-config default. If you have 50 or more labeled query pairs, you can tune a weighted combination. Add a cross-encoder reranker after fusion for the single biggest precision improvement.

A Full Hybrid Retriever in Python

This example combines BM25 and dense embeddings with RRF using Qdrant as the vector database:

python

from qdrant_client import QdrantClient, models
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
import numpy as np

# Documents to index
documents = [
    "How do I return a product to the online store?",
    "What is your refund policy for digital purchases?",
    "I need to cancel my monthly subscription.",
    "ERR_CONN_RESET_4XX occurs when the server closes the connection.",
    "How to handle HTTP connection timeout errors in Python.",
    "Network socket errors: causes and remedies.",
]

# Embedding model
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings  = embed_model.encode(documents)

# Qdrant client (in-memory for demo)
client = QdrantClient(":memory:")
client.create_collection(
    collection_name="docs",
    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE),
)

# Upsert all documents
client.upsert(
    collection_name="docs",
    points=[
        models.PointStruct(id=i, vector=emb.tolist(), payload={"text": doc})
        for i, (doc, emb) in enumerate(zip(documents, embeddings))
    ],
)

# BM25 index (separate sparse index)
tokenized_corpus = [doc.lower().split() for doc in documents]
bm25 = BM25Okapi(tokenized_corpus)


def hybrid_search(query: str, top_k: int = 3) -> list:
    # Sparse: BM25 ranking
    sparse_scores = bm25.get_scores(query.lower().split())
    sparse_ranked  = np.argsort(sparse_scores)[::-1].tolist()

    # Dense: semantic search
    query_emb     = embed_model.encode([query])[0]
    dense_results = client.search(
        collection_name="docs",
        query_vector=query_emb.tolist(),
        limit=len(documents),
    )
    dense_ranked = [hit.id for hit in dense_results]

    # Fusion via RRF
    k      = 60
    scores = {}
    for rank, doc_id in enumerate(sparse_ranked):
        scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k + rank + 1)
    for rank, doc_id in enumerate(dense_ranked):
        scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k + rank + 1)

    top_ids = sorted(scores, key=scores.__getitem__, reverse=True)[:top_k]
    return [documents[i] for i in top_ids]


# Test 1: Exact identifier query — sparse should save this
results = hybrid_search("ERR_CONN_RESET_4XX")
print("Query: ERR_CONN_RESET_4XX")
for r in results:
    print(f"  {r}")

# Test 2: Semantic query — dense should find this despite vocabulary mismatch
results = hybrid_search("how to get my money back from a purchase")
print("\nQuery: how to get my money back from a purchase")
for r in results:
    print(f"  {r}")

The first query succeeds because BM25 gives "ERR_CONN_RESET_4XX" an extremely high IDF score. The second query succeeds because dense embeddings connect "money back" to "refund" and "return" even with no keyword overlap. Neither retriever alone would handle both queries well.

Comparing Dense and Sparse Vectors Side by Side

plaintext

Property          | Dense Vectors          | Sparse Vectors
------------------+------------------------+--------------------------
Dimensionality    | Low (128 to 3072)      | High (vocab size, 10K-50K)
Non-zero values   | Almost all             | Very few
Produced by       | Neural embedding model | BM25, TF-IDF, SPLADE
Captures          | Semantic meaning       | Lexical term presence
Strengths         | Synonym matching,      | Exact string matching,
                  | paraphrase retrieval,  | rare identifiers,
                  | contextual meaning     | new domain terms
Weaknesses        | Exact identifiers,     | Vocabulary mismatch,
                  | rare proper nouns      | synonyms, paraphrases
Interpretability  | Low                   | High
Index type        | HNSW, IVF              | Inverted index
Memory per vector | Higher per vector      | Lower (store only non-zeros)
Recall@10 alone   | ~78%                   | ~65%
Recall@10 hybrid  | 91% (combined)        | 91% (combined)

When to Use Dense, Sparse, or Hybrid

The right choice depends on your query distribution.

Use dense-only retrieval when your users write natural language queries, when vocabulary mismatch is common (your documents and users describe things differently), and when you do not have exact identifier lookups. Recommendation systems, general document Q&A, and conversational search usually fall here.

Use sparse-only retrieval when your users search for exact strings: product SKUs, error codes, legal case numbers, person names, and technical identifiers. When the query and the document are expected to use exactly the same terminology, BM25 is faster, simpler to operate, and often more precise.

Use hybrid search for most production RAG applications. Your users will send both types of queries, often in the same session. According to research cited by Infinity on hybrid retrieval, an IBM research paper compared BM25, dense vectors, BM25 plus dense, dense plus sparse, and BM25 plus dense plus sparse. The study concluded that using three-way retrieval is the optimal option for RAG.

Hybrid Search Support in Vector Databases

Every major vector database now supports hybrid search, though the implementations differ.

Weaviate exposes a single hybrid() query method accepting an alpha parameter from 0 (pure sparse) to 1 (pure dense). Internally it runs BM25 and vector search in parallel, fuses via RRF or relative score fusion, and returns a single ranked list.

Qdrant supports DBSF (Distribution-Based Score Fusion) as an alternative to RRF. DBSF normalizes scores relative to their distributions before combining, which gives better results when one retriever has a much higher score variance than the other.

Elasticsearch added native RRF support in version 8.9. It supports both BM25 and ELSER (its SPLADE-inspired sparse neural retriever) alongside dense kNN search.

Milvus supports multi-vector search, allowing simultaneous retrieval across dense and sparse vector fields in a single query with configurable fusion weights.

OpenSearch supports neural sparse search through its neural sparse query type, combining sparse neural retrieval with dense semantic search via a hybrid query wrapper.

Storage and Memory Considerations

Sparse vectors are memory-efficient to store because only non-zero values need to be saved. Libraries like scipy.sparse use compressed sparse row (CSR) format to store only the index and value of each non-zero element. A 50,000-dimensional sparse vector with 500 non-zero values requires roughly 500 times less storage than a naive full-length array.

Dense vectors require storage proportional to their dimensionality regardless of content. A 1536-dimensional float32 vector takes 6,144 bytes per document. At one million documents, that is roughly 6 GB for the raw vectors alone, before any index overhead.

Hybrid architectures therefore maintain two separate index structures: a dense vector index (HNSW or IVF) and an inverted index (the standard data structure for sparse keyword search). As noted by the GoPenAI hybrid search article, the dual index adds approximately 1.4 times the storage footprint of dense-only retrieval, with about 6ms additional query latency. For most production applications, those are negligible costs relative to the recall improvement.

The Practical Recommendation

For teams building RAG pipelines, the evidence points clearly toward hybrid. BM25 is the default sparse retriever because it has zero inference overhead, works perfectly for exact string queries, and runs inside every major search infrastructure already. If your corpus has heavy vocabulary mismatch between how users ask questions and how documents are written, replacing BM25 with SPLADE produces measurable recall gains at the cost of added inference latency.

Use RRF as the default fusion algorithm. It is immune to score normalization edge cases and requires no tuning to produce reasonable results. If you have labeled evaluation data for your specific query distribution, train a weighted linear combination for marginal gains beyond RRF.

The semantic search article covers how the query pipeline works end to end once you have both indexes in place. The vector database comparison article digs into the tradeoffs between purpose-built vector databases and Elasticsearch for hybrid workloads.

Why Traditional Indexes Cannot Index Dense Vectors

Dense vectors cannot use the same inverted index that stores sparse vectors. An inverted index works by sorting and grouping exact values. For text, it maps every unique term to the list of documents containing it. For numbers, it supports range queries. Neither operation makes sense for a 1536-dimensional float vector.

Dense vectors require ANN (Approximate Nearest Neighbor) index structures specifically designed for high-dimensional geometry. HNSW organizes vectors into a layered graph where each node connects to its nearest neighbors. IVF clusters vectors into groups and searches only the most relevant clusters at query time.

The why traditional indexes fail for vector search article covers this in full, including the curse of dimensionality and why the standard B-tree cannot be adapted to work on high-dimensional float vectors.

Summary

Dense vectors and sparse vectors are complementary representations that fail in opposite directions. Dense vectors encode semantic meaning and handle vocabulary mismatch. Sparse vectors encode lexical term presence and handle exact identifier matching. Neither is universally better.

Hybrid search combines both by running BM25 or SPLADE sparse retrieval and dense ANN search in parallel, then fusing results with Reciprocal Rank Fusion. Research consistently shows hybrid retrieval outperforms either method alone by a meaningful margin.

The full data flow: text input gets embedded into a dense vector via an embedding model, gets tokenized and scored into a sparse representation via BM25, both representations get stored in a vector database that maintains parallel indexes, and at query time both indexes are searched and fused before the top chunks reach the LLM.

Sources and Further Reading

Weaviate. Hybrid Search Explained. weaviate.io/blog/hybrid-search-explained
Elastic. Sparse Embeddings: Dense vs. Sparse Vector and Usage With ML Models. elastic.co/search-labs/blog/sparse-vector-embedding
Milvus. What Are Dense and Sparse Embeddings? milvus.io/ai-quick-reference/what-are-dense-and-sparse-embeddings
Chroma. Sparse Vector Support. trychroma.com/project/sparse-vector-search
OpenSearch. Neural Sparse Search Documentation. docs.opensearch.org/latest/vector-search/ai-search/neural-sparse-search
Supermemory. Hybrid Search Guide: Vectors and Full Text (April 2026). blog.supermemory.ai/hybrid-search-guide
Prem AI. Hybrid Search for RAG: BM25, SPLADE, and Vector Search Combined. blog.premai.io/hybrid-search-for-rag-bm25-splade-and-vector-search-combined
GoPenAI. Hybrid Search in RAG: Dense plus Sparse, RRF, and When to Use Which. blog.gopenai.com/hybrid-search-in-rag
Infiniflow. Dense Vector plus Sparse Vector plus Full Text Search plus Tensor Reranker. infiniflow.org/blog/best-hybrid-search-solution
Zilliz. Sparse and Dense Embeddings. zilliz.com/learn/sparse-and-dense-embeddings
Thakur et al. BEIR: A Heterogeneous Benchmark for Zero-Shot Evaluation of IR Models. arxiv.org/abs/2104.08663
DEV Community. Dense vs Sparse Retrieval: Mastering FAISS, BM25, and Hybrid Search. dev.to/qvfagundes/dense-vs-sparse-retrieval-mastering-faiss-bm25-and-hybrid-search-4kb1

What Makes a Vector Dense or Sparse?

The classification comes down to one property: what fraction of the dimensions have non-zero values.

python

import numpy as np

# Dense vector (768-dim embedding — most values non-zero)
dense = np.array([0.412, -0.231, 0.887, 0.051, -0.330, 0.712, ...])
# nearly all 768 dimensions contain a meaningful float

# Sparse vector (10,000-dim vocabulary — mostly zeros)
sparse = np.zeros(10000)
sparse[243]  = 0.82    # "machine" appears 3 times
sparse[1047] = 1.41    # "learning" appears 5 times
sparse[8833] = 0.54    # "vector" appears 2 times
# the remaining 9,997 positions stay zero

nonzero_count = np.count_nonzero(sparse)
print(f"Non-zero dimensions: {nonzero_count} out of {len(sparse)}")
# Non-zero dimensions: 3 out of 10000

Dense Vectors: Semantic Meaning as Geometry

Dense vectors are the output of trained embedding models. When you call OpenAI's embedding API or run a Sentence Transformers model, you get a dense vector for each input.

python

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "How do I return a product?",
    "What is your refund policy?",
    "I want to cancel my order.",
    "What is the distance from Earth to Mars?",
]

embeddings = model.encode(sentences)

def cosine_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare semantically related sentences
print(cosine_sim(embeddings[0], embeddings[1]))   # ~0.85 — very similar
print(cosine_sim(embeddings[0], embeddings[2]))   # ~0.72 — related
print(cosine_sim(embeddings[0], embeddings[3]))   # ~0.12 — unrelated

Characteristics of Dense Vectors

Dense vectors are not interpretable. You cannot look at dimension 412 and say "this dimension represents sports content." Information is distributed and entangled across all dimensions by design.

Sparse Vectors: Term Presence as Weight

The classic method for producing sparse vectors is TF-IDF, and its more refined descendant BM25.

TF-IDF

python

from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

corpus = [
    "machine learning is a subset of artificial intelligence",
    "deep learning uses neural networks",
    "artificial intelligence includes machine learning and deep learning",
    "neural networks are inspired by the human brain",
]

vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(corpus)

# Each document is now a sparse vector
doc0_vector = tfidf_matrix[0].toarray()[0]
vocab = vectorizer.get_feature_names_out()

# Print only non-zero terms for document 0
nonzero_indices = np.nonzero(doc0_vector)[0]
for idx in nonzero_indices:
    print(f"  '{vocab[idx]}': {doc0_vector[idx]:.4f}")

# Output (only words present in doc 0):
#   'artificial': 0.3853
#   'intelligence': 0.3853
#   'is': 0.5087
#   'learning': 0.2810
#   'machine': 0.3853
#   'of': 0.5087
#   'subset': 0.5087

BM25: The Industry Standard for Keyword Search

python

from rank_bm25 import BM25Okapi

corpus = [
    "how do I return a product",
    "what is your refund policy",
    "I want to cancel my subscription",
    "how to contact customer support",
    "machine learning model training tutorial",
]

# Tokenize
tokenized_corpus = [doc.split() for doc in corpus]
bm25 = BM25Okapi(tokenized_corpus)

# Query
query = "product return"
tokenized_query = query.split()
scores = bm25.get_scores(tokenized_query)

for doc, score in zip(corpus, scores):
    print(f"Score {score:.4f}: {doc}")

# Output:
# Score 0.9516: how do I return a product         ← highest
# Score 0.1823: what is your refund policy
# Score 0.0000: I want to cancel my subscription
# Score 0.0000: how to contact customer support
# Score 0.0000: machine learning model training tutorial

BM25 is the default ranking algorithm in Elasticsearch, Solr, Lucene, and OpenSearch. It has been the backbone of keyword search for decades.

SPLADE: A Neural Sparse Model

plaintext

BM25 query "car":
  Activates dimensions: {car: 2.1}

SPLADE query "car":
  Activates dimensions: {car: 1.8, vehicle: 1.4, automobile: 1.1, automotive: 0.7, driver: 0.4}

Elastic's own implementation of this idea is ELSER (Elastic Learned Sparse EncodeR), which uses the same SPLADE principles tuned for English retrieval.

The Structural Failure Modes

Understanding where each representation breaks down is as important as understanding where it succeeds.

Where Dense Vectors Fail

Where Sparse Vectors Fail

The failures are structural and complementary. Dense misses exact strings. Sparse misses semantic relationships. This is the premise of hybrid search.

Hybrid Search: Combining Both Representations

plaintext

User query: "how to handle HTTP timeout errors in Python"

Sparse (BM25) pipeline:
  Query → tokenize → BM25 score → ranked list A
  Strong on: "HTTP", "timeout", "Python" — exact matches

Dense (embedding) pipeline:
  Query → embedding model → ANN search → ranked list B
  Strong on: "connection error handling", "retry logic",
             "request exceptions" — semantic matches

Fusion (RRF):
  Combine ranked list A + ranked list B
  → Final merged ranked list

Top K chunks → LLM context → grounded response

Reciprocal Rank Fusion

python

def reciprocal_rank_fusion(sparse_results, dense_results, k=60):
    """
    Merge two ranked result lists using RRF.

    Args:
        sparse_results: list of doc IDs ordered by sparse score (best first)
        dense_results:  list of doc IDs ordered by dense score  (best first)
        k:              constant to prevent high weighting of top-1 results
    Returns:
        merged: list of (doc_id, rrf_score) sorted by combined score
    """
    scores = {}

    for rank, doc_id in enumerate(sparse_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k + rank + 1)

    for rank, doc_id in enumerate(dense_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k + rank + 1)

    merged = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    return merged


sparse_results = ["doc_A", "doc_C", "doc_E", "doc_B"]
dense_results  = ["doc_B", "doc_A", "doc_D", "doc_C"]

merged = reciprocal_rank_fusion(sparse_results, dense_results)
print("Merged ranking:")
for doc_id, score in merged:
    print(f"  {doc_id}: {score:.5f}")

# Merged ranking:
#   doc_A: 0.03226    ← ranked 1st in sparse, 2nd in dense
#   doc_B: 0.03175    ← ranked 4th in sparse, 1st in dense
#   doc_C: 0.02969    ← ranked 2nd in sparse, 4th in dense
#   doc_E: 0.01587    ← ranked 3rd in sparse only
#   doc_D: 0.01575    ← ranked 3rd in dense only

A Full Hybrid Retriever in Python

This example combines BM25 and dense embeddings with RRF using Qdrant as the vector database:

python

from qdrant_client import QdrantClient, models
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
import numpy as np

# Documents to index
documents = [
    "How do I return a product to the online store?",
    "What is your refund policy for digital purchases?",
    "I need to cancel my monthly subscription.",
    "ERR_CONN_RESET_4XX occurs when the server closes the connection.",
    "How to handle HTTP connection timeout errors in Python.",
    "Network socket errors: causes and remedies.",
]

# Embedding model
embed_model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings  = embed_model.encode(documents)

# Qdrant client (in-memory for demo)
client = QdrantClient(":memory:")
client.create_collection(
    collection_name="docs",
    vectors_config=models.VectorParams(size=384, distance=models.Distance.COSINE),
)

# Upsert all documents
client.upsert(
    collection_name="docs",
    points=[
        models.PointStruct(id=i, vector=emb.tolist(), payload={"text": doc})
        for i, (doc, emb) in enumerate(zip(documents, embeddings))
    ],
)

# BM25 index (separate sparse index)
tokenized_corpus = [doc.lower().split() for doc in documents]
bm25 = BM25Okapi(tokenized_corpus)


def hybrid_search(query: str, top_k: int = 3) -> list:
    # Sparse: BM25 ranking
    sparse_scores = bm25.get_scores(query.lower().split())
    sparse_ranked  = np.argsort(sparse_scores)[::-1].tolist()

    # Dense: semantic search
    query_emb     = embed_model.encode([query])[0]
    dense_results = client.search(
        collection_name="docs",
        query_vector=query_emb.tolist(),
        limit=len(documents),
    )
    dense_ranked = [hit.id for hit in dense_results]

    # Fusion via RRF
    k      = 60
    scores = {}
    for rank, doc_id in enumerate(sparse_ranked):
        scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k + rank + 1)
    for rank, doc_id in enumerate(dense_ranked):
        scores[doc_id] = scores.get(doc_id, 0) + 1.0 / (k + rank + 1)

    top_ids = sorted(scores, key=scores.__getitem__, reverse=True)[:top_k]
    return [documents[i] for i in top_ids]


# Test 1: Exact identifier query — sparse should save this
results = hybrid_search("ERR_CONN_RESET_4XX")
print("Query: ERR_CONN_RESET_4XX")
for r in results:
    print(f"  {r}")

# Test 2: Semantic query — dense should find this despite vocabulary mismatch
results = hybrid_search("how to get my money back from a purchase")
print("\nQuery: how to get my money back from a purchase")
for r in results:
    print(f"  {r}")

Comparing Dense and Sparse Vectors Side by Side

plaintext

Property          | Dense Vectors          | Sparse Vectors
------------------+------------------------+--------------------------
Dimensionality    | Low (128 to 3072)      | High (vocab size, 10K-50K)
Non-zero values   | Almost all             | Very few
Produced by       | Neural embedding model | BM25, TF-IDF, SPLADE
Captures          | Semantic meaning       | Lexical term presence
Strengths         | Synonym matching,      | Exact string matching,
                  | paraphrase retrieval,  | rare identifiers,
                  | contextual meaning     | new domain terms
Weaknesses        | Exact identifiers,     | Vocabulary mismatch,
                  | rare proper nouns      | synonyms, paraphrases
Interpretability  | Low                   | High
Index type        | HNSW, IVF              | Inverted index
Memory per vector | Higher per vector      | Lower (store only non-zeros)
Recall@10 alone   | ~78%                   | ~65%
Recall@10 hybrid  | 91% (combined)        | 91% (combined)

When to Use Dense, Sparse, or Hybrid

The right choice depends on your query distribution.

Hybrid Search Support in Vector Databases

Every major vector database now supports hybrid search, though the implementations differ.

Elasticsearch added native RRF support in version 8.9. It supports both BM25 and ELSER (its SPLADE-inspired sparse neural retriever) alongside dense kNN search.

Milvus supports multi-vector search, allowing simultaneous retrieval across dense and sparse vector fields in a single query with configurable fusion weights.

OpenSearch supports neural sparse search through its neural sparse query type, combining sparse neural retrieval with dense semantic search via a hybrid query wrapper.

Storage and Memory Considerations

The Practical Recommendation

Why Traditional Indexes Cannot Index Dense Vectors

Summary

Sources and Further Reading

Weaviate. Hybrid Search Explained. weaviate.io/blog/hybrid-search-explained
Elastic. Sparse Embeddings: Dense vs. Sparse Vector and Usage With ML Models. elastic.co/search-labs/blog/sparse-vector-embedding
Milvus. What Are Dense and Sparse Embeddings? milvus.io/ai-quick-reference/what-are-dense-and-sparse-embeddings
Chroma. Sparse Vector Support. trychroma.com/project/sparse-vector-search
OpenSearch. Neural Sparse Search Documentation. docs.opensearch.org/latest/vector-search/ai-search/neural-sparse-search
Supermemory. Hybrid Search Guide: Vectors and Full Text (April 2026). blog.supermemory.ai/hybrid-search-guide
Prem AI. Hybrid Search for RAG: BM25, SPLADE, and Vector Search Combined. blog.premai.io/hybrid-search-for-rag-bm25-splade-and-vector-search-combined
GoPenAI. Hybrid Search in RAG: Dense plus Sparse, RRF, and When to Use Which. blog.gopenai.com/hybrid-search-in-rag
Infiniflow. Dense Vector plus Sparse Vector plus Full Text Search plus Tensor Reranker. infiniflow.org/blog/best-hybrid-search-solution
Zilliz. Sparse and Dense Embeddings. zilliz.com/learn/sparse-and-dense-embeddings
Thakur et al. BEIR: A Heterogeneous Benchmark for Zero-Shot Evaluation of IR Models. arxiv.org/abs/2104.08663
DEV Community. Dense vs Sparse Retrieval: Mastering FAISS, BM25, and Hybrid Search. dev.to/qvfagundes/dense-vs-sparse-retrieval-mastering-faiss-bm25-and-hybrid-search-4kb1

What Makes a Vector Dense or Sparse?

Dense Vectors: Semantic Meaning as Geometry

Characteristics of Dense Vectors

Sparse Vectors: Term Presence as Weight

TF-IDF

BM25: The Industry Standard for Keyword Search

SPLADE: A Neural Sparse Model

The Structural Failure Modes

Where Dense Vectors Fail

Where Sparse Vectors Fail

Hybrid Search: Combining Both Representations

Reciprocal Rank Fusion

A Full Hybrid Retriever in Python

Comparing Dense and Sparse Vectors Side by Side

When to Use Dense, Sparse, or Hybrid

Hybrid Search Support in Vector Databases

Storage and Memory Considerations

The Practical Recommendation

Why Traditional Indexes Cannot Index Dense Vectors

Summary

Sources and Further Reading

Krunal Kanojiya

Related Posts

What Makes a Vector Dense or Sparse?

Dense Vectors: Semantic Meaning as Geometry

Characteristics of Dense Vectors

Sparse Vectors: Term Presence as Weight

TF-IDF

BM25: The Industry Standard for Keyword Search

SPLADE: A Neural Sparse Model

The Structural Failure Modes

Where Dense Vectors Fail

Where Sparse Vectors Fail

Hybrid Search: Combining Both Representations

Reciprocal Rank Fusion

A Full Hybrid Retriever in Python

Comparing Dense and Sparse Vectors Side by Side

When to Use Dense, Sparse, or Hybrid

Hybrid Search Support in Vector Databases

Storage and Memory Considerations

The Practical Recommendation

Why Traditional Indexes Cannot Index Dense Vectors

Summary

Sources and Further Reading

Krunal Kanojiya

Related Posts