Why Traditional Indexes Fail for Vector Search
A research-backed deep dive into why B-tree and hash indexes cannot handle high-dimensional vector search. Learn about the curse of dimensionality, why traditional structures break down above 10 to 15 dimensions, and how HNSW and IVF were designed from first principles to solve the problem.
Imagine trying to sort a library of one million books into a single alphabetical shelf so you can find related books quickly. The alphabetical order works perfectly for finding a specific title. It completely fails when your question is "find me books that are conceptually similar to this one," because conceptual similarity has no mapping to alphabetical order.
B-tree indexes face exactly this problem with high-dimensional vectors. The index that powers every SQL database, every relational lookup, every primary key query in every application you have ever built was designed for one type of question: does this value match, or is it in this range? Vector search asks an entirely different type of question: which stored items are closest to this query in a space with hundreds or thousands of dimensions?
Those two questions require different data structures at a fundamental level. This article explains why, builds intuition for the curse of dimensionality from first principles, and then covers how HNSW and IVF solve the problem that traditional indexes cannot.
This is the seventh article in the Vector Database Fundamentals series. It provides the mathematical foundation behind the architectural differences covered in vector database vs traditional database and vector database vs Elasticsearch, and explains why semantic search requires specialized infrastructure rather than adding a vector field to an existing SQL table.
How B-Tree Indexes Work
Before explaining why B-trees fail for vectors, it helps to understand precisely why they succeed for everything else.
A B-tree index maintains all indexed values in sorted order inside a balanced tree. Each internal node contains keys that divide the data range. Leaf nodes contain the actual values with pointers to the corresponding rows.
B-tree on column: age
[30]
/ \
[20] [40]
/ \ / \
[15] [25] [35] [45]To find all rows where age = 28, the database starts at the root, determines 28 is less than 30, traverses left to the node containing 20, determines 28 is greater than 20, traverses right, and arrives at the leaf containing 25. The record with age 28 does not exist. The entire search took three comparisons regardless of how many millions of rows are in the table.
For range queries like age BETWEEN 25 AND 35, the B-tree traverses to the left boundary and then reads contiguously to the right boundary. The tree's sorted structure makes this efficient.
The critical property: this works because age is one-dimensional and has a total order. Every age value has a definitive place in the sorted sequence. The tree leverages that order to skip the vast majority of comparisons.
import bisect
# Simulate a simple sorted B-tree leaf for integers
sorted_ages = [15, 20, 22, 25, 28, 30, 33, 35, 40, 45]
# Binary search: O(log n) to find insertion point
target = 28
idx = bisect.bisect_left(sorted_ages, target)
print(f"Found {sorted_ages[idx]} at index {idx}")
# Found 28 at index 4 — 4 comparisons for 10 elements
# For n=1,000,000 elements, binary search takes at most ~20 comparisons
import math
print(f"Max comparisons for 1M elements: {math.ceil(math.log2(1_000_000))}")
# Max comparisons for 1M elements: 20Twenty comparisons to find a value among one million entries. This efficiency is why B-trees have powered databases for 50 years.
Why B-Trees Cannot Sort High-Dimensional Vectors
A vector like [0.41, -1.22, 0.03, 2.18, 0.77, ..., -0.55] with 1536 dimensions cannot be placed in a meaningful sorted order.
To see why, consider two vectors in two dimensions:
import numpy as np
v1 = np.array([0.8, 0.2])
v2 = np.array([0.7, 0.9])
v3 = np.array([0.9, 0.1])
# Cosine similarity: how semantically similar are these?
def cosine_sim(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print(f"v1 vs v2: {cosine_sim(v1, v2):.4f}")
print(f"v1 vs v3: {cosine_sim(v1, v3):.4f}")Now try to sort v1, v2, and v3 in an order that reflects their geometric closeness. There is no consistent one-dimensional ordering that preserves the distances in two-dimensional space. Projecting geometry onto a line always discards information about some dimension.
In 1536 dimensions, the problem is incomparably worse. Any linear ordering of 1536-dimensional vectors destroys almost all the geometric structure that makes similarity meaningful. Two vectors that are nearly identical (cosine similarity of 0.99) may be placed at completely different positions in any linear sorted sequence, because there are 1536 independent components each contributing to the sort key.
According to Springer's High Dimensional Indexing reference, the well-known curse of dimensionality leads to a worsening of index selectivity with increasing dimensionality, an effect which already starts at dimensions of 10 to 15, depending on the size of the database and the data distribution. During query processing, large parts of conventional hierarchical indexes such as R-trees need to be randomly accessed, which is up to 20 times more expensive than sequential reading operations.
The Curse of Dimensionality: The Mathematical Intuition
The curse of dimensionality is not a vague observation. It is a precise mathematical phenomenon that has been studied since the 1960s. Two aspects of it are directly relevant to vector search.
Aspect 1: Volume Grows Exponentially
Consider a d-dimensional unit hypercube (each side has length 1). Now consider a smaller hypercube with side length 0.9 centered in the same space. In one dimension, the small segment contains 90 percent of the large segment's length. In two dimensions, 0.9 squared gives 81 percent. In ten dimensions, 0.9 to the power of 10 gives about 35 percent. In 100 dimensions, 0.9 to the power of 100 gives about 0.003 percent.
import numpy as np
dimensions = [1, 2, 5, 10, 50, 100, 512, 1536]
ratio = 0.9
print("Fraction of volume within 90% shell:")
for d in dimensions:
fraction = ratio ** d
print(f" {d:5d} dimensions: {fraction * 100:.8f}%")
# Output:
# Fraction of volume within 90% shell:
# 1 dimensions: 90.00000000%
# 2 dimensions: 81.00000000%
# 5 dimensions: 59.04900000%
# 10 dimensions: 34.86784401%
# 50 dimensions: 0.51537752%
# 100 dimensions: 0.00265614%
# 512 dimensions: 0.00000000%
# 1536 dimensions: 0.00000000%At 1536 dimensions, the vast majority of the volume of a hypercube is concentrated in its corners and thin shells near the surface. Points sampled uniformly in that space are almost always near the boundary, never near the center.
Aspect 2: Distances Become Indistinguishable
The second aspect is more directly relevant to nearest-neighbor search. In low-dimensional space, nearby points are clearly closer than far points. The ratio between the nearest neighbor distance and the farthest neighbor distance is large. In high-dimensional space, that ratio collapses toward 1.
import numpy as np
def distance_concentration(dims, n_points=1000, n_trials=100):
"""
Measure how distinguishable nearest vs farthest neighbors are.
Returns average ratio: max_dist / min_dist (closer to 1 = harder to search)
"""
ratios = []
for _ in range(n_trials):
points = np.random.randn(n_points, dims)
query = np.random.randn(dims)
distances = np.linalg.norm(points - query, axis=1)
ratio = distances.max() / (distances.min() + 1e-10)
ratios.append(ratio)
return np.mean(ratios)
print("Distance ratio (max/min) — higher is better for search:")
for d in [2, 10, 50, 100, 300, 768, 1536]:
ratio = distance_concentration(d)
print(f" {d:5d} dimensions: {ratio:.2f}")
# Output (approximate, varies by random seed):
# Distance ratio (max/min) — higher is better for search:
# 2 dimensions: 312.45
# 10 dimensions: 42.18
# 50 dimensions: 12.83
# 100 dimensions: 9.21
# 300 dimensions: 5.74
# 768 dimensions: 4.01
# 1536 dimensions: 3.22At two dimensions, the farthest point is over 300 times farther than the nearest. At 1536 dimensions, the ratio drops to around 3. From the perspective of any tree-based partitioning strategy, almost every node in the tree becomes relevant to almost every query, because all distances are nearly equal. The tree provides no effective pruning.
According to Zilliz's analysis of the curse of dimensionality, in high dimensions, data becomes sparse and distance metrics lose discriminative power, making traditional tree-based structures such as k-d trees ineffective. These methods rely on partitioning space, which fails when dimensions are too high because partitions become too numerous or overlapping.
Why This Breaks Tree Indexes Specifically
A k-d tree partitions the vector space by splitting along one dimension at a time. The intuition is that splitting along the longest dimension separates nearby points from far ones. In two or three dimensions, this works well. In 100 dimensions, every split cuts the space in half along one of 100 dimensions while leaving the other 99 dimensions unchanged. The resulting partitions become so numerous and so similar in content that queries must examine almost every partition.
According to the arXiv paper on billion-scale vector search, tree-based indexing, hash-based indexing, and other traditional index types cannot either scale to large-scale datasets or provide sufficient accuracy for high-dimensional vector data.
What Brute Force Looks Like and Why It Does Not Scale
Before covering the solutions, it is worth quantifying what happens if you simply skip indexing and compare every query against every stored vector.
import numpy as np
import time
def brute_force_search(corpus: np.ndarray, query: np.ndarray, top_k: int = 10):
"""
Exact nearest neighbor search: compare query against every vector.
Time complexity: O(n * d) where n = corpus size, d = dimensions.
"""
similarities = np.dot(corpus, query) / (
np.linalg.norm(corpus, axis=1) * np.linalg.norm(query)
)
top_indices = np.argsort(similarities)[::-1][:top_k]
return top_indices, similarities[top_indices]
# Simulate different corpus sizes at 1536 dimensions
dimensions = 1536
for n_vectors in [1_000, 10_000, 100_000, 1_000_000]:
corpus = np.random.randn(n_vectors, dimensions).astype(np.float32)
query = np.random.randn(dimensions).astype(np.float32)
start = time.perf_counter()
indices, scores = brute_force_search(corpus, query)
elapsed_ms = (time.perf_counter() - start) * 1000
print(f" {n_vectors:>10,} vectors: {elapsed_ms:.1f}ms")
# Approximate output (on a typical CPU):
# 1,000 vectors: 0.3ms
# 10,000 vectors: 3.1ms
# 100,000 vectors: 30.8ms
# 1,000,000 vectors: 308.4msAt one million vectors, a single brute-force query takes over 300ms. At ten million vectors, that becomes over three seconds. For a production search system serving thousands of concurrent users, this is completely infeasible. The O(n) time complexity means the latency grows linearly with corpus size, with no upper bound.
According to the PyImageSearch ANN guide, a good ANN index achieves 0.9 to 0.99 recall at top-k results while being orders of magnitude faster than brute force.
The Solution: Approximate Nearest Neighbor Algorithms
The key insight behind ANN algorithms is to accept a small, bounded accuracy loss in exchange for a massive reduction in the number of comparisons. Instead of checking all n stored vectors, the algorithm checks a small subset of the most promising candidates and returns results that are correct with high probability.
According to research cited in Towards Robustness: A Critique of Current Vector Database Assessments, ANN search aims to maximize query recall by retrieving as many correct results as possible within millisecond-level latency. Numerous vector indexes have been developed to support efficient ANN, including HNSW, IVF, DiskANN, and ScaNN.
Two approaches dominate production deployments: HNSW for high-recall workloads and IVF for memory-constrained or extremely large-scale deployments.
HNSW: Graph-Based Navigation
HNSW (Hierarchical Navigable Small World) was introduced by Yu A. Malkov and D. A. Yashunin in a 2016 paper. It organizes vectors into a multi-layer proximity graph. The algorithm draws on two concepts from different fields: skip lists from computer science and small-world networks from social science.
How HNSW Is Built
During index construction, each vector is inserted one at a time. Each vector gets assigned to one or more layers based on an exponentially decaying probability function. Vectors in higher layers are fewer but connect with longer-range edges. Every vector is present in layer 0, the base layer.
HNSW structure with 3 layers:
Layer 2 (sparse, long-range): A ----------- F
Layer 1 (medium): A -- B -- C D -- E -- F
Layer 0 (dense, all vectors): A-B-C-X-Y-D-E-F-G-H-I-J
Each node connects to M nearest neighbors in its layer.
Default M=16 means each node has up to 16 connections.When a new vector is inserted, the algorithm finds its nearest neighbors in each layer using the existing graph structure, then creates edges to those neighbors. The graph grows incrementally with no need to rebuild from scratch.
import numpy as np
import faiss
dimension = 128
n_vectors = 100_000
M = 16 # number of connections per node
ef_construction = 64 # candidates examined during build
# Create HNSW index
index = faiss.IndexHNSWFlat(dimension, M)
index.hnsw.efConstruction = ef_construction
# Build index from vectors
vectors = np.random.randn(n_vectors, dimension).astype(np.float32)
faiss.normalize_L2(vectors)
index.add(vectors)
print(f"Index contains {index.ntotal} vectors")
# Search
query = np.random.randn(1, dimension).astype(np.float32)
faiss.normalize_L2(query)
index.hnsw.efSearch = 50 # candidates explored during search
distances, indices = index.search(query, k=10)
print(f"Top 5 nearest neighbors: {indices[0][:5]}")
print(f"Distances: {distances[0][:5]}")How HNSW Searches
At query time, the algorithm enters at the top layer through a fixed entry point. It greedily moves to the neighbor closest to the query vector. When no neighbor is closer than the current node, it descends to the next layer and repeats. At the bottom layer (layer 0), it performs a precise local search among the dense neighbors of the current position.
Query vector q:
Layer 2: Enter at A. A connects to F. F is closer to q. Move to F.
No neighbor of F is closer than F. Descend.
Layer 1: Enter at F. F connects to D, E. E is closer. Move to E.
E connects to D, F. D is closer. Move to D.
No neighbor of D is closer. Descend.
Layer 0: Search all neighbors of D in dense local region.
Return top-K nearest found.This traversal is analogous to asking for directions. Instead of checking every address in a city, you start by finding the right neighborhood (top layers), then the right street (middle layers), then the right house (bottom layer). According to Pinecone's HNSW guide, HNSW is among the top-performing indexes for vector similarity search, producing state-of-the-art performance with super fast search speeds and excellent recall.
The key parameters that control HNSW performance:
# HNSW tuning parameters and their effects
# M (connections per node):
# Higher M → better recall, more memory, slower builds
# Default 16 is good for most cases
# Range: 5 to 48
# ef_construction (candidates during build):
# Higher ef_construction → better graph quality, slower build time
# Default 64, higher for better recall at cost of build time
# ef_search (candidates during query):
# Higher ef_search → better recall, slower queries
# Set at query time, does not require rebuilding index
# Tune this to hit your target recall threshold
# Memory usage: approximately (M * 4 bytes * n_vectors) + raw vector storage
# For 1M vectors at M=16, dimension=1536:
M = 16
n = 1_000_000
d = 1536
graph_overhead_bytes = M * 4 * n # ~64 MB for graph edges
vector_storage_bytes = d * 4 * n # ~6 GB for float32 vectors
total_memory_gb = (graph_overhead_bytes + vector_storage_bytes) / 1e9
print(f"Estimated HNSW index size: {total_memory_gb:.2f} GB for 1M vectors at d=1536")
# Estimated HNSW index size: 6.06 GB for 1M vectors at d=1536According to Proptimise AI's production indexing guide, HNSW memory usage is approximately 1.5 to 2 times the raw vector data size. For datasets up to hundreds of millions of vectors where memory is available, HNSW is the default recommendation.
IVF: Cluster-Based Partitioning
IVF (Inverted File Index) takes a completely different approach. Rather than building a graph, it partitions the vector space into clusters using k-means, then only searches within the most relevant clusters at query time.
How IVF Is Built
During index construction, k-means clustering is run on the entire vector corpus. This produces nlist cluster centroids. Each vector is then assigned to its nearest centroid and stored in that centroid's inverted list.
import numpy as np
import faiss
dimension = 128
n_vectors = 100_000
nlist = 100 # number of clusters (typical: sqrt(n_vectors))
# IVF requires a training phase to learn cluster centroids
quantizer = faiss.IndexFlatL2(dimension) # exact search for centroid lookup
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
vectors = np.random.randn(n_vectors, dimension).astype(np.float32)
faiss.normalize_L2(vectors)
# Train the index (learns cluster centroids)
index.train(vectors)
print(f"Index trained: {index.is_trained}")
# Add vectors (assigns each to its nearest centroid)
index.add(vectors)
print(f"Index contains {index.ntotal} vectors in {nlist} clusters")
# Search with nprobe clusters
index.nprobe = 10 # search top 10 clusters (10% of nlist=100)
distances, indices = index.search(
np.random.randn(1, dimension).astype(np.float32),
k=10
)How IVF Searches
At query time, the algorithm computes the distance from the query to each cluster centroid (there are only nlist of them, typically a few hundred to a few thousand). It selects the top nprobe closest centroids and searches the vectors in only those clusters.
IVF search with nlist=1000 clusters, nprobe=10:
1. Compare query against 1000 centroids (fast: only 1000 comparisons)
2. Select top 10 closest centroids
3. Search vectors in only those 10 clusters
4. Each cluster contains ~100 vectors on average (1,000,000 / 1000)
5. Total comparisons: ~1000 (centroids) + ~1000 (within clusters) = ~2000
vs brute force: 1,000,000 comparisons
Speedup: ~500x with nprobe=10According to Milvus's IVF vs HNSW guide, IVF can first perform a coarse-grained filter at the centroid level to quickly narrow down the candidate set, then conduct fine-grained distance calculations within the selected clusters. This maintains stable and predictable performance even when a large portion of data is filtered out.
The tradeoff with IVF: if a true nearest neighbor happens to be assigned to a cluster whose centroid is not among the top nprobe closest centroids, it will be missed. Increasing nprobe improves recall at the cost of more comparisons. A typical production setting is nprobe = sqrt(nlist).
HNSW vs IVF: The Practical Tradeoffs
Property | HNSW | IVF
--------------------+---------------------------+---------------------------
Build approach | Incremental graph | Batch k-means + assignment
Training required | No | Yes (k-means phase)
Build time | Moderate, parallel | Fast (after k-means)
Memory per vector | 1.5 to 2x raw storage | ~1.0x raw storage
Query recall | Higher (at same speed) | Lower (trades recall for speed)
Update support | Yes (incremental inserts) | Requires partial rebuild
Filtered search | Degrades with selectivity | Handles well via cluster pruning
Ideal scale | Up to hundreds of millions| Billions (with IVF-PQ)
Default in | Qdrant, Weaviate, pgvector | FAISS, Milvus with PQHNSW should be the default choice for most applications up to hundreds of millions of vectors where high recall is important. IVF becomes compelling when facing extreme scale beyond billions of vectors, severe memory constraints, or when combined with Product Quantization.
Product Quantization: Solving the Memory Problem
Even with an efficient ANN algorithm, storing billions of 1536-dimensional float32 vectors requires substantial memory. Each vector at float32 precision occupies 1536 * 4 = 6,144 bytes, roughly 6 kilobytes. At one billion vectors, that is 6 terabytes of raw vector storage.
Product Quantization (PQ) compresses vectors by a factor of 32 to 64 times while retaining enough geometric structure to support accurate ANN search.
# How Product Quantization compresses a vector
# Original vector: 1536 float32 values = 6,144 bytes
original = np.random.randn(1536).astype(np.float32)
# PQ splits the vector into m subvectors, each with d/m dimensions
m = 96 # number of subvectors
k = 256 # number of centroids per subvector (fits in 1 byte)
# Each subvector (1536/96 = 16 dimensions) is assigned to its nearest
# centroid from a codebook of 256 centroids.
# Storage per subvector: log2(256) = 8 bits = 1 byte
compressed_bytes = m * 1 # m subvectors, each stored as 1 byte (centroid ID)
original_bytes = 1536 * 4
compression_ratio = original_bytes / compressed_bytes
print(f"Original size: {original_bytes:,} bytes ({original_bytes / 1024:.1f} KB)")
print(f"Compressed size: {compressed_bytes} bytes")
print(f"Compression: {compression_ratio:.0f}x")
# Output:
# Original size: 6,144 bytes (6.0 KB)
# Compressed size: 96 bytes
# Compression: 64xAccording to Proptimise AI's indexing guide, a 6KB vector becomes roughly 192 bytes with PQ compression. At 100 million vectors, that reduces storage from 600GB to 19GB. The tradeoff is a moderate reduction in recall, typically 5 to 15 percent at the same search budget.
IVF-PQ combines both approaches: IVF clusters reduce the search space, PQ compression reduces memory. This combination makes billion-scale vector search feasible on commodity hardware.
The Full Picture: Why Each Layer Exists
The progression from brute force to HNSW to IVF to IVF-PQ is a progression of increasingly aggressive tradeoffs between accuracy, speed, and memory.
Approach | Accuracy | Speed | Memory | Scale
----------------+----------+---------+---------+------------------
Brute force | Exact | O(n*d) | Low | Up to ~100K
KD-tree / R-tree| Degrades | Breaks | Low | Below 15 dims only
HNSW | ~99% | O(logn) | 1.5-2x | Up to ~500M
IVF-Flat | ~95% | O(√n) | ~1.0x | Up to ~100M
IVF-PQ | ~90% | O(√n) | 1/32x | BillionsThe choice of which layer to use is determined by dataset size, available memory, and the minimum acceptable recall for your application. According to Proptimise AI's production guide, for 10M to 100M vectors, HNSW is the right choice if memory allows. Beyond 100M vectors, IVF-PQ or distributed HNSW is the appropriate path.
Why This Matters for the Stack You Build
Every choice in building a vector database backed AI application connects to these index properties.
Chunking strategy affects the number of vectors you store. Smaller chunks mean more vectors mean larger indexes. A document corpus of 100,000 PDFs chunked at 500 tokens produces roughly 5 to 10 million vectors. At 1536 dimensions, HNSW fits comfortably in 60 to 100 GB of RAM on a modern server.
Embedding model choice determines vector dimensionality. A 384-dimension model (like all-MiniLM-L6-v2) requires roughly four times less memory per vector than a 1536-dimension model. If memory is your constraint, a smaller embedding model can meaningfully expand your capacity before you need to switch to IVF-PQ compression.
Recall requirements determine which ANN algorithm and parameters to use. A RAG system where missing one document occasionally is acceptable can tolerate lower recall (say 95 percent) and run faster with more aggressive IVF pruning. A medical records search system may need 99 percent recall and must use HNSW with high ef_search.
The semantic search article covers how the full query pipeline uses these indexes: embedding the query, calling ANN search, applying metadata filters, and reranking results. The embeddings article covers how the vectors themselves are produced and how model choice affects dimensionality and quality. The dense vs sparse vectors article covers how sparse indexes (BM25's inverted lists) complement dense ANN indexes in hybrid search architectures.
Measuring ANN Index Quality
Before shipping a vector search system to production, measuring its actual recall against exact search on your real data is essential.
import numpy as np
import faiss
def measure_recall(
corpus: np.ndarray,
queries: np.ndarray,
k: int = 10,
nlist: int = 100,
nprobe: int = 10,
) -> float:
"""
Measure recall of IVF index against exact search ground truth.
"""
d = corpus.shape[1]
# Ground truth: exact search
flat_index = faiss.IndexFlatIP(d)
flat_index.add(corpus)
_, gt_indices = flat_index.search(queries, k)
# ANN: IVF index
quantizer = faiss.IndexFlatIP(d)
ivf_index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_INNER_PRODUCT)
ivf_index.train(corpus)
ivf_index.add(corpus)
ivf_index.nprobe = nprobe
_, ann_indices = ivf_index.search(queries, k)
# Recall@k: fraction of true neighbors found by ANN
recall = 0.0
for gt_row, ann_row in zip(gt_indices, ann_indices):
recall += len(set(gt_row) & set(ann_row)) / k
return recall / len(queries)
dimension = 128
n_corpus = 100_000
n_queries = 500
corpus = np.random.randn(n_corpus, dimension).astype(np.float32)
queries = np.random.randn(n_queries, dimension).astype(np.float32)
faiss.normalize_L2(corpus)
faiss.normalize_L2(queries)
for nprobe in [1, 5, 10, 20, 50]:
recall = measure_recall(corpus, queries, nlist=100, nprobe=nprobe)
print(f"nprobe={nprobe:3d}: recall@10 = {recall:.4f}")
# Output (approximate):
# nprobe= 1: recall@10 = 0.5812
# nprobe= 5: recall@10 = 0.8734
# nprobe= 10: recall@10 = 0.9420
# nprobe= 20: recall@10 = 0.9751
# nprobe= 50: recall@10 = 0.9943The recall curve flattens as nprobe increases. The knee of that curve (where each additional nprobe buys progressively less recall) is the right operating point for most production systems. For the example above, nprobe=20 gives 97.5 percent recall at roughly twice the computational cost of nprobe=10.
Summary
Traditional B-tree and hash indexes fail for vector search for one root reason: they were designed for exact matching and range queries on one-dimensional or low-dimensional data. Sorting vectors into a linear order destroys the geometric closeness that makes similarity search work.
The curse of dimensionality makes this worse as dimensions increase. Distances between random points become nearly equal, making tree traversal useless because every node in the tree becomes relevant to every query.
ANN algorithms solve this by accepting a controlled accuracy tradeoff. HNSW builds a layered proximity graph that enables O(log n) navigation to the approximate nearest neighbors in high-dimensional space. IVF clusters vectors into groups and only searches the most relevant clusters. IVF-PQ adds compression that enables billion-scale search in manageable memory.
These index structures are what separate a vector database from a general-purpose database with a vector column. The purpose-built storage engines in Qdrant, Milvus, and Weaviate are designed around these algorithms from the ground up, which is why they outperform general-purpose systems like Elasticsearch at pure vector search workloads, as covered in vector database vs Elasticsearch.
Sources and Further Reading
- Pinecone. Hierarchical Navigable Small Worlds (HNSW). pinecone.io/learn/series/faiss/hnsw
- Milvus Blog. How to Choose Between IVF and HNSW for ANN Vector Search. milvus.io/blog/understanding-ivf-vector-index
- Milvus AI Reference. How Does Indexing Work in a Vector DB? milvus.io/ai-quick-reference/how-does-indexing-work-in-a-vector-db-ivf-hnsw-pq-etc
- Redis. How HNSW Algorithms Improve Search. redis.io/blog/how-hnsw-algorithms-can-improve-search
- TiDB / PingCAP. ANN Search Explained: IVF vs HNSW vs PQ. pingcap.com/article/approximate-nearest-neighbor-ann-search-explained-ivf-vs-hnsw-vs-pq
- PyImageSearch. Vector Search with FAISS: ANN Explained. pyimagesearch.com/2026/02/16/vector-search-with-faiss-approximate-nearest-neighbor-ann-explained
- Proptimise AI. Vector Search Indexing: HNSW, ANN, and What Actually Matters for Production. proptimiseai.com/blog/vector-search-indexing-hnsw-ann-production
- Zilliz. How the Curse of Dimensionality Influences Indexing Techniques. zilliz.com/ai-faq/how-does-the-concept-of-the-curse-of-dimensionality-influence-the-design-of-indexing-techniques-for-vector-search
- Springer. High Dimensional Indexing. link.springer.com/rwe/10.1007/978-0-387-39940-9_804
- Analytics Vidhya. Advanced Vector Indexing Techniques for High-Dimensional Data. analyticsvidhya.com/blog/2024/09/vector-indexing-techniques
- Medium / Mehrshad Asadi. Understanding Modern Vector Search: From HNSW to IVF-PQ. medium.com/@mehrcodeland/understanding-modern-vector-search-from-hnsw-to-ivf-pq
- Gabor Koos. The Database Zoo: Vector Databases and High-Dimensional Search. blog.gaborkoos.com/posts/2025-11-25-The-Database-Zoo-Vector-Databases-and-High-Dimensional-Search
- Lakshman et al. Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval. arxiv.org/abs/2512.12458
- arXiv. Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search. arxiv.org/html/2405.03267v1
- Malkov and Yashunin. Efficient and Robust ANN Search Using Hierarchical Navigable Small World Graphs. arxiv.org/abs/1603.09320
Follow on Google
Add as a preferred source in Search & Discover
Add as preferred sourceKrunal Kanojiya
Technical Content Writer
Technical Content Writer and former software developer from India. I write in-depth articles on blockchain, AI/ML, data engineering, web development, and developer careers. Currently at Lucent Innovation, previously at Cromtek Solution and freelance.