Why can't a B-tree index be used for vector search?

B-tree indexes work by sorting values into a linear order and traversing a balanced tree to find exact or range matches. This only works when data has a meaningful one-dimensional sort order, like integers, strings, or dates. A 1536-dimensional vector cannot be sorted into a linear sequence that preserves geometric closeness. Two vectors that are nearly identical in 1536-dimensional space may have completely different positions in any linear ordering, making tree traversal useless for similarity search.

What is the curse of dimensionality?

The curse of dimensionality refers to a set of phenomena that occur as the number of dimensions in a dataset grows. The most relevant one for vector search is that distances between random points become nearly equal as dimensions increase. In low dimensions, nearby points are clearly closer than far points. In hundreds or thousands of dimensions, almost all pairs of points become roughly equidistant, making it impossible to distinguish near neighbors from far ones using simple distance comparisons.

What is ANN search and why is it used instead of exact nearest neighbor search?

ANN stands for Approximate Nearest Neighbor. An exact nearest neighbor search requires comparing the query vector against every stored vector, which takes O(n) time and becomes infeasible at millions of vectors. ANN algorithms use index structures to skip most comparisons and return results that are very close to the true nearest neighbors in O(log n) or better time. A well-tuned ANN index achieves 95 to 99 percent recall compared to exact search while being 100 to 1000 times faster.

HNSW (Hierarchical Navigable Small World) builds a layered graph during index construction. Each vector becomes a node. Nodes are connected to their nearest neighbors. The graph has multiple layers: higher layers have few nodes with long-range connections for fast navigation; the bottom layer has all nodes with short-range connections for precise local search. At query time, the algorithm enters at the top layer, greedily moves toward the query vector through long-range connections, then descends to progressively lower layers for refinement. It reaches the approximate nearest neighbors in O(log n) comparisons.

How does IVF work and when should I choose it over HNSW?

IVF (Inverted File Index) partitions all vectors into clusters using k-means during index construction. Each cluster has a centroid. At query time, the algorithm finds the closest cluster centroids to the query and only searches vectors within those clusters. IVF is more memory-efficient than HNSW because it does not store graph edges. Choose IVF when memory is constrained or your dataset exceeds hundreds of millions of vectors. Choose HNSW when you need higher recall and can afford the 1.5 to 2 times memory overhead over raw vector storage.

What is Product Quantization in vector search?

Product Quantization (PQ) is a compression technique that reduces vector storage by a factor of 10 to 50 times. It splits each vector into subvectors, clusters each subspace independently using k-means, and stores only the index of the nearest centroid for each subvector. A 1536-dimensional float32 vector that occupies 6 kilobytes can be compressed to around 100 bytes with modest accuracy loss. PQ is typically combined with IVF as IVF-PQ for billion-scale search where fitting vectors in RAM would otherwise be impossible.

Why Traditional Indexes Fail for Vector Search

Imagine trying to sort a library of one million books into a single alphabetical shelf so you can find related books quickly. The alphabetical order works perfectly for finding a specific title. It completely fails when your question is "find me books that are conceptually similar to this one," because conceptual similarity has no mapping to alphabetical order.

B-tree indexes face exactly this problem with high-dimensional vectors. The index that powers every SQL database, every relational lookup, every primary key query in every application you have ever built was designed for one type of question: does this value match, or is it in this range? Vector search asks an entirely different type of question: which stored items are closest to this query in a space with hundreds or thousands of dimensions?

Those two questions require different data structures at a fundamental level. This article explains why, builds intuition for the curse of dimensionality from first principles, and then covers how HNSW and IVF solve the problem that traditional indexes cannot.

This is the seventh article in the Vector Database Fundamentals series. It provides the mathematical foundation behind the architectural differences covered in vector database vs traditional database and vector database vs Elasticsearch, and explains why semantic search requires specialized infrastructure rather than adding a vector field to an existing SQL table. For the operations side of the same topic (how to choose and tune an index in production), see the companion guide on what vector indexing is and why it matters.

How B-Tree Indexes Work

Before explaining why B-trees fail for vectors, it helps to understand precisely why they succeed for everything else.

A B-tree index maintains all indexed values in sorted order inside a balanced tree. Each internal node contains keys that divide the data range. Leaf nodes contain the actual values with pointers to the corresponding rows.

plaintext

B-tree on column: age

            [30]
           /    \
      [20]        [40]
      /  \        /  \
  [15] [25]   [35] [45]

To find all rows where age = 28, the database starts at the root, determines 28 is less than 30, traverses left to the node containing 20, determines 28 is greater than 20, traverses right, and arrives at the leaf containing 25. The record with age 28 does not exist. The entire search took three comparisons regardless of how many millions of rows are in the table.

For range queries like age BETWEEN 25 AND 35, the B-tree traverses to the left boundary and then reads contiguously to the right boundary. The tree's sorted structure makes this efficient.

The critical property: this works because age is one-dimensional and has a total order. Every age value has a definitive place in the sorted sequence. The tree leverages that order to skip the vast majority of comparisons.

python

import bisect

# Simulate a simple sorted B-tree leaf for integers
sorted_ages = [15, 20, 22, 25, 28, 30, 33, 35, 40, 45]

# Binary search: O(log n) to find insertion point
target = 28
idx = bisect.bisect_left(sorted_ages, target)
print(f"Found {sorted_ages[idx]} at index {idx}")
# Found 28 at index 4 — 4 comparisons for 10 elements

# For n=1,000,000 elements, binary search takes at most ~20 comparisons
import math
print(f"Max comparisons for 1M elements: {math.ceil(math.log2(1_000_000))}")
# Max comparisons for 1M elements: 20

Twenty comparisons to find a value among one million entries. This efficiency is why B-trees have powered databases for 50 years.

Why B-Trees Cannot Sort High-Dimensional Vectors

A vector like [0.41, -1.22, 0.03, 2.18, 0.77, ..., -0.55] with 1536 dimensions cannot be placed in a meaningful sorted order.

To see why, consider two vectors in two dimensions:

python

import numpy as np

v1 = np.array([0.8, 0.2])
v2 = np.array([0.7, 0.9])
v3 = np.array([0.9, 0.1])

# Cosine similarity: how semantically similar are these?
def cosine_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

print(f"v1 vs v2: {cosine_sim(v1, v2):.4f}")
print(f"v1 vs v3: {cosine_sim(v1, v3):.4f}")

Now try to sort v1, v2, and v3 in an order that reflects their geometric closeness. There is no consistent one-dimensional ordering that preserves the distances in two-dimensional space. Projecting geometry onto a line always discards information about some dimension.

In 1536 dimensions, the problem is incomparably worse. Any linear ordering of 1536-dimensional vectors destroys almost all the geometric structure that makes similarity meaningful. Two vectors that are nearly identical (cosine similarity of 0.99) may be placed at completely different positions in any linear sorted sequence, because there are 1536 independent components each contributing to the sort key.

According to Springer's High Dimensional Indexing reference, the well-known curse of dimensionality leads to a worsening of index selectivity with increasing dimensionality, an effect which already starts at dimensions of 10 to 15, depending on the size of the database and the data distribution. During query processing, large parts of conventional hierarchical indexes such as R-trees need to be randomly accessed, which is up to 20 times more expensive than sequential reading operations.

The Curse of Dimensionality: The Mathematical Intuition

The curse of dimensionality is not a vague observation. It is a precise mathematical phenomenon that has been studied since the 1960s. Two aspects of it are directly relevant to vector search.

Aspect 1: Volume Grows Exponentially

Consider a d-dimensional unit hypercube (each side has length 1). Now consider a smaller hypercube with side length 0.9 centered in the same space. In one dimension, the small segment contains 90 percent of the large segment's length. In two dimensions, 0.9 squared gives 81 percent. In ten dimensions, 0.9 to the power of 10 gives about 35 percent. In 100 dimensions, 0.9 to the power of 100 gives about 0.003 percent.

python

import numpy as np

dimensions = [1, 2, 5, 10, 50, 100, 512, 1536]
ratio = 0.9

print("Fraction of volume within 90% shell:")
for d in dimensions:
    fraction = ratio ** d
    print(f"  {d:5d} dimensions: {fraction * 100:.8f}%")

# Output:
# Fraction of volume within 90% shell:
#      1 dimensions: 90.00000000%
#      2 dimensions: 81.00000000%
#      5 dimensions: 59.04900000%
#     10 dimensions: 34.86784401%
#     50 dimensions:  0.51537752%
#    100 dimensions:  0.00265614%
#    512 dimensions:  0.00000000%
#   1536 dimensions:  0.00000000%

At 1536 dimensions, the vast majority of the volume of a hypercube is concentrated in its corners and thin shells near the surface. Points sampled uniformly in that space are almost always near the boundary, never near the center.

Aspect 2: Distances Become Indistinguishable

The second aspect is more directly relevant to nearest-neighbor search. In low-dimensional space, nearby points are clearly closer than far points. The ratio between the nearest neighbor distance and the farthest neighbor distance is large. In high-dimensional space, that ratio collapses toward 1.

python

import numpy as np

def distance_concentration(dims, n_points=1000, n_trials=100):
    """
    Measure how distinguishable nearest vs farthest neighbors are.
    Returns average ratio: max_dist / min_dist (closer to 1 = harder to search)
    """
    ratios = []
    for _ in range(n_trials):
        points = np.random.randn(n_points, dims)
        query  = np.random.randn(dims)

        distances = np.linalg.norm(points - query, axis=1)
        ratio = distances.max() / (distances.min() + 1e-10)
        ratios.append(ratio)

    return np.mean(ratios)

print("Distance ratio (max/min) — higher is better for search:")
for d in [2, 10, 50, 100, 300, 768, 1536]:
    ratio = distance_concentration(d)
    print(f"  {d:5d} dimensions: {ratio:.2f}")

# Output (approximate, varies by random seed):
# Distance ratio (max/min) — higher is better for search:
#      2 dimensions: 312.45
#     10 dimensions:  42.18
#     50 dimensions:  12.83
#    100 dimensions:   9.21
#    300 dimensions:   5.74
#    768 dimensions:   4.01
#   1536 dimensions:   3.22

At two dimensions, the farthest point is over 300 times farther than the nearest. At 1536 dimensions, the ratio drops to around 3. From the perspective of any tree-based partitioning strategy, almost every node in the tree becomes relevant to almost every query, because all distances are nearly equal. The tree provides no effective pruning.

According to Zilliz's analysis of the curse of dimensionality, in high dimensions, data becomes sparse and distance metrics lose discriminative power, making traditional tree-based structures such as k-d trees ineffective. These methods rely on partitioning space, which fails when dimensions are too high because partitions become too numerous or overlapping.

Why This Breaks Tree Indexes Specifically

A k-d tree partitions the vector space by splitting along one dimension at a time. The intuition is that splitting along the longest dimension separates nearby points from far ones. In two or three dimensions, this works well. In 100 dimensions, every split cuts the space in half along one of 100 dimensions while leaving the other 99 dimensions unchanged. The resulting partitions become so numerous and so similar in content that queries must examine almost every partition.

According to the arXiv paper on billion-scale vector search, tree-based indexing, hash-based indexing, and other traditional index types cannot either scale to large-scale datasets or provide sufficient accuracy for high-dimensional vector data.

What Brute Force Looks Like and Why It Does Not Scale

Before covering the solutions, it is worth quantifying what happens if you simply skip indexing and compare every query against every stored vector.

python

import numpy as np
import time

def brute_force_search(corpus: np.ndarray, query: np.ndarray, top_k: int = 10):
    """
    Exact nearest neighbor search: compare query against every vector.
    Time complexity: O(n * d) where n = corpus size, d = dimensions.
    """
    similarities = np.dot(corpus, query) / (
        np.linalg.norm(corpus, axis=1) * np.linalg.norm(query)
    )
    top_indices = np.argsort(similarities)[::-1][:top_k]
    return top_indices, similarities[top_indices]

# Simulate different corpus sizes at 1536 dimensions
dimensions = 1536

for n_vectors in [1_000, 10_000, 100_000, 1_000_000]:
    corpus = np.random.randn(n_vectors, dimensions).astype(np.float32)
    query  = np.random.randn(dimensions).astype(np.float32)

    start = time.perf_counter()
    indices, scores = brute_force_search(corpus, query)
    elapsed_ms = (time.perf_counter() - start) * 1000

    print(f"  {n_vectors:>10,} vectors: {elapsed_ms:.1f}ms")

# Approximate output (on a typical CPU):
#       1,000 vectors:   0.3ms
#      10,000 vectors:   3.1ms
#     100,000 vectors:  30.8ms
#   1,000,000 vectors: 308.4ms

At one million vectors, a single brute-force query takes over 300ms. At ten million vectors, that becomes over three seconds. For a production search system serving thousands of concurrent users, this is completely infeasible. The O(n) time complexity means the latency grows linearly with corpus size, with no upper bound.

According to the PyImageSearch ANN guide, a good ANN index achieves 0.9 to 0.99 recall at top-k results while being orders of magnitude faster than brute force.

The Solution: Approximate Nearest Neighbor Algorithms

The key insight behind ANN algorithms is to accept a small, bounded accuracy loss in exchange for a massive reduction in the number of comparisons. Instead of checking all n stored vectors, the algorithm checks a small subset of the most promising candidates and returns results that are correct with high probability.

According to research cited in Towards Robustness: A Critique of Current Vector Database Assessments, ANN search aims to maximize query recall by retrieving as many correct results as possible within millisecond-level latency. Numerous vector indexes have been developed to support efficient ANN, including HNSW, IVF, DiskANN, and ScaNN.

Two approaches dominate production deployments: HNSW for high-recall workloads and IVF for memory-constrained or extremely large-scale deployments.

This is a short summary of HNSW. For the full walkthrough with diagrams, see the dedicated guide on the HNSW algorithm explained.

HNSW (Hierarchical Navigable Small World) was introduced by Yu A. Malkov and D. A. Yashunin in a 2016 paper. It organizes vectors into a multi-layer proximity graph. The algorithm draws on two concepts from different fields: skip lists from computer science and small-world networks from social science.

How HNSW Is Built

During index construction, each vector is inserted one at a time. Each vector gets assigned to one or more layers based on an exponentially decaying probability function. Vectors in higher layers are fewer but connect with longer-range edges. Every vector is present in layer 0, the base layer.

plaintext

HNSW structure with 3 layers:

Layer 2 (sparse, long-range):   A ----------- F
Layer 1 (medium):               A -- B -- C   D -- E -- F
Layer 0 (dense, all vectors):   A-B-C-X-Y-D-E-F-G-H-I-J

Each node connects to M nearest neighbors in its layer.
Default M=16 means each node has up to 16 connections.

When a new vector is inserted, the algorithm finds its nearest neighbors in each layer using the existing graph structure, then creates edges to those neighbors. The graph grows incrementally with no need to rebuild from scratch.

python

import numpy as np
import faiss

dimension  = 128
n_vectors  = 100_000
M          = 16          # number of connections per node
ef_construction = 64     # candidates examined during build

# Create HNSW index
index = faiss.IndexHNSWFlat(dimension, M)
index.hnsw.efConstruction = ef_construction

# Build index from vectors
vectors = np.random.randn(n_vectors, dimension).astype(np.float32)
faiss.normalize_L2(vectors)
index.add(vectors)

print(f"Index contains {index.ntotal} vectors")

# Search
query = np.random.randn(1, dimension).astype(np.float32)
faiss.normalize_L2(query)

index.hnsw.efSearch = 50    # candidates explored during search
distances, indices  = index.search(query, k=10)

print(f"Top 5 nearest neighbors: {indices[0][:5]}")
print(f"Distances:               {distances[0][:5]}")

How HNSW Searches

At query time, the algorithm enters at the top layer through a fixed entry point. It greedily moves to the neighbor closest to the query vector. When no neighbor is closer than the current node, it descends to the next layer and repeats. At the bottom layer (layer 0), it performs a precise local search among the dense neighbors of the current position.

plaintext

Query vector q:

Layer 2: Enter at A. A connects to F. F is closer to q. Move to F.
         No neighbor of F is closer than F. Descend.

Layer 1: Enter at F. F connects to D, E. E is closer. Move to E.
         E connects to D, F. D is closer. Move to D.
         No neighbor of D is closer. Descend.

Layer 0: Search all neighbors of D in dense local region.
         Return top-K nearest found.

This traversal is analogous to asking for directions. Instead of checking every address in a city, you start by finding the right neighborhood (top layers), then the right street (middle layers), then the right house (bottom layer). According to Pinecone's HNSW guide, HNSW is among the top-performing indexes for vector similarity search, producing state-of-the-art performance with super fast search speeds and excellent recall.

The key parameters that control HNSW performance:

python

# HNSW tuning parameters and their effects

# M (connections per node): 
#   Higher M → better recall, more memory, slower builds
#   Default 16 is good for most cases
#   Range: 5 to 48

# ef_construction (candidates during build):
#   Higher ef_construction → better graph quality, slower build time
#   Default 64, higher for better recall at cost of build time

# ef_search (candidates during query):
#   Higher ef_search → better recall, slower queries
#   Set at query time, does not require rebuilding index
#   Tune this to hit your target recall threshold

# Memory usage: approximately (M * 4 bytes * n_vectors) + raw vector storage
# For 1M vectors at M=16, dimension=1536:
M = 16
n = 1_000_000
d = 1536

graph_overhead_bytes = M * 4 * n                     # ~64 MB for graph edges
vector_storage_bytes = d * 4 * n                     # ~6 GB for float32 vectors
total_memory_gb      = (graph_overhead_bytes + vector_storage_bytes) / 1e9

print(f"Estimated HNSW index size: {total_memory_gb:.2f} GB for 1M vectors at d=1536")
# Estimated HNSW index size: 6.06 GB for 1M vectors at d=1536

According to Proptimise AI's production indexing guide, HNSW memory usage is approximately 1.5 to 2 times the raw vector data size. For datasets up to hundreds of millions of vectors where memory is available, HNSW is the default recommendation.

IVF: Cluster-Based Partitioning

This is a short summary of IVF. For the full walkthrough, see the dedicated guide on the IVF index explained.

IVF (Inverted File Index) takes a completely different approach. Rather than building a graph, it partitions the vector space into clusters using k-means, then only searches within the most relevant clusters at query time.

How IVF Is Built

During index construction, k-means clustering is run on the entire vector corpus. This produces nlist cluster centroids. Each vector is then assigned to its nearest centroid and stored in that centroid's inverted list.

python

import numpy as np
import faiss

dimension = 128
n_vectors = 100_000
nlist     = 100     # number of clusters (typical: sqrt(n_vectors))

# IVF requires a training phase to learn cluster centroids
quantizer = faiss.IndexFlatL2(dimension)     # exact search for centroid lookup
index     = faiss.IndexIVFFlat(quantizer, dimension, nlist)

vectors = np.random.randn(n_vectors, dimension).astype(np.float32)
faiss.normalize_L2(vectors)

# Train the index (learns cluster centroids)
index.train(vectors)
print(f"Index trained: {index.is_trained}")

# Add vectors (assigns each to its nearest centroid)
index.add(vectors)
print(f"Index contains {index.ntotal} vectors in {nlist} clusters")

# Search with nprobe clusters
index.nprobe = 10    # search top 10 clusters (10% of nlist=100)
distances, indices = index.search(
    np.random.randn(1, dimension).astype(np.float32),
    k=10
)

How IVF Searches

At query time, the algorithm computes the distance from the query to each cluster centroid (there are only nlist of them, typically a few hundred to a few thousand). It selects the top nprobe closest centroids and searches the vectors in only those clusters.

plaintext

IVF search with nlist=1000 clusters, nprobe=10:

1. Compare query against 1000 centroids (fast: only 1000 comparisons)
2. Select top 10 closest centroids
3. Search vectors in only those 10 clusters
4. Each cluster contains ~100 vectors on average (1,000,000 / 1000)
5. Total comparisons: ~1000 (centroids) + ~1000 (within clusters) = ~2000
   vs brute force: 1,000,000 comparisons

Speedup: ~500x with nprobe=10

According to Milvus's IVF vs HNSW guide, IVF can first perform a coarse-grained filter at the centroid level to quickly narrow down the candidate set, then conduct fine-grained distance calculations within the selected clusters. This maintains stable and predictable performance even when a large portion of data is filtered out.

The tradeoff with IVF: if a true nearest neighbor happens to be assigned to a cluster whose centroid is not among the top nprobe closest centroids, it will be missed. Increasing nprobe improves recall at the cost of more comparisons. A typical production setting is nprobe = sqrt(nlist).

HNSW vs IVF: The Practical Tradeoffs

plaintext

Property            | HNSW                      | IVF
--------------------+---------------------------+---------------------------
Build approach      | Incremental graph          | Batch k-means + assignment
Training required   | No                        | Yes (k-means phase)
Build time          | Moderate, parallel         | Fast (after k-means)
Memory per vector   | 1.5 to 2x raw storage     | ~1.0x raw storage
Query recall        | Higher (at same speed)    | Lower (trades recall for speed)
Update support      | Yes (incremental inserts) | Requires partial rebuild
Filtered search     | Degrades with selectivity | Handles well via cluster pruning
Ideal scale         | Up to hundreds of millions| Billions (with IVF-PQ)
Default in          | Qdrant, Weaviate, pgvector | FAISS, Milvus with PQ

HNSW should be the default choice for most applications up to hundreds of millions of vectors where high recall is important. IVF becomes compelling when facing extreme scale beyond billions of vectors, severe memory constraints, or when combined with Product Quantization.

Product Quantization: Solving the Memory Problem

This is a short summary of PQ. For the full walkthrough, see the dedicated guide on product quantization explained.

Even with an efficient ANN algorithm, storing billions of 1536-dimensional float32 vectors requires substantial memory. Each vector at float32 precision occupies 1536 * 4 = 6,144 bytes, roughly 6 kilobytes. At one billion vectors, that is 6 terabytes of raw vector storage.

Product Quantization (PQ) compresses vectors by a factor of 32 to 64 times while retaining enough geometric structure to support accurate ANN search.

python

# How Product Quantization compresses a vector

# Original vector: 1536 float32 values = 6,144 bytes
original = np.random.randn(1536).astype(np.float32)

# PQ splits the vector into m subvectors, each with d/m dimensions
m = 96      # number of subvectors
k = 256     # number of centroids per subvector (fits in 1 byte)

# Each subvector (1536/96 = 16 dimensions) is assigned to its nearest
# centroid from a codebook of 256 centroids.
# Storage per subvector: log2(256) = 8 bits = 1 byte

compressed_bytes = m * 1   # m subvectors, each stored as 1 byte (centroid ID)
original_bytes   = 1536 * 4

compression_ratio = original_bytes / compressed_bytes
print(f"Original size:    {original_bytes:,} bytes ({original_bytes / 1024:.1f} KB)")
print(f"Compressed size:  {compressed_bytes} bytes")
print(f"Compression:      {compression_ratio:.0f}x")

# Output:
# Original size:    6,144 bytes (6.0 KB)
# Compressed size:  96 bytes
# Compression:      64x

According to Proptimise AI's indexing guide, a 6KB vector becomes roughly 192 bytes with PQ compression. At 100 million vectors, that reduces storage from 600GB to 19GB. The tradeoff is a moderate reduction in recall, typically 5 to 15 percent at the same search budget.

IVF-PQ combines both approaches: IVF clusters reduce the search space, PQ compression reduces memory. This combination makes billion-scale vector search feasible on commodity hardware.

The Full Picture: Why Each Layer Exists

The progression from brute force to HNSW to IVF to IVF-PQ is a progression of increasingly aggressive tradeoffs between accuracy, speed, and memory.

plaintext

Approach        | Accuracy | Speed   | Memory  | Scale
----------------+----------+---------+---------+------------------
Brute force     | Exact    | O(n*d)  | Low     | Up to ~100K
KD-tree / R-tree| Degrades | Breaks  | Low     | Below 15 dims only
HNSW            | ~99%     | O(logn) | 1.5-2x  | Up to ~500M
IVF-Flat        | ~95%     | O(√n)   | ~1.0x   | Up to ~100M
IVF-PQ          | ~90%     | O(√n)   | 1/32x   | Billions

The choice of which layer to use is determined by dataset size, available memory, and the minimum acceptable recall for your application. According to Proptimise AI's production guide, for 10M to 100M vectors, HNSW is the right choice if memory allows. Beyond 100M vectors, IVF-PQ or distributed HNSW is the appropriate path.

Why This Matters for the Stack You Build

Every choice in building a vector database backed AI application connects to these index properties.

Chunking strategy affects the number of vectors you store. Smaller chunks mean more vectors mean larger indexes. A document corpus of 100,000 PDFs chunked at 500 tokens produces roughly 5 to 10 million vectors. At 1536 dimensions, HNSW fits comfortably in 60 to 100 GB of RAM on a modern server.

Embedding model choice determines vector dimensionality. A 384-dimension model (like all-MiniLM-L6-v2) requires roughly four times less memory per vector than a 1536-dimension model. If memory is your constraint, a smaller embedding model can meaningfully expand your capacity before you need to switch to IVF-PQ compression.

Recall requirements determine which ANN algorithm and parameters to use. A RAG system where missing one document occasionally is acceptable can tolerate lower recall (say 95 percent) and run faster with more aggressive IVF pruning. A medical records search system may need 99 percent recall and must use HNSW with high ef_search.

The semantic search article covers how the full query pipeline uses these indexes: embedding the query, calling ANN search, applying metadata filters, and reranking results. The embeddings article covers how the vectors themselves are produced and how model choice affects dimensionality and quality. The dense vs sparse vectors article covers how sparse indexes (BM25's inverted lists) complement dense ANN indexes in hybrid search architectures.

Measuring ANN Index Quality

Before shipping a vector search system to production, measuring its actual recall against exact search on your real data is essential.

python

import numpy as np
import faiss

def measure_recall(
    corpus: np.ndarray,
    queries: np.ndarray,
    k: int = 10,
    nlist: int = 100,
    nprobe: int = 10,
) -> float:
    """
    Measure recall of IVF index against exact search ground truth.
    """
    d = corpus.shape[1]

    # Ground truth: exact search
    flat_index = faiss.IndexFlatIP(d)
    flat_index.add(corpus)
    _, gt_indices = flat_index.search(queries, k)

    # ANN: IVF index
    quantizer  = faiss.IndexFlatIP(d)
    ivf_index  = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_INNER_PRODUCT)
    ivf_index.train(corpus)
    ivf_index.add(corpus)
    ivf_index.nprobe = nprobe
    _, ann_indices = ivf_index.search(queries, k)

    # Recall@k: fraction of true neighbors found by ANN
    recall = 0.0
    for gt_row, ann_row in zip(gt_indices, ann_indices):
        recall += len(set(gt_row) & set(ann_row)) / k

    return recall / len(queries)


dimension  = 128
n_corpus   = 100_000
n_queries  = 500

corpus  = np.random.randn(n_corpus,  dimension).astype(np.float32)
queries = np.random.randn(n_queries, dimension).astype(np.float32)
faiss.normalize_L2(corpus)
faiss.normalize_L2(queries)

for nprobe in [1, 5, 10, 20, 50]:
    recall = measure_recall(corpus, queries, nlist=100, nprobe=nprobe)
    print(f"nprobe={nprobe:3d}: recall@10 = {recall:.4f}")

# Output (approximate):
# nprobe=  1: recall@10 = 0.5812
# nprobe=  5: recall@10 = 0.8734
# nprobe= 10: recall@10 = 0.9420
# nprobe= 20: recall@10 = 0.9751
# nprobe= 50: recall@10 = 0.9943

The recall curve flattens as nprobe increases. The knee of that curve (where each additional nprobe buys progressively less recall) is the right operating point for most production systems. For the example above, nprobe=20 gives 97.5 percent recall at roughly twice the computational cost of nprobe=10.

Summary

Traditional B-tree and hash indexes fail for vector search for one root reason: they were designed for exact matching and range queries on one-dimensional or low-dimensional data. Sorting vectors into a linear order destroys the geometric closeness that makes similarity search work.

The curse of dimensionality makes this worse as dimensions increase. Distances between random points become nearly equal, making tree traversal useless because every node in the tree becomes relevant to every query.

ANN algorithms solve this by accepting a controlled accuracy tradeoff. HNSW builds a layered proximity graph that enables O(log n) navigation to the approximate nearest neighbors in high-dimensional space. IVF clusters vectors into groups and only searches the most relevant clusters. IVF-PQ adds compression that enables billion-scale search in manageable memory.

These index structures are what separate a vector database from a general-purpose database with a vector column. The purpose-built storage engines in Qdrant, Milvus, and Weaviate are designed around these algorithms from the ground up, which is why they outperform general-purpose systems like Elasticsearch at pure vector search workloads, as covered in vector database vs Elasticsearch.

Sources and Further Reading

Pinecone. Hierarchical Navigable Small Worlds (HNSW). pinecone.io/learn/series/faiss/hnsw
Milvus Blog. How to Choose Between IVF and HNSW for ANN Vector Search. milvus.io/blog/understanding-ivf-vector-index
Milvus AI Reference. How Does Indexing Work in a Vector DB? milvus.io/ai-quick-reference/how-does-indexing-work-in-a-vector-db-ivf-hnsw-pq-etc
Redis. How HNSW Algorithms Improve Search. redis.io/blog/how-hnsw-algorithms-can-improve-search
TiDB / PingCAP. ANN Search Explained: IVF vs HNSW vs PQ. pingcap.com/article/approximate-nearest-neighbor-ann-search-explained-ivf-vs-hnsw-vs-pq
PyImageSearch. Vector Search with FAISS: ANN Explained. pyimagesearch.com/2026/02/16/vector-search-with-faiss-approximate-nearest-neighbor-ann-explained
Proptimise AI. Vector Search Indexing: HNSW, ANN, and What Actually Matters for Production. proptimiseai.com/blog/vector-search-indexing-hnsw-ann-production
Zilliz. How the Curse of Dimensionality Influences Indexing Techniques. zilliz.com/ai-faq/how-does-the-concept-of-the-curse-of-dimensionality-influence-the-design-of-indexing-techniques-for-vector-search
Springer. High Dimensional Indexing. link.springer.com/rwe/10.1007/978-0-387-39940-9_804
Analytics Vidhya. Advanced Vector Indexing Techniques for High-Dimensional Data. analyticsvidhya.com/blog/2024/09/vector-indexing-techniques
Medium / Mehrshad Asadi. Understanding Modern Vector Search: From HNSW to IVF-PQ. medium.com/@mehrcodeland/understanding-modern-vector-search-from-hnsw-to-ivf-pq
Gabor Koos. The Database Zoo: Vector Databases and High-Dimensional Search. blog.gaborkoos.com/posts/2025-11-25-The-Database-Zoo-Vector-Databases-and-High-Dimensional-Search
Lakshman et al. Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval. arxiv.org/abs/2512.12458
arXiv. Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search. arxiv.org/html/2405.03267v1
Malkov and Yashunin. Efficient and Robust ANN Search Using Hierarchical Navigable Small World Graphs. arxiv.org/abs/1603.09320

How B-Tree Indexes Work

Before explaining why B-trees fail for vectors, it helps to understand precisely why they succeed for everything else.

plaintext

B-tree on column: age

            [30]
           /    \
      [20]        [40]
      /  \        /  \
  [15] [25]   [35] [45]

For range queries like age BETWEEN 25 AND 35, the B-tree traverses to the left boundary and then reads contiguously to the right boundary. The tree's sorted structure makes this efficient.

python

import bisect

# Simulate a simple sorted B-tree leaf for integers
sorted_ages = [15, 20, 22, 25, 28, 30, 33, 35, 40, 45]

# Binary search: O(log n) to find insertion point
target = 28
idx = bisect.bisect_left(sorted_ages, target)
print(f"Found {sorted_ages[idx]} at index {idx}")
# Found 28 at index 4 — 4 comparisons for 10 elements

# For n=1,000,000 elements, binary search takes at most ~20 comparisons
import math
print(f"Max comparisons for 1M elements: {math.ceil(math.log2(1_000_000))}")
# Max comparisons for 1M elements: 20

Twenty comparisons to find a value among one million entries. This efficiency is why B-trees have powered databases for 50 years.

Why B-Trees Cannot Sort High-Dimensional Vectors

A vector like [0.41, -1.22, 0.03, 2.18, 0.77, ..., -0.55] with 1536 dimensions cannot be placed in a meaningful sorted order.

To see why, consider two vectors in two dimensions:

python

import numpy as np

v1 = np.array([0.8, 0.2])
v2 = np.array([0.7, 0.9])
v3 = np.array([0.9, 0.1])

# Cosine similarity: how semantically similar are these?
def cosine_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

print(f"v1 vs v2: {cosine_sim(v1, v2):.4f}")
print(f"v1 vs v3: {cosine_sim(v1, v3):.4f}")

The Curse of Dimensionality: The Mathematical Intuition

The curse of dimensionality is not a vague observation. It is a precise mathematical phenomenon that has been studied since the 1960s. Two aspects of it are directly relevant to vector search.

Aspect 1: Volume Grows Exponentially

python

import numpy as np

dimensions = [1, 2, 5, 10, 50, 100, 512, 1536]
ratio = 0.9

print("Fraction of volume within 90% shell:")
for d in dimensions:
    fraction = ratio ** d
    print(f"  {d:5d} dimensions: {fraction * 100:.8f}%")

# Output:
# Fraction of volume within 90% shell:
#      1 dimensions: 90.00000000%
#      2 dimensions: 81.00000000%
#      5 dimensions: 59.04900000%
#     10 dimensions: 34.86784401%
#     50 dimensions:  0.51537752%
#    100 dimensions:  0.00265614%
#    512 dimensions:  0.00000000%
#   1536 dimensions:  0.00000000%

Aspect 2: Distances Become Indistinguishable

python

import numpy as np

def distance_concentration(dims, n_points=1000, n_trials=100):
    """
    Measure how distinguishable nearest vs farthest neighbors are.
    Returns average ratio: max_dist / min_dist (closer to 1 = harder to search)
    """
    ratios = []
    for _ in range(n_trials):
        points = np.random.randn(n_points, dims)
        query  = np.random.randn(dims)

        distances = np.linalg.norm(points - query, axis=1)
        ratio = distances.max() / (distances.min() + 1e-10)
        ratios.append(ratio)

    return np.mean(ratios)

print("Distance ratio (max/min) — higher is better for search:")
for d in [2, 10, 50, 100, 300, 768, 1536]:
    ratio = distance_concentration(d)
    print(f"  {d:5d} dimensions: {ratio:.2f}")

# Output (approximate, varies by random seed):
# Distance ratio (max/min) — higher is better for search:
#      2 dimensions: 312.45
#     10 dimensions:  42.18
#     50 dimensions:  12.83
#    100 dimensions:   9.21
#    300 dimensions:   5.74
#    768 dimensions:   4.01
#   1536 dimensions:   3.22

Why This Breaks Tree Indexes Specifically

What Brute Force Looks Like and Why It Does Not Scale

Before covering the solutions, it is worth quantifying what happens if you simply skip indexing and compare every query against every stored vector.

python

import numpy as np
import time

def brute_force_search(corpus: np.ndarray, query: np.ndarray, top_k: int = 10):
    """
    Exact nearest neighbor search: compare query against every vector.
    Time complexity: O(n * d) where n = corpus size, d = dimensions.
    """
    similarities = np.dot(corpus, query) / (
        np.linalg.norm(corpus, axis=1) * np.linalg.norm(query)
    )
    top_indices = np.argsort(similarities)[::-1][:top_k]
    return top_indices, similarities[top_indices]

# Simulate different corpus sizes at 1536 dimensions
dimensions = 1536

for n_vectors in [1_000, 10_000, 100_000, 1_000_000]:
    corpus = np.random.randn(n_vectors, dimensions).astype(np.float32)
    query  = np.random.randn(dimensions).astype(np.float32)

    start = time.perf_counter()
    indices, scores = brute_force_search(corpus, query)
    elapsed_ms = (time.perf_counter() - start) * 1000

    print(f"  {n_vectors:>10,} vectors: {elapsed_ms:.1f}ms")

# Approximate output (on a typical CPU):
#       1,000 vectors:   0.3ms
#      10,000 vectors:   3.1ms
#     100,000 vectors:  30.8ms
#   1,000,000 vectors: 308.4ms

According to the PyImageSearch ANN guide, a good ANN index achieves 0.9 to 0.99 recall at top-k results while being orders of magnitude faster than brute force.

The Solution: Approximate Nearest Neighbor Algorithms

Two approaches dominate production deployments: HNSW for high-recall workloads and IVF for memory-constrained or extremely large-scale deployments.

This is a short summary of HNSW. For the full walkthrough with diagrams, see the dedicated guide on the HNSW algorithm explained.

How HNSW Is Built

plaintext

HNSW structure with 3 layers:

Layer 2 (sparse, long-range):   A ----------- F
Layer 1 (medium):               A -- B -- C   D -- E -- F
Layer 0 (dense, all vectors):   A-B-C-X-Y-D-E-F-G-H-I-J

Each node connects to M nearest neighbors in its layer.
Default M=16 means each node has up to 16 connections.

python

import numpy as np
import faiss

dimension  = 128
n_vectors  = 100_000
M          = 16          # number of connections per node
ef_construction = 64     # candidates examined during build

# Create HNSW index
index = faiss.IndexHNSWFlat(dimension, M)
index.hnsw.efConstruction = ef_construction

# Build index from vectors
vectors = np.random.randn(n_vectors, dimension).astype(np.float32)
faiss.normalize_L2(vectors)
index.add(vectors)

print(f"Index contains {index.ntotal} vectors")

# Search
query = np.random.randn(1, dimension).astype(np.float32)
faiss.normalize_L2(query)

index.hnsw.efSearch = 50    # candidates explored during search
distances, indices  = index.search(query, k=10)

print(f"Top 5 nearest neighbors: {indices[0][:5]}")
print(f"Distances:               {distances[0][:5]}")

How HNSW Searches

plaintext

Query vector q:

Layer 2: Enter at A. A connects to F. F is closer to q. Move to F.
         No neighbor of F is closer than F. Descend.

Layer 1: Enter at F. F connects to D, E. E is closer. Move to E.
         E connects to D, F. D is closer. Move to D.
         No neighbor of D is closer. Descend.

Layer 0: Search all neighbors of D in dense local region.
         Return top-K nearest found.

The key parameters that control HNSW performance:

python

# HNSW tuning parameters and their effects

# M (connections per node): 
#   Higher M → better recall, more memory, slower builds
#   Default 16 is good for most cases
#   Range: 5 to 48

# ef_construction (candidates during build):
#   Higher ef_construction → better graph quality, slower build time
#   Default 64, higher for better recall at cost of build time

# ef_search (candidates during query):
#   Higher ef_search → better recall, slower queries
#   Set at query time, does not require rebuilding index
#   Tune this to hit your target recall threshold

# Memory usage: approximately (M * 4 bytes * n_vectors) + raw vector storage
# For 1M vectors at M=16, dimension=1536:
M = 16
n = 1_000_000
d = 1536

graph_overhead_bytes = M * 4 * n                     # ~64 MB for graph edges
vector_storage_bytes = d * 4 * n                     # ~6 GB for float32 vectors
total_memory_gb      = (graph_overhead_bytes + vector_storage_bytes) / 1e9

print(f"Estimated HNSW index size: {total_memory_gb:.2f} GB for 1M vectors at d=1536")
# Estimated HNSW index size: 6.06 GB for 1M vectors at d=1536

IVF: Cluster-Based Partitioning

This is a short summary of IVF. For the full walkthrough, see the dedicated guide on the IVF index explained.

How IVF Is Built

python

import numpy as np
import faiss

dimension = 128
n_vectors = 100_000
nlist     = 100     # number of clusters (typical: sqrt(n_vectors))

# IVF requires a training phase to learn cluster centroids
quantizer = faiss.IndexFlatL2(dimension)     # exact search for centroid lookup
index     = faiss.IndexIVFFlat(quantizer, dimension, nlist)

vectors = np.random.randn(n_vectors, dimension).astype(np.float32)
faiss.normalize_L2(vectors)

# Train the index (learns cluster centroids)
index.train(vectors)
print(f"Index trained: {index.is_trained}")

# Add vectors (assigns each to its nearest centroid)
index.add(vectors)
print(f"Index contains {index.ntotal} vectors in {nlist} clusters")

# Search with nprobe clusters
index.nprobe = 10    # search top 10 clusters (10% of nlist=100)
distances, indices = index.search(
    np.random.randn(1, dimension).astype(np.float32),
    k=10
)

How IVF Searches

plaintext

IVF search with nlist=1000 clusters, nprobe=10:

1. Compare query against 1000 centroids (fast: only 1000 comparisons)
2. Select top 10 closest centroids
3. Search vectors in only those 10 clusters
4. Each cluster contains ~100 vectors on average (1,000,000 / 1000)
5. Total comparisons: ~1000 (centroids) + ~1000 (within clusters) = ~2000
   vs brute force: 1,000,000 comparisons

Speedup: ~500x with nprobe=10

HNSW vs IVF: The Practical Tradeoffs

plaintext

Property            | HNSW                      | IVF
--------------------+---------------------------+---------------------------
Build approach      | Incremental graph          | Batch k-means + assignment
Training required   | No                        | Yes (k-means phase)
Build time          | Moderate, parallel         | Fast (after k-means)
Memory per vector   | 1.5 to 2x raw storage     | ~1.0x raw storage
Query recall        | Higher (at same speed)    | Lower (trades recall for speed)
Update support      | Yes (incremental inserts) | Requires partial rebuild
Filtered search     | Degrades with selectivity | Handles well via cluster pruning
Ideal scale         | Up to hundreds of millions| Billions (with IVF-PQ)
Default in          | Qdrant, Weaviate, pgvector | FAISS, Milvus with PQ

Product Quantization: Solving the Memory Problem

This is a short summary of PQ. For the full walkthrough, see the dedicated guide on product quantization explained.

Product Quantization (PQ) compresses vectors by a factor of 32 to 64 times while retaining enough geometric structure to support accurate ANN search.

python

# How Product Quantization compresses a vector

# Original vector: 1536 float32 values = 6,144 bytes
original = np.random.randn(1536).astype(np.float32)

# PQ splits the vector into m subvectors, each with d/m dimensions
m = 96      # number of subvectors
k = 256     # number of centroids per subvector (fits in 1 byte)

# Each subvector (1536/96 = 16 dimensions) is assigned to its nearest
# centroid from a codebook of 256 centroids.
# Storage per subvector: log2(256) = 8 bits = 1 byte

compressed_bytes = m * 1   # m subvectors, each stored as 1 byte (centroid ID)
original_bytes   = 1536 * 4

compression_ratio = original_bytes / compressed_bytes
print(f"Original size:    {original_bytes:,} bytes ({original_bytes / 1024:.1f} KB)")
print(f"Compressed size:  {compressed_bytes} bytes")
print(f"Compression:      {compression_ratio:.0f}x")

# Output:
# Original size:    6,144 bytes (6.0 KB)
# Compressed size:  96 bytes
# Compression:      64x

IVF-PQ combines both approaches: IVF clusters reduce the search space, PQ compression reduces memory. This combination makes billion-scale vector search feasible on commodity hardware.

The Full Picture: Why Each Layer Exists

The progression from brute force to HNSW to IVF to IVF-PQ is a progression of increasingly aggressive tradeoffs between accuracy, speed, and memory.

plaintext

Approach        | Accuracy | Speed   | Memory  | Scale
----------------+----------+---------+---------+------------------
Brute force     | Exact    | O(n*d)  | Low     | Up to ~100K
KD-tree / R-tree| Degrades | Breaks  | Low     | Below 15 dims only
HNSW            | ~99%     | O(logn) | 1.5-2x  | Up to ~500M
IVF-Flat        | ~95%     | O(√n)   | ~1.0x   | Up to ~100M
IVF-PQ          | ~90%     | O(√n)   | 1/32x   | Billions

Why This Matters for the Stack You Build

Every choice in building a vector database backed AI application connects to these index properties.

Measuring ANN Index Quality

Before shipping a vector search system to production, measuring its actual recall against exact search on your real data is essential.

python

import numpy as np
import faiss

def measure_recall(
    corpus: np.ndarray,
    queries: np.ndarray,
    k: int = 10,
    nlist: int = 100,
    nprobe: int = 10,
) -> float:
    """
    Measure recall of IVF index against exact search ground truth.
    """
    d = corpus.shape[1]

    # Ground truth: exact search
    flat_index = faiss.IndexFlatIP(d)
    flat_index.add(corpus)
    _, gt_indices = flat_index.search(queries, k)

    # ANN: IVF index
    quantizer  = faiss.IndexFlatIP(d)
    ivf_index  = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_INNER_PRODUCT)
    ivf_index.train(corpus)
    ivf_index.add(corpus)
    ivf_index.nprobe = nprobe
    _, ann_indices = ivf_index.search(queries, k)

    # Recall@k: fraction of true neighbors found by ANN
    recall = 0.0
    for gt_row, ann_row in zip(gt_indices, ann_indices):
        recall += len(set(gt_row) & set(ann_row)) / k

    return recall / len(queries)


dimension  = 128
n_corpus   = 100_000
n_queries  = 500

corpus  = np.random.randn(n_corpus,  dimension).astype(np.float32)
queries = np.random.randn(n_queries, dimension).astype(np.float32)
faiss.normalize_L2(corpus)
faiss.normalize_L2(queries)

for nprobe in [1, 5, 10, 20, 50]:
    recall = measure_recall(corpus, queries, nlist=100, nprobe=nprobe)
    print(f"nprobe={nprobe:3d}: recall@10 = {recall:.4f}")

# Output (approximate):
# nprobe=  1: recall@10 = 0.5812
# nprobe=  5: recall@10 = 0.8734
# nprobe= 10: recall@10 = 0.9420
# nprobe= 20: recall@10 = 0.9751
# nprobe= 50: recall@10 = 0.9943

Summary

Sources and Further Reading

Pinecone. Hierarchical Navigable Small Worlds (HNSW). pinecone.io/learn/series/faiss/hnsw
Milvus Blog. How to Choose Between IVF and HNSW for ANN Vector Search. milvus.io/blog/understanding-ivf-vector-index
Milvus AI Reference. How Does Indexing Work in a Vector DB? milvus.io/ai-quick-reference/how-does-indexing-work-in-a-vector-db-ivf-hnsw-pq-etc
Redis. How HNSW Algorithms Improve Search. redis.io/blog/how-hnsw-algorithms-can-improve-search
TiDB / PingCAP. ANN Search Explained: IVF vs HNSW vs PQ. pingcap.com/article/approximate-nearest-neighbor-ann-search-explained-ivf-vs-hnsw-vs-pq
PyImageSearch. Vector Search with FAISS: ANN Explained. pyimagesearch.com/2026/02/16/vector-search-with-faiss-approximate-nearest-neighbor-ann-explained
Proptimise AI. Vector Search Indexing: HNSW, ANN, and What Actually Matters for Production. proptimiseai.com/blog/vector-search-indexing-hnsw-ann-production
Zilliz. How the Curse of Dimensionality Influences Indexing Techniques. zilliz.com/ai-faq/how-does-the-concept-of-the-curse-of-dimensionality-influence-the-design-of-indexing-techniques-for-vector-search
Springer. High Dimensional Indexing. link.springer.com/rwe/10.1007/978-0-387-39940-9_804
Analytics Vidhya. Advanced Vector Indexing Techniques for High-Dimensional Data. analyticsvidhya.com/blog/2024/09/vector-indexing-techniques
Medium / Mehrshad Asadi. Understanding Modern Vector Search: From HNSW to IVF-PQ. medium.com/@mehrcodeland/understanding-modern-vector-search-from-hnsw-to-ivf-pq
Gabor Koos. The Database Zoo: Vector Databases and High-Dimensional Search. blog.gaborkoos.com/posts/2025-11-25-The-Database-Zoo-Vector-Databases-and-High-Dimensional-Search
Lakshman et al. Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval. arxiv.org/abs/2512.12458
arXiv. Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search. arxiv.org/html/2405.03267v1
Malkov and Yashunin. Efficient and Robust ANN Search Using Hierarchical Navigable Small World Graphs. arxiv.org/abs/1603.09320

How B-Tree Indexes Work

Why B-Trees Cannot Sort High-Dimensional Vectors

The Curse of Dimensionality: The Mathematical Intuition

Aspect 1: Volume Grows Exponentially

Aspect 2: Distances Become Indistinguishable

Why This Breaks Tree Indexes Specifically

What Brute Force Looks Like and Why It Does Not Scale

The Solution: Approximate Nearest Neighbor Algorithms

HNSW: Graph-Based Navigation

How HNSW Is Built

How HNSW Searches

IVF: Cluster-Based Partitioning

How IVF Is Built

How IVF Searches

HNSW vs IVF: The Practical Tradeoffs

Product Quantization: Solving the Memory Problem

The Full Picture: Why Each Layer Exists

Why This Matters for the Stack You Build

Measuring ANN Index Quality

Summary

Sources and Further Reading

Krunal Kanojiya

Related Posts

How B-Tree Indexes Work

Why B-Trees Cannot Sort High-Dimensional Vectors

The Curse of Dimensionality: The Mathematical Intuition

Aspect 1: Volume Grows Exponentially

Aspect 2: Distances Become Indistinguishable

Why This Breaks Tree Indexes Specifically

What Brute Force Looks Like and Why It Does Not Scale

The Solution: Approximate Nearest Neighbor Algorithms

HNSW: Graph-Based Navigation

How HNSW Is Built

How HNSW Searches

IVF: Cluster-Based Partitioning

How IVF Is Built

How IVF Searches

HNSW vs IVF: The Practical Tradeoffs

Product Quantization: Solving the Memory Problem

The Full Picture: Why Each Layer Exists

Why This Matters for the Stack You Build

Measuring ANN Index Quality

Summary

Sources and Further Reading

Krunal Kanojiya

Related Posts