Which vector database is best in 2026?

It depends on your use case. Pinecone is best for zero-ops managed search with fast setup. Qdrant is best for filtering-heavy workloads, hybrid search, and cost-conscious teams. Weaviate is best for built-in hybrid search and multi-modal data. Milvus is best for billion-scale deployments that need maximum scalability and GPU acceleration.

Is Qdrant better than Pinecone?

For most production workloads, yes. Qdrant delivers lower p50 latency (4ms vs 20ms on Pinecone Serverless), better filtered search recall, and significantly lower cost at scale. The exception is operational simplicity: Pinecone requires zero infrastructure management, which matters for small teams without DevOps capacity.

Which vector database is the cheapest?

Self-hosted Qdrant is the cheapest option. A $20 to $40 per month VPS handles up to 10 million vectors with good latency and zero per-query fees. Self-hosted Milvus and Weaviate are also cost-effective at scale, but require Kubernetes expertise. For managed cloud, Qdrant Cloud and Pinecone Serverless are roughly equivalent at 10M vectors, around $65 to $70 per month.

Which vector database is best for RAG?

Qdrant and Weaviate both work very well for RAG pipelines. Qdrant gives you the best filtering and lowest latency for most team sizes. Weaviate wins if your RAG pipeline needs native hybrid search with BM25 plus dense vectors in a single query. Pinecone is the easiest to set up for RAG prototyping. Milvus is the right choice if your RAG dataset exceeds 100 million vectors.

Is Weaviate or Milvus better?

They solve different problems. Weaviate is better for hybrid search, built-in vectorization, and multi-modal data. Milvus is better for billion-scale deployments, GPU-accelerated indexing, and teams that need the most index type flexibility. If your dataset is under 50 million vectors, Weaviate is operationally simpler. Above that scale, Milvus has a more mature distributed architecture.

Can I self-host all four of these vector databases?

Yes, but with different levels of complexity. Qdrant is the easiest to self-host as a single binary or Docker container. Weaviate requires Docker Compose or Kubernetes for production. Milvus requires a full Kubernetes deployment with etcd, MinIO or S3, and a message queue. Pinecone cannot be self-hosted at all as it is a proprietary closed-source service.

Pinecone vs Weaviate vs Milvus vs Qdrant (2026): Full Comparison

I have helped a number of teams pick a vector database in the last year. The conversation always starts the same way: someone sends me a Slack message with four logos and asks which one to use. Pinecone, Weaviate, Milvus, Qdrant.

The honest answer is that all four are good. The useful answer is that each one is built for a different set of constraints. Getting this decision wrong does not break your application on day one, but it shows up six months later when your bill is three times your compute budget, or your filtered search recall starts degrading under load.

This comparison covers what actually matters in production: architecture, performance, filtering quality, hybrid search, cost at scale, and developer experience. By the end you will know exactly which one fits your situation.

The Short Version

If you want to skip ahead, here is the summary.

Use Pinecone if you want a fully managed service with zero infrastructure to operate and your dataset is under 10 million vectors.

Use Qdrant if you need the best filtering, native hybrid search, low latency, or lower cost at scale. It is the right default for most teams building RAG pipelines in 2026.

Use Weaviate if your application needs built-in vectorization, multi-modal search, or the most mature BM25 plus dense vector hybrid search implementation available.

Use Milvus if your dataset exceeds 100 million vectors, you need GPU-accelerated indexing, or you have Kubernetes expertise and want maximum control over indexing strategy.

For everything else, keep reading.

What Each One Is

What is Pinecone

Pinecone is a fully managed, closed-source vector database. You cannot run it on your own servers. You get an API key and use their cloud service.

It launched in 2021 as one of the first purpose-built vector databases. In 2026, Pinecone has committed fully to serverless as the default. Pod-based indexes are now considered legacy. The serverless model charges you per read unit, write unit, and storage, with no idle compute cost.

Recent additions include Pinecone Inference (hosted embedding models so you do not need a separate embedding call), Dedicated Read Nodes for predictable latency at high QPS, full-text search in public preview, and BYOC (Bring Your Own Cloud) on AWS, GCP, and Azure for teams that need data residency.

Plans in 2026 run from a free Starter tier to Builder ($20 per month flat), Standard (usage-based, $50 per month minimum), and Enterprise.

What is Weaviate

Weaviate is an open-source vector database written in Go, released under Apache 2.0. You can self-host it or use Weaviate Cloud as a managed service.

Its core strength is hybrid search. Weaviate combines dense vector search with BM25 keyword matching in a single query, processed simultaneously rather than as two separate passes that you merge yourself.

In April 2026, Weaviate shipped v1.37 with a native MCP Server, making it the first vector database where LLMs and AI agents can query and write directly through a standardized protocol without custom integration code. The update also added Diversity Search using Maximal Marginal Relevance, which reduces redundancy in retrieval results.

Weaviate Cloud plans start with a free 14-day sandbox (no credit card required), then Flex at $45 per month minimum and Premium at $400 per month minimum. Pricing is based on vector dimensions, storage, and backups.

What us Milvus

Milvus is an open-source vector database written in Go and C++, released under Apache 2.0 and hosted by the LF AI and Data Foundation. The managed version is Zilliz Cloud.

Milvus is built for scale that none of the other three can match. Its distributed architecture separates compute from storage, allowing independent scaling of query nodes, data nodes, and index nodes. It handles 10 billion or more vectors in production.

Milvus 2.6, released in mid-2025 and actively patched through 2.6.18, introduced several major improvements: RaBitQ 1-bit quantization that compresses indexes to 1/32 of their original size while maintaining 95 percent recall, tiered hot and cold storage that automatically moves data based on access patterns, Woodpecker (a new WAL system that replaces the dependency on Kafka or Pulsar), and BM25 full-text search benchmarked at 400 percent higher throughput than Elasticsearch on equivalent hardware. Milvus 3.0 is targeted for late 2026 with External Collection, Snapshot, and Storage V3.

What is Qdrant

Qdrant is an open-source vector database written in Rust, released under Apache 2.0. The managed version is Qdrant Cloud.

Qdrant has surpassed 30,000 GitHub stars and 250 million downloads. It powers production workloads at Canva, Tripadvisor, HubSpot, Bosch, and Deutsche Telekom. The Rust implementation gives it very low memory overhead and consistently the lowest p50 latency of any purpose-built vector database in independent benchmarks: 4ms at 1 million vectors.

In April 2026, Qdrant Cloud shipped three enterprise capabilities: GPU-accelerated HNSW indexing (4x faster index builds on AWS, now available on GPU-enabled clusters), Multi-AZ clusters replicating across three availability zones with 99.95 percent uptime SLAs and no failover delay, and audit logging for all API operations in structured JSON. Qdrant's free tier includes 1GB of storage, enough for approximately 500,000 vectors at 1536 dimensions.

Architecture and Deployment

How each database handles deployment defines most of the trade-offs that follow.

Pinecone is a black box. You do not see or configure the underlying index architecture. The serverless model separates storage from compute. You pay for what you query. No idle charges, no infrastructure decisions.

Weaviate uses HNSW indexing with a segment-based architecture. It supports a flat index for small collections where memory efficiency matters more than speed, automatic compression, and dynamic indexing. The modular architecture lets you swap embedding models, vectorizers, and rerankers without rebuilding your application.

Milvus is the most architecturally complex of the four. It separates storage and compute at the infrastructure level, running multiple specialized node types on Kubernetes: query nodes, data nodes, index nodes, and a coordinator layer. This is what allows it to scale to billions of vectors and independently add query capacity for read-heavy loads. Milvus 2.6 replaced external Kafka or Pulsar dependencies with Woodpecker, its own WAL system built on object storage.

Qdrant uses segment-based HNSW storage. The optimizer merges and reindexes segments in the background. You can tune construction parameters, quantization strategy, and the disk versus memory trade-off per collection. Self-hosting Qdrant is a single binary or Docker container. No Kubernetes required for most workloads.

Deployment complexity from simplest to hardest: Pinecone (no ops) and Qdrant (single binary) and Weaviate (Docker Compose) and Milvus (Kubernetes).

Performance and Latency

Benchmarks from Salt Technologies AI's Vector Database Performance Benchmark 2026 covering 1 million vectors at 1536 dimensions:

Database	p50 Latency	p99 Latency	Notes
Qdrant (self-hosted)	4ms	8 to 12ms	Lowest latency of all purpose-built vector DBs
Milvus (with GPU)	6ms	12 to 18ms	GPU-accelerated; CPU-only is slower
Pinecone Serverless	20 to 30ms	40 to 80ms (cold)	Warm queries 10 to 15ms
Weaviate Cloud	50 to 70ms	100 to 150ms	Managed; self-hosted is faster

A few things to understand from these numbers.

Qdrant's 4ms p50 is consistent across warm and cold queries because it runs on dedicated infrastructure you control. Pinecone's 20 to 30ms p50 is for the managed serverless product. The cold query spike matters for applications with bursty or overnight-quiet traffic patterns.

Milvus with GPU acceleration reaches 6ms p50, comparable to self-hosted Qdrant, but you are managing a Kubernetes cluster to get there. Without GPU, Milvus p50 is closer to 15 to 25ms depending on configuration.

Weaviate's managed latency is the highest of the four. Self-hosted Weaviate with HNSW and binary quantization can reach 20 to 40ms p99, significantly better than the managed cloud numbers, but you take on the infrastructure burden.

Best for raw performance: Qdrant (self-hosted) or Milvus with GPU. For zero-ops performance, Pinecone Serverless warm queries are competitive at moderate scale.

Filtering

Filtering is the most important practical difference between these four databases for most production applications. Almost every real RAG or search query includes filters: user ID, date range, category, tenant ID, or some combination.

When a database receives a filtered vector query, there are two broad approaches. Post-filtering retrieves a candidate set from the index first and then applies the filter to the results. Pre-filtering or filtered traversal applies the filter during the index traversal itself, visiting only nodes that match.

Post-filtering breaks down when your filter is highly selective. If only 1 percent of your vectors match the filter, the traversal wastes 99 percent of its work retrieving vectors that get discarded. Recall drops significantly under selective filters.

Qdrant uses filtered HNSW traversal. The payload filter is applied during graph traversal, not after. Recall stays high even when filters reduce the dataset to a tiny fraction of the total. This is the most technically correct approach for selective filtering and the reason Qdrant is the top recommendation for filtering-heavy workloads.

python

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

client = QdrantClient(url="http://localhost:6333")

results = client.query_points(
    collection_name="documents",
    query=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(key="user_id", match=MatchValue(value=42)),
            FieldCondition(key="category", match=MatchValue(value="technical")),
            FieldCondition(
                key="created_at",
                range=Range(gte=1700000000)
            ),
        ]
    ),
    limit=10,
)

Weaviate also applies filters during index traversal and performs well on selective filters. Its filtering uses the inverted index on payload fields alongside the HNSW graph, which gives it strong recall under filtering conditions comparable to Qdrant.

Milvus supports filtering through a scalar index built alongside the vector index. Performance under selective filters is strong at scale due to the distributed architecture. At very high QPS with complex filters, Milvus's distributed query execution handles the load better than single-node options.

Pinecone handles metadata filtering well for common equality and range filters. Under highly selective filters on large datasets, recall can degrade. Pinecone's metadata filter syntax is also less expressive than SQL and does not support joins, subqueries, or arbitrary expressions.

Best for filtering: Qdrant and Weaviate for correctness under selective filters. Milvus for filtering at billion-scale. Pinecone for simple filters with moderate selectivity.

Hybrid Search

Hybrid search combines dense vectors with sparse vectors to handle both semantic queries and exact keyword matches. It consistently outperforms pure semantic search for technical documentation, product catalogs, and anything with specific identifiers like product codes, proper nouns, or model numbers.

Weaviate has the most mature hybrid search implementation of the four. BM25 plus dense vector search is processed in a single unified query, not as two separate calls you merge in application code. The BM25 keyword index is maintained automatically alongside the vector index.

python

import weaviate
from weaviate.classes.query import HybridFusion

client = weaviate.connect_to_weaviate_cloud(
    cluster_url="YOUR_CLUSTER_URL",
    auth_credentials=weaviate.auth.AuthApiKey("YOUR_API_KEY"),
)

collection = client.collections.get("Documents")

results = collection.query.hybrid(
    query="vector database filtering performance",
    vector=query_embedding,
    fusion_type=HybridFusion.RELATIVE_SCORE,
    limit=10,
)

client.close()

Qdrant supports native sparse vectors alongside dense vectors in named vector collections. You store both a dense embedding and a sparse vector (BM25 or SPLADE weights) for each document and query them together using Reciprocal Rank Fusion.

python

from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, Prefetch, FusionQuery, Fusion

client = QdrantClient(url="http://localhost:6333")

results = client.query_points(
    collection_name="documents",
    prefetch=[
        Prefetch(query=dense_vector, using="dense", limit=20),
        Prefetch(
            query=SparseVector(indices=sparse_indices, values=sparse_values),
            using="sparse",
            limit=20,
        ),
    ],
    query=FusionQuery(fusion=Fusion.RRF),
    limit=10,
)

Milvus supports hybrid search with BM25 full-text search integrated into the 2.6 release. The BM25 implementation is benchmarked by the Milvus team at 400 percent higher throughput than Elasticsearch on equivalent workloads, which lets teams collapse a two-system stack (Elasticsearch plus vector DB) into a single Milvus deployment.

Pinecone supports hybrid search through sparse-dense vectors. You store both a dense and sparse vector per document and pass both in the query. The implementation is clean but less flexible than Qdrant's named vector design when combining more than two vector types.

Best for hybrid search: Weaviate for the most integrated and mature implementation. Qdrant for flexibility and control. Milvus for teams who also need keyword search throughput at scale.

Cost Comparison

Cost is where teams get surprised the most. The number on the pricing page is rarely the number you pay in production.

Pinecone

Pinecone Serverless bills on three dimensions: read units, write units, and storage.

Storage: approximately $0.033 per GB per month
Standard plan read units: approximately $16 per million read units
Write units: approximately $2 per million write units

At 1536 dimensions, 1 million vectors occupies roughly 6GB. Each query consumes approximately 6 to 20 read units depending on dimensions and filter complexity.

Scale	Vectors	Queries per month	Estimated monthly cost
Prototype	100K	10K	Under $5
Small production	500K	100K	$15 to $40
Medium production	2M	500K	$80 to $200
Large production	10M	2M	$400 to $800
Enterprise	50M+	10M+	$2,000 plus

The Builder plan at $20 per month flat is useful for individual developers who want higher limits than the free tier without committing to usage-based pricing.

Weaviate

Weaviate Cloud bills on vector dimensions, storage, and backups.

The formula: monthly cost = (vectors x dimensions x replication factor x $0.01668) / 1,000,000 + (storage GB x $0.255) + backup costs

Important detail: the replication factor multiplies your dimension billing. RF=2 doubles it. RF=3 triples it. At 5 million vectors at 1536 dimensions with RF=2, your dimension cost alone is around $256 per month before storage. Enabling Binary Quantization can reduce dimension billing by up to 97 percent, making it essential at any scale above 1 million vectors.

Managed Cloud minimum costs: Flex at $45 per month, Premium at $400 per month.

At 10 million vectors (with Binary Quantization enabled), Weaviate Cloud runs approximately $135 per month. Without Binary Quantization, the same dataset costs significantly more.

Self-hosting Weaviate is free but requires meaningful DevOps investment. Community members report that a 120GB RAM VPS (around $80 to $120 per month on Hetzner or Contabo) handles 9 million embeddings comfortably.

Milvus

Milvus itself is free and open source. You pay only for the infrastructure you run it on.

For self-hosted Milvus Standalone (single node, suitable for up to roughly 10 million vectors):

A VPS with 16GB RAM and 8 vCPU on DigitalOcean or Hetzner: $60 to $100 per month

For Milvus Distributed on Kubernetes (necessary for billion-scale):

Minimum viable production cluster: $400 to $800 per month in cloud compute

Zilliz Cloud, the managed Milvus service, starts at roughly $65 per month for small deployments and scales with storage and compute.

Milvus 2.6's RaBitQ quantization compresses indexes to 1/32 of their original size with 95 percent recall maintained. At billion-scale, this materially changes the hardware required and therefore the cost.

Qdrant

Qdrant Cloud pricing starts at $0.014 per hour per node (approximately $10 per month for the smallest node), with a free tier providing 1GB storage.

For self-hosted Qdrant on a VPS:

Instance	RAM	Vectors at 1536d	Monthly cost
Small VPS (Hetzner CX22)	4GB	~500K	$5 to $10
Medium VPS (Hetzner CX32)	8GB	~1M	$15 to $25
Large VPS (DigitalOcean 8vCPU/16GB)	16GB	~2M	$40 to $60
Dedicated server	64GB	~8M	$80 to $150

At 10 million vectors, self-hosted Qdrant on a $40 per month VPS with binary quantization enabled handles the workload with sub-10ms p99. Qdrant Cloud for the same workload runs approximately $65 per month with no per-query fees.

Cost summary

Database	10M vectors managed (est.)	10M vectors self-hosted	Best cost scenario
Pinecone Serverless	$70 to $100/month	Not available	Low query volume, small scale
Weaviate Cloud	$135/month (BQ enabled)	$80 to $120/month (Hetzner)	Hybrid search included, no extra cost
Milvus (Zilliz)	$65 to $120/month	$60 to $100/month	Billion-scale with quantization
Qdrant Cloud	$65/month	$20 to $40/month	Best self-hosted economics

Best for cost: Self-hosted Qdrant for the lowest total cost at any scale. Managed Pinecone Serverless for small workloads where operational cost outweighs infrastructure savings.

Developer Experience

All four have Python SDKs. The differences are in configuration depth and integration ecosystem.

Pinecone is the fastest to get started. Create an account, generate an API key, create an index, and push vectors. No Docker, no Kubernetes, no configuration decisions.

python

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("my-index")

index.upsert(vectors=[
    {"id": "doc-1", "values": embedding, "metadata": {"text": "..."}}
])

results = index.query(vector=query_embedding, top_k=10, include_metadata=True)

Weaviate has a steeper initial learning curve due to its schema-first design. You define a schema (called a collection in v4) before inserting data. Once past that setup, the API is expressive and the built-in vectorization means you can skip the separate embedding call for supported models.

python

import weaviate
from weaviate.classes.config import Configure

client = weaviate.connect_to_weaviate_cloud(
    cluster_url="YOUR_CLUSTER_URL",
    auth_credentials=weaviate.auth.AuthApiKey("YOUR_API_KEY"),
)

# Create a collection with built-in vectorization
client.collections.create(
    name="Documents",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    properties=[
        weaviate.classes.config.Property(name="content", data_type=weaviate.classes.config.DataType.TEXT),
        weaviate.classes.config.Property(name="category", data_type=weaviate.classes.config.DataType.TEXT),
    ]
)

collection = client.collections.get("Documents")

# Insert without providing embeddings (Weaviate generates them)
collection.data.insert({"content": "Your document text here", "category": "technical"})

client.close()

Milvus has the most configuration surface area. You define schemas, choose index types (HNSW, IVF, DiskANN, FLAT, GPU variants), set quantization parameters, and configure consistency levels. More flexibility, more decisions.

python

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="http://localhost:19530")

schema = MilvusClient.create_schema(auto_id=False, enable_dynamic_field=True)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=1536)
schema.add_field(field_name="content", datatype=DataType.VARCHAR, max_length=65535)

index_params = client.prepare_index_params()
index_params.add_index(
    field_name="embedding",
    index_type="HNSW",
    metric_type="COSINE",
    params={"M": 16, "efConstruction": 256}
)

client.create_collection(
    collection_name="documents",
    schema=schema,
    index_params=index_params
)

Qdrant sits between Pinecone and Milvus on complexity. The Python client is clean, and the defaults are sane. You get access to quantization, named vectors, and payload filtering without a steep learning curve.

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=embedding,
            payload={"content": "Your document text here", "user_id": 42}
        )
    ]
)

results = client.query_points(
    collection_name="documents",
    query=query_embedding,
    limit=10,
)

Best for developer experience: Pinecone for fastest setup. Weaviate if built-in vectorization is valuable to your team. Qdrant for a good balance of simplicity and features. Milvus for maximum control.

When to Use Pinecone

Pinecone is the right choice in these situations.

You want to build a product, not operate a database. Pinecone removes every infrastructure decision. Your team does not need to think about Kubernetes, Docker, index tuning, or on-call for a database.

Your dataset is under 10 million vectors. At this scale, Pinecone Serverless is competitively priced, the latency is acceptable for most applications, and the zero-ops benefit is real.

You have bursty or unpredictable query volume. Serverless means you pay almost nothing when your application is quiet overnight and scale automatically during traffic spikes. No idle pod costs.

You are building a RAG prototype quickly. Pinecone gets you from zero to working vector search in under thirty minutes. For proof-of-concept work where speed to demo matters, Pinecone wins.

Your team has no DevOps capacity. If running a database in production would require skills your team does not have, Pinecone's managed service removes that requirement entirely.

When to Use Weaviate

Weaviate is the right choice in these situations.

Native hybrid search is a core requirement. Weaviate processes BM25 keyword matching and dense vector similarity in a single query. If your content includes specific terminology, product codes, or proper nouns that semantic search alone handles poorly, Weaviate's hybrid implementation is the most mature available.

You want built-in vectorization without managing a separate embedding pipeline. Weaviate integrates directly with OpenAI, Cohere, Hugging Face, and other embedding providers. You insert text and Weaviate generates the embeddings automatically. This simplifies your application code.

Your application handles multi-modal data. Weaviate can store and search across text, images, and audio in the same vector space. If your retrieval system needs to work across data types, Weaviate handles this natively.

You are building AI agents that need to query data directly. Weaviate v1.37 ships a native MCP Server at /v1/mcp. LLMs and AI agents can query and write to Weaviate without a custom API layer, which is a meaningful simplification for agentic architectures.

You want open source flexibility with a managed option. Unlike Pinecone, Weaviate is open source. You can self-host for free and use Weaviate Cloud as a managed option without changing your application code.

When to Use Milvus

Milvus is the right choice in these situations.

Your dataset exceeds 100 million vectors. Milvus is the only database in this comparison with a distributed architecture mature enough for truly billion-scale workloads. Its horizontal scaling, separate query and data nodes, and automatic sharding handle this scale reliably.

You need GPU-accelerated indexing. Milvus has supported GPU indexing since 2024, delivering up to 10x faster HNSW construction versus CPU-only on equivalent hardware. For high-write workloads where new content is continuously indexed, this is a significant operational benefit.

You need the widest choice of index types. Milvus supports HNSW, IVF (multiple variants), FLAT, DiskANN, SCANN, and GPU-accelerated IVF and PQ variants. If you need to tune the recall, latency, memory, and cost trade-off with fine granularity, Milvus gives you more levers than any other option.

Your team already runs Kubernetes. Milvus requires Kubernetes for distributed mode. Teams with existing K8s infrastructure and platform engineering experience absorb the operational cost well. Teams without it should pick a different option.

You want to collapse your Elasticsearch and vector DB into one system. Milvus 2.6's BM25 full-text search benchmarks at 400 percent higher throughput than Elasticsearch on equivalent hardware. If you currently run both systems and want to simplify your stack, Milvus can replace both.

When to Use Qdrant

Qdrant is the right choice in these situations.

Your queries do heavy filtering. Qdrant's filtered HNSW maintains recall even when filters reduce the dataset to 1 to 2 percent of the total. This is technically the most correct approach for selective filtering and the reason it outperforms the others in filter-heavy production workloads.

You need native hybrid search with full control over retrieval. Qdrant's named vector collections let you store dense and sparse vectors per document and combine them with Reciprocal Rank Fusion. The implementation is more flexible than Pinecone's and comparable to Weaviate's in capability.

Cost matters. Self-hosted Qdrant on a $40 per month VPS handles 2 to 5 million vectors with sub-10ms p99. At 10 million vectors on self-hosted hardware, you are paying $40 to $60 per month for a workload that costs $400 to $800 on Pinecone pod-based equivalents.

You need low latency as a hard requirement. Qdrant consistently delivers 4ms p50 at 1 million vectors, the lowest of any purpose-built vector database in 2026 benchmarks. If your SLA requires single-digit millisecond retrieval, self-hosted Qdrant is the right starting point.

You need enterprise features with open-source flexibility. Qdrant Cloud added GPU-accelerated indexing, Multi-AZ clusters with 99.95 percent uptime SLAs, and audit logging for compliance in April 2026. You get enterprise-grade managed features without proprietary lock-in.

You want to avoid vendor lock-in. Qdrant is Apache 2.0 licensed. If pricing changes or you need to move infrastructure, you can migrate a self-hosted Qdrant deployment to any cloud or on-premise environment.

My Recommendation

For most teams building RAG applications in 2026, I would start with Qdrant.

The filtering quality is the strongest argument. In production RAG pipelines, almost every query has a filter, whether by user, date, tenant, or category. Qdrant's filtered HNSW keeps recall high regardless of filter selectivity. Combined with the lowest p50 latency, native sparse vector support, and the most cost-efficient self-hosted path, it is the most capable all-around option for teams that can run a single Docker container.

The one scenario where I would pick Pinecone is a small team in the early stages of building a product where database operations would genuinely pull engineering attention away from the product itself. Pinecone Serverless gets you live in under an hour, and the cost at small scale is reasonable.

The one scenario where I would pick Weaviate is a team whose retrieval quality depends heavily on hybrid search across text with specific terminology, or a team that wants to skip the embedding pipeline entirely by using Weaviate's built-in vectorization modules.

The one scenario where I would pick Milvus is a team with datasets above 100 million vectors, existing Kubernetes expertise, and a need for the flexibility of GPU-accelerated indexing or multiple index types.

If you are not sure yet, the decision framework in how to choose a vector database walks through the full set of criteria including pgvector and Chroma alongside these four.

The Short Version

If you want to skip ahead, here is the summary.

Use Pinecone if you want a fully managed service with zero infrastructure to operate and your dataset is under 10 million vectors.

Use Qdrant if you need the best filtering, native hybrid search, low latency, or lower cost at scale. It is the right default for most teams building RAG pipelines in 2026.

Use Weaviate if your application needs built-in vectorization, multi-modal search, or the most mature BM25 plus dense vector hybrid search implementation available.

Use Milvus if your dataset exceeds 100 million vectors, you need GPU-accelerated indexing, or you have Kubernetes expertise and want maximum control over indexing strategy.

For everything else, keep reading.

What Each One Is

What is Pinecone

Pinecone is a fully managed, closed-source vector database. You cannot run it on your own servers. You get an API key and use their cloud service.

Plans in 2026 run from a free Starter tier to Builder ($20 per month flat), Standard (usage-based, $50 per month minimum), and Enterprise.

What is Weaviate

Weaviate is an open-source vector database written in Go, released under Apache 2.0. You can self-host it or use Weaviate Cloud as a managed service.

What us Milvus

Milvus is an open-source vector database written in Go and C++, released under Apache 2.0 and hosted by the LF AI and Data Foundation. The managed version is Zilliz Cloud.

What is Qdrant

Qdrant is an open-source vector database written in Rust, released under Apache 2.0. The managed version is Qdrant Cloud.

Architecture and Deployment

How each database handles deployment defines most of the trade-offs that follow.

Deployment complexity from simplest to hardest: Pinecone (no ops) and Qdrant (single binary) and Weaviate (Docker Compose) and Milvus (Kubernetes).

Performance and Latency

Benchmarks from Salt Technologies AI's Vector Database Performance Benchmark 2026 covering 1 million vectors at 1536 dimensions:

Database	p50 Latency	p99 Latency	Notes
Qdrant (self-hosted)	4ms	8 to 12ms	Lowest latency of all purpose-built vector DBs
Milvus (with GPU)	6ms	12 to 18ms	GPU-accelerated; CPU-only is slower
Pinecone Serverless	20 to 30ms	40 to 80ms (cold)	Warm queries 10 to 15ms
Weaviate Cloud	50 to 70ms	100 to 150ms	Managed; self-hosted is faster

A few things to understand from these numbers.

Best for raw performance: Qdrant (self-hosted) or Milvus with GPU. For zero-ops performance, Pinecone Serverless warm queries are competitive at moderate scale.

Filtering

python

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

client = QdrantClient(url="http://localhost:6333")

results = client.query_points(
    collection_name="documents",
    query=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(key="user_id", match=MatchValue(value=42)),
            FieldCondition(key="category", match=MatchValue(value="technical")),
            FieldCondition(
                key="created_at",
                range=Range(gte=1700000000)
            ),
        ]
    ),
    limit=10,
)

Best for filtering: Qdrant and Weaviate for correctness under selective filters. Milvus for filtering at billion-scale. Pinecone for simple filters with moderate selectivity.

Hybrid Search

python

import weaviate
from weaviate.classes.query import HybridFusion

client = weaviate.connect_to_weaviate_cloud(
    cluster_url="YOUR_CLUSTER_URL",
    auth_credentials=weaviate.auth.AuthApiKey("YOUR_API_KEY"),
)

collection = client.collections.get("Documents")

results = collection.query.hybrid(
    query="vector database filtering performance",
    vector=query_embedding,
    fusion_type=HybridFusion.RELATIVE_SCORE,
    limit=10,
)

client.close()

python

from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, Prefetch, FusionQuery, Fusion

client = QdrantClient(url="http://localhost:6333")

results = client.query_points(
    collection_name="documents",
    prefetch=[
        Prefetch(query=dense_vector, using="dense", limit=20),
        Prefetch(
            query=SparseVector(indices=sparse_indices, values=sparse_values),
            using="sparse",
            limit=20,
        ),
    ],
    query=FusionQuery(fusion=Fusion.RRF),
    limit=10,
)

Best for hybrid search: Weaviate for the most integrated and mature implementation. Qdrant for flexibility and control. Milvus for teams who also need keyword search throughput at scale.

Cost Comparison

Cost is where teams get surprised the most. The number on the pricing page is rarely the number you pay in production.

Pinecone

Pinecone Serverless bills on three dimensions: read units, write units, and storage.

Storage: approximately $0.033 per GB per month
Standard plan read units: approximately $16 per million read units
Write units: approximately $2 per million write units

At 1536 dimensions, 1 million vectors occupies roughly 6GB. Each query consumes approximately 6 to 20 read units depending on dimensions and filter complexity.

Scale	Vectors	Queries per month	Estimated monthly cost
Prototype	100K	10K	Under $5
Small production	500K	100K	$15 to $40
Medium production	2M	500K	$80 to $200
Large production	10M	2M	$400 to $800
Enterprise	50M+	10M+	$2,000 plus

The Builder plan at $20 per month flat is useful for individual developers who want higher limits than the free tier without committing to usage-based pricing.

Weaviate

Weaviate Cloud bills on vector dimensions, storage, and backups.

The formula: monthly cost = (vectors x dimensions x replication factor x $0.01668) / 1,000,000 + (storage GB x $0.255) + backup costs

Managed Cloud minimum costs: Flex at $45 per month, Premium at $400 per month.

At 10 million vectors (with Binary Quantization enabled), Weaviate Cloud runs approximately $135 per month. Without Binary Quantization, the same dataset costs significantly more.

Milvus

Milvus itself is free and open source. You pay only for the infrastructure you run it on.

For self-hosted Milvus Standalone (single node, suitable for up to roughly 10 million vectors):

A VPS with 16GB RAM and 8 vCPU on DigitalOcean or Hetzner: $60 to $100 per month

For Milvus Distributed on Kubernetes (necessary for billion-scale):

Minimum viable production cluster: $400 to $800 per month in cloud compute

Zilliz Cloud, the managed Milvus service, starts at roughly $65 per month for small deployments and scales with storage and compute.

Qdrant

Qdrant Cloud pricing starts at $0.014 per hour per node (approximately $10 per month for the smallest node), with a free tier providing 1GB storage.

For self-hosted Qdrant on a VPS:

Instance	RAM	Vectors at 1536d	Monthly cost
Small VPS (Hetzner CX22)	4GB	~500K	$5 to $10
Medium VPS (Hetzner CX32)	8GB	~1M	$15 to $25
Large VPS (DigitalOcean 8vCPU/16GB)	16GB	~2M	$40 to $60
Dedicated server	64GB	~8M	$80 to $150

Cost summary

Database	10M vectors managed (est.)	10M vectors self-hosted	Best cost scenario
Pinecone Serverless	$70 to $100/month	Not available	Low query volume, small scale
Weaviate Cloud	$135/month (BQ enabled)	$80 to $120/month (Hetzner)	Hybrid search included, no extra cost
Milvus (Zilliz)	$65 to $120/month	$60 to $100/month	Billion-scale with quantization
Qdrant Cloud	$65/month	$20 to $40/month	Best self-hosted economics

Best for cost: Self-hosted Qdrant for the lowest total cost at any scale. Managed Pinecone Serverless for small workloads where operational cost outweighs infrastructure savings.

Developer Experience

All four have Python SDKs. The differences are in configuration depth and integration ecosystem.

Pinecone is the fastest to get started. Create an account, generate an API key, create an index, and push vectors. No Docker, no Kubernetes, no configuration decisions.

python

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("my-index")

index.upsert(vectors=[
    {"id": "doc-1", "values": embedding, "metadata": {"text": "..."}}
])

results = index.query(vector=query_embedding, top_k=10, include_metadata=True)

python

import weaviate
from weaviate.classes.config import Configure

client = weaviate.connect_to_weaviate_cloud(
    cluster_url="YOUR_CLUSTER_URL",
    auth_credentials=weaviate.auth.AuthApiKey("YOUR_API_KEY"),
)

# Create a collection with built-in vectorization
client.collections.create(
    name="Documents",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    properties=[
        weaviate.classes.config.Property(name="content", data_type=weaviate.classes.config.DataType.TEXT),
        weaviate.classes.config.Property(name="category", data_type=weaviate.classes.config.DataType.TEXT),
    ]
)

collection = client.collections.get("Documents")

# Insert without providing embeddings (Weaviate generates them)
collection.data.insert({"content": "Your document text here", "category": "technical"})

client.close()

python

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="http://localhost:19530")

schema = MilvusClient.create_schema(auto_id=False, enable_dynamic_field=True)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=1536)
schema.add_field(field_name="content", datatype=DataType.VARCHAR, max_length=65535)

index_params = client.prepare_index_params()
index_params.add_index(
    field_name="embedding",
    index_type="HNSW",
    metric_type="COSINE",
    params={"M": 16, "efConstruction": 256}
)

client.create_collection(
    collection_name="documents",
    schema=schema,
    index_params=index_params
)

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=embedding,
            payload={"content": "Your document text here", "user_id": 42}
        )
    ]
)

results = client.query_points(
    collection_name="documents",
    query=query_embedding,
    limit=10,
)

When to Use Pinecone

Pinecone is the right choice in these situations.

Your dataset is under 10 million vectors. At this scale, Pinecone Serverless is competitively priced, the latency is acceptable for most applications, and the zero-ops benefit is real.

You have bursty or unpredictable query volume. Serverless means you pay almost nothing when your application is quiet overnight and scale automatically during traffic spikes. No idle pod costs.

You are building a RAG prototype quickly. Pinecone gets you from zero to working vector search in under thirty minutes. For proof-of-concept work where speed to demo matters, Pinecone wins.

Your team has no DevOps capacity. If running a database in production would require skills your team does not have, Pinecone's managed service removes that requirement entirely.

When to Use Weaviate

Weaviate is the right choice in these situations.

When to Use Milvus

Milvus is the right choice in these situations.

When to Use Qdrant

Qdrant is the right choice in these situations.

My Recommendation

For most teams building RAG applications in 2026, I would start with Qdrant.

If you are not sure yet, the decision framework in how to choose a vector database walks through the full set of criteria including pgvector and Chroma alongside these four.

The Short Version

What Each One Is

What is Pinecone

What is Weaviate

What us Milvus

What is Qdrant

Architecture and Deployment

Performance and Latency

Filtering

Hybrid Search

Cost Comparison

Pinecone

Weaviate

Milvus

Qdrant

Cost summary

Developer Experience

When to Use Pinecone

When to Use Weaviate

When to Use Milvus

When to Use Qdrant

My Recommendation

Related Reading

Krunal Kanojiya

Related Posts

The Short Version

What Each One Is

What is Pinecone

What is Weaviate

What us Milvus

What is Qdrant

Architecture and Deployment

Performance and Latency

Filtering

Hybrid Search

Cost Comparison

Pinecone

Weaviate

Milvus

Qdrant

Cost summary

Developer Experience

When to Use Pinecone

When to Use Weaviate

When to Use Milvus

When to Use Qdrant

My Recommendation

Related Reading

Krunal Kanojiya

Related Posts