Qdrant Tutorial: Getting Started with Vector Search in Python (2026)
A complete Qdrant tutorial for 2026. Covers Docker setup, creating collections, inserting vectors with Python, similarity search, metadata filtering, hybrid search, and payload indexing. Includes working code for every step.
When I started working with vector search, Pinecone was the default recommendation. The API was simple and the docs were good. But the moment my dataset grew past a few hundred thousand vectors, the bill started hurting.
Qdrant was the answer. Open source, written in Rust, free to self-host, and genuinely fast. I have been running it in production on RAG applications since early 2025 and have not found a meaningful reason to switch.
This guide takes you from zero to a working Qdrant setup. By the end you will have Qdrant running locally, a collection created, vectors inserted with Python, and a similarity search query running with metadata filters.
What Qdrant Is
Qdrant is a vector database built specifically for similarity search. You store embeddings (high-dimensional vectors) in it and query them to find the most semantically similar results.
It is written in Rust, which gives it very low memory overhead and fast query execution. It runs as a standalone service with an HTTP and gRPC API. The Python client wraps both.
Qdrant supports:
- HNSW indexing for fast approximate nearest neighbor search
- Payload filtering (filter by metadata at query time, efficiently)
- Sparse vectors for hybrid search
- Scalar and binary quantization to reduce memory usage
- Collections, named vector spaces within a collection, and sharding for large-scale setups
If you are deciding between Qdrant and other vector databases, see pgvector vs Pinecone and Pinecone vs Qdrant for full comparisons.
Step 1: Run Qdrant with Docker
The fastest way to get Qdrant running locally is Docker. One command and it is up.
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrantPort 6333 is the HTTP API. Port 6334 is gRPC. The -v flag mounts a local directory so your data persists when the container restarts.
Once it is running, open http://localhost:6333/dashboard in your browser. You will see the Qdrant web UI where you can inspect collections and run queries visually.
To verify the API is alive:
curl http://localhost:6333/
# returns: {"title":"qdrant - vector search engine","version":"..."}Alternative: download the binary
If you prefer not to use Docker, download a pre-built binary from the Qdrant releases page. Extract it and run:
./qdrantSame API, same ports, no Docker required.
Alternative: Qdrant Cloud
If you want a managed service instead of running it yourself, sign up at cloud.qdrant.io. The free tier gives you 1GB of storage, which is enough for roughly 500K vectors at 1536 dimensions. You get a hosted URL and an API key instead of running Docker.
Step 2: Install the Python Client
pip install qdrant-clientFor embeddings in this tutorial, also install the OpenAI SDK:
pip install openaiIf you prefer a free local embedding model, install sentence-transformers instead:
pip install sentence-transformersBoth work. I will show both options.
Step 3: Connect to Qdrant
from qdrant_client import QdrantClient
# Connect to local Docker instance
client = QdrantClient(host="localhost", port=6333)
# Or connect to Qdrant Cloud
# client = QdrantClient(
# url="https://your-cluster-url.qdrant.io",
# api_key="your-api-key"
# )
# Verify the connection
info = client.get_collections()
print(info)The client automatically uses HTTP. For production, you can switch to gRPC by passing prefer_grpc=True which reduces latency on high-QPS workloads.
Step 4: Create a Collection
A collection is where your vectors live. Every vector in a collection must have the same dimension. You also choose the distance metric here.
from qdrant_client.models import Distance, VectorParams
client.create_collection(
collection_name="articles",
vectors_config=VectorParams(
size=1536, # must match your embedding model output
distance=Distance.COSINE
)
)Choosing the right distance metric:
| Metric | Use when |
|---|---|
COSINE | OpenAI embeddings, most sentence transformers, when direction matters more than magnitude |
DOT | When vectors are normalized (same as cosine but faster) |
EUCLID | Pixel-level similarity, some image embeddings |
For OpenAI text-embedding-3-small or text-embedding-3-large, use COSINE. For Cohere and most sentence transformer models, also use COSINE.
Verify the collection was created:
collection_info = client.get_collection("articles")
print(collection_info)Step 5: Generate Embeddings
Before inserting anything, you need vectors. Here is how to generate them with both OpenAI and a free local model.
Option A: OpenAI embeddings
from openai import OpenAI
openai_client = OpenAI(api_key="your-openai-api-key")
def get_embedding(text: str) -> list[float]:
response = openai_client.embeddings.create(
model="text-embedding-3-small", # 1536 dimensions
input=text
)
return response.data[0].embedding
# Test it
vector = get_embedding("Qdrant is a vector database written in Rust")
print(f"Dimension: {len(vector)}") # 1536Option B: Free local model (no API key needed)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") # 384 dimensions
def get_embedding(text: str) -> list[float]:
return model.encode(text).tolist()
# If using this model, create the collection with size=384
# client.create_collection(
# collection_name="articles",
# vectors_config=VectorParams(size=384, distance=Distance.COSINE)
# )Step 6: Insert Vectors (Upsert)
In Qdrant, the operation to insert or update vectors is called upsert. Each point (Qdrant's name for a single vector entry) has an ID, a vector, and an optional payload.
The payload is arbitrary JSON. You attach it to the vector and Qdrant lets you filter on it at query time without any extra storage cost.
from qdrant_client.models import PointStruct
# Sample documents
documents = [
{
"id": 1,
"text": "Qdrant is a vector database written in Rust for high-performance similarity search.",
"category": "vector-search",
"author": "krunal",
"published_year": 2026
},
{
"id": 2,
"text": "RAG combines retrieval from a vector database with language model generation.",
"category": "rag",
"author": "krunal",
"published_year": 2026
},
{
"id": 3,
"text": "HNSW is the graph-based algorithm that powers fast approximate nearest neighbor search.",
"category": "vector-search",
"author": "alice",
"published_year": 2025
},
{
"id": 4,
"text": "Pinecone is a fully managed vector database with a serverless option.",
"category": "vector-search",
"author": "alice",
"published_year": 2025
},
{
"id": 5,
"text": "pgvector adds vector similarity search as an extension to PostgreSQL.",
"category": "vector-search",
"author": "krunal",
"published_year": 2026
},
]
# Generate embeddings and build points
points = []
for doc in documents:
embedding = get_embedding(doc["text"])
points.append(
PointStruct(
id=doc["id"],
vector=embedding,
payload={
"text": doc["text"],
"category": doc["category"],
"author": doc["author"],
"published_year": doc["published_year"]
}
)
)
# Upsert all points at once
client.upsert(
collection_name="articles",
points=points
)
print(f"Inserted {len(points)} points")Batch upsert for large datasets
For large datasets, upsert in batches to avoid memory issues and timeouts:
def batch_upsert(client, collection_name, documents, batch_size=100):
total = len(documents)
for i in range(0, total, batch_size):
batch = documents[i:i + batch_size]
points = []
for doc in batch:
embedding = get_embedding(doc["text"])
points.append(
PointStruct(
id=doc["id"],
vector=embedding,
payload=doc
)
)
client.upsert(collection_name=collection_name, points=points)
print(f"Upserted {min(i + batch_size, total)}/{total}")Step 7: Run a Similarity Search
Now query the collection. You embed a question and Qdrant returns the most semantically similar points.
# Embed the query
query_text = "how does vector search work?"
query_vector = get_embedding(query_text)
# Search
results = client.search(
collection_name="articles",
query_vector=query_vector,
limit=3, # return top 3 results
with_payload=True # include the JSON payload in results
)
# Print results
for result in results:
print(f"Score: {result.score:.4f}")
print(f"Text: {result.payload['text']}")
print(f"Category: {result.payload['category']}")
print()The score is the cosine similarity between your query vector and the stored vector. Higher is more similar. Scores range from 0 to 1 with cosine distance.
Step 8: Filter by Payload
This is where Qdrant genuinely stands out. You can filter results by any payload field at query time, and Qdrant applies the filter efficiently alongside the vector search rather than as a post-processing step.
Basic filter
from qdrant_client.models import Filter, FieldCondition, MatchValue
results = client.search(
collection_name="articles",
query_vector=query_vector,
query_filter=Filter(
must=[
FieldCondition(
key="category",
match=MatchValue(value="vector-search")
)
]
),
limit=3,
with_payload=True
)
for result in results:
print(f"{result.score:.4f} | {result.payload['text'][:60]}")Multiple conditions (AND logic)
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
results = client.search(
collection_name="articles",
query_vector=query_vector,
query_filter=Filter(
must=[
FieldCondition(
key="category",
match=MatchValue(value="vector-search")
),
FieldCondition(
key="author",
match=MatchValue(value="krunal")
),
FieldCondition(
key="published_year",
range=Range(gte=2026) # published 2026 or later
)
]
),
limit=5,
with_payload=True
)OR logic
from qdrant_client.models import Filter, FieldCondition, MatchAny
results = client.search(
collection_name="articles",
query_vector=query_vector,
query_filter=Filter(
must=[
FieldCondition(
key="category",
match=MatchAny(any=["vector-search", "rag"]) # either category
)
]
),
limit=5,
with_payload=True
)NOT logic (exclude)
results = client.search(
collection_name="articles",
query_vector=query_vector,
query_filter=Filter(
must_not=[
FieldCondition(
key="author",
match=MatchValue(value="alice")
)
]
),
limit=5,
with_payload=True
)Step 9: Add Payload Indexes for Fast Filtering
By default, Qdrant filters by scanning all candidate payload values. For large collections, add payload indexes to the fields you filter on most often. This dramatically speeds up filtered searches.
from qdrant_client.models import PayloadSchemaType
# Index the category field (keyword type for exact match)
client.create_payload_index(
collection_name="articles",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD
)
# Index the published_year field (integer type for range queries)
client.create_payload_index(
collection_name="articles",
field_name="published_year",
field_schema=PayloadSchemaType.INTEGER
)As a rule: if you filter on a field in more than 10% of your queries, index it.
Step 10: Hybrid Search (Dense + Sparse)
Hybrid search combines semantic similarity (dense vectors) with keyword matching (sparse vectors). It is better than either alone for queries with specific terms, product codes, or proper nouns.
Qdrant has native sparse vector support. You store a sparse vector alongside each dense vector and query both at once.
Set up a collection with named vectors
from qdrant_client.models import VectorParams, SparseVectorParams, Distance
client.create_collection(
collection_name="articles_hybrid",
vectors_config={
"dense": VectorParams(size=1536, distance=Distance.COSINE)
},
sparse_vectors_config={
"sparse": SparseVectorParams()
}
)Generate sparse vectors with BM25
from qdrant_client.models import PointStruct, SparseVector
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
# Build a TF-IDF encoder on your corpus
corpus = [doc["text"] for doc in documents]
tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(corpus)
def get_sparse_vector(text: str) -> SparseVector:
"""Convert text to a sparse vector using TF-IDF weights."""
vec = tfidf.transform([text])
cx = vec.tocoo()
indices = cx.col.tolist()
values = cx.data.tolist()
return SparseVector(indices=indices, values=values)
# Upsert points with both dense and sparse vectors
points = []
for i, doc in enumerate(documents):
dense_vec = get_embedding(doc["text"])
sparse_vec = get_sparse_vector(doc["text"])
points.append(
PointStruct(
id=doc["id"],
vector={
"dense": dense_vec,
"sparse": sparse_vec
},
payload={
"text": doc["text"],
"category": doc["category"]
}
)
)
client.upsert(collection_name="articles_hybrid", points=points)Hybrid query
from qdrant_client.models import Prefetch, FusionQuery, Fusion
query_text = "fast approximate search algorithm"
dense_query = get_embedding(query_text)
sparse_query = get_sparse_vector(query_text)
# Query using reciprocal rank fusion to merge dense and sparse results
results = client.query_points(
collection_name="articles_hybrid",
prefetch=[
Prefetch(
query=dense_query,
using="dense",
limit=20
),
Prefetch(
query=sparse_query,
using="sparse",
limit=20
)
],
query=FusionQuery(fusion=Fusion.RRF), # reciprocal rank fusion
limit=5,
with_payload=True
)
for result in results.points:
print(f"{result.score:.4f} | {result.payload['text'][:60]}")Reciprocal rank fusion (RRF) combines the rankings from both searches without needing to tune a weight parameter. It is the recommended default for hybrid search.
Step 11: Check Collection Stats
info = client.get_collection("articles")
print(f"Vectors count: {info.vectors_count}")
print(f"Points count: {info.points_count}")
print(f"Status: {info.status}")
print(f"Optimizer: {info.optimizer_status}")The optimizer runs in the background to merge segments and rebuild indexes. The status will show ok when indexing is complete.
Scroll Through All Points
To retrieve all points in a collection (not just search results), use scroll. This is useful for bulk exports or debugging.
from qdrant_client.models import ScrollRequest
points, next_offset = client.scroll(
collection_name="articles",
limit=100,
with_payload=True,
with_vectors=False # set True if you need the vectors back
)
for point in points:
print(f"ID: {point.id} | {point.payload['text'][:50]}")Delete Points
from qdrant_client.models import PointIdsList
# Delete by specific IDs
client.delete(
collection_name="articles",
points_selector=PointIdsList(points=[1, 2])
)
# Delete by filter (all points where category = "rag")
from qdrant_client.models import Filter, FieldCondition, MatchValue, FilterSelector
client.delete(
collection_name="articles",
points_selector=FilterSelector(
filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="rag"))
]
)
)
)Complete RAG Example
Here is a complete end-to-end example that ties everything together: index a set of documents and answer questions over them.
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue
from openai import OpenAI
import uuid
# Clients
qdrant = QdrantClient(host="localhost", port=6333)
openai_client = OpenAI(api_key="your-openai-api-key")
COLLECTION = "knowledge_base"
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL = "gpt-4o-mini"
# Create collection
qdrant.recreate_collection(
collection_name=COLLECTION,
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
def embed(text: str) -> list[float]:
return openai_client.embeddings.create(
model=EMBED_MODEL, input=text
).data[0].embedding
# Index documents
docs = [
{"text": "Qdrant stores vectors and lets you search them by similarity.", "source": "qdrant-docs"},
{"text": "RAG grounds LLM responses in retrieved documents to reduce hallucination.", "source": "rag-guide"},
{"text": "HNSW is a graph-based index used for approximate nearest neighbor search.", "source": "hnsw-paper"},
{"text": "Cosine similarity measures the angle between two vectors, not their length.", "source": "math-guide"},
]
points = [
PointStruct(
id=str(uuid.uuid4()),
vector=embed(doc["text"]),
payload=doc
)
for doc in docs
]
qdrant.upsert(collection_name=COLLECTION, points=points)
def answer_question(question: str, source_filter: str = None) -> str:
query_vec = embed(question)
# Build optional filter
search_filter = None
if source_filter:
search_filter = Filter(
must=[FieldCondition(key="source", match=MatchValue(value=source_filter))]
)
# Retrieve top 3 relevant chunks
results = qdrant.search(
collection_name=COLLECTION,
query_vector=query_vec,
query_filter=search_filter,
limit=3,
with_payload=True
)
context = "\n\n".join(r.payload["text"] for r in results)
# Generate answer with retrieved context
response = openai_client.chat.completions.create(
model=CHAT_MODEL,
messages=[
{
"role": "system",
"content": "Answer the question using only the provided context. If the answer is not in the context, say so."
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}
]
)
return response.choices[0].message.content
# Test it
print(answer_question("What is HNSW?"))
print(answer_question("How does RAG reduce hallucination?"))Self-hosted vs Qdrant Cloud
| Consideration | Self-hosted | Qdrant Cloud |
|---|---|---|
| Cost | VPS only ($20 to $100/month) | Free tier, then $25+/month |
| Setup time | 10 minutes with Docker | 5 minutes, no Docker needed |
| Data control | Full control, stays on your infra | Hosted by Qdrant |
| Maintenance | You manage upgrades and backups | Qdrant manages everything |
| Scale | You provision the hardware | Auto-scales (paid tiers) |
| Best for | Teams with DevOps capacity, cost-sensitive workloads | Teams who want zero ops |
For prototypes and side projects, Qdrant Cloud free tier is the easiest start. For production workloads where cost matters, self-hosting on a $40/month VPS typically handles millions of vectors at a fraction of managed service pricing.
Common Mistakes to Avoid
Wrong dimension size. If you create a collection with size=1536 and then try to insert a 384-dimension vector, Qdrant will reject it with an error. Always match the collection dimension to your embedding model output.
No payload index on filtered fields. Filtering without a payload index works but scans every point's payload. On collections with millions of points, this slows down filtered searches significantly. Run create_payload_index on any field you filter on regularly.
Upserting without IDs. Qdrant requires each point to have an ID. Use either sequential integers or UUID strings. If you upsert a point with an ID that already exists, Qdrant overwrites the existing point. This is the correct behavior for updates.
Forgetting with_payload=True. By default, search results do not include the payload. You get the ID and score but not the text or metadata. Always pass with_payload=True unless you only need the IDs.
Running searches before the optimizer finishes. After a large upsert, Qdrant optimizes its internal segments in the background. Searches work immediately, but performance improves once optimization finishes. Check collection.optimizer_status to see when it is done.
Summary
Qdrant is one of the most capable vector databases available in 2026. Getting it running takes under ten minutes with Docker. The Python client is well-designed and covers every operation you need.
The key concepts to internalize: collections hold vectors of one dimension, payloads are the metadata attached to each point, filters let you scope searches by payload at query time, and payload indexes make filtered searches fast at scale.
From here, the natural next steps are adding quantization to reduce memory usage on large collections, setting up replication for high availability, and tuning HNSW parameters for your specific recall and latency requirements.
Related Reading
Follow on Google
Add as a preferred source in Search & Discover
Add as preferred sourceKrunal Kanojiya
Technical Content Writer
I am a technical content writer and former software developer from India. I write clear, in-depth articles on blockchain, AI and machine learning, data engineering, web development, and developer careers. I work at Lucent Innovation now. Before that I wrote about blockchain at Cromtek Solution and did freelance work.