Vector Search & Databases·14 min read·2,617 words

Qdrant Tutorial: Getting Started with Vector Search in Python (2026)

A complete Qdrant tutorial for 2026. Covers Docker setup, creating collections, inserting vectors with Python, similarity search, metadata filtering, hybrid search, and payload indexing. Includes working code for every step.

Krunal Kanojiya

June 07, 2026

#qdrant#vector-database#vector-search#tutorial#python#RAG#embeddings#similarity-search#HNSW#getting-started

Qdrant Tutorial: Getting Started with Vector Search in Python (2026)

When I started working with vector search, Pinecone was the default recommendation. The API was simple and the docs were good. But the moment my dataset grew past a few hundred thousand vectors, the bill started hurting.

Qdrant was the answer. Open source, written in Rust, free to self-host, and genuinely fast. I have been running it in production on RAG applications since early 2025 and have not found a meaningful reason to switch.

This guide takes you from zero to a working Qdrant setup. By the end you will have Qdrant running locally, a collection created, vectors inserted with Python, and a similarity search query running with metadata filters.

What Qdrant Is

Qdrant is a vector database built specifically for similarity search. You store embeddings (high-dimensional vectors) in it and query them to find the most semantically similar results.

It is written in Rust, which gives it very low memory overhead and fast query execution. It runs as a standalone service with an HTTP and gRPC API. The Python client wraps both.

Qdrant supports:

HNSW indexing for fast approximate nearest neighbor search
Payload filtering (filter by metadata at query time, efficiently)
Sparse vectors for hybrid search
Scalar and binary quantization to reduce memory usage
Collections, named vector spaces within a collection, and sharding for large-scale setups

If you are deciding between Qdrant and other vector databases, see pgvector vs Pinecone and Pinecone vs Qdrant for full comparisons.

Step 1: Run Qdrant with Docker

The fastest way to get Qdrant running locally is Docker. One command and it is up.

bash

docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant

Port 6333 is the HTTP API. Port 6334 is gRPC. The -v flag mounts a local directory so your data persists when the container restarts.

Once it is running, open http://localhost:6333/dashboard in your browser. You will see the Qdrant web UI where you can inspect collections and run queries visually.

To verify the API is alive:

bash

curl http://localhost:6333/
# returns: {"title":"qdrant - vector search engine","version":"..."}

Alternative: download the binary

If you prefer not to use Docker, download a pre-built binary from the Qdrant releases page. Extract it and run:

bash

./qdrant

Same API, same ports, no Docker required.

Alternative: Qdrant Cloud

If you want a managed service instead of running it yourself, sign up at cloud.qdrant.io. The free tier gives you 1GB of storage, which is enough for roughly 500K vectors at 1536 dimensions. You get a hosted URL and an API key instead of running Docker.

Step 2: Install the Python Client

bash

pip install qdrant-client

For embeddings in this tutorial, also install the OpenAI SDK:

bash

pip install openai

If you prefer a free local embedding model, install sentence-transformers instead:

bash

pip install sentence-transformers

Both work. I will show both options.

Step 3: Connect to Qdrant

python

from qdrant_client import QdrantClient

# Connect to local Docker instance
client = QdrantClient(host="localhost", port=6333)

# Or connect to Qdrant Cloud
# client = QdrantClient(
#     url="https://your-cluster-url.qdrant.io",
#     api_key="your-api-key"
# )

# Verify the connection
info = client.get_collections()
print(info)

The client automatically uses HTTP. For production, you can switch to gRPC by passing prefer_grpc=True which reduces latency on high-QPS workloads.

Step 4: Create a Collection

A collection is where your vectors live. Every vector in a collection must have the same dimension. You also choose the distance metric here.

python

from qdrant_client.models import Distance, VectorParams

client.create_collection(
    collection_name="articles",
    vectors_config=VectorParams(
        size=1536,          # must match your embedding model output
        distance=Distance.COSINE
    )
)

Choosing the right distance metric:

Metric	Use when
`COSINE`	OpenAI embeddings, most sentence transformers, when direction matters more than magnitude
`DOT`	When vectors are normalized (same as cosine but faster)
`EUCLID`	Pixel-level similarity, some image embeddings

For OpenAI text-embedding-3-small or text-embedding-3-large, use COSINE. For Cohere and most sentence transformer models, also use COSINE.

Verify the collection was created:

python

collection_info = client.get_collection("articles")
print(collection_info)

Step 5: Generate Embeddings

Before inserting anything, you need vectors. Here is how to generate them with both OpenAI and a free local model.

Option A: OpenAI embeddings

python

from openai import OpenAI

openai_client = OpenAI(api_key="your-openai-api-key")

def get_embedding(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",  # 1536 dimensions
        input=text
    )
    return response.data[0].embedding

# Test it
vector = get_embedding("Qdrant is a vector database written in Rust")
print(f"Dimension: {len(vector)}")  # 1536

Option B: Free local model (no API key needed)

python

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")  # 384 dimensions

def get_embedding(text: str) -> list[float]:
    return model.encode(text).tolist()

# If using this model, create the collection with size=384
# client.create_collection(
#     collection_name="articles",
#     vectors_config=VectorParams(size=384, distance=Distance.COSINE)
# )

Step 6: Insert Vectors (Upsert)

In Qdrant, the operation to insert or update vectors is called upsert. Each point (Qdrant's name for a single vector entry) has an ID, a vector, and an optional payload.

The payload is arbitrary JSON. You attach it to the vector and Qdrant lets you filter on it at query time without any extra storage cost.

python

from qdrant_client.models import PointStruct

# Sample documents
documents = [
    {
        "id": 1,
        "text": "Qdrant is a vector database written in Rust for high-performance similarity search.",
        "category": "vector-search",
        "author": "krunal",
        "published_year": 2026
    },
    {
        "id": 2,
        "text": "RAG combines retrieval from a vector database with language model generation.",
        "category": "rag",
        "author": "krunal",
        "published_year": 2026
    },
    {
        "id": 3,
        "text": "HNSW is the graph-based algorithm that powers fast approximate nearest neighbor search.",
        "category": "vector-search",
        "author": "alice",
        "published_year": 2025
    },
    {
        "id": 4,
        "text": "Pinecone is a fully managed vector database with a serverless option.",
        "category": "vector-search",
        "author": "alice",
        "published_year": 2025
    },
    {
        "id": 5,
        "text": "pgvector adds vector similarity search as an extension to PostgreSQL.",
        "category": "vector-search",
        "author": "krunal",
        "published_year": 2026
    },
]

# Generate embeddings and build points
points = []
for doc in documents:
    embedding = get_embedding(doc["text"])
    points.append(
        PointStruct(
            id=doc["id"],
            vector=embedding,
            payload={
                "text": doc["text"],
                "category": doc["category"],
                "author": doc["author"],
                "published_year": doc["published_year"]
            }
        )
    )

# Upsert all points at once
client.upsert(
    collection_name="articles",
    points=points
)

print(f"Inserted {len(points)} points")

Batch upsert for large datasets

For large datasets, upsert in batches to avoid memory issues and timeouts:

python

def batch_upsert(client, collection_name, documents, batch_size=100):
    total = len(documents)
    for i in range(0, total, batch_size):
        batch = documents[i:i + batch_size]
        points = []
        for doc in batch:
            embedding = get_embedding(doc["text"])
            points.append(
                PointStruct(
                    id=doc["id"],
                    vector=embedding,
                    payload=doc
                )
            )
        client.upsert(collection_name=collection_name, points=points)
        print(f"Upserted {min(i + batch_size, total)}/{total}")

Step 7: Run a Similarity Search

Now query the collection. You embed a question and Qdrant returns the most semantically similar points.

python

# Embed the query
query_text = "how does vector search work?"
query_vector = get_embedding(query_text)

# Search
results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    limit=3,            # return top 3 results
    with_payload=True   # include the JSON payload in results
)

# Print results
for result in results:
    print(f"Score: {result.score:.4f}")
    print(f"Text:  {result.payload['text']}")
    print(f"Category: {result.payload['category']}")
    print()

The score is the cosine similarity between your query vector and the stored vector. Higher is more similar. Scores range from 0 to 1 with cosine distance.

Step 8: Filter by Payload

This is where Qdrant genuinely stands out. You can filter results by any payload field at query time, and Qdrant applies the filter efficiently alongside the vector search rather than as a post-processing step.

Basic filter

python

from qdrant_client.models import Filter, FieldCondition, MatchValue

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="vector-search")
            )
        ]
    ),
    limit=3,
    with_payload=True
)

for result in results:
    print(f"{result.score:.4f} | {result.payload['text'][:60]}")

Multiple conditions (AND logic)

python

from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="vector-search")
            ),
            FieldCondition(
                key="author",
                match=MatchValue(value="krunal")
            ),
            FieldCondition(
                key="published_year",
                range=Range(gte=2026)  # published 2026 or later
            )
        ]
    ),
    limit=5,
    with_payload=True
)

OR logic

python

from qdrant_client.models import Filter, FieldCondition, MatchAny

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchAny(any=["vector-search", "rag"])  # either category
            )
        ]
    ),
    limit=5,
    with_payload=True
)

NOT logic (exclude)

python

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must_not=[
            FieldCondition(
                key="author",
                match=MatchValue(value="alice")
            )
        ]
    ),
    limit=5,
    with_payload=True
)

Step 9: Add Payload Indexes for Fast Filtering

By default, Qdrant filters by scanning all candidate payload values. For large collections, add payload indexes to the fields you filter on most often. This dramatically speeds up filtered searches.

python

from qdrant_client.models import PayloadSchemaType

# Index the category field (keyword type for exact match)
client.create_payload_index(
    collection_name="articles",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD
)

# Index the published_year field (integer type for range queries)
client.create_payload_index(
    collection_name="articles",
    field_name="published_year",
    field_schema=PayloadSchemaType.INTEGER
)

As a rule: if you filter on a field in more than 10% of your queries, index it.

Step 10: Hybrid Search (Dense + Sparse)

Hybrid search combines semantic similarity (dense vectors) with keyword matching (sparse vectors). It is better than either alone for queries with specific terms, product codes, or proper nouns.

Qdrant has native sparse vector support. You store a sparse vector alongside each dense vector and query both at once.

Set up a collection with named vectors

python

from qdrant_client.models import VectorParams, SparseVectorParams, Distance

client.create_collection(
    collection_name="articles_hybrid",
    vectors_config={
        "dense": VectorParams(size=1536, distance=Distance.COSINE)
    },
    sparse_vectors_config={
        "sparse": SparseVectorParams()
    }
)

Generate sparse vectors with BM25

python

from qdrant_client.models import PointStruct, SparseVector
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

# Build a TF-IDF encoder on your corpus
corpus = [doc["text"] for doc in documents]
tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(corpus)

def get_sparse_vector(text: str) -> SparseVector:
    """Convert text to a sparse vector using TF-IDF weights."""
    vec = tfidf.transform([text])
    cx = vec.tocoo()
    indices = cx.col.tolist()
    values = cx.data.tolist()
    return SparseVector(indices=indices, values=values)

# Upsert points with both dense and sparse vectors
points = []
for i, doc in enumerate(documents):
    dense_vec = get_embedding(doc["text"])
    sparse_vec = get_sparse_vector(doc["text"])
    points.append(
        PointStruct(
            id=doc["id"],
            vector={
                "dense": dense_vec,
                "sparse": sparse_vec
            },
            payload={
                "text": doc["text"],
                "category": doc["category"]
            }
        )
    )

client.upsert(collection_name="articles_hybrid", points=points)

Hybrid query

python

from qdrant_client.models import Prefetch, FusionQuery, Fusion

query_text = "fast approximate search algorithm"
dense_query = get_embedding(query_text)
sparse_query = get_sparse_vector(query_text)

# Query using reciprocal rank fusion to merge dense and sparse results
results = client.query_points(
    collection_name="articles_hybrid",
    prefetch=[
        Prefetch(
            query=dense_query,
            using="dense",
            limit=20
        ),
        Prefetch(
            query=sparse_query,
            using="sparse",
            limit=20
        )
    ],
    query=FusionQuery(fusion=Fusion.RRF),  # reciprocal rank fusion
    limit=5,
    with_payload=True
)

for result in results.points:
    print(f"{result.score:.4f} | {result.payload['text'][:60]}")

Reciprocal rank fusion (RRF) combines the rankings from both searches without needing to tune a weight parameter. It is the recommended default for hybrid search.

Step 11: Check Collection Stats

python

info = client.get_collection("articles")
print(f"Vectors count: {info.vectors_count}")
print(f"Points count:  {info.points_count}")
print(f"Status:        {info.status}")
print(f"Optimizer:     {info.optimizer_status}")

The optimizer runs in the background to merge segments and rebuild indexes. The status will show ok when indexing is complete.

Scroll Through All Points

To retrieve all points in a collection (not just search results), use scroll. This is useful for bulk exports or debugging.

python

from qdrant_client.models import ScrollRequest

points, next_offset = client.scroll(
    collection_name="articles",
    limit=100,
    with_payload=True,
    with_vectors=False  # set True if you need the vectors back
)

for point in points:
    print(f"ID: {point.id} | {point.payload['text'][:50]}")

Delete Points

python

from qdrant_client.models import PointIdsList

# Delete by specific IDs
client.delete(
    collection_name="articles",
    points_selector=PointIdsList(points=[1, 2])
)

# Delete by filter (all points where category = "rag")
from qdrant_client.models import Filter, FieldCondition, MatchValue, FilterSelector

client.delete(
    collection_name="articles",
    points_selector=FilterSelector(
        filter=Filter(
            must=[
                FieldCondition(key="category", match=MatchValue(value="rag"))
            ]
        )
    )
)

Complete RAG Example

Here is a complete end-to-end example that ties everything together: index a set of documents and answer questions over them.

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue
from openai import OpenAI
import uuid

# Clients
qdrant = QdrantClient(host="localhost", port=6333)
openai_client = OpenAI(api_key="your-openai-api-key")

COLLECTION = "knowledge_base"
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL = "gpt-4o-mini"

# Create collection
qdrant.recreate_collection(
    collection_name=COLLECTION,
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

def embed(text: str) -> list[float]:
    return openai_client.embeddings.create(
        model=EMBED_MODEL, input=text
    ).data[0].embedding

# Index documents
docs = [
    {"text": "Qdrant stores vectors and lets you search them by similarity.", "source": "qdrant-docs"},
    {"text": "RAG grounds LLM responses in retrieved documents to reduce hallucination.", "source": "rag-guide"},
    {"text": "HNSW is a graph-based index used for approximate nearest neighbor search.", "source": "hnsw-paper"},
    {"text": "Cosine similarity measures the angle between two vectors, not their length.", "source": "math-guide"},
]

points = [
    PointStruct(
        id=str(uuid.uuid4()),
        vector=embed(doc["text"]),
        payload=doc
    )
    for doc in docs
]
qdrant.upsert(collection_name=COLLECTION, points=points)

def answer_question(question: str, source_filter: str = None) -> str:
    query_vec = embed(question)

    # Build optional filter
    search_filter = None
    if source_filter:
        search_filter = Filter(
            must=[FieldCondition(key="source", match=MatchValue(value=source_filter))]
        )

    # Retrieve top 3 relevant chunks
    results = qdrant.search(
        collection_name=COLLECTION,
        query_vector=query_vec,
        query_filter=search_filter,
        limit=3,
        with_payload=True
    )

    context = "\n\n".join(r.payload["text"] for r in results)

    # Generate answer with retrieved context
    response = openai_client.chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {
                "role": "system",
                "content": "Answer the question using only the provided context. If the answer is not in the context, say so."
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }
        ]
    )
    return response.choices[0].message.content

# Test it
print(answer_question("What is HNSW?"))
print(answer_question("How does RAG reduce hallucination?"))

Self-hosted vs Qdrant Cloud

Consideration	Self-hosted	Qdrant Cloud
Cost	VPS only ($20 to $100/month)	Free tier, then $25+/month
Setup time	10 minutes with Docker	5 minutes, no Docker needed
Data control	Full control, stays on your infra	Hosted by Qdrant
Maintenance	You manage upgrades and backups	Qdrant manages everything
Scale	You provision the hardware	Auto-scales (paid tiers)
Best for	Teams with DevOps capacity, cost-sensitive workloads	Teams who want zero ops

For prototypes and side projects, Qdrant Cloud free tier is the easiest start. For production workloads where cost matters, self-hosting on a $40/month VPS typically handles millions of vectors at a fraction of managed service pricing.

Common Mistakes to Avoid

Wrong dimension size. If you create a collection with size=1536 and then try to insert a 384-dimension vector, Qdrant will reject it with an error. Always match the collection dimension to your embedding model output.

No payload index on filtered fields. Filtering without a payload index works but scans every point's payload. On collections with millions of points, this slows down filtered searches significantly. Run create_payload_index on any field you filter on regularly.

Upserting without IDs. Qdrant requires each point to have an ID. Use either sequential integers or UUID strings. If you upsert a point with an ID that already exists, Qdrant overwrites the existing point. This is the correct behavior for updates.

Forgetting with_payload=True. By default, search results do not include the payload. You get the ID and score but not the text or metadata. Always pass with_payload=True unless you only need the IDs.

Running searches before the optimizer finishes. After a large upsert, Qdrant optimizes its internal segments in the background. Searches work immediately, but performance improves once optimization finishes. Check collection.optimizer_status to see when it is done.

Summary

Qdrant is one of the most capable vector databases available in 2026. Getting it running takes under ten minutes with Docker. The Python client is well-designed and covers every operation you need.

The key concepts to internalize: collections hold vectors of one dimension, payloads are the metadata attached to each point, filters let you scope searches by payload at query time, and payload indexes make filtered searches fast at scale.

From here, the natural next steps are adding quantization to reduce memory usage on large collections, setting up replication for high availability, and tuning HNSW parameters for your specific recall and latency requirements.

Frequently Asked Questions

What is Qdrant?

Qdrant is an open-source vector database written in Rust. It stores high-dimensional vectors (embeddings) and lets you search them by semantic similarity. You can run it locally with Docker, deploy it on your own servers, or use Qdrant Cloud as a managed service. It is commonly used as the vector store in RAG applications, semantic search systems, and recommendation engines.

Is Qdrant free to use?

Yes. Qdrant is open source under Apache 2.0. You can run it locally or on any server for free with no usage limits. Qdrant Cloud has a free tier with 1GB of storage (enough for roughly 500K 1536-dimension vectors). Paid cloud tiers start when you need more storage or dedicated resources.

How do I install Qdrant locally?

The easiest way is Docker: run `docker run -p 6333:6333 qdrant/qdrant`. The HTTP API is available at localhost:6333 and the dashboard at localhost:6333/dashboard. You can also download a pre-built binary from the Qdrant GitHub releases page.

What Python library does Qdrant use?

The official Python client is qdrant-client. Install it with `pip install qdrant-client`. It supports both synchronous and async usage and works with local Docker instances and Qdrant Cloud.

How many vectors can Qdrant handle?

Qdrant scales to hundreds of millions of vectors on appropriate hardware. Its Rust implementation gives it low memory overhead compared to other vector databases. With quantization enabled (scalar or binary), you can store significantly more vectors in the same RAM. For most teams, Qdrant self-hosted on a single server handles 5 to 50 million vectors comfortably.

Does Qdrant support hybrid search?

Yes. Qdrant has native support for sparse vectors, which lets you store both dense embeddings and sparse BM25 vectors per document. A single query can combine both using the sparse-dense fusion API. This gives you hybrid search without any external keyword search engine.

What is a Qdrant collection?

A collection is Qdrant's equivalent of a table or an index. Each collection holds vectors of a fixed dimension and uses a configured distance metric (cosine, dot product, or Euclidean). A collection also stores payload: arbitrary JSON metadata attached to each vector that you can filter on at query time.

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source

Appears in Google Discover

Krunal Kanojiya

Technical Content Writer

I am a technical writer and former software developer from India. I publish practical tutorials and in-depth guides on AI engineering, data engineering, programming, algorithms, blockchain, and modern software development.

GitHub LinkedIn X

pgvector: The Complete Guide to Vector Search in PostgreSQL (2026)

May 29, 2026 · 11 min read

Pinecone vs Weaviate vs Milvus vs Qdrant: Best Vector Database in 2026?

Jun 27, 2026 · 21 min read

Pinecone vs Qdrant: Which Vector Database Should You Use in 2026?

May 28, 2026 · 10 min read

Vector Search & Databases·14 min read·2,617 words

Qdrant Tutorial: Getting Started with Vector Search in Python (2026)

Krunal Kanojiya

June 07, 2026

#qdrant#vector-database#vector-search#tutorial#python#RAG#embeddings#similarity-search#HNSW#getting-started

What Qdrant Is

Qdrant is a vector database built specifically for similarity search. You store embeddings (high-dimensional vectors) in it and query them to find the most semantically similar results.

It is written in Rust, which gives it very low memory overhead and fast query execution. It runs as a standalone service with an HTTP and gRPC API. The Python client wraps both.

Qdrant supports:

HNSW indexing for fast approximate nearest neighbor search
Payload filtering (filter by metadata at query time, efficiently)
Sparse vectors for hybrid search
Scalar and binary quantization to reduce memory usage
Collections, named vector spaces within a collection, and sharding for large-scale setups

If you are deciding between Qdrant and other vector databases, see pgvector vs Pinecone and Pinecone vs Qdrant for full comparisons.

Step 1: Run Qdrant with Docker

The fastest way to get Qdrant running locally is Docker. One command and it is up.

bash

docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant

Port 6333 is the HTTP API. Port 6334 is gRPC. The -v flag mounts a local directory so your data persists when the container restarts.

Once it is running, open http://localhost:6333/dashboard in your browser. You will see the Qdrant web UI where you can inspect collections and run queries visually.

To verify the API is alive:

bash

curl http://localhost:6333/
# returns: {"title":"qdrant - vector search engine","version":"..."}

Alternative: download the binary

If you prefer not to use Docker, download a pre-built binary from the Qdrant releases page. Extract it and run:

bash

./qdrant

Same API, same ports, no Docker required.

Alternative: Qdrant Cloud

Step 2: Install the Python Client

bash

pip install qdrant-client

For embeddings in this tutorial, also install the OpenAI SDK:

bash

pip install openai

If you prefer a free local embedding model, install sentence-transformers instead:

bash

pip install sentence-transformers

Both work. I will show both options.

Step 3: Connect to Qdrant

python

from qdrant_client import QdrantClient

# Connect to local Docker instance
client = QdrantClient(host="localhost", port=6333)

# Or connect to Qdrant Cloud
# client = QdrantClient(
#     url="https://your-cluster-url.qdrant.io",
#     api_key="your-api-key"
# )

# Verify the connection
info = client.get_collections()
print(info)

The client automatically uses HTTP. For production, you can switch to gRPC by passing prefer_grpc=True which reduces latency on high-QPS workloads.

Step 4: Create a Collection

A collection is where your vectors live. Every vector in a collection must have the same dimension. You also choose the distance metric here.

python

from qdrant_client.models import Distance, VectorParams

client.create_collection(
    collection_name="articles",
    vectors_config=VectorParams(
        size=1536,          # must match your embedding model output
        distance=Distance.COSINE
    )
)

Choosing the right distance metric:

Metric	Use when
`COSINE`	OpenAI embeddings, most sentence transformers, when direction matters more than magnitude
`DOT`	When vectors are normalized (same as cosine but faster)
`EUCLID`	Pixel-level similarity, some image embeddings

For OpenAI text-embedding-3-small or text-embedding-3-large, use COSINE. For Cohere and most sentence transformer models, also use COSINE.

Verify the collection was created:

python

collection_info = client.get_collection("articles")
print(collection_info)

Step 5: Generate Embeddings

Before inserting anything, you need vectors. Here is how to generate them with both OpenAI and a free local model.

Option A: OpenAI embeddings

python

from openai import OpenAI

openai_client = OpenAI(api_key="your-openai-api-key")

def get_embedding(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",  # 1536 dimensions
        input=text
    )
    return response.data[0].embedding

# Test it
vector = get_embedding("Qdrant is a vector database written in Rust")
print(f"Dimension: {len(vector)}")  # 1536

Option B: Free local model (no API key needed)

python

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")  # 384 dimensions

def get_embedding(text: str) -> list[float]:
    return model.encode(text).tolist()

# If using this model, create the collection with size=384
# client.create_collection(
#     collection_name="articles",
#     vectors_config=VectorParams(size=384, distance=Distance.COSINE)
# )

Step 6: Insert Vectors (Upsert)

In Qdrant, the operation to insert or update vectors is called upsert. Each point (Qdrant's name for a single vector entry) has an ID, a vector, and an optional payload.

The payload is arbitrary JSON. You attach it to the vector and Qdrant lets you filter on it at query time without any extra storage cost.

python

from qdrant_client.models import PointStruct

# Sample documents
documents = [
    {
        "id": 1,
        "text": "Qdrant is a vector database written in Rust for high-performance similarity search.",
        "category": "vector-search",
        "author": "krunal",
        "published_year": 2026
    },
    {
        "id": 2,
        "text": "RAG combines retrieval from a vector database with language model generation.",
        "category": "rag",
        "author": "krunal",
        "published_year": 2026
    },
    {
        "id": 3,
        "text": "HNSW is the graph-based algorithm that powers fast approximate nearest neighbor search.",
        "category": "vector-search",
        "author": "alice",
        "published_year": 2025
    },
    {
        "id": 4,
        "text": "Pinecone is a fully managed vector database with a serverless option.",
        "category": "vector-search",
        "author": "alice",
        "published_year": 2025
    },
    {
        "id": 5,
        "text": "pgvector adds vector similarity search as an extension to PostgreSQL.",
        "category": "vector-search",
        "author": "krunal",
        "published_year": 2026
    },
]

# Generate embeddings and build points
points = []
for doc in documents:
    embedding = get_embedding(doc["text"])
    points.append(
        PointStruct(
            id=doc["id"],
            vector=embedding,
            payload={
                "text": doc["text"],
                "category": doc["category"],
                "author": doc["author"],
                "published_year": doc["published_year"]
            }
        )
    )

# Upsert all points at once
client.upsert(
    collection_name="articles",
    points=points
)

print(f"Inserted {len(points)} points")

Batch upsert for large datasets

For large datasets, upsert in batches to avoid memory issues and timeouts:

python

def batch_upsert(client, collection_name, documents, batch_size=100):
    total = len(documents)
    for i in range(0, total, batch_size):
        batch = documents[i:i + batch_size]
        points = []
        for doc in batch:
            embedding = get_embedding(doc["text"])
            points.append(
                PointStruct(
                    id=doc["id"],
                    vector=embedding,
                    payload=doc
                )
            )
        client.upsert(collection_name=collection_name, points=points)
        print(f"Upserted {min(i + batch_size, total)}/{total}")

Step 7: Run a Similarity Search

Now query the collection. You embed a question and Qdrant returns the most semantically similar points.

python

# Embed the query
query_text = "how does vector search work?"
query_vector = get_embedding(query_text)

# Search
results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    limit=3,            # return top 3 results
    with_payload=True   # include the JSON payload in results
)

# Print results
for result in results:
    print(f"Score: {result.score:.4f}")
    print(f"Text:  {result.payload['text']}")
    print(f"Category: {result.payload['category']}")
    print()

The score is the cosine similarity between your query vector and the stored vector. Higher is more similar. Scores range from 0 to 1 with cosine distance.

Step 8: Filter by Payload

Basic filter

python

from qdrant_client.models import Filter, FieldCondition, MatchValue

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="vector-search")
            )
        ]
    ),
    limit=3,
    with_payload=True
)

for result in results:
    print(f"{result.score:.4f} | {result.payload['text'][:60]}")

Multiple conditions (AND logic)

python

from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="vector-search")
            ),
            FieldCondition(
                key="author",
                match=MatchValue(value="krunal")
            ),
            FieldCondition(
                key="published_year",
                range=Range(gte=2026)  # published 2026 or later
            )
        ]
    ),
    limit=5,
    with_payload=True
)

OR logic

python

from qdrant_client.models import Filter, FieldCondition, MatchAny

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchAny(any=["vector-search", "rag"])  # either category
            )
        ]
    ),
    limit=5,
    with_payload=True
)

NOT logic (exclude)

python

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must_not=[
            FieldCondition(
                key="author",
                match=MatchValue(value="alice")
            )
        ]
    ),
    limit=5,
    with_payload=True
)

Step 9: Add Payload Indexes for Fast Filtering

By default, Qdrant filters by scanning all candidate payload values. For large collections, add payload indexes to the fields you filter on most often. This dramatically speeds up filtered searches.

python

from qdrant_client.models import PayloadSchemaType

# Index the category field (keyword type for exact match)
client.create_payload_index(
    collection_name="articles",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD
)

# Index the published_year field (integer type for range queries)
client.create_payload_index(
    collection_name="articles",
    field_name="published_year",
    field_schema=PayloadSchemaType.INTEGER
)

As a rule: if you filter on a field in more than 10% of your queries, index it.

Step 10: Hybrid Search (Dense + Sparse)

Hybrid search combines semantic similarity (dense vectors) with keyword matching (sparse vectors). It is better than either alone for queries with specific terms, product codes, or proper nouns.

Qdrant has native sparse vector support. You store a sparse vector alongside each dense vector and query both at once.

Set up a collection with named vectors

python

from qdrant_client.models import VectorParams, SparseVectorParams, Distance

client.create_collection(
    collection_name="articles_hybrid",
    vectors_config={
        "dense": VectorParams(size=1536, distance=Distance.COSINE)
    },
    sparse_vectors_config={
        "sparse": SparseVectorParams()
    }
)

Generate sparse vectors with BM25

python

from qdrant_client.models import PointStruct, SparseVector
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

# Build a TF-IDF encoder on your corpus
corpus = [doc["text"] for doc in documents]
tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(corpus)

def get_sparse_vector(text: str) -> SparseVector:
    """Convert text to a sparse vector using TF-IDF weights."""
    vec = tfidf.transform([text])
    cx = vec.tocoo()
    indices = cx.col.tolist()
    values = cx.data.tolist()
    return SparseVector(indices=indices, values=values)

# Upsert points with both dense and sparse vectors
points = []
for i, doc in enumerate(documents):
    dense_vec = get_embedding(doc["text"])
    sparse_vec = get_sparse_vector(doc["text"])
    points.append(
        PointStruct(
            id=doc["id"],
            vector={
                "dense": dense_vec,
                "sparse": sparse_vec
            },
            payload={
                "text": doc["text"],
                "category": doc["category"]
            }
        )
    )

client.upsert(collection_name="articles_hybrid", points=points)

Hybrid query

python

from qdrant_client.models import Prefetch, FusionQuery, Fusion

query_text = "fast approximate search algorithm"
dense_query = get_embedding(query_text)
sparse_query = get_sparse_vector(query_text)

# Query using reciprocal rank fusion to merge dense and sparse results
results = client.query_points(
    collection_name="articles_hybrid",
    prefetch=[
        Prefetch(
            query=dense_query,
            using="dense",
            limit=20
        ),
        Prefetch(
            query=sparse_query,
            using="sparse",
            limit=20
        )
    ],
    query=FusionQuery(fusion=Fusion.RRF),  # reciprocal rank fusion
    limit=5,
    with_payload=True
)

for result in results.points:
    print(f"{result.score:.4f} | {result.payload['text'][:60]}")

Reciprocal rank fusion (RRF) combines the rankings from both searches without needing to tune a weight parameter. It is the recommended default for hybrid search.

Step 11: Check Collection Stats

python

info = client.get_collection("articles")
print(f"Vectors count: {info.vectors_count}")
print(f"Points count:  {info.points_count}")
print(f"Status:        {info.status}")
print(f"Optimizer:     {info.optimizer_status}")

The optimizer runs in the background to merge segments and rebuild indexes. The status will show ok when indexing is complete.

Scroll Through All Points

To retrieve all points in a collection (not just search results), use scroll. This is useful for bulk exports or debugging.

python

from qdrant_client.models import ScrollRequest

points, next_offset = client.scroll(
    collection_name="articles",
    limit=100,
    with_payload=True,
    with_vectors=False  # set True if you need the vectors back
)

for point in points:
    print(f"ID: {point.id} | {point.payload['text'][:50]}")

Delete Points

python

from qdrant_client.models import PointIdsList

# Delete by specific IDs
client.delete(
    collection_name="articles",
    points_selector=PointIdsList(points=[1, 2])
)

# Delete by filter (all points where category = "rag")
from qdrant_client.models import Filter, FieldCondition, MatchValue, FilterSelector

client.delete(
    collection_name="articles",
    points_selector=FilterSelector(
        filter=Filter(
            must=[
                FieldCondition(key="category", match=MatchValue(value="rag"))
            ]
        )
    )
)

Complete RAG Example

Here is a complete end-to-end example that ties everything together: index a set of documents and answer questions over them.

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue
from openai import OpenAI
import uuid

# Clients
qdrant = QdrantClient(host="localhost", port=6333)
openai_client = OpenAI(api_key="your-openai-api-key")

COLLECTION = "knowledge_base"
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL = "gpt-4o-mini"

# Create collection
qdrant.recreate_collection(
    collection_name=COLLECTION,
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

def embed(text: str) -> list[float]:
    return openai_client.embeddings.create(
        model=EMBED_MODEL, input=text
    ).data[0].embedding

# Index documents
docs = [
    {"text": "Qdrant stores vectors and lets you search them by similarity.", "source": "qdrant-docs"},
    {"text": "RAG grounds LLM responses in retrieved documents to reduce hallucination.", "source": "rag-guide"},
    {"text": "HNSW is a graph-based index used for approximate nearest neighbor search.", "source": "hnsw-paper"},
    {"text": "Cosine similarity measures the angle between two vectors, not their length.", "source": "math-guide"},
]

points = [
    PointStruct(
        id=str(uuid.uuid4()),
        vector=embed(doc["text"]),
        payload=doc
    )
    for doc in docs
]
qdrant.upsert(collection_name=COLLECTION, points=points)

def answer_question(question: str, source_filter: str = None) -> str:
    query_vec = embed(question)

    # Build optional filter
    search_filter = None
    if source_filter:
        search_filter = Filter(
            must=[FieldCondition(key="source", match=MatchValue(value=source_filter))]
        )

    # Retrieve top 3 relevant chunks
    results = qdrant.search(
        collection_name=COLLECTION,
        query_vector=query_vec,
        query_filter=search_filter,
        limit=3,
        with_payload=True
    )

    context = "\n\n".join(r.payload["text"] for r in results)

    # Generate answer with retrieved context
    response = openai_client.chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {
                "role": "system",
                "content": "Answer the question using only the provided context. If the answer is not in the context, say so."
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }
        ]
    )
    return response.choices[0].message.content

# Test it
print(answer_question("What is HNSW?"))
print(answer_question("How does RAG reduce hallucination?"))

Self-hosted vs Qdrant Cloud

Consideration	Self-hosted	Qdrant Cloud
Cost	VPS only ($20 to $100/month)	Free tier, then $25+/month
Setup time	10 minutes with Docker	5 minutes, no Docker needed
Data control	Full control, stays on your infra	Hosted by Qdrant
Maintenance	You manage upgrades and backups	Qdrant manages everything
Scale	You provision the hardware	Auto-scales (paid tiers)
Best for	Teams with DevOps capacity, cost-sensitive workloads	Teams who want zero ops

Common Mistakes to Avoid

Summary

Qdrant is one of the most capable vector databases available in 2026. Getting it running takes under ten minutes with Docker. The Python client is well-designed and covers every operation you need.

Frequently Asked Questions

What is Qdrant?

Is Qdrant free to use?

How do I install Qdrant locally?

What Python library does Qdrant use?

The official Python client is qdrant-client. Install it with `pip install qdrant-client`. It supports both synchronous and async usage and works with local Docker instances and Qdrant Cloud.

How many vectors can Qdrant handle?

Does Qdrant support hybrid search?

What is a Qdrant collection?

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source

Appears in Google Discover

Krunal Kanojiya

Technical Content Writer

GitHub LinkedIn X

pgvector: The Complete Guide to Vector Search in PostgreSQL (2026)

May 29, 2026 · 11 min read

Pinecone vs Weaviate vs Milvus vs Qdrant: Best Vector Database in 2026?

Jun 27, 2026 · 21 min read

Pinecone vs Qdrant: Which Vector Database Should You Use in 2026?

May 28, 2026 · 10 min read

What Qdrant Is

Step 1: Run Qdrant with Docker

Alternative: download the binary

Alternative: Qdrant Cloud

Step 2: Install the Python Client

Step 3: Connect to Qdrant

Step 4: Create a Collection

Step 5: Generate Embeddings

Option A: OpenAI embeddings

Option B: Free local model (no API key needed)

Step 6: Insert Vectors (Upsert)

Batch upsert for large datasets

Step 7: Run a Similarity Search

Step 8: Filter by Payload

Basic filter

Multiple conditions (AND logic)

OR logic

NOT logic (exclude)

Step 9: Add Payload Indexes for Fast Filtering

Step 10: Hybrid Search (Dense + Sparse)

Set up a collection with named vectors

Generate sparse vectors with BM25

Hybrid query

Step 11: Check Collection Stats

Scroll Through All Points

Delete Points

Complete RAG Example

Self-hosted vs Qdrant Cloud

Common Mistakes to Avoid

Summary

Related Reading

Frequently Asked Questions

Krunal Kanojiya

Related Posts

What Qdrant Is

Step 1: Run Qdrant with Docker

Alternative: download the binary

Alternative: Qdrant Cloud

Step 2: Install the Python Client

Step 3: Connect to Qdrant

Step 4: Create a Collection

Step 5: Generate Embeddings

Option A: OpenAI embeddings

Option B: Free local model (no API key needed)

Step 6: Insert Vectors (Upsert)

Batch upsert for large datasets

Step 7: Run a Similarity Search

Step 8: Filter by Payload

Basic filter

Multiple conditions (AND logic)

OR logic

NOT logic (exclude)

Step 9: Add Payload Indexes for Fast Filtering

Step 10: Hybrid Search (Dense + Sparse)

Set up a collection with named vectors

Generate sparse vectors with BM25

Hybrid query

Step 11: Check Collection Stats

Scroll Through All Points

Delete Points

Complete RAG Example

Self-hosted vs Qdrant Cloud

Common Mistakes to Avoid

Summary

Related Reading

Frequently Asked Questions

Krunal Kanojiya

Related Posts