K
Krunal Kanojiya
HomeAboutServicesBlog
Hire Me
K
Krunal Kanojiya

Technical Content Writer

BlogRSSSitemapEmail
© 2026 Krunal Kanojiya · Built with Next.js
Privacy PolicyTerms of Service
  1. Home
  2. /
  3. Blog
  4. /
  5. Vector Search & Databases
  6. /
  7. Qdrant Tutorial: Getting Started with Vector Search in Python (2026)
Vector Search & Databases14 min read2,617 words

Qdrant Tutorial: Getting Started with Vector Search in Python (2026)

A complete Qdrant tutorial for 2026. Covers Docker setup, creating collections, inserting vectors with Python, similarity search, metadata filtering, hybrid search, and payload indexing. Includes working code for every step.

Krunal Kanojiya

Krunal Kanojiya

June 07, 2026
Share:
#qdrant#vector-database#vector-search#tutorial#python#RAG#embeddings#similarity-search#HNSW#getting-started
Qdrant Tutorial: Getting Started with Vector Search in Python (2026)

When I started working with vector search, Pinecone was the default recommendation. The API was simple and the docs were good. But the moment my dataset grew past a few hundred thousand vectors, the bill started hurting.

Qdrant was the answer. Open source, written in Rust, free to self-host, and genuinely fast. I have been running it in production on RAG applications since early 2025 and have not found a meaningful reason to switch.

This guide takes you from zero to a working Qdrant setup. By the end you will have Qdrant running locally, a collection created, vectors inserted with Python, and a similarity search query running with metadata filters.

What Qdrant Is

Qdrant is a vector database built specifically for similarity search. You store embeddings (high-dimensional vectors) in it and query them to find the most semantically similar results.

It is written in Rust, which gives it very low memory overhead and fast query execution. It runs as a standalone service with an HTTP and gRPC API. The Python client wraps both.

Qdrant supports:

  • HNSW indexing for fast approximate nearest neighbor search
  • Payload filtering (filter by metadata at query time, efficiently)
  • Sparse vectors for hybrid search
  • Scalar and binary quantization to reduce memory usage
  • Collections, named vector spaces within a collection, and sharding for large-scale setups

If you are deciding between Qdrant and other vector databases, see pgvector vs Pinecone and Pinecone vs Qdrant for full comparisons.

Step 1: Run Qdrant with Docker

The fastest way to get Qdrant running locally is Docker. One command and it is up.

bash
docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant

Port 6333 is the HTTP API. Port 6334 is gRPC. The -v flag mounts a local directory so your data persists when the container restarts.

Once it is running, open http://localhost:6333/dashboard in your browser. You will see the Qdrant web UI where you can inspect collections and run queries visually.

To verify the API is alive:

bash
curl http://localhost:6333/
# returns: {"title":"qdrant - vector search engine","version":"..."}

Alternative: download the binary

If you prefer not to use Docker, download a pre-built binary from the Qdrant releases page. Extract it and run:

bash
./qdrant

Same API, same ports, no Docker required.

Alternative: Qdrant Cloud

If you want a managed service instead of running it yourself, sign up at cloud.qdrant.io. The free tier gives you 1GB of storage, which is enough for roughly 500K vectors at 1536 dimensions. You get a hosted URL and an API key instead of running Docker.

Step 2: Install the Python Client

bash
pip install qdrant-client

For embeddings in this tutorial, also install the OpenAI SDK:

bash
pip install openai

If you prefer a free local embedding model, install sentence-transformers instead:

bash
pip install sentence-transformers

Both work. I will show both options.

Step 3: Connect to Qdrant

python
from qdrant_client import QdrantClient

# Connect to local Docker instance
client = QdrantClient(host="localhost", port=6333)

# Or connect to Qdrant Cloud
# client = QdrantClient(
#     url="https://your-cluster-url.qdrant.io",
#     api_key="your-api-key"
# )

# Verify the connection
info = client.get_collections()
print(info)

The client automatically uses HTTP. For production, you can switch to gRPC by passing prefer_grpc=True which reduces latency on high-QPS workloads.

Step 4: Create a Collection

A collection is where your vectors live. Every vector in a collection must have the same dimension. You also choose the distance metric here.

python
from qdrant_client.models import Distance, VectorParams

client.create_collection(
    collection_name="articles",
    vectors_config=VectorParams(
        size=1536,          # must match your embedding model output
        distance=Distance.COSINE
    )
)

Choosing the right distance metric:

MetricUse when
COSINEOpenAI embeddings, most sentence transformers, when direction matters more than magnitude
DOTWhen vectors are normalized (same as cosine but faster)
EUCLIDPixel-level similarity, some image embeddings

For OpenAI text-embedding-3-small or text-embedding-3-large, use COSINE. For Cohere and most sentence transformer models, also use COSINE.

Verify the collection was created:

python
collection_info = client.get_collection("articles")
print(collection_info)

Step 5: Generate Embeddings

Before inserting anything, you need vectors. Here is how to generate them with both OpenAI and a free local model.

Option A: OpenAI embeddings

python
from openai import OpenAI

openai_client = OpenAI(api_key="your-openai-api-key")

def get_embedding(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",  # 1536 dimensions
        input=text
    )
    return response.data[0].embedding

# Test it
vector = get_embedding("Qdrant is a vector database written in Rust")
print(f"Dimension: {len(vector)}")  # 1536

Option B: Free local model (no API key needed)

python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")  # 384 dimensions

def get_embedding(text: str) -> list[float]:
    return model.encode(text).tolist()

# If using this model, create the collection with size=384
# client.create_collection(
#     collection_name="articles",
#     vectors_config=VectorParams(size=384, distance=Distance.COSINE)
# )

Step 6: Insert Vectors (Upsert)

In Qdrant, the operation to insert or update vectors is called upsert. Each point (Qdrant's name for a single vector entry) has an ID, a vector, and an optional payload.

The payload is arbitrary JSON. You attach it to the vector and Qdrant lets you filter on it at query time without any extra storage cost.

python
from qdrant_client.models import PointStruct

# Sample documents
documents = [
    {
        "id": 1,
        "text": "Qdrant is a vector database written in Rust for high-performance similarity search.",
        "category": "vector-search",
        "author": "krunal",
        "published_year": 2026
    },
    {
        "id": 2,
        "text": "RAG combines retrieval from a vector database with language model generation.",
        "category": "rag",
        "author": "krunal",
        "published_year": 2026
    },
    {
        "id": 3,
        "text": "HNSW is the graph-based algorithm that powers fast approximate nearest neighbor search.",
        "category": "vector-search",
        "author": "alice",
        "published_year": 2025
    },
    {
        "id": 4,
        "text": "Pinecone is a fully managed vector database with a serverless option.",
        "category": "vector-search",
        "author": "alice",
        "published_year": 2025
    },
    {
        "id": 5,
        "text": "pgvector adds vector similarity search as an extension to PostgreSQL.",
        "category": "vector-search",
        "author": "krunal",
        "published_year": 2026
    },
]

# Generate embeddings and build points
points = []
for doc in documents:
    embedding = get_embedding(doc["text"])
    points.append(
        PointStruct(
            id=doc["id"],
            vector=embedding,
            payload={
                "text": doc["text"],
                "category": doc["category"],
                "author": doc["author"],
                "published_year": doc["published_year"]
            }
        )
    )

# Upsert all points at once
client.upsert(
    collection_name="articles",
    points=points
)

print(f"Inserted {len(points)} points")

Batch upsert for large datasets

For large datasets, upsert in batches to avoid memory issues and timeouts:

python
def batch_upsert(client, collection_name, documents, batch_size=100):
    total = len(documents)
    for i in range(0, total, batch_size):
        batch = documents[i:i + batch_size]
        points = []
        for doc in batch:
            embedding = get_embedding(doc["text"])
            points.append(
                PointStruct(
                    id=doc["id"],
                    vector=embedding,
                    payload=doc
                )
            )
        client.upsert(collection_name=collection_name, points=points)
        print(f"Upserted {min(i + batch_size, total)}/{total}")

Step 7: Run a Similarity Search

Now query the collection. You embed a question and Qdrant returns the most semantically similar points.

python
# Embed the query
query_text = "how does vector search work?"
query_vector = get_embedding(query_text)

# Search
results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    limit=3,            # return top 3 results
    with_payload=True   # include the JSON payload in results
)

# Print results
for result in results:
    print(f"Score: {result.score:.4f}")
    print(f"Text:  {result.payload['text']}")
    print(f"Category: {result.payload['category']}")
    print()

The score is the cosine similarity between your query vector and the stored vector. Higher is more similar. Scores range from 0 to 1 with cosine distance.

Step 8: Filter by Payload

This is where Qdrant genuinely stands out. You can filter results by any payload field at query time, and Qdrant applies the filter efficiently alongside the vector search rather than as a post-processing step.

Basic filter

python
from qdrant_client.models import Filter, FieldCondition, MatchValue

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="vector-search")
            )
        ]
    ),
    limit=3,
    with_payload=True
)

for result in results:
    print(f"{result.score:.4f} | {result.payload['text'][:60]}")

Multiple conditions (AND logic)

python
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="vector-search")
            ),
            FieldCondition(
                key="author",
                match=MatchValue(value="krunal")
            ),
            FieldCondition(
                key="published_year",
                range=Range(gte=2026)  # published 2026 or later
            )
        ]
    ),
    limit=5,
    with_payload=True
)

OR logic

python
from qdrant_client.models import Filter, FieldCondition, MatchAny

results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchAny(any=["vector-search", "rag"])  # either category
            )
        ]
    ),
    limit=5,
    with_payload=True
)

NOT logic (exclude)

python
results = client.search(
    collection_name="articles",
    query_vector=query_vector,
    query_filter=Filter(
        must_not=[
            FieldCondition(
                key="author",
                match=MatchValue(value="alice")
            )
        ]
    ),
    limit=5,
    with_payload=True
)

Step 9: Add Payload Indexes for Fast Filtering

By default, Qdrant filters by scanning all candidate payload values. For large collections, add payload indexes to the fields you filter on most often. This dramatically speeds up filtered searches.

python
from qdrant_client.models import PayloadSchemaType

# Index the category field (keyword type for exact match)
client.create_payload_index(
    collection_name="articles",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD
)

# Index the published_year field (integer type for range queries)
client.create_payload_index(
    collection_name="articles",
    field_name="published_year",
    field_schema=PayloadSchemaType.INTEGER
)

As a rule: if you filter on a field in more than 10% of your queries, index it.

Step 10: Hybrid Search (Dense + Sparse)

Hybrid search combines semantic similarity (dense vectors) with keyword matching (sparse vectors). It is better than either alone for queries with specific terms, product codes, or proper nouns.

Qdrant has native sparse vector support. You store a sparse vector alongside each dense vector and query both at once.

Set up a collection with named vectors

python
from qdrant_client.models import VectorParams, SparseVectorParams, Distance

client.create_collection(
    collection_name="articles_hybrid",
    vectors_config={
        "dense": VectorParams(size=1536, distance=Distance.COSINE)
    },
    sparse_vectors_config={
        "sparse": SparseVectorParams()
    }
)

Generate sparse vectors with BM25

python
from qdrant_client.models import PointStruct, SparseVector
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

# Build a TF-IDF encoder on your corpus
corpus = [doc["text"] for doc in documents]
tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(corpus)

def get_sparse_vector(text: str) -> SparseVector:
    """Convert text to a sparse vector using TF-IDF weights."""
    vec = tfidf.transform([text])
    cx = vec.tocoo()
    indices = cx.col.tolist()
    values = cx.data.tolist()
    return SparseVector(indices=indices, values=values)

# Upsert points with both dense and sparse vectors
points = []
for i, doc in enumerate(documents):
    dense_vec = get_embedding(doc["text"])
    sparse_vec = get_sparse_vector(doc["text"])
    points.append(
        PointStruct(
            id=doc["id"],
            vector={
                "dense": dense_vec,
                "sparse": sparse_vec
            },
            payload={
                "text": doc["text"],
                "category": doc["category"]
            }
        )
    )

client.upsert(collection_name="articles_hybrid", points=points)

Hybrid query

python
from qdrant_client.models import Prefetch, FusionQuery, Fusion

query_text = "fast approximate search algorithm"
dense_query = get_embedding(query_text)
sparse_query = get_sparse_vector(query_text)

# Query using reciprocal rank fusion to merge dense and sparse results
results = client.query_points(
    collection_name="articles_hybrid",
    prefetch=[
        Prefetch(
            query=dense_query,
            using="dense",
            limit=20
        ),
        Prefetch(
            query=sparse_query,
            using="sparse",
            limit=20
        )
    ],
    query=FusionQuery(fusion=Fusion.RRF),  # reciprocal rank fusion
    limit=5,
    with_payload=True
)

for result in results.points:
    print(f"{result.score:.4f} | {result.payload['text'][:60]}")

Reciprocal rank fusion (RRF) combines the rankings from both searches without needing to tune a weight parameter. It is the recommended default for hybrid search.

Step 11: Check Collection Stats

python
info = client.get_collection("articles")
print(f"Vectors count: {info.vectors_count}")
print(f"Points count:  {info.points_count}")
print(f"Status:        {info.status}")
print(f"Optimizer:     {info.optimizer_status}")

The optimizer runs in the background to merge segments and rebuild indexes. The status will show ok when indexing is complete.

Scroll Through All Points

To retrieve all points in a collection (not just search results), use scroll. This is useful for bulk exports or debugging.

python
from qdrant_client.models import ScrollRequest

points, next_offset = client.scroll(
    collection_name="articles",
    limit=100,
    with_payload=True,
    with_vectors=False  # set True if you need the vectors back
)

for point in points:
    print(f"ID: {point.id} | {point.payload['text'][:50]}")

Delete Points

python
from qdrant_client.models import PointIdsList

# Delete by specific IDs
client.delete(
    collection_name="articles",
    points_selector=PointIdsList(points=[1, 2])
)

# Delete by filter (all points where category = "rag")
from qdrant_client.models import Filter, FieldCondition, MatchValue, FilterSelector

client.delete(
    collection_name="articles",
    points_selector=FilterSelector(
        filter=Filter(
            must=[
                FieldCondition(key="category", match=MatchValue(value="rag"))
            ]
        )
    )
)

Complete RAG Example

Here is a complete end-to-end example that ties everything together: index a set of documents and answer questions over them.

python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue
from openai import OpenAI
import uuid

# Clients
qdrant = QdrantClient(host="localhost", port=6333)
openai_client = OpenAI(api_key="your-openai-api-key")

COLLECTION = "knowledge_base"
EMBED_MODEL = "text-embedding-3-small"
CHAT_MODEL = "gpt-4o-mini"

# Create collection
qdrant.recreate_collection(
    collection_name=COLLECTION,
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

def embed(text: str) -> list[float]:
    return openai_client.embeddings.create(
        model=EMBED_MODEL, input=text
    ).data[0].embedding

# Index documents
docs = [
    {"text": "Qdrant stores vectors and lets you search them by similarity.", "source": "qdrant-docs"},
    {"text": "RAG grounds LLM responses in retrieved documents to reduce hallucination.", "source": "rag-guide"},
    {"text": "HNSW is a graph-based index used for approximate nearest neighbor search.", "source": "hnsw-paper"},
    {"text": "Cosine similarity measures the angle between two vectors, not their length.", "source": "math-guide"},
]

points = [
    PointStruct(
        id=str(uuid.uuid4()),
        vector=embed(doc["text"]),
        payload=doc
    )
    for doc in docs
]
qdrant.upsert(collection_name=COLLECTION, points=points)

def answer_question(question: str, source_filter: str = None) -> str:
    query_vec = embed(question)

    # Build optional filter
    search_filter = None
    if source_filter:
        search_filter = Filter(
            must=[FieldCondition(key="source", match=MatchValue(value=source_filter))]
        )

    # Retrieve top 3 relevant chunks
    results = qdrant.search(
        collection_name=COLLECTION,
        query_vector=query_vec,
        query_filter=search_filter,
        limit=3,
        with_payload=True
    )

    context = "\n\n".join(r.payload["text"] for r in results)

    # Generate answer with retrieved context
    response = openai_client.chat.completions.create(
        model=CHAT_MODEL,
        messages=[
            {
                "role": "system",
                "content": "Answer the question using only the provided context. If the answer is not in the context, say so."
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }
        ]
    )
    return response.choices[0].message.content

# Test it
print(answer_question("What is HNSW?"))
print(answer_question("How does RAG reduce hallucination?"))

Self-hosted vs Qdrant Cloud

ConsiderationSelf-hostedQdrant Cloud
CostVPS only ($20 to $100/month)Free tier, then $25+/month
Setup time10 minutes with Docker5 minutes, no Docker needed
Data controlFull control, stays on your infraHosted by Qdrant
MaintenanceYou manage upgrades and backupsQdrant manages everything
ScaleYou provision the hardwareAuto-scales (paid tiers)
Best forTeams with DevOps capacity, cost-sensitive workloadsTeams who want zero ops

For prototypes and side projects, Qdrant Cloud free tier is the easiest start. For production workloads where cost matters, self-hosting on a $40/month VPS typically handles millions of vectors at a fraction of managed service pricing.

Common Mistakes to Avoid

Wrong dimension size. If you create a collection with size=1536 and then try to insert a 384-dimension vector, Qdrant will reject it with an error. Always match the collection dimension to your embedding model output.

No payload index on filtered fields. Filtering without a payload index works but scans every point's payload. On collections with millions of points, this slows down filtered searches significantly. Run create_payload_index on any field you filter on regularly.

Upserting without IDs. Qdrant requires each point to have an ID. Use either sequential integers or UUID strings. If you upsert a point with an ID that already exists, Qdrant overwrites the existing point. This is the correct behavior for updates.

Forgetting with_payload=True. By default, search results do not include the payload. You get the ID and score but not the text or metadata. Always pass with_payload=True unless you only need the IDs.

Running searches before the optimizer finishes. After a large upsert, Qdrant optimizes its internal segments in the background. Searches work immediately, but performance improves once optimization finishes. Check collection.optimizer_status to see when it is done.

Summary

Qdrant is one of the most capable vector databases available in 2026. Getting it running takes under ten minutes with Docker. The Python client is well-designed and covers every operation you need.

The key concepts to internalize: collections hold vectors of one dimension, payloads are the metadata attached to each point, filters let you scope searches by payload at query time, and payload indexes make filtered searches fast at scale.

From here, the natural next steps are adding quantization to reduce memory usage on large collections, setting up replication for high availability, and tuning HNSW parameters for your specific recall and latency requirements.

Related Reading

  • Pinecone vs Qdrant
  • pgvector vs Pinecone
  • How to Choose a Vector Database
  • HNSW Algorithm Explained
  • RAG Architecture Explained
  • What Is a Vector Database?
  • Vector Database in RAG

On this page

What Qdrant IsStep 1: Run Qdrant with DockerAlternative: download the binaryAlternative: Qdrant CloudStep 2: Install the Python ClientStep 3: Connect to QdrantStep 4: Create a CollectionStep 5: Generate EmbeddingsOption A: OpenAI embeddingsOption B: Free local model (no API key needed)Step 6: Insert Vectors (Upsert)Batch upsert for large datasetsStep 7: Run a Similarity SearchStep 8: Filter by PayloadBasic filterMultiple conditions (AND logic)OR logicNOT logic (exclude)Step 9: Add Payload Indexes for Fast FilteringStep 10: Hybrid Search (Dense + Sparse)Set up a collection with named vectorsGenerate sparse vectors with BM25Hybrid queryStep 11: Check Collection StatsScroll Through All PointsDelete PointsComplete RAG ExampleSelf-hosted vs Qdrant CloudCommon Mistakes to AvoidSummaryRelated Reading

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source
Appears in Google Discover
All posts

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source
Appears in Google Discover
Krunal Kanojiya

Krunal Kanojiya

Technical Content Writer

I am a technical content writer and former software developer from India. I write clear, in-depth articles on blockchain, AI and machine learning, data engineering, web development, and developer careers. I work at Lucent Innovation now. Before that I wrote about blockchain at Cromtek Solution and did freelance work.

GitHubLinkedInX

Related Posts

pgvector: The Complete Guide to Vector Search in PostgreSQL (2026)

May 29, 2026 · 11 min read

Pinecone vs Qdrant: Which Vector Database Should You Use in 2026?

May 28, 2026 · 10 min read

pgvector vs Pinecone: Which One Should You Use in 2026?

Jun 07, 2026 · 16 min read