K
Krunal Kanojiya
HomeAboutServicesBlog
Hire Me
K
Krunal Kanojiya

Technical Content Writer

BlogRSSSitemapEmail
© 2026 Krunal Kanojiya · Built with Next.js
Privacy PolicyTerms of Service
  1. Home
  2. /
  3. Blog
  4. /
  5. Vector Search & Databases
  6. /
  7. pgvector: The Complete Guide to Vector Search in PostgreSQL (2026)
Vector Search & Databases11 min read2,078 words

pgvector: The Complete Guide to Vector Search in PostgreSQL (2026)

pgvector adds vector similarity search directly to PostgreSQL. This guide covers installation, creating vector columns, HNSW and IVFFlat indexes, cosine and L2 distance queries, filtering, Python integration, performance tuning, and when to migrate to a dedicated vector database.

Krunal Kanojiya

Krunal Kanojiya

May 29, 2026
Share:
#pgvector#postgresql#vector-database#vector-search#embeddings#HNSW#similarity-search#RAG#python
pgvector: The Complete Guide to Vector Search in PostgreSQL (2026)

Most teams that build their first RAG application do not need a separate vector database.

They are already running PostgreSQL. They already have backups, monitoring, and on-call procedures set up for it. Adding a new database means a new thing to operate, a new failure mode to watch, and a new bill to pay.

pgvector solves that. It adds vector similarity search directly to PostgreSQL as an extension. Your vectors live in the same database as your users, products, and orders. You can join them together in a single query. You do not need to manage anything new.

This guide covers everything you need to get pgvector working in production: installation, indexing, querying, Python integration, performance tuning, and knowing when you have outgrown it.

What pgvector Actually Is

pgvector is a PostgreSQL extension that adds three things:

  1. A vector data type for storing high-dimensional floating-point arrays
  2. Three distance operators for measuring similarity between vectors
  3. Two index types (HNSW and IVFFlat) for fast approximate nearest neighbor search

Without an index, pgvector does exact search. Every query scans the entire table and calculates the distance to every stored vector. That is accurate but slow at any meaningful scale. With an HNSW index, queries use approximate nearest neighbor search and return results in milliseconds.

pgvector is open source under the PostgreSQL License. It works on any PostgreSQL 12 or later instance. Supabase, AWS RDS, Google Cloud SQL, and most managed Postgres providers support it out of the box.

Installation

If you are running Postgres locally, install pgvector using your package manager.

On Ubuntu or Debian:

bash
sudo apt install postgresql-16-pgvector

On macOS with Homebrew:

bash
brew install pgvector

From source (any system):

bash
git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
make install

Once installed, enable the extension in your database:

sql
CREATE EXTENSION IF NOT EXISTS vector;

That is all. pgvector is now active in that database.

Creating a Table with a Vector Column

The vector(n) type takes one argument: the number of dimensions. This must match the output dimensions of your embedding model.

Common embedding dimensions:

  • OpenAI text-embedding-3-small: 1536
  • OpenAI text-embedding-3-large: 3072
  • Google text-embedding-004: 768
  • Sentence Transformers all-MiniLM-L6-v2: 384

Here is a table for storing document chunks with their embeddings:

sql
CREATE TABLE documents (
    id        BIGSERIAL PRIMARY KEY,
    content   TEXT        NOT NULL,
    source    TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    embedding vector(1536)
);

The embedding column stores a 1536-dimensional vector. Everything else is a normal Postgres column.

Inserting Vectors

You can insert vectors from SQL directly or from Python using the pgvector library.

From SQL:

sql
INSERT INTO documents (content, source, embedding)
VALUES (
    'pgvector adds vector search to PostgreSQL.',
    'blog-post',
    '[0.021, -0.134, 0.082, ...]'  -- 1536 values
);

From Python with psycopg2:

python
import psycopg2
from pgvector.psycopg2 import register_vector
import openai

# Connect and register the vector type
conn = psycopg2.connect("postgresql://user:password@localhost/mydb")
register_vector(conn)

# Get an embedding from OpenAI
client = openai.OpenAI(api_key="your-key")

def embed(text: str) -> list[float]:
    response = client.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return response.data[0].embedding

# Insert a document
text = "pgvector adds vector search to PostgreSQL."
embedding = embed(text)

with conn.cursor() as cur:
    cur.execute(
        "INSERT INTO documents (content, source, embedding) VALUES (%s, %s, %s)",
        (text, "blog-post", embedding)
    )
conn.commit()

From Python with SQLAlchemy:

python
from sqlalchemy import create_engine, Column, Integer, Text
from sqlalchemy.orm import declarative_base
from pgvector.sqlalchemy import Vector

Base = declarative_base()

class Document(Base):
    __tablename__ = "documents"
    id        = Column(Integer, primary_key=True)
    content   = Column(Text)
    embedding = Column(Vector(1536))

engine = create_engine("postgresql://user:password@localhost/mydb")
Base.metadata.create_all(engine)

Querying: Finding Similar Vectors

pgvector adds three distance operators:

OperatorDistance typeBest for
<->L2 (Euclidean)Normalized vectors, general use
<=>Cosine distanceText embeddings from most models
<#>Negative inner productVectors trained with dot product objective

For text embeddings from OpenAI, Sentence Transformers, or most general-purpose models, use cosine distance (<=>).

To understand the difference, read our article on cosine similarity vs Euclidean distance.

Basic similarity query:

sql
SELECT id, content, embedding <=> '[0.021, -0.134, 0.082, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.021, -0.134, 0.082, ...]'
LIMIT 10;

From Python:

python
def search(query: str, limit: int = 10) -> list[dict]:
    query_embedding = embed(query)

    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT id, content, source,
                   embedding <=> %s AS distance
            FROM documents
            ORDER BY embedding <=> %s
            LIMIT %s
            """,
            (query_embedding, query_embedding, limit)
        )
        rows = cur.fetchall()

    return [
        {"id": r[0], "content": r[1], "source": r[2], "distance": r[3]}
        for r in rows
    ]

results = search("how does pgvector handle approximate search?")
for r in results:
    print(f"{r['distance']:.4f}  {r['content'][:80]}")

Filtering with WHERE Clauses

One of the biggest advantages of pgvector over dedicated vector databases is that filtering is just SQL. You do not need to learn a new filter syntax.

sql
-- Only search documents from a specific source
SELECT id, content, embedding <=> %s AS distance
FROM documents
WHERE source = 'blog-post'
ORDER BY embedding <=> %s
LIMIT 10;

-- Only search documents created in the last 30 days
SELECT id, content, embedding <=> %s AS distance
FROM documents
WHERE created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> %s
LIMIT 10;

-- Combine filters
SELECT id, content, embedding <=> %s AS distance
FROM documents
WHERE source = 'blog-post'
  AND created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> %s
LIMIT 10;

One thing to know: pgvector uses post-filtering by default. It runs the ANN search first and then applies the WHERE clause. For selective filters, this can return fewer than your requested LIMIT. If you are filtering on a highly selective condition and need guaranteed k results, you may need to increase your limit and re-filter in application code.

Building an Index

Without an index, every query is an exact scan of all rows. Fast for small tables, very slow above a few thousand rows.

pgvector supports two index types: HNSW and IVFFlat.

HNSW (Recommended)

HNSW (Hierarchical Navigable Small World) is the better choice for most production use cases. You do not need to train it, it handles inserts well, and it gives strong recall at low latency.

sql
-- Cosine distance index (most common for text embeddings)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- L2 distance index
CREATE INDEX ON documents USING hnsw (embedding vector_l2_ops);

-- Inner product index
CREATE INDEX ON documents USING hnsw (embedding vector_ip_ops);

HNSW has two parameters you can tune:

  • m: the number of connections per node in the graph (default 16). Higher m means better recall but more memory and slower index build.
  • ef_construction: how many candidates are considered during index build (default 64). Higher means better recall but slower build.
sql
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

For most RAG applications, the defaults work well. If your recall is lower than expected, increase ef_construction to 128 and rebuild.

IVFFlat

IVFFlat (Inverted File with Flat compression) divides the vector space into lists clusters and searches only the nearest clusters at query time. It requires training on your data and needs to be rebuilt when your dataset grows significantly.

sql
-- Train and build an IVFFlat index
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

The lists value is roughly the square root of your total row count. 100 lists for 10K rows. 316 lists for 100K rows. 1000 lists for 1M rows.

IVFFlat is faster to build than HNSW but needs periodic rebuilding as your data changes. Use HNSW unless you have a specific reason not to.

Tuning Query Recall

HNSW uses an ef_search parameter at query time. Higher values check more candidates and return better recall at the cost of slightly more latency.

sql
-- Set ef_search for this session
SET hnsw.ef_search = 100;

-- Now run your query
SELECT id, content, embedding <=> %s AS distance
FROM documents
ORDER BY embedding <=> %s
LIMIT 10;

The default ef_search is 40. If your top results feel off, try 80 or 100. For IVFFlat, the equivalent parameter is ivfflat.probes.

sql
SET ivfflat.probes = 10;

Full Python RAG Example

Here is a complete working example: ingest documents, index them, and retrieve relevant chunks for an LLM.

python
import psycopg2
from pgvector.psycopg2 import register_vector
import openai

conn = psycopg2.connect("postgresql://user:password@localhost/mydb")
register_vector(conn)
client = openai.OpenAI(api_key="your-key")

def embed(text: str) -> list[float]:
    return client.embeddings.create(
        input=text, model="text-embedding-3-small"
    ).data[0].embedding

def ingest(texts: list[str], source: str) -> None:
    with conn.cursor() as cur:
        for text in texts:
            cur.execute(
                "INSERT INTO documents (content, source, embedding) VALUES (%s, %s, %s)",
                (text, source, embed(text))
            )
    conn.commit()

def retrieve(query: str, limit: int = 5) -> list[str]:
    q_vec = embed(query)
    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT content
            FROM documents
            ORDER BY embedding <=> %s
            LIMIT %s
            """,
            (q_vec, limit)
        )
        return [row[0] for row in cur.fetchall()]

def answer(question: str) -> str:
    chunks = retrieve(question)
    context = "\n\n".join(chunks)
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Answer using this context:\n\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# Use it
ingest(["pgvector adds vector search to PostgreSQL.", "..."], source="docs")
print(answer("How do I run a vector similarity search in Postgres?"))

Performance at Scale

pgvector works well at these scale points:

Under 100K vectors: Any approach works. Even exact search is fast enough.

100K to 1M vectors: HNSW index with default settings. Expect sub-5ms query latency on a $20 to $50 per month cloud instance.

1M to 5M vectors: Still manageable with HNSW on a larger instance (8 to 16GB RAM). You may need to increase shared_buffers and work_mem in your Postgres config. The Postgres tuning guide from PGTune can generate a good starting config based on your hardware. Latency will stay reasonable but index build takes longer.

Above 5M vectors: This is where pgvector starts to show its limits. Memory pressure increases significantly because HNSW graphs are kept in memory. Query latency starts climbing. At this point, a dedicated vector database like Qdrant or Pinecone will give you better performance at lower cost.

When to Stop Using pgvector

pgvector is the right tool for a specific range of use cases. Here are the signals that tell you it is time to move to a dedicated vector database.

Your query latency is climbing. If p99 latency is above 50ms and you have already tuned ef_search and increased hardware, you have hit the limit.

You need hybrid search. pgvector has no native sparse vector support. Building hybrid search on top of pgvector requires running a separate keyword search (using Postgres full-text search) and merging results manually in application code. Dedicated databases like Qdrant handle this natively.

Filtering is returning incomplete results. Post-filtering on selective conditions produces fewer results than requested. If your application needs guaranteed k results under selective filters, a database with filtered HNSW traversal handles it correctly.

Your vector dataset is growing past 5M rows. The memory requirements for HNSW at this scale put pressure on the rest of your Postgres workload. You start trading off vector search performance against OLTP performance on the same instance.

If any of these apply to you, the guide to choosing a vector database walks through the full set of options and what to move to.

My Recommendation

Start with pgvector if you are already on PostgreSQL.

It removes the infrastructure overhead of a second database, keeps your data in one place, and handles all but the most demanding vector search workloads. The Postgres ecosystem tools you already use for migrations, backups, and monitoring work with pgvector data automatically.

Move to a dedicated vector database when your vector count exceeds 5M, your filtering patterns need guaranteed recall under selective conditions, or you need native hybrid search without building it yourself.

The transition is not difficult. pgvector and Qdrant use the same concept of embedding model outputs as stored vectors. The main work is moving data and updating query code. Waiting until you have a real reason to migrate is the right call.

Related Reading

  • How to choose a vector database in 2026
  • Pinecone vs Qdrant: which should you use?
  • HNSW algorithm explained with diagrams
  • IVF index explained for vector search
  • What is vector indexing and why it matters
  • Cosine similarity vs Euclidean distance
  • Vector database vs traditional database
  • How similarity search works in vector databases

On this page

What pgvector Actually IsInstallationCreating a Table with a Vector ColumnInserting VectorsQuerying: Finding Similar VectorsFiltering with WHERE ClausesBuilding an IndexHNSW (Recommended)IVFFlatTuning Query RecallFull Python RAG ExamplePerformance at ScaleWhen to Stop Using pgvectorMy RecommendationRelated Reading

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source
Appears in Google Discover
All posts

Follow on Google

Add as a preferred source in Search & Discover

Add as preferred source
Appears in Google Discover
Krunal Kanojiya

Krunal Kanojiya

Technical Content Writer

Technical Content Writer and former software developer from India. I write in-depth articles on blockchain, AI/ML, data engineering, web development, and developer careers. Currently at Lucent Innovation, previously at Cromtek Solution and freelance.

GitHubLinkedInX

Related Posts

Pinecone vs Qdrant: Which Vector Database Should You Use in 2026?

May 28, 2026 · 10 min read

What Is Vector Indexing and Why It Matters

Apr 29, 2026 · 19 min read

HNSW Algorithm Explained With Diagrams

Apr 27, 2026 · 25 min read