pgvector: The Complete Guide to Vector Search in PostgreSQL (2026)
pgvector adds vector similarity search directly to PostgreSQL. This guide covers installation, creating vector columns, HNSW and IVFFlat indexes, cosine and L2 distance queries, filtering, Python integration, performance tuning, and when to migrate to a dedicated vector database.
Most teams that build their first RAG application do not need a separate vector database.
They are already running PostgreSQL. They already have backups, monitoring, and on-call procedures set up for it. Adding a new database means a new thing to operate, a new failure mode to watch, and a new bill to pay.
pgvector solves that. It adds vector similarity search directly to PostgreSQL as an extension. Your vectors live in the same database as your users, products, and orders. You can join them together in a single query. You do not need to manage anything new.
This guide covers everything you need to get pgvector working in production: installation, indexing, querying, Python integration, performance tuning, and knowing when you have outgrown it.
What pgvector Actually Is
pgvector is a PostgreSQL extension that adds three things:
- A
vectordata type for storing high-dimensional floating-point arrays - Three distance operators for measuring similarity between vectors
- Two index types (HNSW and IVFFlat) for fast approximate nearest neighbor search
Without an index, pgvector does exact search. Every query scans the entire table and calculates the distance to every stored vector. That is accurate but slow at any meaningful scale. With an HNSW index, queries use approximate nearest neighbor search and return results in milliseconds.
pgvector is open source under the PostgreSQL License. It works on any PostgreSQL 12 or later instance. Supabase, AWS RDS, Google Cloud SQL, and most managed Postgres providers support it out of the box.
Installation
If you are running Postgres locally, install pgvector using your package manager.
On Ubuntu or Debian:
sudo apt install postgresql-16-pgvectorOn macOS with Homebrew:
brew install pgvectorFrom source (any system):
git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
make installOnce installed, enable the extension in your database:
CREATE EXTENSION IF NOT EXISTS vector;That is all. pgvector is now active in that database.
Creating a Table with a Vector Column
The vector(n) type takes one argument: the number of dimensions. This must match the output dimensions of your embedding model.
Common embedding dimensions:
- OpenAI
text-embedding-3-small: 1536 - OpenAI
text-embedding-3-large: 3072 - Google
text-embedding-004: 768 - Sentence Transformers
all-MiniLM-L6-v2: 384
Here is a table for storing document chunks with their embeddings:
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
source TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
embedding vector(1536)
);The embedding column stores a 1536-dimensional vector. Everything else is a normal Postgres column.
Inserting Vectors
You can insert vectors from SQL directly or from Python using the pgvector library.
From SQL:
INSERT INTO documents (content, source, embedding)
VALUES (
'pgvector adds vector search to PostgreSQL.',
'blog-post',
'[0.021, -0.134, 0.082, ...]' -- 1536 values
);From Python with psycopg2:
import psycopg2
from pgvector.psycopg2 import register_vector
import openai
# Connect and register the vector type
conn = psycopg2.connect("postgresql://user:password@localhost/mydb")
register_vector(conn)
# Get an embedding from OpenAI
client = openai.OpenAI(api_key="your-key")
def embed(text: str) -> list[float]:
response = client.embeddings.create(
input=text,
model="text-embedding-3-small"
)
return response.data[0].embedding
# Insert a document
text = "pgvector adds vector search to PostgreSQL."
embedding = embed(text)
with conn.cursor() as cur:
cur.execute(
"INSERT INTO documents (content, source, embedding) VALUES (%s, %s, %s)",
(text, "blog-post", embedding)
)
conn.commit()From Python with SQLAlchemy:
from sqlalchemy import create_engine, Column, Integer, Text
from sqlalchemy.orm import declarative_base
from pgvector.sqlalchemy import Vector
Base = declarative_base()
class Document(Base):
__tablename__ = "documents"
id = Column(Integer, primary_key=True)
content = Column(Text)
embedding = Column(Vector(1536))
engine = create_engine("postgresql://user:password@localhost/mydb")
Base.metadata.create_all(engine)Querying: Finding Similar Vectors
pgvector adds three distance operators:
| Operator | Distance type | Best for |
|---|---|---|
<-> | L2 (Euclidean) | Normalized vectors, general use |
<=> | Cosine distance | Text embeddings from most models |
<#> | Negative inner product | Vectors trained with dot product objective |
For text embeddings from OpenAI, Sentence Transformers, or most general-purpose models, use cosine distance (<=>).
To understand the difference, read our article on cosine similarity vs Euclidean distance.
Basic similarity query:
SELECT id, content, embedding <=> '[0.021, -0.134, 0.082, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.021, -0.134, 0.082, ...]'
LIMIT 10;From Python:
def search(query: str, limit: int = 10) -> list[dict]:
query_embedding = embed(query)
with conn.cursor() as cur:
cur.execute(
"""
SELECT id, content, source,
embedding <=> %s AS distance
FROM documents
ORDER BY embedding <=> %s
LIMIT %s
""",
(query_embedding, query_embedding, limit)
)
rows = cur.fetchall()
return [
{"id": r[0], "content": r[1], "source": r[2], "distance": r[3]}
for r in rows
]
results = search("how does pgvector handle approximate search?")
for r in results:
print(f"{r['distance']:.4f} {r['content'][:80]}")Filtering with WHERE Clauses
One of the biggest advantages of pgvector over dedicated vector databases is that filtering is just SQL. You do not need to learn a new filter syntax.
-- Only search documents from a specific source
SELECT id, content, embedding <=> %s AS distance
FROM documents
WHERE source = 'blog-post'
ORDER BY embedding <=> %s
LIMIT 10;
-- Only search documents created in the last 30 days
SELECT id, content, embedding <=> %s AS distance
FROM documents
WHERE created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> %s
LIMIT 10;
-- Combine filters
SELECT id, content, embedding <=> %s AS distance
FROM documents
WHERE source = 'blog-post'
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> %s
LIMIT 10;One thing to know: pgvector uses post-filtering by default. It runs the ANN search first and then applies the WHERE clause. For selective filters, this can return fewer than your requested LIMIT. If you are filtering on a highly selective condition and need guaranteed k results, you may need to increase your limit and re-filter in application code.
Building an Index
Without an index, every query is an exact scan of all rows. Fast for small tables, very slow above a few thousand rows.
pgvector supports two index types: HNSW and IVFFlat.
HNSW (Recommended)
HNSW (Hierarchical Navigable Small World) is the better choice for most production use cases. You do not need to train it, it handles inserts well, and it gives strong recall at low latency.
-- Cosine distance index (most common for text embeddings)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- L2 distance index
CREATE INDEX ON documents USING hnsw (embedding vector_l2_ops);
-- Inner product index
CREATE INDEX ON documents USING hnsw (embedding vector_ip_ops);HNSW has two parameters you can tune:
m: the number of connections per node in the graph (default 16). Higher m means better recall but more memory and slower index build.ef_construction: how many candidates are considered during index build (default 64). Higher means better recall but slower build.
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);For most RAG applications, the defaults work well. If your recall is lower than expected, increase ef_construction to 128 and rebuild.
IVFFlat
IVFFlat (Inverted File with Flat compression) divides the vector space into lists clusters and searches only the nearest clusters at query time. It requires training on your data and needs to be rebuilt when your dataset grows significantly.
-- Train and build an IVFFlat index
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);The lists value is roughly the square root of your total row count. 100 lists for 10K rows. 316 lists for 100K rows. 1000 lists for 1M rows.
IVFFlat is faster to build than HNSW but needs periodic rebuilding as your data changes. Use HNSW unless you have a specific reason not to.
Tuning Query Recall
HNSW uses an ef_search parameter at query time. Higher values check more candidates and return better recall at the cost of slightly more latency.
-- Set ef_search for this session
SET hnsw.ef_search = 100;
-- Now run your query
SELECT id, content, embedding <=> %s AS distance
FROM documents
ORDER BY embedding <=> %s
LIMIT 10;The default ef_search is 40. If your top results feel off, try 80 or 100. For IVFFlat, the equivalent parameter is ivfflat.probes.
SET ivfflat.probes = 10;Full Python RAG Example
Here is a complete working example: ingest documents, index them, and retrieve relevant chunks for an LLM.
import psycopg2
from pgvector.psycopg2 import register_vector
import openai
conn = psycopg2.connect("postgresql://user:password@localhost/mydb")
register_vector(conn)
client = openai.OpenAI(api_key="your-key")
def embed(text: str) -> list[float]:
return client.embeddings.create(
input=text, model="text-embedding-3-small"
).data[0].embedding
def ingest(texts: list[str], source: str) -> None:
with conn.cursor() as cur:
for text in texts:
cur.execute(
"INSERT INTO documents (content, source, embedding) VALUES (%s, %s, %s)",
(text, source, embed(text))
)
conn.commit()
def retrieve(query: str, limit: int = 5) -> list[str]:
q_vec = embed(query)
with conn.cursor() as cur:
cur.execute(
"""
SELECT content
FROM documents
ORDER BY embedding <=> %s
LIMIT %s
""",
(q_vec, limit)
)
return [row[0] for row in cur.fetchall()]
def answer(question: str) -> str:
chunks = retrieve(question)
context = "\n\n".join(chunks)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer using this context:\n\n{context}"},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
# Use it
ingest(["pgvector adds vector search to PostgreSQL.", "..."], source="docs")
print(answer("How do I run a vector similarity search in Postgres?"))Performance at Scale
pgvector works well at these scale points:
Under 100K vectors: Any approach works. Even exact search is fast enough.
100K to 1M vectors: HNSW index with default settings. Expect sub-5ms query latency on a $20 to $50 per month cloud instance.
1M to 5M vectors: Still manageable with HNSW on a larger instance (8 to 16GB RAM). You may need to increase shared_buffers and work_mem in your Postgres config. The Postgres tuning guide from PGTune can generate a good starting config based on your hardware. Latency will stay reasonable but index build takes longer.
Above 5M vectors: This is where pgvector starts to show its limits. Memory pressure increases significantly because HNSW graphs are kept in memory. Query latency starts climbing. At this point, a dedicated vector database like Qdrant or Pinecone will give you better performance at lower cost.
When to Stop Using pgvector
pgvector is the right tool for a specific range of use cases. Here are the signals that tell you it is time to move to a dedicated vector database.
Your query latency is climbing. If p99 latency is above 50ms and you have already tuned ef_search and increased hardware, you have hit the limit.
You need hybrid search. pgvector has no native sparse vector support. Building hybrid search on top of pgvector requires running a separate keyword search (using Postgres full-text search) and merging results manually in application code. Dedicated databases like Qdrant handle this natively.
Filtering is returning incomplete results. Post-filtering on selective conditions produces fewer results than requested. If your application needs guaranteed k results under selective filters, a database with filtered HNSW traversal handles it correctly.
Your vector dataset is growing past 5M rows. The memory requirements for HNSW at this scale put pressure on the rest of your Postgres workload. You start trading off vector search performance against OLTP performance on the same instance.
If any of these apply to you, the guide to choosing a vector database walks through the full set of options and what to move to.
My Recommendation
Start with pgvector if you are already on PostgreSQL.
It removes the infrastructure overhead of a second database, keeps your data in one place, and handles all but the most demanding vector search workloads. The Postgres ecosystem tools you already use for migrations, backups, and monitoring work with pgvector data automatically.
Move to a dedicated vector database when your vector count exceeds 5M, your filtering patterns need guaranteed recall under selective conditions, or you need native hybrid search without building it yourself.
The transition is not difficult. pgvector and Qdrant use the same concept of embedding model outputs as stored vectors. The main work is moving data and updating query code. Waiting until you have a real reason to migrate is the right call.
Related Reading
- How to choose a vector database in 2026
- Pinecone vs Qdrant: which should you use?
- HNSW algorithm explained with diagrams
- IVF index explained for vector search
- What is vector indexing and why it matters
- Cosine similarity vs Euclidean distance
- Vector database vs traditional database
- How similarity search works in vector databases
Follow on Google
Add as a preferred source in Search & Discover
Add as preferred sourceKrunal Kanojiya
Technical Content Writer
Technical Content Writer and former software developer from India. I write in-depth articles on blockchain, AI/ML, data engineering, web development, and developer careers. Currently at Lucent Innovation, previously at Cromtek Solution and freelance.