PostgreSQL pgvector vs Dedicated Vector DBs: Which Should You Use?

Compare pgvector with Pinecone, Weaviate, and Qdrant to choose the right vector storage for your AI app in 2026.

Problem: You Need Vector Search and Don't Know Where to Put It

You're building a RAG pipeline, a semantic search feature, or an embedding-powered recommender. Now you need somewhere to store and query vectors — and you have to decide: stick with PostgreSQL and pgvector, or spin up a dedicated vector database like Pinecone, Weaviate, or Qdrant?

You'll learn:

  • What pgvector can and can't do at scale
  • When a dedicated vector DB actually pays off
  • How to benchmark your own workload before committing

Time: 20 min | Level: Intermediate


Why This Happens

Vector search is fundamentally different from relational queries. Finding the 10 most similar vectors to a query embedding requires computing approximate nearest neighbors (ANN) across potentially millions of high-dimensional points — not a lookup by index key.

PostgreSQL wasn't designed for this. pgvector adds the capability, but the tradeoffs matter:

Common symptoms that push teams toward dedicated DBs:

  • Queries slow down past ~500k vectors even with HNSW indexes
  • Memory pressure from loading vector indexes alongside relational data
  • No native support for filtered ANN search that stays fast at scale
  • Operational complexity when tuning work_mem and index build parameters

What pgvector Actually Gives You

pgvector (v0.7+) supports two index types: IVFFlat and HNSW. HNSW is almost always the right choice for production — better recall, faster queries at the cost of more memory and slower build times.

Step 1: Set Up pgvector with HNSW

-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create table with embedding column
CREATE TABLE documents (
  id        BIGSERIAL PRIMARY KEY,
  content   TEXT,
  metadata  JSONB,
  embedding vector(1536)  -- Dimension must match your model (1536 = OpenAI ada-002)
);

-- Build HNSW index
-- m: connections per layer (higher = better recall, more memory)
-- ef_construction: build-time search depth (higher = better recall, slower build)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

Expected: Index builds in seconds for <100k rows. Budget ~10-15 minutes for 1M rows.

Step 2: Query with Filtering

-- Set runtime search depth (higher = better recall, slower)
SET hnsw.ef_search = 100;

-- Semantic search with metadata filter
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE metadata->>'source' = 'internal-wiki'  -- Pre-filter
ORDER BY embedding <=> $1                      -- Then rank by similarity
LIMIT 10;

The catch: Pre-filtering with WHERE clauses before ANN search degrades recall significantly when the filter is selective. pgvector applies the filter first, then ranks — so a filter that returns 5% of rows means your ANN search runs on 5% of the data. At small scale this is fine. At millions of rows with narrow filters, recall tanks.

Step 3: Tune for Your Workload

-- More memory for index scans (set per session or in postgresql.conf)
SET work_mem = '256MB';

-- Check index usage
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM documents ORDER BY embedding <=> $1 LIMIT 10;

You should see: Index Scan using documents_embedding_idx — not Seq Scan. If you see a sequential scan, your index isn't being used, usually because work_mem is too low to hold it.

EXPLAIN output showing HNSW index scan vs sequential scan HNSW index scan on the left, sequential scan on the right — same query, different work_mem settings


When pgvector Is Enough

pgvector wins in these scenarios:

You already run PostgreSQL. No new infrastructure, no new operational burden, transactions work across your relational and vector data, and your existing backup/monitoring stack covers it.

Your corpus is under 1-2M vectors. HNSW recall stays above 95% and p99 latency stays under 50ms for most workloads at this scale. Past that, you need careful tuning or a different tool.

You need hybrid queries. Joining vector similarity with relational data (user permissions, date ranges, structured metadata) is natural SQL. Replicating this in a dedicated vector DB requires syncing two systems.

Budget matters. Pinecone's serverless tier is free for small workloads, but dedicated pod-based plans start at ~$70/month and climb quickly. Running pgvector on an existing RDS or Supabase instance costs nothing extra.


When You Need a Dedicated Vector DB

Pinecone

Best for teams that don't want to manage infrastructure. Serverless Pinecone handles scaling automatically and supports filtered search without recall degradation — it indexes metadata alongside vectors so filters don't reduce the ANN candidate pool.

import pinecone

pc = pinecone.Pinecone(api_key="YOUR_KEY")
index = pc.Index("my-index")

# Upsert vectors
index.upsert(vectors=[
    {"id": "doc-1", "values": embedding, "metadata": {"source": "wiki", "year": 2025}}
])

# Query with filter — no recall penalty here
results = index.query(
    vector=query_embedding,
    top_k=10,
    filter={"source": {"$eq": "wiki"}}  # Applied natively during ANN search
)

Trade-off: Vendor lock-in, no joins with external relational data, egress costs at scale.

Qdrant

Open-source, self-hosted or cloud, and arguably the best performance-per-dollar for high-throughput workloads. Supports payload-based filtering during ANN search (not after), which is the key architectural difference from pgvector.

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue

client = QdrantClient("localhost", port=6333)

results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter=Filter(
        must=[FieldCondition(key="source", match=MatchValue(value="wiki"))]
    ),
    limit=10
)

Qdrant also supports sparse vectors and sparse-dense hybrid search out of the box — useful if you want to combine BM25-style keyword relevance with semantic similarity (often outperforms either alone).

Weaviate

Strongest for teams that want a GraphQL-first API and built-in text vectorization (it can call your embedding model automatically on insert). Good for prototyping, slightly more complex operationally than Qdrant.


The Real Benchmark: Your Data

Don't trust generic benchmarks. Run this before you commit:

import time
import psycopg2
import numpy as np

conn = psycopg2.connect("postgresql://localhost/mydb")
cur = conn.cursor()

# Generate 500k random 1536-dim vectors (approximates real workload)
query_vec = np.random.rand(1536).tolist()

# Warm up
cur.execute("SELECT id FROM documents ORDER BY embedding <=> %s::vector LIMIT 10", (query_vec,))

# Measure p50 and p99 over 100 queries
latencies = []
for _ in range(100):
    start = time.perf_counter()
    cur.execute("SELECT id FROM documents ORDER BY embedding <=> %s::vector LIMIT 10", (query_vec,))
    cur.fetchall()
    latencies.append((time.perf_counter() - start) * 1000)

latencies.sort()
print(f"p50: {latencies[50]:.1f}ms")
print(f"p99: {latencies[99]:.1f}ms")

Decision thresholds (rough guidelines):

  • p99 < 50ms at your expected corpus size → pgvector is fine
  • p99 50-200ms → tune hnsw.ef_search, m, and work_mem first
  • p99 > 200ms or recall < 90% → evaluate dedicated options

Verification

# Check recall against brute-force exact search
# If HNSW recall < 0.90, increase m and ef_construction

psql -c "
  WITH exact AS (
    SELECT id FROM documents ORDER BY embedding <=> '[...]'::vector LIMIT 10
  ),
  approx AS (
    SELECT id FROM documents ORDER BY embedding <=> '[...]'::vector LIMIT 10
  )
  SELECT COUNT(*) * 1.0 / 10 AS recall
  FROM exact WHERE id IN (SELECT id FROM approx);
"

You should see: recall >= 0.90 for production use. Below that, bump m from 16 to 32.


What You Learned

  • pgvector with HNSW is production-ready for corpora under ~2M vectors with loose filters
  • Pre-filtering in pgvector degrades recall — dedicated DBs handle this natively
  • Qdrant is the strongest self-hosted alternative; Pinecone for managed simplicity
  • Benchmark with your actual data and filter selectivity before making the call

Limitation: These numbers shift with embedding dimension, hardware, and concurrent query load. A 768-dim model (e.g., nomic-embed-text) puts less pressure on pgvector than 3072-dim (OpenAI text-embedding-3-large).

When NOT to use a dedicated vector DB: If you have fewer than 500k vectors and your filters match >20% of the corpus, pgvector will almost certainly be fast enough — and one less system to operate is worth a lot.


Tested on pgvector 0.7.4, PostgreSQL 16.2, Qdrant 1.9, Pinecone serverless (us-east-1), Ubuntu 24.04 / 32GB RAM