Problem: Keyword Search Misses What Users Actually Mean

Your users search "fast car" and get zero results because your database stores "high-performance vehicle." Keyword search fails on synonyms, paraphrases, and intent. Semantic search fixes this — but wiring up the Gemini 2.0 Embeddings API, a vector store, and a query pipeline takes real effort to get right.

You'll learn:

How to generate text embeddings with Gemini 2.0's text-embedding-004 model
How to store and index embeddings in pgvector (PostgreSQL)
How to query by semantic similarity with cosine distance

Time: 25 min | Difficulty: Intermediate

Why Gemini 2.0 Embeddings Over Alternatives

Gemini's text-embedding-004 outputs 768-dimensional vectors and supports a task_type parameter that lets you optimize embeddings for retrieval, classification, or clustering — without retraining. OpenAI's text-embedding-3-small lacks this. That task hint meaningfully improves recall in retrieval tasks.

Model comparison:

Model	Dimensions	Task type support	Price (per 1M tokens)
`text-embedding-004`	768	✅ Yes	$0.000025
`text-embedding-3-small`	1536	❌ No	$0.020
`nomic-embed-text` (local)	768	❌ No	Free

For production retrieval at scale, text-embedding-004 is the clear cost leader.

Architecture Overview

Your text
    │
    ▼
Gemini Embeddings API  ──▶  768-dim float vector
    │
    ▼
pgvector (PostgreSQL)  ──▶  HNSW index on vector column
    │
    ▼
cosine similarity query  ──▶  top-K semantically similar results

Solution

Step 1: Install Dependencies

# Using uv (recommended) or pip
uv add google-generativeai psycopg2-binary pgvector python-dotenv

# Verify Gemini SDK
python -c "import google.generativeai as genai; print(genai.__version__)"

Expected: 0.8.x or higher

Step 2: Set Up pgvector in PostgreSQL

# With Docker — fastest way to get pgvector locally
docker run -d \
  --name pgvector-db \
  -e POSTGRES_PASSWORD=secret \
  -e POSTGRES_DB=semantic_search \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Connect and enable the extension:

-- Run once per database
CREATE EXTENSION IF NOT EXISTS vector;

-- Gemini text-embedding-004 outputs 768 dimensions
CREATE TABLE documents (
    id          BIGSERIAL PRIMARY KEY,
    content     TEXT NOT NULL,
    metadata    JSONB,
    embedding   vector(768)
);

-- HNSW index: faster queries than IVFFlat, no training step needed
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Why HNSW over IVFFlat: IVFFlat requires a training phase (you must INSERT data before building the index). HNSW builds incrementally — better for a pipeline where documents arrive continuously.

Step 3: Generate Embeddings with Gemini 2.0

import os
import google.generativeai as genai
from dotenv import load_dotenv

load_dotenv()
genai.configure(api_key=os.environ["GEMINI_API_KEY"])


def embed_for_storage(texts: list[str]) -> list[list[float]]:
    """Embed documents for indexing. task_type=RETRIEVAL_DOCUMENT
    tells Gemini to optimize for being retrieved, not for querying."""
    result = genai.embed_content(
        model="models/text-embedding-004",
        content=texts,
        task_type="RETRIEVAL_DOCUMENT",
    )
    return result["embedding"]


def embed_for_query(query: str) -> list[float]:
    """Embed a search query. task_type=RETRIEVAL_QUERY produces
    a vector compatible with RETRIEVAL_DOCUMENT embeddings."""
    result = genai.embed_content(
        model="models/text-embedding-004",
        content=query,
        task_type="RETRIEVAL_QUERY",
    )
    return result["embedding"]

The task_type distinction matters. Documents use RETRIEVAL_DOCUMENT; queries use RETRIEVAL_QUERY. Using the wrong task type for either side measurably reduces retrieval accuracy (Google reports ~5–10% recall drop in their benchmarks).

Available task types:

task_type	Use for
`RETRIEVAL_DOCUMENT`	Chunks you store in the vector DB
`RETRIEVAL_QUERY`	User's search input
`SEMANTIC_SIMILARITY`	Comparing two pieces of text directly
`CLASSIFICATION`	Training classifiers on embeddings

Step 4: Index Documents

import psycopg2
from psycopg2.extras import execute_values

DB_URL = "postgresql://postgres:secret@localhost:5432/semantic_search"


def index_documents(docs: list[dict]) -> None:
    """
    docs = [{"content": "...", "metadata": {...}}, ...]
    Batch embed then bulk insert — avoids N round-trips to the DB.
    """
    texts = [d["content"] for d in docs]

    # Gemini API allows up to 100 texts per embed_content call
    embeddings = embed_for_storage(texts)

    rows = [
        (d["content"], d.get("metadata", {}), emb)
        for d, emb in zip(docs, embeddings)
    ]

    with psycopg2.connect(DB_URL) as conn:
        with conn.cursor() as cur:
            execute_values(
                cur,
                "INSERT INTO documents (content, metadata, embedding) VALUES %s",
                rows,
                template="(%s, %s::jsonb, %s::vector)",
            )
        conn.commit()


# Example usage
sample_docs = [
    {"content": "Electric vehicles reduce urban air pollution significantly.", "metadata": {"source": "env-report"}},
    {"content": "High-performance sports cars accelerate from 0 to 60 mph in under 3 seconds.", "metadata": {"source": "auto-blog"}},
    {"content": "Public transit cuts per-capita carbon emissions compared to private cars.", "metadata": {"source": "transit-study"}},
    {"content": "The latest GPU architecture doubles inference throughput for large language models.", "metadata": {"source": "tech-news"}},
]

index_documents(sample_docs)
print("Indexed", len(sample_docs), "documents")

Step 5: Query by Semantic Similarity

def semantic_search(query: str, top_k: int = 3) -> list[dict]:
    """Return top_k documents most semantically similar to query."""
    query_embedding = embed_for_query(query)

    with psycopg2.connect(DB_URL) as conn:
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT content, metadata,
                       1 - (embedding <=> %s::vector) AS similarity
                FROM documents
                ORDER BY embedding <=> %s::vector   -- cosine distance, ascending
                LIMIT %s
                """,
                (query_embedding, query_embedding, top_k),
            )
            rows = cur.fetchall()

    return [
        {"content": row[0], "metadata": row[1], "similarity": round(row[2], 4)}
        for row in rows
    ]


# Test with a query that wouldn't match keywords
results = semantic_search("fast car", top_k=2)
for r in results:
    print(f"[{r['similarity']}] {r['content'][:80]}")

Expected output:

[0.8821] High-performance sports cars accelerate from 0 to 60 mph in under 3 seconds.
[0.6103] Electric vehicles reduce urban air pollution significantly.

"Fast car" correctly retrieves "high-performance sports cars" even with zero keyword overlap.

Step 6: Handle Batching for Large Document Sets

The Gemini API accepts up to 100 texts per call. For larger corpora, batch in chunks:

import time


def index_large_corpus(docs: list[dict], batch_size: int = 100) -> None:
    """Process large document sets without hitting API limits."""
    for i in range(0, len(docs), batch_size):
        batch = docs[i : i + batch_size]
        index_documents(batch)

        # Gemini free tier: 1500 requests/min; paid: much higher
        # Add a small delay only if you're on the free tier
        if i + batch_size < len(docs):
            time.sleep(0.1)

        print(f"Indexed {min(i + batch_size, len(docs))}/{len(docs)} documents")

Verification

Run the full pipeline end-to-end:

python -c "
from your_module import semantic_search
results = semantic_search('reducing carbon footprint transportation', top_k=3)
for r in results:
    print(r['similarity'], r['content'][:60])
"

You should see: The transit and EV documents ranked above the GPU article — they're semantically closer to "carbon footprint transportation" even though neither document contains that exact phrase.

Check that the HNSW index is being used:

EXPLAIN ANALYZE
SELECT content
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 3;

Look for Index Scan using documents_embedding_idx in the output. If you see Seq Scan, the index isn't active — this usually means fewer rows than pgvector's minimum threshold (by default 0, but check your hnsw.ef_search setting).

Production Considerations

Chunking strategy: Don't embed entire documents. Split into ~300–500 token chunks with 50-token overlap. Embedding a 10,000-word article produces a single vector that averages out all meaning — retrieval becomes imprecise.

Metadata filtering: Add a WHERE metadata->>'source' = 'tech-news' clause before the ORDER BY to filter before ranking. This is cheaper than post-filtering after a top-K retrieval.

Embedding freshness: Embeddings are tied to the model version. If you switch from text-embedding-004 to a future model, re-embed your entire corpus. Store the model name in your documents table to track this.

-- Add model version tracking to your schema
ALTER TABLE documents ADD COLUMN embedding_model TEXT DEFAULT 'text-embedding-004';

Rate limits: The Gemini API free tier allows 1,500 embed requests per minute. Each request can contain up to 100 texts. At 100 docs/request, that's 150,000 documents per minute — sufficient for most batch indexing jobs.

What You Learned

task_type is unique to Gemini embeddings and meaningfully improves retrieval recall
HNSW indexes in pgvector don't require pre-training, making them better for streaming ingestion than IVFFlat
The <=> operator in pgvector computes cosine distance — subtract from 1 to get similarity
Batch embedding up to 100 texts per API call to stay within rate limits efficiently

When not to use this approach: For exact lookup (IDs, SKUs, codes), stick with B-tree indexes and = queries. Semantic search introduces approximate matching — a similarity score of 0.95 is not a guarantee of correctness.

Tested on google-generativeai 0.8.3, pgvector 0.7.0, PostgreSQL 16, Python 3.12, Ubuntu 24.04