Menu
← All Categories

RAG (Retrieval-Augmented Generation)

Browse articles on RAG (Retrieval-Augmented Generation) — tutorials, guides, and in-depth comparisons.

RAG (Retrieval-Augmented Generation) gives LLMs access to your private data without fine-tuning. You store documents as vector embeddings, retrieve the relevant ones at query time, and inject them into the LLM prompt as context.

RAG Pipeline Architecture

User query
    ↓
Embedding model (converts query to vector)
    ↓
Vector store (finds similar document chunks)
    ↓
Reranker (optional: re-scores for precision)
    ↓
LLM prompt (query + retrieved context)
    ↓
Answer

Vector Database Options

DBSelf-hostCloudBest for
pgvector✅ (PostgreSQL)Supabase, NeonExisting Postgres stack
Qdrant✅ DockerProduction, filtering, quantization
ChromaLocal dev, prototyping
PineconeManaged, no infra
WeaviateGraphQL queries, hybrid search

Quick Start with pgvector

-- Enable pgvector in PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;

-- Create a table with vector column
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding VECTOR(1536),  -- OpenAI text-embedding-3-small dimensions
  metadata JSONB
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Query: find 5 most similar documents
SELECT content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;

Learning Path

  1. Basic RAG — chunking, embedding, store, retrieve, answer
  2. Chunking strategies — fixed size vs semantic vs recursive, overlap
  3. Embedding models — OpenAI vs Cohere vs local (nomic-embed, BGE)
  4. Hybrid search — combine dense vectors with BM25 keyword search
  5. Reranking — cross-encoder models for precision (Cohere Rerank, BGE)
  6. Agentic RAG — self-querying, multi-step retrieval, query decomposition
  7. Evaluation — RAGAS metrics: faithfulness, relevancy, context recall

Showing 1–30 of 46 articles · Page 1 of 2