RAG (Retrieval-Augmented Generation)

Browse articles on RAG (Retrieval-Augmented Generation) — tutorials, guides, and in-depth comparisons.

46 articles 9 comparisons → Browse all topics

RAG (Retrieval-Augmented Generation) gives LLMs access to your private data without fine-tuning. You store documents as vector embeddings, retrieve the relevant ones at query time, and inject them into the LLM prompt as context.

RAG Pipeline Architecture

User query
    ↓
Embedding model (converts query to vector)
    ↓
Vector store (finds similar document chunks)
    ↓
Reranker (optional: re-scores for precision)
    ↓
LLM prompt (query + retrieved context)
    ↓
Answer

Vector Database Options

DB	Self-host	Cloud	Best for
pgvector	✅ (PostgreSQL)	Supabase, Neon	Existing Postgres stack
Qdrant	✅ Docker	✅	Production, filtering, quantization
Chroma	✅	❌	Local dev, prototyping
Pinecone	❌	✅	Managed, no infra
Weaviate	✅	✅	GraphQL queries, hybrid search

Quick Start with pgvector

-- Enable pgvector in PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;

-- Create a table with vector column
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding VECTOR(1536),  -- OpenAI text-embedding-3-small dimensions
  metadata JSONB
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Query: find 5 most similar documents
SELECT content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;

Learning Path

Basic RAG — chunking, embedding, store, retrieve, answer
Chunking strategies — fixed size vs semantic vs recursive, overlap
Embedding models — OpenAI vs Cohere vs local (nomic-embed, BGE)
Hybrid search — combine dense vectors with BM25 keyword search
Reranking — cross-encoder models for precision (Cohere Rerank, BGE)
Agentic RAG — self-querying, multi-step retrieval, query decomposition
Evaluation — RAGAS metrics: faithfulness, relevancy, context recall

Showing 1–30 of 46 articles · Page 1 of 2