RAG (Retrieval-Augmented Generation)
Browse articles on RAG (Retrieval-Augmented Generation) — tutorials, guides, and in-depth comparisons.
RAG (Retrieval-Augmented Generation) gives LLMs access to your private data without fine-tuning. You store documents as vector embeddings, retrieve the relevant ones at query time, and inject them into the LLM prompt as context.
RAG Pipeline Architecture
User query
↓
Embedding model (converts query to vector)
↓
Vector store (finds similar document chunks)
↓
Reranker (optional: re-scores for precision)
↓
LLM prompt (query + retrieved context)
↓
Answer
Vector Database Options
| DB | Self-host | Cloud | Best for |
|---|---|---|---|
| pgvector | ✅ (PostgreSQL) | Supabase, Neon | Existing Postgres stack |
| Qdrant | ✅ Docker | ✅ | Production, filtering, quantization |
| Chroma | ✅ | ❌ | Local dev, prototyping |
| Pinecone | ❌ | ✅ | Managed, no infra |
| Weaviate | ✅ | ✅ | GraphQL queries, hybrid search |
Quick Start with pgvector
-- Enable pgvector in PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(1536), -- OpenAI text-embedding-3-small dimensions
metadata JSONB
);
-- Create HNSW index for fast similarity search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- Query: find 5 most similar documents
SELECT content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 5;
Learning Path
- Basic RAG — chunking, embedding, store, retrieve, answer
- Chunking strategies — fixed size vs semantic vs recursive, overlap
- Embedding models — OpenAI vs Cohere vs local (nomic-embed, BGE)
- Hybrid search — combine dense vectors with BM25 keyword search
- Reranking — cross-encoder models for precision (Cohere Rerank, BGE)
- Agentic RAG — self-querying, multi-step retrieval, query decomposition
- Evaluation — RAGAS metrics: faithfulness, relevancy, context recall
Showing 31–46 of 46 articles · Page 2 of 2
- Build a Multi-Source RAG with LlamaIndex in 20 Minutes
- Build a Production RAG Pipeline in 30 Minutes with LangChain 0.5
- Build a RAG System with Python and Pinecone in 45 Minutes
- Stop Building Keyword Search: Build AI-Powered Semantic Search with Pinecone in 30 Minutes
- How to Choose the Right Vector Database: Pinecone vs. Weaviate vs. Milvus in 2025
- ChromaDB vs Pinecone with Ollama: Complete Vector Database Comparison Guide
- BGE-M3 Embedding Setup: Multi-Modal RAG System Implementation Guide
- How to Implement RAG with Transformers: Complete Retrieval-Augmented Generation Guide
- Testing Strategies for LangChain and LlamaIndex Applications: Complete Guide for 2025
- LlamaIndex Graph Integration: Neo4j and Knowledge Graphs Guide
- How to Create Custom Retrievers in LlamaIndex: Complete Developer Guide
- How to Profile LlamaIndex Applications for Memory Bottlenecks
- BLEU vs ROUGE vs BERTScore: Choose the Right LLM Metric for Your Project
- Vector Database Comparison: Pinecone vs Weaviate vs Chroma - Complete Guide 2025
- Embedding Models Comparison: OpenAI vs Sentence-Transformers Performance Analysis
- Building RAG Pipelines with LangChain 1.0: A Practical Guide