CrewAI with RAG: Add a Knowledge Base to Your Agent Teams

Problem: Your CrewAI Agents Don't Know Your Docs

CrewAI agents are good at reasoning. They're useless at remembering your internal documentation, product specs, or support history — because they've never seen it.

Without RAG, agents hallucinate answers or refuse to respond. With it, they retrieve the exact context they need before acting.

You'll learn:

How to embed your documents into pgvector
How to wrap a retriever as a CrewAI Tool
How to assign that tool to specific agents in a crew

Time: 25 min | Difficulty: Intermediate

Why Agents Need RAG, Not Just Prompts

Stuffing documents into a system prompt works for one or two pages. It breaks at scale:

Context windows fill up fast with large doc sets
Agents pay attention to the whole prompt, not just relevant chunks
Token costs multiply across every task the crew runs

RAG fixes this by retrieving only the 3–5 chunks an agent actually needs, at the moment it needs them.

Symptoms you need RAG:

Agent answers contradict your actual product docs
You're hitting context limits mid-crew
Different agents need different knowledge bases

Architecture Overview

User task
    │
    ▼
Crew (CrewAI)
    ├── Research Agent  ──uses──▶  KnowledgeBaseTool
    │                                   │
    │                             pgvector / Qdrant
    │                             (your embedded docs)
    │
    └── Writer Agent   ──uses──▶  (no retrieval needed)

The knowledge base is a standalone vector store. Only agents that need retrieval get the tool — agents that synthesize or write don't need it and run faster without it.

Solution

Step 1: Install Dependencies

# Requires Python 3.11+
pip install crewai crewai-tools langchain-community langchain-openai pgvector psycopg2-binary pypdf

Verify the install:

python -c "import crewai; print(crewai.__version__)"
# Expected: 0.80.x or later

Step 2: Embed Your Documents into pgvector

Start a local Postgres instance with the pgvector extension:

docker run -d \
  --name pgvector \
  -e POSTGRES_PASSWORD=secret \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Now ingest your documents. This script chunks PDFs and markdown files and stores embeddings in Postgres:

# ingest.py
import os
from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.pgvector import PGVector

CONNECTION_STRING = "postgresql+psycopg2://postgres:secret@localhost:5432/knowledge"
COLLECTION_NAME = "company_docs"

def ingest(docs_path: str):
    # Load all PDFs and markdown from a directory
    loader = DirectoryLoader(docs_path, glob="**/*.{pdf,md}", show_progress=True)
    raw_docs = loader.load()

    # 512-token chunks with 64-token overlap — works well for factual Q&A
    splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)
    chunks = splitter.split_documents(raw_docs)

    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

    PGVector.from_documents(
        documents=chunks,
        embedding=embeddings,
        collection_name=COLLECTION_NAME,
        connection_string=CONNECTION_STRING,
    )
    print(f"Ingested {len(chunks)} chunks from {len(raw_docs)} documents")

if __name__ == "__main__":
    ingest("./docs")

Run it once before starting the crew:

OPENAI_API_KEY=sk-... python ingest.py
# Expected: Ingested 847 chunks from 23 documents

Step 3: Build the Retriever Tool

CrewAI tools are plain Python classes that inherit from BaseTool. Wrap your retriever here:

# tools/knowledge_tool.py
from crewai.tools import BaseTool
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.pgvector import PGVector
from pydantic import Field

CONNECTION_STRING = "postgresql+psycopg2://postgres:secret@localhost:5432/knowledge"
COLLECTION_NAME = "company_docs"

class KnowledgeBaseTool(BaseTool):
    name: str = "knowledge_base_search"
    description: str = (
        "Search the company knowledge base for product documentation, "
        "support history, and internal policies. "
        "Input: a specific question or search phrase. "
        "Returns: up to 5 relevant text chunks."
    )
    # k controls how many chunks come back — 5 is a good default
    k: int = Field(default=5)

    def _run(self, query: str) -> str:
        embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
        store = PGVector(
            collection_name=COLLECTION_NAME,
            connection_string=CONNECTION_STRING,
            embedding_function=embeddings,
        )
        results = store.similarity_search(query, k=self.k)
        if not results:
            return "No relevant documents found."
        # Return chunks with source metadata so agents can cite sources
        return "\n\n---\n\n".join(
            f"Source: {doc.metadata.get('source', 'unknown')}\n{doc.page_content}"
            for doc in results
        )

The description field is critical. CrewAI's LLM reads it to decide when to call the tool. Be specific about what it knows and what format input should take.

Step 4: Define Agents and Assign the Tool

# crew.py
from crewai import Agent, Task, Crew, Process
from tools.knowledge_tool import KnowledgeBaseTool
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0.1)
knowledge_tool = KnowledgeBaseTool()

# Research agent gets the retrieval tool
research_agent = Agent(
    role="Product Research Specialist",
    goal="Find accurate information from company documentation to answer user questions.",
    backstory=(
        "You are meticulous and never guess. "
        "If the knowledge base doesn't contain an answer, you say so."
    ),
    tools=[knowledge_tool],
    llm=llm,
    verbose=True,
)

# Writer agent synthesizes — no retrieval needed, faster and cheaper
writer_agent = Agent(
    role="Technical Writer",
    goal="Turn research findings into a clear, concise response for the end user.",
    backstory="You write in plain English. You never add information beyond what research provides.",
    tools=[],  # intentionally empty — synthesis only
    llm=llm,
    verbose=True,
)

Keeping the writer agent tool-free is intentional. It can't hallucinate from retrieval if it never calls the retriever.

Step 5: Define Tasks and Run the Crew

# crew.py (continued)

def answer_question(user_question: str) -> str:
    research_task = Task(
        description=f"Research the following question using the knowledge base: {user_question}",
        expected_output="A detailed answer with source citations from the knowledge base.",
        agent=research_agent,
    )

    write_task = Task(
        description="Rewrite the research output as a clear, user-facing response. No new information.",
        expected_output="A 2–4 paragraph response ready to send to the user.",
        agent=writer_agent,
        context=[research_task],  # passes research output as context
    )

    crew = Crew(
        agents=[research_agent, writer_agent],
        tasks=[research_task, write_task],
        process=Process.sequential,
        verbose=True,
    )

    result = crew.kickoff()
    return result.raw

if __name__ == "__main__":
    answer = answer_question("What is our refund policy for annual subscriptions?")
    print(answer)

Verification

Run the crew end to end:

OPENAI_API_KEY=sk-... python crew.py

You should see:

[Research Specialist] Using tool: knowledge_base_search
[Research Specialist] Input: refund policy annual subscriptions
[Research Specialist] Retrieved 5 chunks from knowledge base...
[Technical Writer] Synthesizing research output...

Final Answer:
Annual subscription refunds are available within 30 days of purchase...

If the research agent skips the tool and answers from memory, tighten the description on your tool and add this to the agent's backstory: "You always search the knowledge base before answering. Never rely on prior knowledge."

Confirm retrieval is hitting Postgres:

docker exec -it pgvector psql -U postgres -d knowledge \
  -c "SELECT COUNT(*) FROM langchain_pg_embedding;"
# Should match your ingested chunk count

Handling Multiple Knowledge Bases

Different agents can use different retrieval tools. Give each tool a distinct name and description:

hr_tool = KnowledgeBaseTool(collection_name="hr_policies", k=3)
hr_tool.name = "hr_policy_search"
hr_tool.description = "Search HR policies, leave entitlements, and onboarding documents."

engineering_tool = KnowledgeBaseTool(collection_name="engineering_docs", k=5)
engineering_tool.name = "engineering_docs_search"
engineering_tool.description = "Search API references, architecture diagrams, and runbooks."

hr_agent = Agent(..., tools=[hr_tool])
eng_agent = Agent(..., tools=[engineering_tool])

CrewAI routes tool calls based on description matching at inference time — the more specific your descriptions, the fewer wrong tool calls you'll see.

Production Considerations

Embedding model: text-embedding-3-small costs ~$0.02 per million tokens. For private deployments, swap in an Ollama embedding model (nomic-embed-text) by replacing OpenAIEmbeddings with OllamaEmbeddings.

Chunk size: 512 tokens works well for factual Q&A. For longer narrative documents (legal contracts, long-form guides), increase to 1024 with 128-token overlap.

Re-ingestion: Run ingest.py as part of your CI pipeline when docs change. pgvector from_documents with the same collection name will add duplicates — add a delete step first:

# Before re-ingesting, clear the collection
store = PGVector(connection_string=CONNECTION_STRING, collection_name=COLLECTION_NAME, ...)
store.delete_collection()

Observability: Wrap tool calls with LangSmith tracing to see exactly which chunks agents retrieved and whether they used them correctly.

What You Learned

RAG in CrewAI is a BaseTool wrapping a LangChain retriever — not a built-in feature
The tool description drives agent routing — write it like a search index entry
Separating retrieval agents from synthesis agents reduces hallucination and token cost
pgvector is the lowest-friction vector store if you're already on Postgres

Limitation: CrewAI doesn't natively deduplicate retrieved chunks across multiple tool calls in one task. If an agent calls the knowledge tool twice with similar queries, it may receive overlapping context. For high-accuracy use cases, add a reranking step (Cohere Rerank or a cross-encoder) before returning results from _run.

Tested on CrewAI 0.80.0, LangChain 0.3.x, pgvector 0.7.0, Python 3.12, Ubuntu 24.04