Problem: LangChain RAG Pipelines Are Painful to Wire Up
Building a RAG pipeline from scratch means writing ingestion scripts, embedding wrappers, retriever logic, and prompt chains — then re-deploying every time something changes.
n8n's LangChain nodes let you build the same pipeline visually, trigger it on a schedule or webhook, and iterate without touching Python.
You'll learn:
- How to ingest documents, chunk them, and store embeddings in a vector DB using n8n
- How to wire a retrieval + generation chain with n8n's AI Agent node
- How to trigger the full RAG pipeline from a webhook for real-time Q&A
Time: 25 min | Difficulty: Intermediate
Why This Works in n8n
n8n 1.x ships with first-class LangChain support: dedicated nodes for document loaders, text splitters, embedding models, vector stores, and chain execution. These are not HTTP wrappers — they use the LangChain.js SDK under the hood.
What you can build without writing code:
- Document ingestion pipelines (PDF, URL, Google Drive)
- Embedding + vector store upsert flows
- Retrieval-augmented generation chains
- Multi-step AI agents with memory
This article uses n8n 1.82+, OpenAI embeddings, and Qdrant as the vector store. Swap any of those for the provider you prefer — the node structure stays the same.
Solution
Step 1: Self-Host n8n with the AI Feature Flag Enabled
n8n's LangChain nodes require the N8N_EXPERIMENTAL_FEATURES flag or a version with AI nodes GA. Run this Docker command:
docker run -d \
--name n8n \
-p 5678:5678 \
-e N8N_ENCRYPTION_KEY=your-secret-key \
-e OPENAI_API_KEY=sk-... \
-v ~/.n8n:/home/node/.n8n \
n8nio/n8n:1.82.0
Expected output:
n8n ready on port 5678
Open http://localhost:5678 and create your account.
If it fails:
Error: EACCES permission denied→ Runsudo chown -R 1000:1000 ~/.n8nfirst- AI nodes missing from palette → Confirm you're on 1.82+ with
docker inspect n8n | grep Image
Step 2: Create the Document Ingestion Workflow
In n8n, create a new workflow. This workflow ingests documents into your vector store. You'll build it once and trigger it whenever your source data changes.
Nodes to add in order:
1. Manual Trigger (or Schedule Trigger for recurring ingestion)
No config needed for testing.
2. HTTP Request node — fetch your source document
Method: GET
URL: https://your-docs-site.com/api/content
Response Format: JSON
Or use the n8n Document Loader node directly:
Operation: Load from URL
URL: https://example.com/whitepaper.pdf
3. Recursive Character Text Splitter node
Chunk Size: 1000
Chunk Overlap: 200
Overlap of 200 preserves context across chunk boundaries — critical for retrieval quality.
4. Embeddings OpenAI node
Model: text-embedding-3-small
# text-embedding-3-small costs ~$0.02/1M tokens vs $0.13 for large
# Use large only for >100k chunk corpora where recall matters
5. Qdrant Vector Store node
Operation: Insert Documents
Qdrant URL: http://localhost:6333
Collection Name: docs-v1
Connect the nodes: Manual Trigger → HTTP Request → Text Splitter → Embeddings OpenAI → Qdrant.
Run the workflow. You should see green checkmarks across all nodes.
Expected output in Qdrant:
curl http://localhost:6333/collections/docs-v1
# Returns: {"result":{"vectors_count": 847, ...}}
Step 3: Start Qdrant Locally
If you don't have Qdrant running yet:
docker run -d \
--name qdrant \
-p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant:v1.9.0
n8n connects to Qdrant over the Docker bridge network. If both containers are on the same machine, use host.docker.internal:6333 instead of localhost:6333 in the n8n Qdrant node.
Step 4: Build the RAG Q&A Workflow
Create a second workflow — this one handles incoming questions and returns answers.
Nodes to add:
1. Webhook node
HTTP Method: POST
Path: /ask
Authentication: Header Auth (add a secret header for production)
2. AI Agent node
This is the core node. Set:
Agent Type: Conversational Agent
System Prompt: |
You are a helpful assistant. Answer questions using only the
context provided by the retrieval tool. If the answer isn't
in the context, say "I don't know."
3. Qdrant Vector Store node (as a Tool for the Agent)
Operation: Retrieve Documents (as Tool)
Collection Name: docs-v1
Top K: 5
# Top 5 chunks balances context window usage vs recall
Connect it as a Tool input to the AI Agent node — drag from the Tools socket, not the main input.
4. OpenAI Chat Model node (connected to AI Agent)
Model: gpt-4o-mini
# gpt-4o-mini is 15x cheaper than gpt-4o for Q&A workloads
Temperature: 0
5. Respond to Webhook node
Response Body: {{ $json.output }}
Full node chain: Webhook → AI Agent (with Qdrant Tool + OpenAI Model) → Respond to Webhook.
Step 5: Add Memory for Multi-Turn Conversations
Single-turn Q&A is enough for search. For a chatbot, add session memory:
Add a Window Buffer Memory node to the AI Agent's Memory socket:
Session ID: {{ $('Webhook').item.json.sessionId }}
# Pass sessionId from the client with each request
Window Size: 10
# Keep last 10 message pairs; higher = more tokens per call
Your webhook payload now needs:
{
"question": "What does the whitepaper say about pricing?",
"sessionId": "user-abc-123"
}
n8n stores conversation history in its internal DB keyed by sessionId. No Redis needed.
Verification
Test the ingestion workflow first:
# Trigger ingestion manually in n8n UI, then check Qdrant
curl http://localhost:6333/collections/docs-v1/points/count
# Expected: {"result":{"count": 847}}
Then test the Q&A webhook:
curl -X POST http://localhost:5678/webhook/ask \
-H "Content-Type: application/json" \
-H "X-Secret: your-header-secret" \
-d '{"question": "What are the main findings?", "sessionId": "test-1"}'
You should see: A JSON response with the output field containing a grounded answer within 3–5 seconds.
Check that retrieval is working — not just generation:
# In n8n, enable "Log" on the Qdrant node
# After a test run, inspect the output to confirm chunks are being retrieved
# If output.documents is empty, the collection name or embedding model may not match
Production Considerations
Re-ingestion strategy: When source docs change, don't append — delete the collection and re-ingest. Qdrant makes this a one-line operation and prevents stale chunks from polluting retrieval.
curl -X DELETE http://localhost:6333/collections/docs-v1
Then re-run your ingestion workflow.
Chunking tuning: 1000 tokens / 200 overlap works for most prose. For code docs or structured data, drop chunk size to 500 and overlap to 50. Retrieval recall drops sharply when chunks are too large to fit a complete concept.
Model cost: text-embedding-3-small + gpt-4o-mini keeps a 10k-chunk corpus + 1000 daily queries under $5/month. Switch to text-embedding-3-large + gpt-4o only if you're seeing factual errors on complex queries.
Webhook security: Always add Header Auth or Basic Auth to n8n webhooks before exposing them publicly. The built-in auth options are under Webhook node → Authentication.
What You Learned
- n8n's LangChain nodes handle ingestion, embedding, and retrieval without boilerplate code
- The AI Agent node with a Qdrant Tool input is the cleanest way to build retrieval-augmented generation in n8n
- Window Buffer Memory adds multi-turn support with zero extra infrastructure
- Separate ingestion and Q&A into two workflows — ingestion runs on schedule, Q&A runs on demand
Limitation: n8n's LangChain nodes use LangChain.js, not LangChain Python. If your pipeline needs Python-specific integrations (Unstructured, custom retrievers), use an Execute Command node to shell out, or call a FastAPI microservice via HTTP Request.
Tested on n8n 1.82.0, Qdrant 1.9.0, OpenAI text-embedding-3-small, gpt-4o-mini · Ubuntu 24.04 and macOS Sequoia