n8n LangChain Integration: Build RAG Pipelines Visually

Problem: LangChain RAG Pipelines Are Painful to Wire Up

Building a RAG pipeline from scratch means writing ingestion scripts, embedding wrappers, retriever logic, and prompt chains — then re-deploying every time something changes.

n8n's LangChain nodes let you build the same pipeline visually, trigger it on a schedule or webhook, and iterate without touching Python.

You'll learn:

How to ingest documents, chunk them, and store embeddings in a vector DB using n8n
How to wire a retrieval + generation chain with n8n's AI Agent node
How to trigger the full RAG pipeline from a webhook for real-time Q&A

Time: 25 min | Difficulty: Intermediate

Why This Works in n8n

n8n 1.x ships with first-class LangChain support: dedicated nodes for document loaders, text splitters, embedding models, vector stores, and chain execution. These are not HTTP wrappers — they use the LangChain.js SDK under the hood.

What you can build without writing code:

Document ingestion pipelines (PDF, URL, Google Drive)
Embedding + vector store upsert flows
Retrieval-augmented generation chains
Multi-step AI agents with memory

This article uses n8n 1.82+, OpenAI embeddings, and Qdrant as the vector store. Swap any of those for the provider you prefer — the node structure stays the same.

Solution

Step 1: Self-Host n8n with the AI Feature Flag Enabled

n8n's LangChain nodes require the N8N_EXPERIMENTAL_FEATURES flag or a version with AI nodes GA. Run this Docker command:

docker run -d \
  --name n8n \
  -p 5678:5678 \
  -e N8N_ENCRYPTION_KEY=your-secret-key \
  -e OPENAI_API_KEY=sk-... \
  -v ~/.n8n:/home/node/.n8n \
  n8nio/n8n:1.82.0

Expected output:

n8n ready on port 5678

Open http://localhost:5678 and create your account.

If it fails:

Error: EACCES permission denied → Run sudo chown -R 1000:1000 ~/.n8n first
AI nodes missing from palette → Confirm you're on 1.82+ with docker inspect n8n | grep Image

Step 2: Create the Document Ingestion Workflow

In n8n, create a new workflow. This workflow ingests documents into your vector store. You'll build it once and trigger it whenever your source data changes.

Nodes to add in order:

1. Manual Trigger (or Schedule Trigger for recurring ingestion)
No config needed for testing.

2. HTTP Request node — fetch your source document

Method: GET
URL: https://your-docs-site.com/api/content
Response Format: JSON

Or use the n8n Document Loader node directly:

Operation: Load from URL
URL: https://example.com/whitepaper.pdf

3. Recursive Character Text Splitter node

Chunk Size: 1000
Chunk Overlap: 200

Overlap of 200 preserves context across chunk boundaries — critical for retrieval quality.

4. Embeddings OpenAI node

Model: text-embedding-3-small
# text-embedding-3-small costs ~$0.02/1M tokens vs $0.13 for large
# Use large only for >100k chunk corpora where recall matters

5. Qdrant Vector Store node

Operation: Insert Documents
Qdrant URL: http://localhost:6333
Collection Name: docs-v1

Connect the nodes: Manual Trigger → HTTP Request → Text Splitter → Embeddings OpenAI → Qdrant.

Run the workflow. You should see green checkmarks across all nodes.

Expected output in Qdrant:

curl http://localhost:6333/collections/docs-v1
# Returns: {"result":{"vectors_count": 847, ...}}

Step 3: Start Qdrant Locally

If you don't have Qdrant running yet:

docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant:v1.9.0

n8n connects to Qdrant over the Docker bridge network. If both containers are on the same machine, use host.docker.internal:6333 instead of localhost:6333 in the n8n Qdrant node.

Step 4: Build the RAG Q&A Workflow

Create a second workflow — this one handles incoming questions and returns answers.

Nodes to add:

1. Webhook node

HTTP Method: POST
Path: /ask
Authentication: Header Auth (add a secret header for production)

2. AI Agent node
This is the core node. Set:

Agent Type: Conversational Agent
System Prompt: |
  You are a helpful assistant. Answer questions using only the
  context provided by the retrieval tool. If the answer isn't
  in the context, say "I don't know."

3. Qdrant Vector Store node (as a Tool for the Agent)

Operation: Retrieve Documents (as Tool)
Collection Name: docs-v1
Top K: 5
# Top 5 chunks balances context window usage vs recall

Connect it as a Tool input to the AI Agent node — drag from the Tools socket, not the main input.

4. OpenAI Chat Model node (connected to AI Agent)

Model: gpt-4o-mini
# gpt-4o-mini is 15x cheaper than gpt-4o for Q&A workloads
Temperature: 0

5. Respond to Webhook node

Response Body: {{ $json.output }}

Full node chain: Webhook → AI Agent (with Qdrant Tool + OpenAI Model) → Respond to Webhook.

Step 5: Add Memory for Multi-Turn Conversations

Single-turn Q&A is enough for search. For a chatbot, add session memory:

Add a Window Buffer Memory node to the AI Agent's Memory socket:

Session ID: {{ $('Webhook').item.json.sessionId }}
# Pass sessionId from the client with each request
Window Size: 10
# Keep last 10 message pairs; higher = more tokens per call

Your webhook payload now needs:

{
  "question": "What does the whitepaper say about pricing?",
  "sessionId": "user-abc-123"
}

n8n stores conversation history in its internal DB keyed by sessionId. No Redis needed.

Verification

Test the ingestion workflow first:

# Trigger ingestion manually in n8n UI, then check Qdrant
curl http://localhost:6333/collections/docs-v1/points/count
# Expected: {"result":{"count": 847}}

Then test the Q&A webhook:

curl -X POST http://localhost:5678/webhook/ask \
  -H "Content-Type: application/json" \
  -H "X-Secret: your-header-secret" \
  -d '{"question": "What are the main findings?", "sessionId": "test-1"}'

You should see: A JSON response with the output field containing a grounded answer within 3–5 seconds.

Check that retrieval is working — not just generation:

# In n8n, enable "Log" on the Qdrant node
# After a test run, inspect the output to confirm chunks are being retrieved
# If output.documents is empty, the collection name or embedding model may not match

Production Considerations

Re-ingestion strategy: When source docs change, don't append — delete the collection and re-ingest. Qdrant makes this a one-line operation and prevents stale chunks from polluting retrieval.

curl -X DELETE http://localhost:6333/collections/docs-v1

Then re-run your ingestion workflow.

Chunking tuning: 1000 tokens / 200 overlap works for most prose. For code docs or structured data, drop chunk size to 500 and overlap to 50. Retrieval recall drops sharply when chunks are too large to fit a complete concept.

Model cost: text-embedding-3-small + gpt-4o-mini keeps a 10k-chunk corpus + 1000 daily queries under $5/month. Switch to text-embedding-3-large + gpt-4o only if you're seeing factual errors on complex queries.

Webhook security: Always add Header Auth or Basic Auth to n8n webhooks before exposing them publicly. The built-in auth options are under Webhook node → Authentication.

What You Learned

n8n's LangChain nodes handle ingestion, embedding, and retrieval without boilerplate code
The AI Agent node with a Qdrant Tool input is the cleanest way to build retrieval-augmented generation in n8n
Window Buffer Memory adds multi-turn support with zero extra infrastructure
Separate ingestion and Q&A into two workflows — ingestion runs on schedule, Q&A runs on demand

Limitation: n8n's LangChain nodes use LangChain.js, not LangChain Python. If your pipeline needs Python-specific integrations (Unstructured, custom retrievers), use an Execute Command node to shell out, or call a FastAPI microservice via HTTP Request.

Tested on n8n 1.82.0, Qdrant 1.9.0, OpenAI text-embedding-3-small, gpt-4o-mini · Ubuntu 24.04 and macOS Sequoia