Flowise RAG Chatbot: Build Without Writing Code

Build a working RAG chatbot in Flowise without writing code. Connect a vector store, embed documents, and deploy a Q&A bot in under 30 minutes.

Mar 15, 2026

4 min read

Mark

flowise

Problem: RAG Pipelines Feel Out of Reach Without Coding

Most RAG tutorials assume you can write Python, manage vector databases, and wire LangChain together by hand. If that's not you — or you just want a working chatbot in 30 minutes — Flowise is the answer.

You'll learn:

How to spin up Flowise locally with Docker
How to upload documents and embed them into a vector store
How to build and test a full RAG chatbot flow with no code

Time: 30 min | Difficulty: Beginner

Why RAG Without Code Is Now Practical

RAG (Retrieval-Augmented Generation) used to mean a stack of Python scripts: document loaders, embedding models, vector DBs, prompt templates, and a glue layer. Flowise replaces all of that with a drag-and-drop canvas.

Flowise 2.x (released late 2025) added persistent vector store support and multi-session chatbots. That makes it production-viable, not just a demo toy.

Solution

Step 1: Run Flowise with Docker

# Pull and start Flowise — data persists in ./flowise-data
docker run -d \
  --name flowise \
  -p 3000:3000 \
  -v $(pwd)/flowise-data:/root/.flowise \
  flowiseai/flowise:latest

Expected output:

Status: Downloaded newer image for flowiseai/flowise:latest
<container-id>

Open http://localhost:3000. You should see the Flowise canvas.

If it fails:

Port 3000 already in use → Change -p 3001:3000 and open http://localhost:3001
docker: command not found → Install Docker Desktop from docker.com first

Step 2: Create a New Chatflow

Click Add New in the top-right corner
Name it RAG Chatbot — Docs QA
You'll land on a blank canvas

Each node on this canvas is one piece of the RAG pipeline.

Step 3: Add a Document Loader Node

Click + on the canvas and search for PDF File.

Set these fields:

File → Upload your PDF (product docs, a manual, any reference doc)
Chunk Size → 1000
Chunk Overlap → 200

Chunk size 1000 with 200 overlap is a safe default. Smaller chunks (500) retrieve more precisely; larger chunks (2000) retain more context per result. Start here and tune later.

Caption: The PDF loader splits your document into overlapping chunks before embedding.

Step 4: Add an Embeddings Node

Click + and search for OpenAI Embeddings.

Set these fields:

OpenAI API Key → Paste your key
Model Name → text-embedding-3-small

Connect the PDF File output to the Document input on the Embeddings node.

Free alternative: Use the Ollama Embeddings node with model nomic-embed-text — no API key needed.

Step 5: Add an In-Memory Vector Store

Search for In-Memory Vector Store and drop it on the canvas.

Connect:

Embeddings node → Embeddings input
PDF File node → Document input

In-memory is fine for testing. For production, swap to the Qdrant or Postgres PGVector node so embeddings survive restarts.

Step 6: Add a Retriever Node

Search for Vector Store Retriever and connect the vector store output to its input.

Set Top K → 4 (chunks retrieved per query).

Top K = 4 is the right starting point. Too low (1–2) and answers miss context. Too high (8+) and you fill the prompt with noise.

Step 7: Add a Chat Model

Search for ChatOpenAI and configure:

OpenAI API Key → Same key as Step 4
Model Name → gpt-4o-mini
Temperature → 0

Local alternative: Use ChatOllama with model llama3.2 for fully local inference.

Step 8: Connect with a Conversational RAG Chain

Search for Conversational Retrieval QA Chain and drop it on the canvas.

Connect:

ChatOpenAI → Language Model input
Vector Store Retriever → Vector Store Retriever input

This node takes the user's question, retrieves matching chunks, and passes both to the LLM.

Full RAG chatflow canvas showing all connected nodes Caption: The complete flow — PDF loader feeds the vector store, the retriever finds relevant chunks, the chain answers.

Step 9: Save and Test

Click Save (top-right)
Click the Chat bubble icon (bottom-right of canvas)
Ask a question about your document

Verification

Ask something that can only be answered from your uploaded document:

What is the maximum operating temperature listed in the safety guidelines?

You should see: A direct answer pulled from your doc. If it's vague or wrong, reduce chunk size to 500 and retest.

Enable Return Source Documents on the QA Chain node to see which chunks were used — this confirms retrieval is working and not hallucinating.

RAG chatbot responding with source document references Caption: Source documents confirm the chatbot is pulling answers from your file, not making things up.

What You Learned

Flowise connects document loaders, embeddings, vector stores, and LLMs visually — no code required
Chunk size (1000) and overlap (200) are the main retrieval quality levers
In-memory vector store is fine for testing; use Qdrant or PGVector for production

Limitation: In-memory storage resets on every Docker restart. Switch to the Qdrant node with a persistent volume before going live.

Next step: Click API Endpoint in Flowise to get a REST endpoint — call it from any frontend or n8n workflow.

Tested on Flowise 2.2.1, Docker 27.x, macOS 15 and Ubuntu 24.04