Problem: Flowise Defaults to Cloud APIs You Don't Want
Flowise works great out of the box with OpenAI — but that means API keys, usage costs, and your data leaving your machine. If you're building internal tools, handling sensitive documents, or just want zero-cost inference, you need Flowise talking to Ollama instead.
The blocker most developers hit: Flowise's Ollama node silently fails if the base URL is wrong or the model name doesn't match exactly what Ollama reports.
You'll learn:
- How to wire Flowise to a local Ollama instance correctly
- How to build a working RAG chatbot with a local embedding model
- How to avoid the three configuration mistakes that cause silent failures
Time: 20 min | Difficulty: Intermediate
Why This Breaks Silently
Flowise's Ollama integration uses the /api/generate and /api/embeddings endpoints directly. If Ollama isn't running, or the model name has a typo, Flowise shows a generic "Something went wrong" error — it doesn't tell you the model wasn't found.
Symptoms:
- Chat node returns empty responses with no error
- Embeddings node hangs indefinitely then times out
- Flow runs fine with OpenAI but fails immediately after switching to Ollama
Root cause is almost always one of three things: Ollama isn't running, the base URL uses localhost instead of 127.0.0.1 (Docker networking issue), or the model name doesn't match ollama list output exactly.
Solution
Step 1: Get Ollama Running with Your Target Model
Before touching Flowise, confirm Ollama is up and the model is downloaded.
# Start Ollama (it runs as a background service after install)
ollama serve
# In a second terminal — pull the model you'll use for chat
ollama pull llama3.2:3b
# Pull a lightweight embedding model (required for RAG flows)
ollama pull nomic-embed-text
# Verify both are available
ollama list
Expected output:
NAME ID SIZE MODIFIED
llama3.2:3b a80c4f17acd5 2.0 GB 2 minutes ago
nomic-embed-text 0a109f422b47 274 MB 1 minute ago
If ollama serve says address already in use — Ollama is already running as a service. Skip this step.
Step 2: Confirm the Ollama API Is Reachable
# Test the API directly before involving Flowise
curl http://127.0.0.1:11434/api/tags
Expected: JSON listing your pulled models.
If it fails:
Connection refused→ Ollama isn't running. Runollama serve.- Works on host but not in Flowise Docker container → See Step 3 for the Docker fix.
Step 3: Start Flowise (Handle Docker Networking)
If you're running Flowise directly on your machine:
npx flowise start
Flowise starts at http://localhost:3000. Use http://127.0.0.1:11434 as your Ollama base URL inside Flowise.
If you're running Flowise in Docker:
# Run Flowise with host networking so it can reach Ollama on the host
docker run -d \
--network host \
-v ~/.flowise:/root/.flowise \
--name flowise \
flowiseai/flowise
With --network host, use http://127.0.0.1:11434 as the Ollama base URL inside Flowise. Without this flag, localhost inside the container points to the container itself — not your host machine where Ollama runs.
Alternative without --network host:
docker run -d \
-p 3000:3000 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v ~/.flowise:/root/.flowise \
--name flowise \
flowiseai/flowise
host.docker.internal works on Docker Desktop (macOS/Windows). On Linux, use --add-host=host.docker.internal:host-gateway instead.
Step 4: Create a Basic Chat Flow
Open Flowise at http://localhost:3000 → Add New → blank canvas.
Add these two nodes:
Node 1: ChatOllama
Drag ChatOllama onto the canvas (find it under Chat Models).
Configure it:
- Base URL:
http://127.0.0.1:11434 - Model Name:
llama3.2:3b(must matchollama listexactly — including the tag) - Temperature:
0.7
Node 2: ConversationChain
Drag ConversationChain onto the canvas (under Chains).
Connect ChatOllama → ConversationChain via the Language Model input.
Click Save, then hit the chat bubble icon (bottom right). Type a message.
If you get an empty response:
# Check Ollama logs for the actual error
journalctl -u ollama -f # Linux systemd
# or check the terminal where you ran `ollama serve`
Step 5: Add RAG with Local Embeddings
This is where local LLMs shine — your documents never leave your machine.
Add these nodes to a new flow:
Document Loader → Text Splitter → Ollama Embeddings → In-Memory Vector Store → Conversational Retrieval QA Chain → ChatOllama
Here's the exact node config for each:
PDF File Loader (under Document Loaders)
- Upload your PDF via the file input
Recursive Character Text Splitter (under Text Splitters)
- Chunk Size:
1000 - Chunk Overlap:
200 - Connect to: Document input on the vector store
OllamaEmbeddings (under Embeddings)
- Base URL:
http://127.0.0.1:11434 - Model Name:
nomic-embed-text - Connect to: Embeddings input on the vector store
In-Memory Vector Store (under Vector Stores)
- Receives Document and Embeddings inputs
- Connect its output to the Vector Store input on the QA chain
Conversational Retrieval QA Chain (under Chains)
- Connect Vector Store and ChatOllama
ChatOllama — same config as Step 4.
Save the flow and test it by asking a question about your uploaded document.
Step 6: Expose the Flow as an API
Once the flow works in the canvas, grab its API endpoint for use in your apps.
# Get your Flow ID from the URL: /chatflows/{FLOW_ID}
FLOW_ID="your-flow-id-here"
# Test via curl
curl -X POST http://localhost:3000/api/v1/prediction/$FLOW_ID \
-H "Content-Type: application/json" \
-d '{"question": "Summarize the main points of the document"}'
Expected:
{
"text": "The document covers...",
"sourceDocuments": [...]
}
Add an API key for production:
# In Flowise Settings → API Keys → Add new key
# Then use it in requests:
curl -X POST http://localhost:3000/api/v1/prediction/$FLOW_ID \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{"question": "What are the action items?"}'
Verification
Run this end-to-end check:
# 1. Confirm Ollama has both models
ollama list | grep -E "llama3.2:3b|nomic-embed-text"
# 2. Test Ollama inference directly
curl http://127.0.0.1:11434/api/generate \
-d '{"model":"llama3.2:3b","prompt":"Say hello","stream":false}' \
| python3 -m json.tool | grep response
# 3. Test Flowise API (replace with your actual flow ID)
curl -s -X POST http://localhost:3000/api/v1/prediction/YOUR_FLOW_ID \
-H "Content-Type: application/json" \
-d '{"question":"Hello, are you running locally?"}' \
| python3 -m json.tool
You should see: A response from Llama 3.2 in step 2, and the same model answering via Flowise in step 3 — all without any external API calls.
Monitor resource usage during inference:
# NVIDIA GPU
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv -l 2
# Apple Silicon
sudo powermetrics --samplers gpu_power -i 2000 -n 5
What You Learned
- Flowise's Ollama node requires
127.0.0.1notlocalhostwhen running in Docker with host networking - Model names in Flowise must exactly match the tag shown in
ollama list—llama3.2andllama3.2:3bare different identifiers nomic-embed-textis the best local embedding model for RAG with Ollama — it's small (274MB) and consistently outperforms alternatives at this size
Limitation: Local inference is slower than cloud APIs. On an 8GB VRAM GPU, llama3.2:3b delivers ~30–50 tokens/sec. For latency-sensitive apps, consider streaming responses via the Flowise WebSocket API rather than waiting for full completion.
When NOT to use this setup: If you need GPT-4-class reasoning for complex agentic tasks, local 3B–8B models will frustrate you. Use local Ollama for document Q&A, summarization, classification, and structured extraction — tasks where 7B models perform well.
Tested on Flowise 2.2.x, Ollama 0.5.4, llama3.2:3b, nomic-embed-text, Ubuntu 24.04 and macOS Sequoia