Flowise Ollama Integration: Local LLM Workflows 2026

Problem: Flowise Defaults to Cloud APIs You Don't Want

Flowise works great out of the box with OpenAI — but that means API keys, usage costs, and your data leaving your machine. If you're building internal tools, handling sensitive documents, or just want zero-cost inference, you need Flowise talking to Ollama instead.

The blocker most developers hit: Flowise's Ollama node silently fails if the base URL is wrong or the model name doesn't match exactly what Ollama reports.

You'll learn:

How to wire Flowise to a local Ollama instance correctly
How to build a working RAG chatbot with a local embedding model
How to avoid the three configuration mistakes that cause silent failures

Time: 20 min | Difficulty: Intermediate

Why This Breaks Silently

Flowise's Ollama integration uses the /api/generate and /api/embeddings endpoints directly. If Ollama isn't running, or the model name has a typo, Flowise shows a generic "Something went wrong" error — it doesn't tell you the model wasn't found.

Symptoms:

Chat node returns empty responses with no error
Embeddings node hangs indefinitely then times out
Flow runs fine with OpenAI but fails immediately after switching to Ollama

Root cause is almost always one of three things: Ollama isn't running, the base URL uses localhost instead of 127.0.0.1 (Docker networking issue), or the model name doesn't match ollama list output exactly.

Solution

Step 1: Get Ollama Running with Your Target Model

Before touching Flowise, confirm Ollama is up and the model is downloaded.

# Start Ollama (it runs as a background service after install)
ollama serve

# In a second terminal — pull the model you'll use for chat
ollama pull llama3.2:3b

# Pull a lightweight embedding model (required for RAG flows)
ollama pull nomic-embed-text

# Verify both are available
ollama list

Expected output:

NAME                    ID              SIZE    MODIFIED
llama3.2:3b             a80c4f17acd5    2.0 GB  2 minutes ago
nomic-embed-text        0a109f422b47    274 MB  1 minute ago

If ollama serve says address already in use — Ollama is already running as a service. Skip this step.

Step 2: Confirm the Ollama API Is Reachable

# Test the API directly before involving Flowise
curl http://127.0.0.1:11434/api/tags

Expected: JSON listing your pulled models.

If it fails:

Connection refused → Ollama isn't running. Run ollama serve.
Works on host but not in Flowise Docker container → See Step 3 for the Docker fix.

Step 3: Start Flowise (Handle Docker Networking)

If you're running Flowise directly on your machine:

npx flowise start

Flowise starts at http://localhost:3000. Use http://127.0.0.1:11434 as your Ollama base URL inside Flowise.

If you're running Flowise in Docker:

# Run Flowise with host networking so it can reach Ollama on the host
docker run -d \
  --network host \
  -v ~/.flowise:/root/.flowise \
  --name flowise \
  flowiseai/flowise

With --network host, use http://127.0.0.1:11434 as the Ollama base URL inside Flowise. Without this flag, localhost inside the container points to the container itself — not your host machine where Ollama runs.

Alternative without --network host:

docker run -d \
  -p 3000:3000 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v ~/.flowise:/root/.flowise \
  --name flowise \
  flowiseai/flowise

host.docker.internal works on Docker Desktop (macOS/Windows). On Linux, use --add-host=host.docker.internal:host-gateway instead.

Step 4: Create a Basic Chat Flow

Open Flowise at http://localhost:3000 → Add New → blank canvas.

Add these two nodes:

Node 1: ChatOllama

Drag ChatOllama onto the canvas (find it under Chat Models).

Configure it:

Base URL: http://127.0.0.1:11434
Model Name: llama3.2:3b (must match ollama list exactly — including the tag)
Temperature: 0.7

Node 2: ConversationChain

Drag ConversationChain onto the canvas (under Chains).

Connect ChatOllama → ConversationChain via the Language Model input.

Click Save, then hit the chat bubble icon (bottom right). Type a message.

If you get an empty response:

# Check Ollama logs for the actual error
journalctl -u ollama -f   # Linux systemd
# or check the terminal where you ran `ollama serve`

Step 5: Add RAG with Local Embeddings

This is where local LLMs shine — your documents never leave your machine.

Add these nodes to a new flow:

Document Loader → Text Splitter → Ollama Embeddings → In-Memory Vector Store → Conversational Retrieval QA Chain → ChatOllama

Here's the exact node config for each:

PDF File Loader (under Document Loaders)

Upload your PDF via the file input

Recursive Character Text Splitter (under Text Splitters)

Chunk Size: 1000
Chunk Overlap: 200
Connect to: Document input on the vector store

OllamaEmbeddings (under Embeddings)

Base URL: http://127.0.0.1:11434
Model Name: nomic-embed-text
Connect to: Embeddings input on the vector store

In-Memory Vector Store (under Vector Stores)

Receives Document and Embeddings inputs
Connect its output to the Vector Store input on the QA chain

Conversational Retrieval QA Chain (under Chains)

Connect Vector Store and ChatOllama

ChatOllama — same config as Step 4.

Save the flow and test it by asking a question about your uploaded document.

Step 6: Expose the Flow as an API

Once the flow works in the canvas, grab its API endpoint for use in your apps.

# Get your Flow ID from the URL: /chatflows/{FLOW_ID}
FLOW_ID="your-flow-id-here"

# Test via curl
curl -X POST http://localhost:3000/api/v1/prediction/$FLOW_ID \
  -H "Content-Type: application/json" \
  -d '{"question": "Summarize the main points of the document"}'

Expected:

{
  "text": "The document covers...",
  "sourceDocuments": [...]
}

Add an API key for production:

# In Flowise Settings → API Keys → Add new key
# Then use it in requests:
curl -X POST http://localhost:3000/api/v1/prediction/$FLOW_ID \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the action items?"}'

Verification

Run this end-to-end check:

# 1. Confirm Ollama has both models
ollama list | grep -E "llama3.2:3b|nomic-embed-text"

# 2. Test Ollama inference directly
curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"llama3.2:3b","prompt":"Say hello","stream":false}' \
  | python3 -m json.tool | grep response

# 3. Test Flowise API (replace with your actual flow ID)
curl -s -X POST http://localhost:3000/api/v1/prediction/YOUR_FLOW_ID \
  -H "Content-Type: application/json" \
  -d '{"question":"Hello, are you running locally?"}' \
  | python3 -m json.tool

You should see: A response from Llama 3.2 in step 2, and the same model answering via Flowise in step 3 — all without any external API calls.

Monitor resource usage during inference:

# NVIDIA GPU
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv -l 2

# Apple Silicon
sudo powermetrics --samplers gpu_power -i 2000 -n 5

What You Learned

Flowise's Ollama node requires 127.0.0.1 not localhost when running in Docker with host networking
Model names in Flowise must exactly match the tag shown in ollama list — llama3.2 and llama3.2:3b are different identifiers
nomic-embed-text is the best local embedding model for RAG with Ollama — it's small (274MB) and consistently outperforms alternatives at this size

Limitation: Local inference is slower than cloud APIs. On an 8GB VRAM GPU, llama3.2:3b delivers ~30–50 tokens/sec. For latency-sensitive apps, consider streaming responses via the Flowise WebSocket API rather than waiting for full completion.

When NOT to use this setup: If you need GPT-4-class reasoning for complex agentic tasks, local 3B–8B models will frustrate you. Use local Ollama for document Q&A, summarization, classification, and structured extraction — tasks where 7B models perform well.

Tested on Flowise 2.2.x, Ollama 0.5.4, llama3.2:3b, nomic-embed-text, Ubuntu 24.04 and macOS Sequoia