Flowise Ollama Integration: Local LLM Workflows 2026

Connect Flowise to Ollama for fully local LLM workflows. No API keys, no costs, complete data privacy. Step-by-step setup with RAG and chatbot examples.

Problem: Flowise Defaults to Cloud APIs You Don't Want

Flowise works great out of the box with OpenAI — but that means API keys, usage costs, and your data leaving your machine. If you're building internal tools, handling sensitive documents, or just want zero-cost inference, you need Flowise talking to Ollama instead.

The blocker most developers hit: Flowise's Ollama node silently fails if the base URL is wrong or the model name doesn't match exactly what Ollama reports.

You'll learn:

  • How to wire Flowise to a local Ollama instance correctly
  • How to build a working RAG chatbot with a local embedding model
  • How to avoid the three configuration mistakes that cause silent failures

Time: 20 min | Difficulty: Intermediate


Why This Breaks Silently

Flowise's Ollama integration uses the /api/generate and /api/embeddings endpoints directly. If Ollama isn't running, or the model name has a typo, Flowise shows a generic "Something went wrong" error — it doesn't tell you the model wasn't found.

Symptoms:

  • Chat node returns empty responses with no error
  • Embeddings node hangs indefinitely then times out
  • Flow runs fine with OpenAI but fails immediately after switching to Ollama

Root cause is almost always one of three things: Ollama isn't running, the base URL uses localhost instead of 127.0.0.1 (Docker networking issue), or the model name doesn't match ollama list output exactly.


Solution

Step 1: Get Ollama Running with Your Target Model

Before touching Flowise, confirm Ollama is up and the model is downloaded.

# Start Ollama (it runs as a background service after install)
ollama serve

# In a second terminal — pull the model you'll use for chat
ollama pull llama3.2:3b

# Pull a lightweight embedding model (required for RAG flows)
ollama pull nomic-embed-text

# Verify both are available
ollama list

Expected output:

NAME                    ID              SIZE    MODIFIED
llama3.2:3b             a80c4f17acd5    2.0 GB  2 minutes ago
nomic-embed-text        0a109f422b47    274 MB  1 minute ago

If ollama serve says address already in use — Ollama is already running as a service. Skip this step.


Step 2: Confirm the Ollama API Is Reachable

# Test the API directly before involving Flowise
curl http://127.0.0.1:11434/api/tags

Expected: JSON listing your pulled models.

If it fails:

  • Connection refused → Ollama isn't running. Run ollama serve.
  • Works on host but not in Flowise Docker container → See Step 3 for the Docker fix.

Step 3: Start Flowise (Handle Docker Networking)

If you're running Flowise directly on your machine:

npx flowise start

Flowise starts at http://localhost:3000. Use http://127.0.0.1:11434 as your Ollama base URL inside Flowise.

If you're running Flowise in Docker:

# Run Flowise with host networking so it can reach Ollama on the host
docker run -d \
  --network host \
  -v ~/.flowise:/root/.flowise \
  --name flowise \
  flowiseai/flowise

With --network host, use http://127.0.0.1:11434 as the Ollama base URL inside Flowise. Without this flag, localhost inside the container points to the container itself — not your host machine where Ollama runs.

Alternative without --network host:

docker run -d \
  -p 3000:3000 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v ~/.flowise:/root/.flowise \
  --name flowise \
  flowiseai/flowise

host.docker.internal works on Docker Desktop (macOS/Windows). On Linux, use --add-host=host.docker.internal:host-gateway instead.


Step 4: Create a Basic Chat Flow

Open Flowise at http://localhost:3000Add New → blank canvas.

Add these two nodes:

Node 1: ChatOllama

Drag ChatOllama onto the canvas (find it under Chat Models).

Configure it:

  • Base URL: http://127.0.0.1:11434
  • Model Name: llama3.2:3b (must match ollama list exactly — including the tag)
  • Temperature: 0.7

Node 2: ConversationChain

Drag ConversationChain onto the canvas (under Chains).

Connect ChatOllamaConversationChain via the Language Model input.

Click Save, then hit the chat bubble icon (bottom right). Type a message.

If you get an empty response:

# Check Ollama logs for the actual error
journalctl -u ollama -f   # Linux systemd
# or check the terminal where you ran `ollama serve`

Step 5: Add RAG with Local Embeddings

This is where local LLMs shine — your documents never leave your machine.

Add these nodes to a new flow:

Document Loader → Text Splitter → Ollama Embeddings → In-Memory Vector Store → Conversational Retrieval QA Chain → ChatOllama

Here's the exact node config for each:

PDF File Loader (under Document Loaders)

  • Upload your PDF via the file input

Recursive Character Text Splitter (under Text Splitters)

  • Chunk Size: 1000
  • Chunk Overlap: 200
  • Connect to: Document input on the vector store

OllamaEmbeddings (under Embeddings)

  • Base URL: http://127.0.0.1:11434
  • Model Name: nomic-embed-text
  • Connect to: Embeddings input on the vector store

In-Memory Vector Store (under Vector Stores)

  • Receives Document and Embeddings inputs
  • Connect its output to the Vector Store input on the QA chain

Conversational Retrieval QA Chain (under Chains)

  • Connect Vector Store and ChatOllama

ChatOllama — same config as Step 4.

Save the flow and test it by asking a question about your uploaded document.


Step 6: Expose the Flow as an API

Once the flow works in the canvas, grab its API endpoint for use in your apps.

# Get your Flow ID from the URL: /chatflows/{FLOW_ID}
FLOW_ID="your-flow-id-here"

# Test via curl
curl -X POST http://localhost:3000/api/v1/prediction/$FLOW_ID \
  -H "Content-Type: application/json" \
  -d '{"question": "Summarize the main points of the document"}'

Expected:

{
  "text": "The document covers...",
  "sourceDocuments": [...]
}

Add an API key for production:

# In Flowise Settings → API Keys → Add new key
# Then use it in requests:
curl -X POST http://localhost:3000/api/v1/prediction/$FLOW_ID \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the action items?"}'

Verification

Run this end-to-end check:

# 1. Confirm Ollama has both models
ollama list | grep -E "llama3.2:3b|nomic-embed-text"

# 2. Test Ollama inference directly
curl http://127.0.0.1:11434/api/generate \
  -d '{"model":"llama3.2:3b","prompt":"Say hello","stream":false}' \
  | python3 -m json.tool | grep response

# 3. Test Flowise API (replace with your actual flow ID)
curl -s -X POST http://localhost:3000/api/v1/prediction/YOUR_FLOW_ID \
  -H "Content-Type: application/json" \
  -d '{"question":"Hello, are you running locally?"}' \
  | python3 -m json.tool

You should see: A response from Llama 3.2 in step 2, and the same model answering via Flowise in step 3 — all without any external API calls.

Monitor resource usage during inference:

# NVIDIA GPU
nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv -l 2

# Apple Silicon
sudo powermetrics --samplers gpu_power -i 2000 -n 5

What You Learned

  • Flowise's Ollama node requires 127.0.0.1 not localhost when running in Docker with host networking
  • Model names in Flowise must exactly match the tag shown in ollama listllama3.2 and llama3.2:3b are different identifiers
  • nomic-embed-text is the best local embedding model for RAG with Ollama — it's small (274MB) and consistently outperforms alternatives at this size

Limitation: Local inference is slower than cloud APIs. On an 8GB VRAM GPU, llama3.2:3b delivers ~30–50 tokens/sec. For latency-sensitive apps, consider streaming responses via the Flowise WebSocket API rather than waiting for full completion.

When NOT to use this setup: If you need GPT-4-class reasoning for complex agentic tasks, local 3B–8B models will frustrate you. Use local Ollama for document Q&A, summarization, classification, and structured extraction — tasks where 7B models perform well.

Tested on Flowise 2.2.x, Ollama 0.5.4, llama3.2:3b, nomic-embed-text, Ubuntu 24.04 and macOS Sequoia