Local AI Development Stack with Docker Compose: Ollama, PostgreSQL, Redis in One Command

Build a reproducible local AI development environment using Docker Compose — wiring Ollama for LLM inference, PostgreSQL + pgvector for embeddings, and Redis for caching with health checks and auto-restart.

'Works on my machine' ends today. This docker-compose.yml gives your entire team the same AI stack in 90 seconds.

Your shiny new RTX 4090 is crying tears of silicon—it's trying to run Llama 3.1 70B alone, while your teammate's MacBook Air is still downloading Python 3.11. Your data scientist is debugging a pip install torch error from last Tuesday, and your backend engineer just rm -rf'd their virtualenv trying to clean up. This isn't development; it's a distributed systems failure happening on one desk. With 57% of professional developers already using Docker (Stack Overflow 2025), the infrastructure for consistency is sitting right there. The problem isn't the AI models—it's the 47-step README that gets your team's environments talking to each other.

Let's kill that README with a single file. We're building a local AI stack that runs Ollama for models, PostgreSQL with pgvector for embeddings, Redis for caching, and your application—all orchestrated with one command. This isn't just a tutorial; it's an extermination of "it works for me."

Why Your Virtualenv is a Liability for AI Development

You've seen the pattern: python3 -m venv .venv, source .venv/bin/activate, pip install -r requirements.txt. It works until it doesn't. The moment you add system dependencies for CUDA drivers, or your teammate is on Windows with WSL, or you need the exact same version of libstdc++ that the llama-cpp-python wheel expects, the house of cards collapses. AI development stacks are monstrously stateful. Ollama models are multi-gigabyte downloads. Vector databases need specific PostgreSQL extensions. GPU toolchains are a dependency hellscape.

Docker Compose solves this by treating your entire stack—not just your Python app—as a single, versioned unit. The docker-compose.yml file is your new requirements.txt, but for every service, library, and system dependency. When you run docker-compose up, you're not just starting containers; you're deploying a precise, isolated, and reproducible environment. Need to test with PostgreSQL 15 instead of 16? Change one line in the compose file. Want to give a new hire a working stack on day one? They run one command. Multi-stage builds alone can reduce your final image size by 60–80% (Docker Hub analysis, 2025), meaning you're not shipping a 1.1GB Python image to run a 50MB app.

Anatomy of a Four-Service AI Stack

Our target architecture is pragmatic, not over-engineered. We'll run four services that cover 90% of local AI app needs:

  1. Ollama (Port 11434): The model server. It pulls and runs open-weight LLMs (Llama 3.2, Mistral, Qwen2) via a simple OpenAI-compatible API. We run it in a container for consistency, but we'll give it direct GPU access for native speed.
  2. PostgreSQL + pgvector (Port 5432): The brain's memory. The pgvector extension turns PostgreSQL into a high-performance vector database for storing and searching embeddings (e.g., from all-MiniLM-L6-v2). No more separate ChromaDB or Weaviate instance for local dev.
  3. Redis (Port 6379): The conversation cache. Store session data, rate-limit counters, and prompt/response caches to avoid burning GPU cycles on repeat queries.
  4. Your App (Port 8000+): The orchestrator. A FastAPI, Flask, or Node.js app that calls Ollama, queries PostgreSQL, and uses Redis. It lives in the same network, talking to ollama:11434, not localhost:11434.

The magic is the private Docker network Compose creates. Services discover each other by name. Your app's connection string is postgresql://postgres:password@postgres:5432/vectordb. It works identically on Linux, macOS, and Windows.

Crafting the Definitive docker-compose.yml

Enough theory. Here's the file that replaces your onboarding document. Save this as docker-compose.yml in your project root.

version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ai-stack-ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    networks:
      - ai-network
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped

  postgres:
    image: pgvector/pgvector:pg16
    container_name: ai-stack-postgres
    environment:
      POSTGRES_DB: vectordb
      POSTGRES_USER: ai_user
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    networks:
      - ai-network
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ai_user -d vectordb"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    container_name: ai-stack-redis
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes
    networks:
      - ai-network
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
    restart: unless-stopped

  app:
    build:
      context: ./app
      dockerfile: Dockerfile
      args:
        BUILDKIT_INLINE_CACHE: 1
    container_name: ai-stack-app
    ports:
      - "8000:8000"
    volumes:
      - ./app:/code
    environment:
      - OLLAMA_HOST=ollama
      - DATABASE_URL=postgresql://ai_user:${POSTGRES_PASSWORD:-changeme}@postgres:5432/vectordb
      - REDIS_URL=redis://redis:6379/0
    networks:
      - ai-network
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
      ollama:
        condition: service_healthy
    restart: unless-stopped

volumes:
  ollama_data:
  postgres_data:
  redis_data:

networks:
  ai-network:
    driver: bridge

Key moves explained:

  • Volumes: ollama_data, postgres_data, redis_data are named volumes. Your downloaded models and database state persist across container restarts. They're not in ./data to avoid permission nightmares.
  • Healthchecks: These aren't optional. The depends_on: condition: service_healthy for the app service means your app won't start until PostgreSQL is accepting connections and Ollama's API is responsive. This kills race conditions on startup.
  • BuildKit: The BUILDKIT_INLINE_CACHE: 1 arg enables layer caching for your app's Dockerfile, which can be 2–5x faster than legacy Docker build (Docker, 2025) on subsequent runs.
  • Network: A custom ai-network isolates this stack from other Compose projects on your machine, preventing port collisions.

Create a simple ./app/init.sql for pgvector:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS document_embeddings (
    id BIGSERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(384),
    metadata JSONB
);
CREATE INDEX ON document_embeddings USING ivfflat (embedding vector_cosine_ops);

Giving Ollama its GPU Wings (NVIDIA Container Toolkit)

The Ollama service definition includes a deploy.reservations.devices section. This is the Docker Compose v3+ way to request GPU access. For it to work, you must install the NVIDIA Container Toolkit on your host machine. This is a one-time setup.

For Linux:


distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

For Docker Desktop (macOS/Windows): Enable GPU passthrough in Docker Desktop's settings: Settings > Resources > WSL Integration (or Advanced on macOS) > Enable NVIDIA GPU Support. On Windows, you must have WSL 2 and the NVIDIA driver installed in WSL.

After setup, verify with docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi. You should see your GPU info. Now, when you docker-compose up, Ollama will use your GPU directly. No CUDA installation inside the container needed.

Keeping Secrets Out of Your Git History

You spotted the ${POSTGRES_PASSWORD:-changeme} syntax. That's Docker Compose reading from a shell environment variable, with a default fallback. The professional move is to use a .env file. Create a .env file in the same directory as your docker-compose.yml:

POSTGRES_PASSWORD=your_secure_password_here
OLLAMA_API_KEY=sk-ollama-... # If using Ollama Cloud
OPENAI_API_KEY=sk-... # For fallback or embeddings

Crucially, add .env to your .gitignore. The docker-compose.yml file can be committed; the .env file is personal or environment-specific. Docker Compose automatically loads a .env file from the current directory. For production, you'd use Docker Secrets (with Docker Swarm) or a dedicated secrets manager, but for local dev, .env is the right tool.

The Override Pattern: Dev Speed vs. Prod Parity

Your docker-compose.yml should be the production-like baseline. Then, use docker-compose.override.yml for development conveniences. Docker Compose automatically merges them when you run docker-compose up. This file is typically .gitignore'd.

Create docker-compose.override.yml:

version: '3.8'

services:
  app:
    # Override the command to run a dev server with hot reload
    command: uvicorn main:app --reload --host 0.0.0.0 --port 8000
    # Mount your local source code for instant changes
    volumes:
      - ./app:/code
      - ./app/.venv:/opt/venv # Optional: persist a virtualenv inside the container
    # Disable the production healthcheck for faster restart loops
    healthcheck:
      disable: true
    # Add a debugger port
    ports:
      - "8000:8000"
      - "5678:5678" # For debugpy

  postgres:
    # Maybe you want adminer for a quick DB GUI in dev
    # Uncomment to add on port 8080
    # image: adminer
    # ports:
    #   - "8080:8080"

Now, run docker-compose up for development (with overrides). Run docker-compose -f docker-compose.yml up for a clean, production-like stack. This pattern is cleaner than commenting/uncommenting blocks in a single file.

Taming the Beast: Common Pitfalls and Their Fixes

You will hit these. Here's how to win.

1. Port Conflicts: "Address already in use" You have a local PostgreSQL running on port 5432, and the container can't bind to it. Fix: Stop the local service (sudo systemctl stop postgresql on Linux) or change the host port mapping in the compose file: "5433:5432". Now connect to localhost:5433.

2. Volume Permission Denied (Linux) Your app container crashes with Permission denied trying to write to a mounted volume. This happens because the container's user (often uid=1000) doesn't match your host user's uid. The quick fix: run sudo chown -R 1000:1000 ./app on your host project directory. The right fix: craft your app's Dockerfile to create a user with a dynamic UID using ARG and USER.

3. Startup Race Conditions Your app starts before PostgreSQL is ready, crashes, and Docker restarts it in a loop. We solved this with healthcheck and depends_on: condition: service_healthy. If you ever see Error response from daemon: network not found, the fix is to re-create the network with docker network create ai-network or, more simply, run docker-compose down followed by docker-compose up. The down command removes networks by default; up recreates them.

4. The Dreaded "no space left on device" You've been pulling models and Docker images for weeks. Your disk is full. The fix: run docker system prune -a --volumes. This is aggressive—it removes all stopped containers, unused networks, dangling images, and build cache. To prevent it, configure Docker Desktop's disk image size limits or, on Linux, set daemon max-size in /etc/docker/daemon.json: { "storage-opts": ["size=120GB"] }.

5. Apple Silicon / ARM64 Woes You build an image on your Mac M3 and try to run it on an AMD64 Linux server. You get OCI runtime exec failed: exec format error. The fix: always build with explicit platform flags when targeting a different architecture. Use docker build --platform linux/amd64 -t your-image . or set DOCKER_DEFAULT_PLATFORM=linux/amd64 in your environment. Better yet, use Docker Buildx to build multi-architecture images.

Optimizing Your Stack: From Working to Fast

A running stack is good. A fast and efficient stack is professional. Use these tools from the approved ecosystem:

  1. Lint your Dockerfile: Run hadolint ./app/Dockerfile to catch common mistakes like not combining RUN apt-get update && apt-get install into one layer.
  2. Analyze image layers: Run dive your-image:tag to visualize each layer's size and inefficiencies. Aim for a layer cache hit rate of 90%+ by ordering your COPY and RUN commands wisely.
  3. Scan for vulnerabilities: Run trivy image your-image:tag before you run it. Don't deploy a container with critical CVEs.
  4. Monitor in terminal: Run lazydocker for a glorious terminal UI that shows logs, stats, and lets you manage containers without memorizing 50 CLI flags.

Let's look at the impact of base image choice, a critical Dockerfile decision:

Base ImageSizeUse CaseKey Trade-off
python:3.12~1.1 GB"It just works" developmentMassive overhead, includes build tools.
python:3.12-slim~130 MBGeneral-purpose productionStripped of common build deps, may need apt-get install for some packages.
python:3.12-alpine~45 MBSize-constrained productionUses musl libc, can cause incompatibility with some Python wheels (e.g., grpcio).

For AI apps, python:3.12-slim is often the sweet spot. You can build a final image under 200MB easily.

Next Steps: From Local Stack to Shared Reality

You now have a single command that spins up a complete, GPU-accelerated AI development environment. The docker-compose.yml file is the source of truth. Commit it. When your teammate pulls the repo, they run docker-compose up, wait for the ~45s first pull (subsequent starts are ~8s), and they're in business. No "can you send me the model file?" No "what's the Redis password?"

Take this foundation and extend it:

  1. Add a testing service: Integrate a container running pytest with the same network access, so your integration tests hit the real Ollama and Postgres containers.
  2. Implement Docker Scout: Use it in your CI pipeline to get a bill of materials (SBOM) and continuous vulnerability assessment for your images. With Docker Hub serving 13M+ image pulls per day (Docker blog), knowing what's in yours is non-negotiable.
  3. Prepare for Prod: Create a docker-compose.prod.yml that removes the development mounts, sets restart: always, and points to your private image registry. Use Docker Buildx to build and push multi-arch images.
  4. Orchestrate Model Downloads: Add an initialization script that uses the Ollama API (curl -X POST http://ollama:11434/api/pull -d '{"name": "llama3.2:3b"}')) to pull frequently used models as part of your stack's health check, so they're pre-cached.

The goal isn't just to run containers. It's to make your AI development workflow as reliable and shareable as the Git repository that holds your code. Stop debugging environments and start building. Your GPU will thank you.