'Works on my machine' ends today. This docker-compose.yml gives your entire team the same AI stack in 90 seconds.
Your shiny new RTX 4090 is crying tears of silicon—it's trying to run Llama 3.1 70B alone, while your teammate's MacBook Air is still downloading Python 3.11. Your data scientist is debugging a pip install torch error from last Tuesday, and your backend engineer just rm -rf'd their virtualenv trying to clean up. This isn't development; it's a distributed systems failure happening on one desk. With 57% of professional developers already using Docker (Stack Overflow 2025), the infrastructure for consistency is sitting right there. The problem isn't the AI models—it's the 47-step README that gets your team's environments talking to each other.
Let's kill that README with a single file. We're building a local AI stack that runs Ollama for models, PostgreSQL with pgvector for embeddings, Redis for caching, and your application—all orchestrated with one command. This isn't just a tutorial; it's an extermination of "it works for me."
Why Your Virtualenv is a Liability for AI Development
You've seen the pattern: python3 -m venv .venv, source .venv/bin/activate, pip install -r requirements.txt. It works until it doesn't. The moment you add system dependencies for CUDA drivers, or your teammate is on Windows with WSL, or you need the exact same version of libstdc++ that the llama-cpp-python wheel expects, the house of cards collapses. AI development stacks are monstrously stateful. Ollama models are multi-gigabyte downloads. Vector databases need specific PostgreSQL extensions. GPU toolchains are a dependency hellscape.
Docker Compose solves this by treating your entire stack—not just your Python app—as a single, versioned unit. The docker-compose.yml file is your new requirements.txt, but for every service, library, and system dependency. When you run docker-compose up, you're not just starting containers; you're deploying a precise, isolated, and reproducible environment. Need to test with PostgreSQL 15 instead of 16? Change one line in the compose file. Want to give a new hire a working stack on day one? They run one command. Multi-stage builds alone can reduce your final image size by 60–80% (Docker Hub analysis, 2025), meaning you're not shipping a 1.1GB Python image to run a 50MB app.
Anatomy of a Four-Service AI Stack
Our target architecture is pragmatic, not over-engineered. We'll run four services that cover 90% of local AI app needs:
- Ollama (Port 11434): The model server. It pulls and runs open-weight LLMs (Llama 3.2, Mistral, Qwen2) via a simple OpenAI-compatible API. We run it in a container for consistency, but we'll give it direct GPU access for native speed.
- PostgreSQL + pgvector (Port 5432): The brain's memory. The
pgvectorextension turns PostgreSQL into a high-performance vector database for storing and searching embeddings (e.g., fromall-MiniLM-L6-v2). No more separate ChromaDB or Weaviate instance for local dev. - Redis (Port 6379): The conversation cache. Store session data, rate-limit counters, and prompt/response caches to avoid burning GPU cycles on repeat queries.
- Your App (Port 8000+): The orchestrator. A FastAPI, Flask, or Node.js app that calls Ollama, queries PostgreSQL, and uses Redis. It lives in the same network, talking to
ollama:11434, notlocalhost:11434.
The magic is the private Docker network Compose creates. Services discover each other by name. Your app's connection string is postgresql://postgres:password@postgres:5432/vectordb. It works identically on Linux, macOS, and Windows.
Crafting the Definitive docker-compose.yml
Enough theory. Here's the file that replaces your onboarding document. Save this as docker-compose.yml in your project root.
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ai-stack-ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
networks:
- ai-network
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
restart: unless-stopped
postgres:
image: pgvector/pgvector:pg16
container_name: ai-stack-postgres
environment:
POSTGRES_DB: vectordb
POSTGRES_USER: ai_user
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
networks:
- ai-network
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ai_user -d vectordb"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
redis:
image: redis:7-alpine
container_name: ai-stack-redis
ports:
- "6379:6379"
volumes:
- redis_data:/data
command: redis-server --appendonly yes
networks:
- ai-network
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
restart: unless-stopped
app:
build:
context: ./app
dockerfile: Dockerfile
args:
BUILDKIT_INLINE_CACHE: 1
container_name: ai-stack-app
ports:
- "8000:8000"
volumes:
- ./app:/code
environment:
- OLLAMA_HOST=ollama
- DATABASE_URL=postgresql://ai_user:${POSTGRES_PASSWORD:-changeme}@postgres:5432/vectordb
- REDIS_URL=redis://redis:6379/0
networks:
- ai-network
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
ollama:
condition: service_healthy
restart: unless-stopped
volumes:
ollama_data:
postgres_data:
redis_data:
networks:
ai-network:
driver: bridge
Key moves explained:
- Volumes:
ollama_data,postgres_data,redis_dataare named volumes. Your downloaded models and database state persist across container restarts. They're not in./datato avoid permission nightmares. - Healthchecks: These aren't optional. The
depends_on: condition: service_healthyfor theappservice means your app won't start until PostgreSQL is accepting connections and Ollama's API is responsive. This kills race conditions on startup. - BuildKit: The
BUILDKIT_INLINE_CACHE: 1arg enables layer caching for your app's Dockerfile, which can be 2–5x faster than legacy Docker build (Docker, 2025) on subsequent runs. - Network: A custom
ai-networkisolates this stack from other Compose projects on your machine, preventing port collisions.
Create a simple ./app/init.sql for pgvector:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS document_embeddings (
id BIGSERIAL PRIMARY KEY,
content TEXT,
embedding vector(384),
metadata JSONB
);
CREATE INDEX ON document_embeddings USING ivfflat (embedding vector_cosine_ops);
Giving Ollama its GPU Wings (NVIDIA Container Toolkit)
The Ollama service definition includes a deploy.reservations.devices section. This is the Docker Compose v3+ way to request GPU access. For it to work, you must install the NVIDIA Container Toolkit on your host machine. This is a one-time setup.
For Linux:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
For Docker Desktop (macOS/Windows): Enable GPU passthrough in Docker Desktop's settings: Settings > Resources > WSL Integration (or Advanced on macOS) > Enable NVIDIA GPU Support. On Windows, you must have WSL 2 and the NVIDIA driver installed in WSL.
After setup, verify with docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi. You should see your GPU info. Now, when you docker-compose up, Ollama will use your GPU directly. No CUDA installation inside the container needed.
Keeping Secrets Out of Your Git History
You spotted the ${POSTGRES_PASSWORD:-changeme} syntax. That's Docker Compose reading from a shell environment variable, with a default fallback. The professional move is to use a .env file. Create a .env file in the same directory as your docker-compose.yml:
POSTGRES_PASSWORD=your_secure_password_here
OLLAMA_API_KEY=sk-ollama-... # If using Ollama Cloud
OPENAI_API_KEY=sk-... # For fallback or embeddings
Crucially, add .env to your .gitignore. The docker-compose.yml file can be committed; the .env file is personal or environment-specific. Docker Compose automatically loads a .env file from the current directory. For production, you'd use Docker Secrets (with Docker Swarm) or a dedicated secrets manager, but for local dev, .env is the right tool.
The Override Pattern: Dev Speed vs. Prod Parity
Your docker-compose.yml should be the production-like baseline. Then, use docker-compose.override.yml for development conveniences. Docker Compose automatically merges them when you run docker-compose up. This file is typically .gitignore'd.
Create docker-compose.override.yml:
version: '3.8'
services:
app:
# Override the command to run a dev server with hot reload
command: uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Mount your local source code for instant changes
volumes:
- ./app:/code
- ./app/.venv:/opt/venv # Optional: persist a virtualenv inside the container
# Disable the production healthcheck for faster restart loops
healthcheck:
disable: true
# Add a debugger port
ports:
- "8000:8000"
- "5678:5678" # For debugpy
postgres:
# Maybe you want adminer for a quick DB GUI in dev
# Uncomment to add on port 8080
# image: adminer
# ports:
# - "8080:8080"
Now, run docker-compose up for development (with overrides). Run docker-compose -f docker-compose.yml up for a clean, production-like stack. This pattern is cleaner than commenting/uncommenting blocks in a single file.
Taming the Beast: Common Pitfalls and Their Fixes
You will hit these. Here's how to win.
1. Port Conflicts: "Address already in use"
You have a local PostgreSQL running on port 5432, and the container can't bind to it. Fix: Stop the local service (sudo systemctl stop postgresql on Linux) or change the host port mapping in the compose file: "5433:5432". Now connect to localhost:5433.
2. Volume Permission Denied (Linux)
Your app container crashes with Permission denied trying to write to a mounted volume. This happens because the container's user (often uid=1000) doesn't match your host user's uid. The quick fix: run sudo chown -R 1000:1000 ./app on your host project directory. The right fix: craft your app's Dockerfile to create a user with a dynamic UID using ARG and USER.
3. Startup Race Conditions
Your app starts before PostgreSQL is ready, crashes, and Docker restarts it in a loop. We solved this with healthcheck and depends_on: condition: service_healthy. If you ever see Error response from daemon: network not found, the fix is to re-create the network with docker network create ai-network or, more simply, run docker-compose down followed by docker-compose up. The down command removes networks by default; up recreates them.
4. The Dreaded "no space left on device"
You've been pulling models and Docker images for weeks. Your disk is full. The fix: run docker system prune -a --volumes. This is aggressive—it removes all stopped containers, unused networks, dangling images, and build cache. To prevent it, configure Docker Desktop's disk image size limits or, on Linux, set daemon max-size in /etc/docker/daemon.json: { "storage-opts": ["size=120GB"] }.
5. Apple Silicon / ARM64 Woes
You build an image on your Mac M3 and try to run it on an AMD64 Linux server. You get OCI runtime exec failed: exec format error. The fix: always build with explicit platform flags when targeting a different architecture. Use docker build --platform linux/amd64 -t your-image . or set DOCKER_DEFAULT_PLATFORM=linux/amd64 in your environment. Better yet, use Docker Buildx to build multi-architecture images.
Optimizing Your Stack: From Working to Fast
A running stack is good. A fast and efficient stack is professional. Use these tools from the approved ecosystem:
- Lint your Dockerfile: Run
hadolint ./app/Dockerfileto catch common mistakes like not combiningRUN apt-get update && apt-get installinto one layer. - Analyze image layers: Run
dive your-image:tagto visualize each layer's size and inefficiencies. Aim for a layer cache hit rate of 90%+ by ordering yourCOPYandRUNcommands wisely. - Scan for vulnerabilities: Run
trivy image your-image:tagbefore you run it. Don't deploy a container with critical CVEs. - Monitor in terminal: Run
lazydockerfor a glorious terminal UI that shows logs, stats, and lets you manage containers without memorizing 50 CLI flags.
Let's look at the impact of base image choice, a critical Dockerfile decision:
| Base Image | Size | Use Case | Key Trade-off |
|---|---|---|---|
python:3.12 | ~1.1 GB | "It just works" development | Massive overhead, includes build tools. |
python:3.12-slim | ~130 MB | General-purpose production | Stripped of common build deps, may need apt-get install for some packages. |
python:3.12-alpine | ~45 MB | Size-constrained production | Uses musl libc, can cause incompatibility with some Python wheels (e.g., grpcio). |
For AI apps, python:3.12-slim is often the sweet spot. You can build a final image under 200MB easily.
Next Steps: From Local Stack to Shared Reality
You now have a single command that spins up a complete, GPU-accelerated AI development environment. The docker-compose.yml file is the source of truth. Commit it. When your teammate pulls the repo, they run docker-compose up, wait for the ~45s first pull (subsequent starts are ~8s), and they're in business. No "can you send me the model file?" No "what's the Redis password?"
Take this foundation and extend it:
- Add a testing service: Integrate a container running
pytestwith the same network access, so your integration tests hit the real Ollama and Postgres containers. - Implement Docker Scout: Use it in your CI pipeline to get a bill of materials (SBOM) and continuous vulnerability assessment for your images. With Docker Hub serving 13M+ image pulls per day (Docker blog), knowing what's in yours is non-negotiable.
- Prepare for Prod: Create a
docker-compose.prod.ymlthat removes the development mounts, setsrestart: always, and points to your private image registry. Use Docker Buildx to build and push multi-arch images. - Orchestrate Model Downloads: Add an initialization script that uses the Ollama API (
curl -X POST http://ollama:11434/api/pull -d '{"name": "llama3.2:3b"}')) to pull frequently used models as part of your stack's health check, so they're pre-cached.
The goal isn't just to run containers. It's to make your AI development workflow as reliable and shareable as the Git repository that holds your code. Stop debugging environments and start building. Your GPU will thank you.