Blue-Green Deployment: Zero-Downtime Ollama Model Updates

Achieve zero-downtime Ollama model updates with blue-green deployment strategies. Learn implementation steps, traffic switching, and rollback procedures.

Ever had your AI service crash during a model update, leaving users staring at error messages? You're not alone. Model deployments fail spectacularly when done wrong, but blue-green deployment transforms this nightmare into a smooth, professional operation.

Blue-green deployment creates two identical production environments - one serves traffic (blue) while the other stays idle (green). When updating your Ollama models, you deploy to the idle environment, test thoroughly, then switch traffic instantly. Zero downtime. Zero drama.

This guide shows you how to implement blue-green deployment for Ollama models using Docker, load balancers, and automated scripts. You'll learn traffic switching, health checks, and rollback procedures that keep your AI services running 24/7.

Why Blue-Green Deployment Matters for Ollama Models

Traditional deployment methods create service interruptions. Users lose access while you restart containers, pull new models, or fix broken configurations. Blue-green deployment eliminates these problems.

Benefits of blue-green deployment:

  • Zero downtime: Users never experience service interruptions
  • Instant rollbacks: Switch back to working version in seconds
  • Safe testing: Validate new models before serving traffic
  • Reduced risk: Isolate deployment issues from production traffic

Model updates become routine operations instead of stressful events. Your monitoring dashboards stay green, and your users stay happy.

Prerequisites and Environment Setup

Before implementing blue-green deployment, prepare your infrastructure and tools.

Required components:

  • Docker and Docker Compose
  • Load balancer (HAProxy, Nginx, or cloud load balancer)
  • Monitoring system (Prometheus, Grafana, or equivalent)
  • Container orchestration platform (optional but recommended)

System requirements:

  • Sufficient memory for two model instances
  • Network connectivity between environments
  • Automated deployment scripts
  • Health check endpoints

Set up your base environment with these commands:

# Create project directory
mkdir ollama-blue-green
cd ollama-blue-green

# Initialize Docker Compose configuration
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
  ollama-blue:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ./models:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  ollama-green:
    image: ollama/ollama:latest
    ports:
      - "11435:11434"
    volumes:
      - ./models:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
EOF

# Create models directory
mkdir models

This configuration creates two identical Ollama instances on different ports. The blue environment runs on port 11434, while green uses 11435.

Implementing Blue-Green Infrastructure

Set up load balancing and traffic routing between your blue and green environments.

HAProxy Configuration

Create an HAProxy configuration that routes traffic based on backend availability:

cat > haproxy.cfg << 'EOF'
global
    daemon
    log stdout local0 info

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
    option httplog
    log global

frontend ollama_frontend
    bind *:8080
    default_backend ollama_active

backend ollama_active
    balance roundrobin
    option httpchk GET /api/tags
    http-check expect status 200
    server blue ollama-blue:11434 check
    # server green ollama-green:11434 check backup

backend ollama_blue
    balance roundrobin
    option httpchk GET /api/tags
    http-check expect status 200
    server blue ollama-blue:11434 check

backend ollama_green
    balance roundrobin
    option httpchk GET /api/tags
    http-check expect status 200
    server green ollama-green:11434 check
EOF

Add HAProxy to your Docker Compose file:

  haproxy:
    image: haproxy:2.8
    ports:
      - "8080:8080"
      - "8404:8404"
    volumes:
      - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
    depends_on:
      - ollama-blue
      - ollama-green
    command: haproxy -f /usr/local/etc/haproxy/haproxy.cfg

Health Check Implementation

Create robust health checks that verify model availability:

cat > health-check.sh << 'EOF'
#!/bin/bash

check_ollama_health() {
    local host=$1
    local port=$2
    local model_name=$3
    
    # Check if service is running
    if ! curl -sf "http://${host}:${port}/api/tags" > /dev/null; then
        echo "ERROR: Ollama service not responding at ${host}:${port}"
        return 1
    fi
    
    # Check if specific model is available
    if ! curl -sf "http://${host}:${port}/api/tags" | grep -q "$model_name"; then
        echo "ERROR: Model $model_name not found at ${host}:${port}"
        return 1
    fi
    
    # Test model inference
    local response=$(curl -sf "http://${host}:${port}/api/generate" \
        -H "Content-Type: application/json" \
        -d "{\"model\":\"$model_name\",\"prompt\":\"Hello\",\"stream\":false}")
    
    if [[ $? -ne 0 ]] || [[ -z "$response" ]]; then
        echo "ERROR: Model inference failed at ${host}:${port}"
        return 1
    fi
    
    echo "SUCCESS: Health check passed for ${host}:${port}"
    return 0
}

# Usage: ./health-check.sh <host> <port> <model_name>
check_ollama_health "$1" "$2" "$3"
EOF

chmod +x health-check.sh

Deployment Strategy and Traffic Switching

Implement automated deployment scripts that handle the complete blue-green workflow.

Deployment Script

Create a script that automates the entire deployment process:

cat > deploy.sh << 'EOF'
#!/bin/bash

set -e

# Configuration
BLUE_PORT=11434
GREEN_PORT=11435
HAPROXY_STATS="http://localhost:8404/stats"
MODEL_NAME=""
NEW_MODEL_PATH=""
ACTIVE_ENV=""

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

log() {
    echo -e "${BLUE}[$(date '+%Y-%m-%d %H:%M:%S')]${NC} $1"
}

error() {
    echo -e "${RED}[ERROR]${NC} $1" >&2
}

success() {
    echo -e "${GREEN}[SUCCESS]${NC} $1"
}

get_active_environment() {
    # Check which environment is currently receiving traffic
    if curl -sf "http://localhost:8080/api/tags" > /dev/null; then
        # Determine active environment by checking HAProxy stats
        if curl -sf "$HAPROXY_STATS" | grep -q "ollama-blue.*UP"; then
            echo "blue"
        elif curl -sf "$HAPROXY_STATS" | grep -q "ollama-green.*UP"; then
            echo "green"
        else
            echo "unknown"
        fi
    else
        echo "none"
    fi
}

deploy_to_environment() {
    local env=$1
    local port=$2
    
    log "Deploying to $env environment (port $port)"
    
    # Stop the target environment
    docker-compose stop ollama-$env
    
    # Pull new model if specified
    if [[ -n "$NEW_MODEL_PATH" ]]; then
        log "Installing new model: $MODEL_NAME"
        docker-compose run --rm ollama-$env ollama pull "$MODEL_NAME"
    fi
    
    # Start the environment
    docker-compose up -d ollama-$env
    
    # Wait for service to be ready
    log "Waiting for $env environment to be ready..."
    for i in {1..30}; do
        if ./health-check.sh localhost $port "$MODEL_NAME"; then
            success "$env environment is ready"
            return 0
        fi
        sleep 10
    done
    
    error "$env environment failed to start properly"
    return 1
}

switch_traffic() {
    local target_env=$1
    
    log "Switching traffic to $target_env environment"
    
    # Update HAProxy configuration
    if [[ "$target_env" == "blue" ]]; then
        sed -i 's/server blue.*backup/server blue ollama-blue:11434 check/' haproxy.cfg
        sed -i 's/server green ollama-green:11434 check/server green ollama-green:11434 check backup/' haproxy.cfg
    else
        sed -i 's/server blue ollama-blue:11434 check/server blue ollama-blue:11434 check backup/' haproxy.cfg
        sed -i 's/server green.*backup/server green ollama-green:11434 check/' haproxy.cfg
    fi
    
    # Reload HAProxy configuration
    docker-compose exec haproxy haproxy -f /usr/local/etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)
    
    # Verify traffic switch
    sleep 5
    if ./health-check.sh localhost 8080 "$MODEL_NAME"; then
        success "Traffic successfully switched to $target_env"
        return 0
    else
        error "Traffic switch verification failed"
        return 1
    fi
}

rollback() {
    local current_env=$1
    local target_env=$2
    
    log "Rolling back from $current_env to $target_env"
    
    if switch_traffic "$target_env"; then
        success "Rollback completed successfully"
    else
        error "Rollback failed"
        exit 1
    fi
}

# Main deployment logic
main() {
    if [[ $# -lt 1 ]]; then
        echo "Usage: $0 <model_name> [new_model_path]"
        exit 1
    fi
    
    MODEL_NAME=$1
    NEW_MODEL_PATH=$2
    
    # Determine current active environment
    ACTIVE_ENV=$(get_active_environment)
    log "Current active environment: $ACTIVE_ENV"
    
    # Determine target environment
    if [[ "$ACTIVE_ENV" == "blue" ]]; then
        TARGET_ENV="green"
        TARGET_PORT=$GREEN_PORT
    else
        TARGET_ENV="blue"
        TARGET_PORT=$BLUE_PORT
    fi
    
    log "Target environment: $TARGET_ENV"
    
    # Deploy to inactive environment
    if deploy_to_environment "$TARGET_ENV" "$TARGET_PORT"; then
        log "Deployment successful, switching traffic..."
        
        # Switch traffic to new environment
        if switch_traffic "$TARGET_ENV"; then
            success "Blue-green deployment completed successfully"
        else
            error "Traffic switch failed, rolling back..."
            rollback "$TARGET_ENV" "$ACTIVE_ENV"
        fi
    else
        error "Deployment failed"
        exit 1
    fi
}

# Handle rollback command
if [[ "$1" == "rollback" ]]; then
    ACTIVE_ENV=$(get_active_environment)
    if [[ "$ACTIVE_ENV" == "blue" ]]; then
        rollback "blue" "green"
    else
        rollback "green" "blue"
    fi
    exit 0
fi

main "$@"
EOF

chmod +x deploy.sh

Traffic Switching Implementation

Implement smooth traffic switching with validation:

cat > traffic-switch.sh << 'EOF'
#!/bin/bash

HAPROXY_ADMIN_SOCKET="/var/run/haproxy/admin.sock"

switch_to_environment() {
    local target_env=$1
    
    echo "Switching traffic to $target_env environment..."
    
    if [[ "$target_env" == "blue" ]]; then
        # Enable blue, disable green
        echo "enable server ollama_active/blue" | socat stdio unix-connect:$HAPROXY_ADMIN_SOCKET
        echo "disable server ollama_active/green" | socat stdio unix-connect:$HAPROXY_ADMIN_SOCKET
    else
        # Enable green, disable blue
        echo "enable server ollama_active/green" | socat stdio unix-connect:$HAPROXY_ADMIN_SOCKET
        echo "disable server ollama_active/blue" | socat stdio unix-connect:$HAPROXY_ADMIN_SOCKET
    fi
    
    # Verify switch
    sleep 2
    echo "show stat" | socat stdio unix-connect:$HAPROXY_ADMIN_SOCKET | grep ollama_active
}

# Usage: ./traffic-switch.sh <blue|green>
switch_to_environment "$1"
EOF

chmod +x traffic-switch.sh

Monitoring and Validation

Set up comprehensive monitoring to track deployment success and system health.

Deployment Monitoring

Create monitoring scripts that track key metrics during deployments:

cat > monitor-deployment.sh << 'EOF'
#!/bin/bash

# Monitor deployment metrics
monitor_deployment() {
    local duration=${1:-300}  # Default 5 minutes
    local interval=${2:-10}   # Default 10 seconds
    
    echo "Monitoring deployment for ${duration} seconds..."
    
    local start_time=$(date +%s)
    local end_time=$((start_time + duration))
    
    while [[ $(date +%s) -lt $end_time ]]; do
        local current_time=$(date '+%Y-%m-%d %H:%M:%S')
        
        # Check response times
        local response_time=$(curl -w "%{time_total}" -s -o /dev/null http://localhost:8080/api/tags)
        
        # Check error rates
        local status_code=$(curl -w "%{http_code}" -s -o /dev/null http://localhost:8080/api/tags)
        
        # Check active connections
        local active_connections=$(curl -s http://localhost:8404/stats | grep -c "UP")
        
        printf "%s | Response: %.3fs | Status: %s | Connections: %d\n" \
               "$current_time" "$response_time" "$status_code" "$active_connections"
        
        sleep $interval
    done
    
    echo "Monitoring completed"
}

# Usage: ./monitor-deployment.sh [duration] [interval]
monitor_deployment "$1" "$2"
EOF

chmod +x monitor-deployment.sh

Health Check Dashboard

Create a simple dashboard to visualize environment health:

cat > health-dashboard.sh << 'EOF'
#!/bin/bash

show_environment_status() {
    echo "=== Ollama Blue-Green Deployment Status ==="
    echo "Timestamp: $(date)"
    echo ""
    
    # Check blue environment
    echo "🔵 Blue Environment (Port 11434):"
    if ./health-check.sh localhost 11434 "llama2" > /dev/null 2>&1; then
        echo "   Status: ✅ Healthy"
    else
        echo "   Status: ❌ Unhealthy"
    fi
    
    # Check green environment
    echo "🟢 Green Environment (Port 11435):"
    if ./health-check.sh localhost 11435 "llama2" > /dev/null 2>&1; then
        echo "   Status: ✅ Healthy"
    else
        echo "   Status: ❌ Unhealthy"
    fi
    
    # Check load balancer
    echo "⚖️  Load Balancer (Port 8080):"
    if curl -sf http://localhost:8080/api/tags > /dev/null 2>&1; then
        echo "   Status: ✅ Active"
        echo "   Active Backend: $(curl -s http://localhost:8404/stats | grep -o 'ollama-[a-z]*.*UP' | head -1 | cut -d' ' -f1)"
    else
        echo "   Status: ❌ Inactive"
    fi
    
    echo ""
    echo "=== Resource Usage ==="
    docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep ollama
}

# Continuous monitoring mode
if [[ "$1" == "--watch" ]]; then
    while true; do
        clear
        show_environment_status
        sleep 5
    done
else
    show_environment_status
fi
EOF

chmod +x health-dashboard.sh

Rollback Procedures

Implement fast rollback capabilities for when deployments go wrong.

Automated Rollback

Create automated rollback triggers based on health check failures:

cat > auto-rollback.sh << 'EOF'
#!/bin/bash

# Configuration
HEALTH_CHECK_INTERVAL=30
FAILURE_THRESHOLD=3
ROLLBACK_TIMEOUT=120

perform_rollback() {
    local current_env=$1
    local target_env=$2
    
    echo "CRITICAL: Performing automatic rollback from $current_env to $target_env"
    
    # Log rollback event
    echo "$(date): Automatic rollback initiated" >> rollback.log
    
    # Execute rollback
    if ./deploy.sh rollback; then
        echo "SUCCESS: Rollback completed successfully"
        echo "$(date): Rollback completed successfully" >> rollback.log
    else
        echo "ERROR: Rollback failed - manual intervention required"
        echo "$(date): Rollback failed - manual intervention required" >> rollback.log
        exit 1
    fi
}

monitor_and_rollback() {
    local failure_count=0
    local current_env=""
    
    while true; do
        # Determine current active environment
        current_env=$(./deploy.sh get_active_environment)
        
        # Perform health check
        if ./health-check.sh localhost 8080 "llama2"; then
            failure_count=0
            echo "$(date): Health check passed for $current_env environment"
        else
            failure_count=$((failure_count + 1))
            echo "$(date): Health check failed for $current_env environment (failure $failure_count/$FAILURE_THRESHOLD)"
            
            if [[ $failure_count -ge $FAILURE_THRESHOLD ]]; then
                # Determine rollback target
                if [[ "$current_env" == "blue" ]]; then
                    target_env="green"
                else
                    target_env="blue"
                fi
                
                perform_rollback "$current_env" "$target_env"
                failure_count=0
            fi
        fi
        
        sleep $HEALTH_CHECK_INTERVAL
    done
}

# Start monitoring
echo "Starting automatic rollback monitoring..."
monitor_and_rollback
EOF

chmod +x auto-rollback.sh

Manual Rollback Commands

Provide simple commands for manual rollback scenarios:

cat > rollback-commands.sh << 'EOF'
#!/bin/bash

# Quick rollback commands for operators

rollback_to_blue() {
    echo "Rolling back to blue environment..."
    ./traffic-switch.sh blue
    echo "Rollback to blue completed"
}

rollback_to_green() {
    echo "Rolling back to green environment..."
    ./traffic-switch.sh green
    echo "Rollback to green completed"
}

emergency_stop() {
    echo "Emergency stop - taking all traffic offline..."
    docker-compose stop haproxy
    echo "Load balancer stopped - all traffic halted"
}

case "$1" in
    blue)
        rollback_to_blue
        ;;
    green)
        rollback_to_green
        ;;
    emergency)
        emergency_stop
        ;;
    *)
        echo "Usage: $0 {blue|green|emergency}"
        echo "  blue      - Switch traffic to blue environment"
        echo "  green     - Switch traffic to green environment"
        echo "  emergency - Stop all traffic (emergency use only)"
        exit 1
        ;;
esac
EOF

chmod +x rollback-commands.sh

Testing and Validation

Thoroughly test your blue-green deployment setup before production use.

End-to-End Testing

Create comprehensive tests that validate the entire deployment process:

cat > test-deployment.sh << 'EOF'
#!/bin/bash

run_deployment_tests() {
    echo "=== Starting Blue-Green Deployment Tests ==="
    
    # Test 1: Basic health checks
    echo "Test 1: Basic health checks"
    if ./health-check.sh localhost 11434 "llama2" && \
       ./health-check.sh localhost 11435 "llama2"; then
        echo "✅ Basic health checks passed"
    else
        echo "❌ Basic health checks failed"
        return 1
    fi
    
    # Test 2: Load balancer connectivity
    echo "Test 2: Load balancer connectivity"
    if curl -sf http://localhost:8080/api/tags > /dev/null; then
        echo "✅ Load balancer connectivity passed"
    else
        echo "❌ Load balancer connectivity failed"
        return 1
    fi
    
    # Test 3: Traffic switching
    echo "Test 3: Traffic switching"
    current_env=$(./deploy.sh get_active_environment)
    if [[ "$current_env" == "blue" ]]; then
        test_env="green"
    else
        test_env="blue"
    fi
    
    if ./traffic-switch.sh "$test_env"; then
        echo "✅ Traffic switching passed"
        # Switch back
        ./traffic-switch.sh "$current_env"
    else
        echo "❌ Traffic switching failed"
        return 1
    fi
    
    # Test 4: Model inference
    echo "Test 4: Model inference"
    response=$(curl -sf "http://localhost:8080/api/generate" \
        -H "Content-Type: application/json" \
        -d '{"model":"llama2","prompt":"Hello","stream":false}')
    
    if [[ -n "$response" ]]; then
        echo "✅ Model inference passed"
    else
        echo "❌ Model inference failed"
        return 1
    fi
    
    echo "=== All tests passed! ==="
    return 0
}

# Run tests
run_deployment_tests
EOF

chmod +x test-deployment.sh

Performance Testing

Validate performance during traffic switches:

cat > performance-test.sh << 'EOF'
#!/bin/bash

# Performance testing during deployments
test_performance() {
    local duration=${1:-60}
    local concurrent_requests=${2:-10}
    
    echo "Running performance test for ${duration} seconds with ${concurrent_requests} concurrent requests"
    
    # Create test script
    cat > load_test.sh << 'LOADTEST'
#!/bin/bash
while true; do
    curl -sf "http://localhost:8080/api/generate" \
        -H "Content-Type: application/json" \
        -d '{"model":"llama2","prompt":"Test","stream":false}' \
        -w "Time: %{time_total}s, Status: %{http_code}\n" \
        -o /dev/null
    sleep 1
done
LOADTEST
    
    chmod +x load_test.sh
    
    # Start concurrent requests
    for i in $(seq 1 $concurrent_requests); do
        timeout $duration ./load_test.sh &
    done
    
    # Wait for completion
    wait
    
    echo "Performance test completed"
    rm -f load_test.sh
}

# Usage: ./performance-test.sh [duration] [concurrent_requests]
test_performance "$1" "$2"
EOF

chmod +x performance-test.sh

Common Issues and Troubleshooting

Address frequent problems that occur during blue-green deployments.

Memory Management

Ollama models consume significant memory. Running two instances requires careful resource management:

# Check memory usage
docker stats --no-stream ollama-blue ollama-green

# Optimize model loading
cat > optimize-memory.sh << 'EOF'
#!/bin/bash

# Preload models to reduce startup time
preload_models() {
    local env=$1
    local models=("llama2" "codellama" "mistral")
    
    echo "Preloading models for $env environment..."
    
    for model in "${models[@]}"; do
        echo "Loading $model..."
        docker-compose exec ollama-$env ollama pull "$model"
    done
}

# Usage: ./optimize-memory.sh <blue|green>
preload_models "$1"
EOF

chmod +x optimize-memory.sh

Network Configuration

Ensure proper network connectivity between components:

# Test network connectivity
cat > test-network.sh << 'EOF'
#!/bin/bash

test_connectivity() {
    echo "Testing network connectivity..."
    
    # Test container-to-container communication
    if docker-compose exec haproxy nc -z ollama-blue 11434; then
        echo "✅ HAProxy -> Blue connectivity OK"
    else
        echo "❌ HAProxy -> Blue connectivity failed"
    fi
    
    if docker-compose exec haproxy nc -z ollama-green 11434; then
        echo "✅ HAProxy -> Green connectivity OK"
    else
        echo "❌ HAProxy -> Green connectivity failed"
    fi
    
    # Test external connectivity
    if curl -sf http://localhost:8080/api/tags > /dev/null; then
        echo "✅ External -> HAProxy connectivity OK"
    else
        echo "❌ External -> HAProxy connectivity failed"
    fi
}

test_connectivity
EOF

chmod +x test-network.sh

Debugging Deployment Issues

Create debugging tools for common deployment problems:

cat > debug-deployment.sh << 'EOF'
#!/bin/bash

debug_deployment() {
    echo "=== Deployment Debug Information ==="
    
    # Container status
    echo "Container Status:"
    docker-compose ps
    echo ""
    
    # Container logs
    echo "Recent container logs:"
    echo "--- Blue Environment ---"
    docker-compose logs --tail=10 ollama-blue
    echo ""
    echo "--- Green Environment ---"
    docker-compose logs --tail=10 ollama-green
    echo ""
    echo "--- HAProxy ---"
    docker-compose logs --tail=10 haproxy
    echo ""
    
    # Network connectivity
    echo "Network Connectivity:"
    ./test-network.sh
    echo ""
    
    # Resource usage
    echo "Resource Usage:"
    docker stats --no-stream
    echo ""
    
    # HAProxy statistics
    echo "HAProxy Statistics:"
    curl -s http://localhost:8404/stats | grep ollama
}

debug_deployment
EOF

chmod +x debug-deployment.sh

Best Practices and Optimization

Implement production-ready optimizations for reliability and performance.

Automated Health Checks

Set up comprehensive health monitoring:

cat > production-monitoring.sh << 'EOF'
#!/bin/bash

# Production monitoring configuration
setup_monitoring() {
    # Create monitoring configuration
    cat > monitoring-config.json << 'JSON'
{
    "health_checks": {
        "interval": 30,
        "timeout": 10,
        "retries": 3,
        "endpoints": [
            "http://localhost:11434/api/tags",
            "http://localhost:11435/api/tags",
            "http://localhost:8080/api/tags"
        ]
    },
    "alerts": {
        "email": "admin@example.com",
        "slack_webhook": "https://hooks.slack.com/...",
        "failure_threshold": 3
    },
    "metrics": {
        "response_time": true,
        "error_rate": true,
        "throughput": true,
        "memory_usage": true
    }
}
JSON

    echo "Monitoring configuration created"
}

setup_monitoring
EOF

chmod +x production-monitoring.sh

Performance Optimization

Optimize for production workloads:

cat > optimize-production.sh << 'EOF'
#!/bin/bash

# Production optimization settings
optimize_for_production() {
    echo "Applying production optimizations..."
    
    # Update Docker Compose with production settings
    cat > docker-compose.prod.yml << 'EOF'
version: '3.8'
services:
  ollama-blue:
    image: ollama/ollama:latest
    deploy:
      resources:
        limits:
          memory: 8G
        reservations:
          memory: 4G
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_NUM_PARALLEL=4
      - OLLAMA_MAX_LOADED_MODELS=2
    volumes:
      - ./models:/root/.ollama
      - /tmp:/tmp
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 15s
      timeout: 5s
      retries: 5
      start_period: 60s

  ollama-green:
    image: ollama/ollama:latest
    deploy:
      resources:
        limits:
          memory: 8G
        reservations:
          memory: 4G
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_NUM_PARALLEL=4
      - OLLAMA_MAX_LOADED_MODELS=2
    volumes:
      - ./models:/root/.ollama
      - /tmp:/tmp
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 15s
      timeout: 5s
      retries: 5
      start_period: 60s

  haproxy:
    image: haproxy:2.8
    deploy:
      resources:
        limits:
          memory: 512M
        reservations:
          memory: 256M
    volumes:
      - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
    restart: unless-stopped
    depends_on:
      - ollama-blue
      - ollama-green
EOF

    echo "Production configuration created"
}

optimize_for_production
EOF

chmod +x optimize-production.sh

Conclusion

Blue-green deployment transforms risky model updates into routine operations. Your Ollama services stay online while you deploy new models, test functionality, and switch traffic instantly. Users never experience downtime, and you gain confidence in your deployment process.

The scripts and configurations in this guide provide a complete blue-green deployment system. Start with the basic setup, then add monitoring, automation, and optimization features as your needs grow. Remember to test thoroughly in a staging environment before implementing in production.

Key benefits you've gained:

  • Zero-downtime model updates for better user experience
  • Instant rollback capabilities for rapid issue resolution
  • Comprehensive monitoring and health checks for system reliability
  • Automated deployment scripts for consistent operations

Your AI services now operate with enterprise-grade reliability. Model updates become strategic advantages rather than operational risks, and your users enjoy uninterrupted access to AI capabilities.

Ready to implement blue-green deployment for your Ollama models? Start with the Docker Compose setup, test the health checks, and gradually add automation features. Your future self will thank you when deployments work flawlessly every time.