Ollama Incident Response Playbook: Emergency Procedures That Actually Work

3 AM. Your phone buzzes. Ollama is down. Users are screaming. Your coffee is cold. Welcome to every DevOps engineer's favorite nightmare.

When your Ollama incident response playbook becomes your lifeline, you need procedures that work under pressure. This guide provides battle-tested emergency procedures to restore your Ollama production environment quickly and efficiently.

Why You Need an Ollama Incident Response Strategy

Production emergencies don't wait for convenient moments. Without a structured Ollama incident response playbook, your team wastes precious minutes figuring out basic troubleshooting steps while users experience downtime.

Common Ollama production failures include:

Model loading failures
Memory exhaustion
Network connectivity issues
Container crashes
Storage problems

Incident Classification and Priority Matrix

Severity 1: Complete Service Outage

Impact: All users cannot access Ollama services
Response time: Immediate (< 5 minutes)
Examples: Container crashes, complete model failures

Severity 2: Partial Service Degradation

Impact: Some users experience slow responses
Response time: 30 minutes
Examples: Memory pressure, slow model inference

Severity 3: Minor Issues

Impact: Individual user problems
Response time: 4 hours
Examples: Specific model timeouts, logging issues

Emergency Response Procedures

Phase 1: Immediate Assessment (0-5 minutes)

Step 1: Verify the Incident

# Check Ollama service status
docker ps | grep ollama

# Quick health check
curl -f http://localhost:11434/api/tags || echo "Ollama API unreachable"

# Check system resources
free -h
df -h

Step 2: Gather Critical Information

# Capture current state
docker logs ollama --tail 100 > incident_logs_$(date +%Y%m%d_%H%M%S).txt

# Check running processes
ps aux | grep ollama

# Network connectivity test
netstat -tlnp | grep :11434

Expected outcomes: You should have clear evidence of the failure type and recent log entries.

Phase 2: Immediate Stabilization (5-15 minutes)

Step 3: Implement Quick Fixes

For container crashes:

# Restart Ollama container
docker restart ollama

# Verify restart success
docker ps | grep ollama
curl -f http://localhost:11434/api/tags

For memory exhaustion:

# Clear cached models
docker exec ollama ollama rm $(docker exec ollama ollama list | grep -v NAME | awk '{print $1}')

# Restart with memory limits
docker run -d --memory="8g" --name ollama-limited ollama/ollama

For model loading failures:

# Check available models
docker exec ollama ollama list

# Reload specific model
docker exec ollama ollama pull llama2:7b

# Test model functionality
docker exec ollama ollama run llama2:7b "Test prompt"

Expected outcomes: Service should be partially or fully restored within 15 minutes.

Phase 3: Root Cause Analysis (15-60 minutes)

Step 4: Deep Diagnostic Investigation

# Comprehensive log analysis
docker logs ollama --since="1h" | grep -i "error\|failed\|exception"

# System resource analysis
iostat -x 1 5  # Check disk I/O
top -bn1 | head -20  # CPU and memory usage

# Network analysis
ss -tuln | grep :11434
tcpdump -i any port 11434 -c 10

Step 5: Identify Infrastructure Issues

# Check Docker daemon health
docker system df
docker system events --since="1h"

# Verify storage health
lsblk
mount | grep docker

Expected outcomes: You should identify the root cause and have supporting evidence.

Phase 4: Full Recovery Implementation (60+ minutes)

Step 6: Implement Permanent Solutions

For persistent memory issues:

# Create optimized Ollama deployment
cat > docker-compose.yml << EOF
version: '3.8'
services:
  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    environment:
      - OLLAMA_MAX_LOADED_MODELS=2
      - OLLAMA_NUM_PARALLEL=4
    deploy:
      resources:
        limits:
          memory: 8G
        reservations:
          memory: 4G
volumes:
  ollama_data:
EOF

docker-compose up -d

Step 7: Verification and Monitoring

# Comprehensive service test
models=("llama2:7b" "codellama:7b" "mistral:7b")
for model in "${models[@]}"; do
    echo "Testing $model..."
    timeout 30 docker exec ollama ollama run $model "Hello world" || echo "$model failed"
done

# Setup monitoring
docker exec ollama sh -c 'echo "* * * * * curl -f http://localhost:11434/api/tags || echo \"Ollama health check failed\" | logger" | crontab -'

Expected outcomes: All models should load successfully, and monitoring should be active.

Prevention and Monitoring Strategies

Proactive Monitoring Setup

# Resource monitoring script
cat > monitor_ollama.sh << 'EOF'
#!/bin/bash
THRESHOLD_CPU=80
THRESHOLD_MEM=85
THRESHOLD_DISK=90

CPU_USAGE=$(docker stats ollama --no-stream --format "{{.CPUPerc}}" | sed 's/%//')
MEM_USAGE=$(docker stats ollama --no-stream --format "{{.MemPerc}}" | sed 's/%//')
DISK_USAGE=$(df -h /var/lib/docker | awk 'NR==2 {print $5}' | sed 's/%//')

if (( $(echo "$CPU_USAGE > $THRESHOLD_CPU" | bc -l) )); then
    echo "WARNING: CPU usage high: $CPU_USAGE%"
fi

if (( $(echo "$MEM_USAGE > $THRESHOLD_MEM" | bc -l) )); then
    echo "WARNING: Memory usage high: $MEM_USAGE%"
fi

if (( $(echo "$DISK_USAGE > $THRESHOLD_DISK" | bc -l) )); then
    echo "WARNING: Disk usage high: $DISK_USAGE%"
fi
EOF

chmod +x monitor_ollama.sh

Health Check Implementation

# Docker health check
docker run -d \
  --name ollama-monitored \
  --health-cmd="curl -f http://localhost:11434/api/tags || exit 1" \
  --health-interval=30s \
  --health-timeout=10s \
  --health-retries=3 \
  ollama/ollama

Communication and Documentation

Incident Communication Template

## Incident Status Update

**Time**: [TIMESTAMP]
**Status**: [INVESTIGATING/IDENTIFIED/MONITORING/RESOLVED]
**Impact**: [DESCRIPTION]
**Next Update**: [TIME]

### Summary
[Brief description of current situation]

### Actions Taken
- [Action 1]
- [Action 2]

### Next Steps
- [Next action with timeline]

Post-Incident Review Process

Timeline reconstruction: Document all actions taken with timestamps
Root cause analysis: Identify why the incident occurred
Process improvements: Update procedures based on lessons learned
Prevention measures: Implement safeguards to prevent recurrence

Advanced Troubleshooting Scenarios

Scenario 1: Model Corruption Recovery

# Identify corrupted models
docker exec ollama ollama list | grep -v "NAME" | while read model; do
    docker exec ollama ollama run "$model" "test" || echo "$model corrupted"
done

# Recreate corrupted models
docker exec ollama ollama rm corrupted_model
docker exec ollama ollama pull corrupted_model

Scenario 2: Network Isolation Issues

# Check network connectivity
docker network ls
docker exec ollama ping google.com

# Recreate network if needed
docker network create ollama-network
docker run -d --network ollama-network ollama/ollama

Scenario 3: Storage Recovery

# Check storage health
docker exec ollama df -h /root/.ollama
docker volume inspect ollama_data

# Backup and restore if needed
docker exec ollama tar -czf /backup/ollama_backup.tar.gz /root/.ollama
docker run -v ollama_data:/restore alpine sh -c "cd /restore && tar -xzf /backup/ollama_backup.tar.gz"

Key Takeaways

Your Ollama incident response playbook should focus on speed and accuracy. The procedures outlined here provide a structured approach to handle production emergencies effectively.

Remember these critical points:

Immediate assessment prevents wasted time on wrong solutions
Quick stabilization minimizes user impact
Root cause analysis prevents future incidents
Comprehensive recovery ensures long-term stability

Keep this playbook accessible and regularly update it based on new incidents and system changes. Your future self (and your users) will thank you when the next emergency strikes.

Need help implementing these procedures? Consider setting up automated monitoring and alerting systems to catch issues before they become emergencies.