3 AM. Your phone buzzes. Ollama is down. Users are screaming. Your coffee is cold. Welcome to every DevOps engineer's favorite nightmare.
When your Ollama incident response playbook becomes your lifeline, you need procedures that work under pressure. This guide provides battle-tested emergency procedures to restore your Ollama production environment quickly and efficiently.
Why You Need an Ollama Incident Response Strategy
Production emergencies don't wait for convenient moments. Without a structured Ollama incident response playbook, your team wastes precious minutes figuring out basic troubleshooting steps while users experience downtime.
Common Ollama production failures include:
- Model loading failures
- Memory exhaustion
- Network connectivity issues
- Container crashes
- Storage problems
Incident Classification and Priority Matrix
Severity 1: Complete Service Outage
- Impact: All users cannot access Ollama services
- Response time: Immediate (< 5 minutes)
- Examples: Container crashes, complete model failures
Severity 2: Partial Service Degradation
- Impact: Some users experience slow responses
- Response time: 30 minutes
- Examples: Memory pressure, slow model inference
Severity 3: Minor Issues
- Impact: Individual user problems
- Response time: 4 hours
- Examples: Specific model timeouts, logging issues
Emergency Response Procedures
Phase 1: Immediate Assessment (0-5 minutes)
Step 1: Verify the Incident
# Check Ollama service status
docker ps | grep ollama
# Quick health check
curl -f http://localhost:11434/api/tags || echo "Ollama API unreachable"
# Check system resources
free -h
df -h
Step 2: Gather Critical Information
# Capture current state
docker logs ollama --tail 100 > incident_logs_$(date +%Y%m%d_%H%M%S).txt
# Check running processes
ps aux | grep ollama
# Network connectivity test
netstat -tlnp | grep :11434
Expected outcomes: You should have clear evidence of the failure type and recent log entries.
Phase 2: Immediate Stabilization (5-15 minutes)
Step 3: Implement Quick Fixes
For container crashes:
# Restart Ollama container
docker restart ollama
# Verify restart success
docker ps | grep ollama
curl -f http://localhost:11434/api/tags
For memory exhaustion:
# Clear cached models
docker exec ollama ollama rm $(docker exec ollama ollama list | grep -v NAME | awk '{print $1}')
# Restart with memory limits
docker run -d --memory="8g" --name ollama-limited ollama/ollama
For model loading failures:
# Check available models
docker exec ollama ollama list
# Reload specific model
docker exec ollama ollama pull llama2:7b
# Test model functionality
docker exec ollama ollama run llama2:7b "Test prompt"
Expected outcomes: Service should be partially or fully restored within 15 minutes.
Phase 3: Root Cause Analysis (15-60 minutes)
Step 4: Deep Diagnostic Investigation
# Comprehensive log analysis
docker logs ollama --since="1h" | grep -i "error\|failed\|exception"
# System resource analysis
iostat -x 1 5 # Check disk I/O
top -bn1 | head -20 # CPU and memory usage
# Network analysis
ss -tuln | grep :11434
tcpdump -i any port 11434 -c 10
Step 5: Identify Infrastructure Issues
# Check Docker daemon health
docker system df
docker system events --since="1h"
# Verify storage health
lsblk
mount | grep docker
Expected outcomes: You should identify the root cause and have supporting evidence.
Phase 4: Full Recovery Implementation (60+ minutes)
Step 6: Implement Permanent Solutions
For persistent memory issues:
# Create optimized Ollama deployment
cat > docker-compose.yml << EOF
version: '3.8'
services:
ollama:
image: ollama/ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
environment:
- OLLAMA_MAX_LOADED_MODELS=2
- OLLAMA_NUM_PARALLEL=4
deploy:
resources:
limits:
memory: 8G
reservations:
memory: 4G
volumes:
ollama_data:
EOF
docker-compose up -d
Step 7: Verification and Monitoring
# Comprehensive service test
models=("llama2:7b" "codellama:7b" "mistral:7b")
for model in "${models[@]}"; do
echo "Testing $model..."
timeout 30 docker exec ollama ollama run $model "Hello world" || echo "$model failed"
done
# Setup monitoring
docker exec ollama sh -c 'echo "* * * * * curl -f http://localhost:11434/api/tags || echo \"Ollama health check failed\" | logger" | crontab -'
Expected outcomes: All models should load successfully, and monitoring should be active.
Prevention and Monitoring Strategies
Proactive Monitoring Setup
# Resource monitoring script
cat > monitor_ollama.sh << 'EOF'
#!/bin/bash
THRESHOLD_CPU=80
THRESHOLD_MEM=85
THRESHOLD_DISK=90
CPU_USAGE=$(docker stats ollama --no-stream --format "{{.CPUPerc}}" | sed 's/%//')
MEM_USAGE=$(docker stats ollama --no-stream --format "{{.MemPerc}}" | sed 's/%//')
DISK_USAGE=$(df -h /var/lib/docker | awk 'NR==2 {print $5}' | sed 's/%//')
if (( $(echo "$CPU_USAGE > $THRESHOLD_CPU" | bc -l) )); then
echo "WARNING: CPU usage high: $CPU_USAGE%"
fi
if (( $(echo "$MEM_USAGE > $THRESHOLD_MEM" | bc -l) )); then
echo "WARNING: Memory usage high: $MEM_USAGE%"
fi
if (( $(echo "$DISK_USAGE > $THRESHOLD_DISK" | bc -l) )); then
echo "WARNING: Disk usage high: $DISK_USAGE%"
fi
EOF
chmod +x monitor_ollama.sh
Health Check Implementation
# Docker health check
docker run -d \
--name ollama-monitored \
--health-cmd="curl -f http://localhost:11434/api/tags || exit 1" \
--health-interval=30s \
--health-timeout=10s \
--health-retries=3 \
ollama/ollama
Communication and Documentation
Incident Communication Template
## Incident Status Update
**Time**: [TIMESTAMP]
**Status**: [INVESTIGATING/IDENTIFIED/MONITORING/RESOLVED]
**Impact**: [DESCRIPTION]
**Next Update**: [TIME]
### Summary
[Brief description of current situation]
### Actions Taken
- [Action 1]
- [Action 2]
### Next Steps
- [Next action with timeline]
Post-Incident Review Process
- Timeline reconstruction: Document all actions taken with timestamps
- Root cause analysis: Identify why the incident occurred
- Process improvements: Update procedures based on lessons learned
- Prevention measures: Implement safeguards to prevent recurrence
Advanced Troubleshooting Scenarios
Scenario 1: Model Corruption Recovery
# Identify corrupted models
docker exec ollama ollama list | grep -v "NAME" | while read model; do
docker exec ollama ollama run "$model" "test" || echo "$model corrupted"
done
# Recreate corrupted models
docker exec ollama ollama rm corrupted_model
docker exec ollama ollama pull corrupted_model
Scenario 2: Network Isolation Issues
# Check network connectivity
docker network ls
docker exec ollama ping google.com
# Recreate network if needed
docker network create ollama-network
docker run -d --network ollama-network ollama/ollama
Scenario 3: Storage Recovery
# Check storage health
docker exec ollama df -h /root/.ollama
docker volume inspect ollama_data
# Backup and restore if needed
docker exec ollama tar -czf /backup/ollama_backup.tar.gz /root/.ollama
docker run -v ollama_data:/restore alpine sh -c "cd /restore && tar -xzf /backup/ollama_backup.tar.gz"
Key Takeaways
Your Ollama incident response playbook should focus on speed and accuracy. The procedures outlined here provide a structured approach to handle production emergencies effectively.
Remember these critical points:
- Immediate assessment prevents wasted time on wrong solutions
- Quick stabilization minimizes user impact
- Root cause analysis prevents future incidents
- Comprehensive recovery ensures long-term stability
Keep this playbook accessible and regularly update it based on new incidents and system changes. Your future self (and your users) will thank you when the next emergency strikes.
Need help implementing these procedures? Consider setting up automated monitoring and alerting systems to catch issues before they become emergencies.