Problem: AI Code Can Do Anything Your User Can
You're using Claude, GPT-4, or Copilot to generate code, but running it directly on your machine means AI mistakes could delete files, leak credentials, or max out your CPU.
You'll learn:
- How to isolate AI code in throwaway containers
- Set resource limits to prevent runaway processes
- Block network access and filesystem writes
- Detect malicious patterns before execution
Time: 15 min | Level: Intermediate
Why This Happens
AI models generate code based on patterns, not intent. They can't verify if code is safe, has bugs, or contains harmful commands. Running AI output directly means:
Common risks:
- File deletion (
rm -rfin scripts) - Credential theft (reading .env files)
- Network exfiltration (sending data externally)
- Resource exhaustion (infinite loops, memory leaks)
Real example: An AI-generated cleanup script accidentally included / in a recursive delete.
Solution
Step 1: Create a Minimal Sandbox Image
# Dockerfile.sandbox
FROM python:3.12-slim
# Remove package managers to prevent installing malware
RUN apt-get purge -y apt apt-get && \
rm -rf /var/lib/apt/lists/*
# Non-root user with no sudo
RUN useradd -m -u 1000 sandbox && \
chmod 755 /home/sandbox
# Read-only filesystem by default
WORKDIR /workspace
USER sandbox
# No network tools
ENV PYTHONUNBUFFERED=1
Why this works: Minimal attack surface - no package managers, no root, no network utilities.
docker build -f Dockerfile.sandbox -t ai-sandbox:latest .
Expected: Image builds in ~30 seconds, size under 200MB.
Step 2: Run Code with Security Constraints
#!/bin/bash
# run-sandbox.sh
# Copy AI-generated code to temp file
CODE_FILE="$1"
docker run \
--rm \
--network none \
--memory="256m" \
--memory-swap="256m" \
--cpus="0.5" \
--pids-limit=50 \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=50m \
--security-opt=no-new-privileges \
--cap-drop=ALL \
-v "$(pwd)/${CODE_FILE}:/workspace/code.py:ro" \
ai-sandbox:latest \
timeout 10s python /workspace/code.py
Security layers explained:
--network none- No internet access--memory="256m"- Prevents memory bombs--cpus="0.5"- Limits CPU to 50%--read-only- Entire filesystem immutable except /tmp--cap-drop=ALL- Removes all Linux capabilitiestimeout 10s- Kills after 10 seconds
If it fails:
- Error: "OCI runtime create failed": Update Docker to 24.0+
- Permission denied: Ensure CODE_FILE has read permissions
Step 3: Scan Code Before Execution
# pre_check.py
import re
import sys
DANGER_PATTERNS = [
r'rm\s+-rf\s+/', # Root deletion
r'eval\(', # Code injection
r'exec\(', # Dynamic execution
r'__import__\(["\']os', # OS module import
r'subprocess\.', # Shell commands
r'open\(.+["\']w', # File writes
r'requests\.', # Network calls
r'socket\.', # Raw sockets
]
def scan_code(filepath):
with open(filepath, 'r') as f:
code = f.read()
risks = []
for pattern in DANGER_PATTERNS:
if re.search(pattern, code, re.IGNORECASE):
risks.append(pattern)
if risks:
print(f"⚠️ Found {len(risks)} risky patterns:")
for r in risks:
print(f" - {r}")
return False
print("✅ No obvious risks detected")
return True
if __name__ == "__main__":
sys.exit(0 if scan_code(sys.argv[1]) else 1)
Use it:
python pre_check.py ai_script.py && ./run-sandbox.sh ai_script.py
Why pattern matching: Catches 80% of dangerous code before execution. Not perfect, but adds defense-in-depth.
Step 4: Capture and Review Output
#!/bin/bash
# run-with-logging.sh
TIMESTAMP=$(date +%s)
LOG_DIR="./sandbox-logs"
mkdir -p "$LOG_DIR"
# Run and capture everything
docker run \
--rm \
--network none \
--memory="256m" \
--cpus="0.5" \
--read-only \
--tmpfs /tmp:noexec,nosuid \
--cap-drop=ALL \
-v "$(pwd)/$1:/workspace/code.py:ro" \
ai-sandbox:latest \
timeout 10s python /workspace/code.py \
> "${LOG_DIR}/${TIMESTAMP}.stdout" \
2> "${LOG_DIR}/${TIMESTAMP}.stderr"
EXIT_CODE=$?
echo "Exit code: ${EXIT_CODE}"
echo "Logs saved to ${LOG_DIR}/${TIMESTAMP}.*"
# Flag suspicious outputs
if grep -qi "error\|exception\|killed" "${LOG_DIR}/${TIMESTAMP}.stderr"; then
echo "⚠️ Check stderr - possible issues detected"
fi
Expected: All output saved to timestamped files, easy to review later.
Advanced: Multi-Language Support
# docker-compose.sandbox.yml
version: '3.8'
services:
python-sandbox:
build:
context: .
dockerfile: Dockerfile.python-sandbox
network_mode: none
mem_limit: 256m
cpus: 0.5
read_only: true
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
node-sandbox:
build:
context: .
dockerfile: Dockerfile.node-sandbox
network_mode: none
mem_limit: 512m # Node needs more memory
cpus: 0.5
read_only: true
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
Usage:
# Run Python code
docker-compose -f docker-compose.sandbox.yml run --rm python-sandbox python /code/script.py
# Run JavaScript code
docker-compose -f docker-compose.sandbox.yml run --rm node-sandbox node /code/app.js
Verification
Test the sandbox:
# Create a malicious test script
cat > test_malicious.py << 'EOF'
import os
os.system('rm -rf /') # Should fail
EOF
./run-sandbox.sh test_malicious.py
You should see:
rm: cannot remove '/': Read-only file system
Container exits safely - filesystem protected, no damage done.
Production Checklist
- Rate limiting: Limit executions per user/hour
- Code size limits: Reject files over 100KB
- Execution timeout: Max 30s for production
- Log retention: Keep logs for 7 days minimum
- Alert on failures: Monitor high error rates
- Image updates: Rebuild base images monthly
- Secret scanning: Check for hardcoded API keys before running
Example rate limit (Redis):
import redis
import hashlib
r = redis.Redis()
def check_rate_limit(user_id, code):
code_hash = hashlib.sha256(code.encode()).hexdigest()
key = f"sandbox:{user_id}:{code_hash}"
# Allow 10 runs per hour
if r.incr(key) > 10:
return False
r.expire(key, 3600)
return True
What You Learned
- Docker provides OS-level isolation for untrusted code
- Resource limits prevent denial-of-service attacks
- Read-only filesystems block most destructive operations
- Pattern scanning catches obvious risks before execution
Limitations:
- Not bulletproof - advanced exploits can escape containers
- CPU limits slow legitimate heavy computations
- No GPU access in this basic setup
When NOT to use this:
- Code needing network access (use proxy containers instead)
- GPU-accelerated AI workloads (requires different approach)
- Windows-specific code (needs Windows containers)
Real-World Example: CI/CD Integration
# .github/workflows/ai-code-review.yml
name: Sandbox AI Suggestions
on:
pull_request:
paths:
- '**.py'
jobs:
sandbox-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build sandbox
run: docker build -f Dockerfile.sandbox -t ai-sandbox .
- name: Test AI-modified files
run: |
for file in $(git diff --name-only origin/main...HEAD | grep '.py$'); do
echo "Testing $file"
docker run --rm --network none --memory=256m \
-v "$PWD/$file:/workspace/code.py:ro" \
ai-sandbox timeout 10s python -m py_compile /workspace/code.py
done
Why this matters: Catches syntax errors and import issues in AI changes before merging.
Debugging Tips
Container won't start:
# Check Docker version
docker --version # Need 24.0+
# Test without security flags first
docker run --rm ai-sandbox python --version
Code times out:
# Increase timeout for legitimate long-running tasks
timeout 60s python /workspace/code.py # 60 seconds instead of 10
Memory errors:
# Check actual memory usage
docker stats --no-stream
# Increase if needed for data processing
--memory="512m"
Tested on Docker 25.0.2, Ubuntu 24.04, macOS 14.3, Python 3.12, Node.js 22.x