Your AI model just broke production. Again. The team stares at error logs while users flood support channels. Sound familiar?
Automated Ollama model testing in CI/CD pipelines prevents these disasters. This guide shows you how to build bulletproof model deployment workflows that catch issues before they reach users.
You'll learn to set up automated testing, create robust pipelines, and deploy with confidence. No more 3 AM deployment panics.
What Is CI/CD Pipeline Integration for Ollama Models?
CI/CD pipeline integration automates Ollama model testing throughout your development workflow. Your pipeline runs tests automatically when code changes, validates model performance, and prevents broken deployments.
Traditional model deployment relies on manual testing. Developers run a few prompts, check outputs, and cross their fingers. This approach fails at scale.
Automated testing catches three critical issues:
- Model loading failures
- Performance degradation
- Output quality problems
Integration happens at multiple pipeline stages. Pre-commit hooks test basic functionality. Build stages validate model compatibility. Deployment stages verify production readiness.
Key Benefits of Automated Ollama Testing
Automated testing reduces deployment failures by 90%. Teams ship faster with confidence. Quality stays consistent across releases.
Cost savings add up quickly. Manual testing takes 2-3 hours per release. Automation completes the same tests in 5 minutes. A team releasing weekly saves 150+ hours annually.
Error detection improves dramatically. Automated tests catch edge cases humans miss. Regression testing prevents old bugs from returning.
Setting Up Your Ollama Testing Environment
Your testing environment needs isolation from production systems. Create dedicated test instances that mirror production configurations without affecting live services.
Prerequisites and Dependencies
Install these tools before starting:
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Install testing frameworks
pip install pytest httpx docker-compose
# Install CI/CD tools (GitHub Actions example)
# .github/workflows/ directory in your repository
Docker containers provide consistent testing environments. Your tests run identically across local machines, CI servers, and staging environments.
Docker Configuration for Ollama Testing
Create a docker-compose.test.yml file:
version: '3.8'
services:
ollama-test:
image: ollama/ollama:latest
ports:
- "11434:11434"
environment:
- OLLAMA_HOST=0.0.0.0
volumes:
- ollama_test_data:/root/.ollama
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/version"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
test-runner:
build: ./tests
depends_on:
ollama-test:
condition: service_healthy
environment:
- OLLAMA_BASE_URL=http://ollama-test:11434
volumes:
- ./tests:/app/tests
- ./models:/app/models
volumes:
ollama_test_data:
This configuration ensures clean test runs. Each test cycle starts with a fresh Ollama instance. Dependencies wait for service readiness before running tests.
Building Automated Test Suites
Effective test suites cover model loading, API responses, and output quality. Structure tests in three categories: smoke tests, integration tests, and performance tests.
Smoke Tests for Basic Functionality
Smoke tests verify core operations work correctly. These fast tests catch obvious failures before deeper testing begins.
# tests/test_smoke.py
import pytest
import httpx
import time
class TestOllamaSmoke:
def setup_method(self):
"""Initialize test client and wait for Ollama"""
self.base_url = "http://localhost:11434"
self.client = httpx.Client(base_url=self.base_url)
self._wait_for_service()
def _wait_for_service(self, timeout=60):
"""Wait for Ollama service to be ready"""
start_time = time.time()
while time.time() - start_time < timeout:
try:
response = self.client.get("/api/version")
if response.status_code == 200:
return
except httpx.RequestError:
pass
time.sleep(2)
pytest.fail("Ollama service failed to start within timeout")
def test_service_health(self):
"""Verify Ollama service responds to health checks"""
response = self.client.get("/api/version")
assert response.status_code == 200
assert "version" in response.json()
def test_model_list(self):
"""Check model listing functionality"""
response = self.client.get("/api/tags")
assert response.status_code == 200
data = response.json()
assert "models" in data
These tests run in under 10 seconds. Quick feedback helps developers fix issues immediately.
Integration Tests for Model Interactions
Integration tests validate complete model workflows. Test model pulling, loading, and inference operations together.
# tests/test_integration.py
import pytest
import httpx
import json
class TestOllamaIntegration:
@pytest.fixture(autouse=True)
def setup(self):
self.client = httpx.Client(base_url="http://localhost:11434")
self.test_model = "llama2:7b"
def test_model_pull(self):
"""Test pulling a model from registry"""
payload = {"name": self.test_model}
# Stream the pull request
with self.client.stream("POST", "/api/pull", json=payload) as response:
assert response.status_code == 200
# Verify pull completion
for line in response.iter_lines():
if line:
data = json.loads(line)
if data.get("status") == "success":
break
def test_model_inference(self):
"""Test basic model inference"""
payload = {
"model": self.test_model,
"prompt": "What is 2+2? Answer with just the number.",
"stream": False
}
response = self.client.post("/api/generate", json=payload)
assert response.status_code == 200
result = response.json()
assert "response" in result
assert "4" in result["response"]
def test_model_chat_format(self):
"""Test chat-formatted requests"""
payload = {
"model": self.test_model,
"messages": [
{"role": "user", "content": "Hello! Respond with 'Hi there!'"}
],
"stream": False
}
response = self.client.post("/api/chat", json=payload)
assert response.status_code == 200
result = response.json()
assert "message" in result
assert "Hi there" in result["message"]["content"]
Integration tests catch compatibility issues between model versions. Run these tests when updating models or Ollama versions.
Performance and Quality Tests
Performance tests ensure models meet speed and accuracy requirements. Set clear benchmarks for response times and output quality.
# tests/test_performance.py
import pytest
import httpx
import time
import json
class TestOllamaPerformance:
def setup_method(self):
self.client = httpx.Client(base_url="http://localhost:11434", timeout=30.0)
self.test_model = "llama2:7b"
def test_response_time(self):
"""Verify response times meet requirements"""
payload = {
"model": self.test_model,
"prompt": "What is machine learning?",
"stream": False
}
start_time = time.time()
response = self.client.post("/api/generate", json=payload)
end_time = time.time()
assert response.status_code == 200
response_time = end_time - start_time
# Fail if response takes longer than 10 seconds
assert response_time < 10.0, f"Response took {response_time:.2f}s, expected < 10s"
def test_output_quality(self):
"""Test output meets quality standards"""
test_cases = [
{
"prompt": "What is 2+2?",
"expected_keywords": ["4", "four"],
"description": "Basic math"
},
{
"prompt": "Name three colors.",
"expected_keywords": ["red", "blue", "green", "yellow"],
"description": "Simple listing"
}
]
for case in test_cases:
payload = {
"model": self.test_model,
"prompt": case["prompt"],
"stream": False
}
response = self.client.post("/api/generate", json=payload)
assert response.status_code == 200
result = response.json()["response"].lower()
# Check if at least one expected keyword appears
found_keywords = [kw for kw in case["expected_keywords"] if kw in result]
assert found_keywords, f"No expected keywords found for {case['description']}"
def test_concurrent_requests(self):
"""Test handling multiple simultaneous requests"""
import concurrent.futures
def make_request():
payload = {
"model": self.test_model,
"prompt": "Count to 5.",
"stream": False
}
response = self.client.post("/api/generate", json=payload)
return response.status_code
# Run 3 concurrent requests
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(make_request) for _ in range(3)]
results = [future.result() for future in futures]
# All requests should succeed
assert all(status == 200 for status in results)
Quality tests prevent deploying models that give poor responses. Set minimum accuracy thresholds for your use cases.
GitHub Actions Pipeline Configuration
GitHub Actions provides excellent CI/CD integration for Ollama testing. Configure workflows that run tests automatically on code changes.
Complete Workflow Example
Create .github/workflows/ollama-test.yml:
name: Ollama Model CI/CD Pipeline
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}/ollama-app
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytest httpx docker-compose
- name: Start Ollama test environment
run: |
docker-compose -f docker-compose.test.yml up -d
# Wait for services to be ready
sleep 30
- name: Run smoke tests
run: |
pytest tests/test_smoke.py -v --tb=short
- name: Pull test model
run: |
docker-compose -f docker-compose.test.yml exec -T ollama-test ollama pull llama2:7b
- name: Run integration tests
run: |
pytest tests/test_integration.py -v --tb=short
- name: Run performance tests
run: |
pytest tests/test_performance.py -v --tb=short
- name: Generate test report
if: always()
run: |
pytest --html=reports/report.html --self-contained-html
- name: Upload test results
if: always()
uses: actions/upload-artifact@v3
with:
name: test-results
path: reports/
- name: Clean up
if: always()
run: |
docker-compose -f docker-compose.test.yml down -v
build:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
deploy:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps:
- name: Deploy to staging
run: |
echo "Deploying to staging environment..."
# Add your deployment commands here
- name: Run deployment verification
run: |
echo "Running post-deployment tests..."
# Add verification tests here
This workflow provides complete automation. Tests run on every pull request. Builds happen on main branch merges. Deployments include verification steps.
Advanced Pipeline Features
Matrix testing validates multiple model configurations. Test different model sizes and versions simultaneously.
# Add to your workflow file
strategy:
matrix:
model: ['llama2:7b', 'mistral:7b', 'codellama:7b']
python-version: ['3.9', '3.10', '3.11']
steps:
- name: Test with ${{ matrix.model }}
env:
TEST_MODEL: ${{ matrix.model }}
run: |
pytest tests/ -v --model=${{ matrix.model }}
Parallel testing reduces pipeline execution time. Multiple jobs run simultaneously across different runners.
Jenkins Pipeline Implementation
Jenkins offers powerful pipeline capabilities for complex Ollama testing workflows. Use Jenkinsfiles for organizations with existing Jenkins infrastructure.
Declarative Pipeline Script
Create Jenkinsfile in your repository root:
pipeline {
agent any
environment {
DOCKER_COMPOSE_FILE = 'docker-compose.test.yml'
TEST_RESULTS_DIR = 'test-results'
OLLAMA_MODEL = 'llama2:7b'
}
stages {
stage('Checkout') {
steps {
checkout scm
script {
env.BUILD_VERSION = sh(
script: 'git rev-parse --short HEAD',
returnStdout: true
).trim()
}
}
}
stage('Setup Test Environment') {
steps {
script {
// Clean up any existing containers
sh 'docker-compose -f ${DOCKER_COMPOSE_FILE} down -v || true'
// Start test environment
sh 'docker-compose -f ${DOCKER_COMPOSE_FILE} up -d'
// Wait for Ollama to be ready
timeout(time: 5, unit: 'MINUTES') {
waitUntil {
script {
def result = sh(
script: 'docker-compose -f ${DOCKER_COMPOSE_FILE} exec -T ollama-test curl -f http://localhost:11434/api/version',
returnStatus: true
)
return result == 0
}
}
}
}
}
}
stage('Run Tests') {
parallel {
stage('Smoke Tests') {
steps {
sh '''
mkdir -p ${TEST_RESULTS_DIR}
pytest tests/test_smoke.py \
--junitxml=${TEST_RESULTS_DIR}/smoke-results.xml \
--html=${TEST_RESULTS_DIR}/smoke-report.html \
--self-contained-html
'''
}
}
stage('Integration Tests') {
steps {
sh '''
# Pull model for integration tests
docker-compose -f ${DOCKER_COMPOSE_FILE} exec -T ollama-test \
ollama pull ${OLLAMA_MODEL}
# Run integration tests
pytest tests/test_integration.py \
--junitxml=${TEST_RESULTS_DIR}/integration-results.xml \
--html=${TEST_RESULTS_DIR}/integration-report.html \
--self-contained-html
'''
}
}
stage('Performance Tests') {
steps {
sh '''
pytest tests/test_performance.py \
--junitxml=${TEST_RESULTS_DIR}/performance-results.xml \
--html=${TEST_RESULTS_DIR}/performance-report.html \
--self-contained-html
'''
}
}
}
}
stage('Build Image') {
when {
branch 'main'
}
steps {
script {
def image = docker.build("ollama-app:${env.BUILD_VERSION}")
docker.withRegistry('https://registry.hub.docker.com', 'docker-hub-credentials') {
image.push()
image.push('latest')
}
}
}
}
stage('Deploy to Staging') {
when {
branch 'main'
}
steps {
script {
// Deploy to staging environment
sh '''
kubectl set image deployment/ollama-app \
ollama-app=ollama-app:${BUILD_VERSION} \
--namespace=staging
kubectl rollout status deployment/ollama-app \
--namespace=staging --timeout=300s
'''
}
}
}
stage('Staging Tests') {
when {
branch 'main'
}
steps {
sh '''
# Run tests against staging environment
OLLAMA_BASE_URL=https://staging.example.com \
pytest tests/test_smoke.py tests/test_integration.py
'''
}
}
}
post {
always {
// Clean up test environment
sh 'docker-compose -f ${DOCKER_COMPOSE_FILE} down -v || true'
// Publish test results
publishTestResults testResultsPattern: 'test-results/*.xml'
publishHTML([
allowMissing: false,
alwaysLinkToLastBuild: true,
keepAll: true,
reportDir: 'test-results',
reportFiles: '*.html',
reportName: 'Test Report'
])
}
failure {
emailext(
subject: "Pipeline Failed: ${env.JOB_NAME} - ${env.BUILD_NUMBER}",
body: "Build failed. Check Jenkins for details: ${env.BUILD_URL}",
to: "${env.CHANGE_AUTHOR_EMAIL}"
)
}
success {
script {
if (env.BRANCH_NAME == 'main') {
slackSend(
channel: '#deployments',
color: 'good',
message: "✅ Ollama app deployed successfully: ${env.BUILD_VERSION}"
)
}
}
}
}
}
This Jenkins pipeline handles complex scenarios. Parallel testing, conditional deployments, and comprehensive reporting work together seamlessly.
Best Practices for Model Testing
Effective model testing requires careful planning and consistent execution. Follow these practices to build reliable testing workflows.
Test Data Management
Version control your test datasets alongside code. Changes to test data should trigger pipeline updates just like code changes.
# tests/conftest.py
import pytest
import json
import os
@pytest.fixture(scope="session")
def test_prompts():
"""Load standardized test prompts from version-controlled files"""
prompts_file = os.path.join(os.path.dirname(__file__), 'data', 'test_prompts.json')
with open(prompts_file, 'r') as f:
return json.load(f)
@pytest.fixture(scope="session")
def expected_responses():
"""Load expected response patterns for validation"""
responses_file = os.path.join(os.path.dirname(__file__), 'data', 'expected_responses.json')
with open(responses_file, 'r') as f:
return json.load(f)
Create test data files in tests/data/ directory:
// tests/data/test_prompts.json
{
"basic_math": [
"What is 2+2?",
"Calculate 10*5",
"What is 100/4?"
],
"reasoning": [
"If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?",
"A farmer has 17 sheep. All but 9 die. How many are left?"
],
"code_generation": [
"Write a Python function to reverse a string",
"Create a simple calculator in JavaScript"
]
}
Separate test data prevents brittle tests. Update prompts without modifying test code.
Environment Isolation
Isolate test environments completely from production systems. Use dedicated namespaces, networks, and resources.
# docker-compose.test.yml - Enhanced isolation
version: '3.8'
services:
ollama-test:
image: ollama/ollama:latest
container_name: ollama-test-${BUILD_ID:-local}
networks:
- test-network
ports:
- "${OLLAMA_PORT:-11434}:11434"
environment:
- OLLAMA_HOST=0.0.0.0
- OLLAMA_MODELS=/tmp/models # Temporary model storage
volumes:
- type: tmpfs
target: /root/.ollama
tmpfs:
size: 8G # Adjust based on model sizes
mem_limit: 8g
cpus: 4.0
networks:
test-network:
driver: bridge
name: ollama-test-${BUILD_ID:-local}
Resource limits prevent test interference. Tests can't consume excessive memory or CPU cycles.
Monitoring and Alerting
Monitor test execution metrics to identify performance trends. Track test duration, failure rates, and resource usage.
# tests/utils/metrics.py
import time
import json
import os
from datetime import datetime
class TestMetrics:
def __init__(self):
self.metrics = {
'test_start': datetime.utcnow().isoformat(),
'test_duration': 0,
'total_requests': 0,
'failed_requests': 0,
'avg_response_time': 0,
'model_name': None
}
def start_timer(self):
self._start_time = time.time()
def end_timer(self):
if hasattr(self, '_start_time'):
self.metrics['test_duration'] = time.time() - self._start_time
def record_request(self, success=True, response_time=0):
self.metrics['total_requests'] += 1
if not success:
self.metrics['failed_requests'] += 1
# Update rolling average response time
current_avg = self.metrics['avg_response_time']
total_requests = self.metrics['total_requests']
self.metrics['avg_response_time'] = (
(current_avg * (total_requests - 1) + response_time) / total_requests
)
def save_metrics(self, filename='test_metrics.json'):
"""Save metrics for trend analysis"""
metrics_dir = os.path.join(os.path.dirname(__file__), '..', '..', 'metrics')
os.makedirs(metrics_dir, exist_ok=True)
filepath = os.path.join(metrics_dir, filename)
# Append to existing metrics
if os.path.exists(filepath):
with open(filepath, 'r') as f:
existing_metrics = json.load(f)
existing_metrics.append(self.metrics)
else:
existing_metrics = [self.metrics]
with open(filepath, 'w') as f:
json.dump(existing_metrics, f, indent=2)
# Usage in tests
@pytest.fixture(scope="session")
def test_metrics():
metrics = TestMetrics()
metrics.start_timer()
yield metrics
metrics.end_timer()
metrics.save_metrics()
Trend analysis helps optimize pipeline performance. Identify slow tests, resource bottlenecks, and degradation patterns.
Troubleshooting Common Issues
Pipeline failures often stem from predictable causes. Address these common problems systematically.
Model Loading Failures
Models fail to load due to insufficient resources or network issues. Implement robust retry logic and resource checks.
# tests/utils/retry.py
import time
import httpx
from functools import wraps
def retry_on_failure(max_attempts=3, delay=5, exceptions=(httpx.RequestError,)):
"""Decorator to retry failed operations"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
last_exception = None
for attempt in range(max_attempts):
try:
return func(*args, **kwargs)
except exceptions as e:
last_exception = e
if attempt < max_attempts - 1:
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
time.sleep(delay)
else:
print(f"All {max_attempts} attempts failed")
raise last_exception
return wrapper
return decorator
# Enhanced model operations with retry logic
class RobustOllamaClient:
def __init__(self, base_url="http://localhost:11434"):
self.client = httpx.Client(base_url=base_url, timeout=60.0)
@retry_on_failure(max_attempts=3, delay=10)
def pull_model(self, model_name):
"""Pull model with retry logic"""
response = self.client.post("/api/pull", json={"name": model_name})
response.raise_for_status()
return response
@retry_on_failure(max_attempts=2, delay=5)
def test_model_inference(self, model_name, prompt):
"""Test inference with retry on failure"""
payload = {
"model": model_name,
"prompt": prompt,
"stream": False
}
response = self.client.post("/api/generate", json=payload)
response.raise_for_status()
return response.json()
def check_model_availability(self, model_name):
"""Verify model is loaded and ready"""
try:
response = self.client.get("/api/tags")
models = response.json().get("models", [])
return any(model["name"] == model_name for model in models)
except httpx.RequestError:
return False
Resource monitoring prevents out-of-memory failures:
# Add to your pipeline scripts
check_resources() {
# Check available memory
available_mem=$(free -m | awk 'NR==2{printf "%.0f", $7}')
required_mem=4096 # 4GB minimum for 7B models
if [ $available_mem -lt $required_mem ]; then
echo "Insufficient memory: ${available_mem}MB available, ${required_mem}MB required"
exit 1
fi
# Check disk space
available_disk=$(df -m . | awk 'NR==2{print $4}')
required_disk=10240 # 10GB for model storage
if [ $available_disk -lt $required_disk ]; then
echo "Insufficient disk space: ${available_disk}MB available, ${required_disk}MB required"
exit 1
fi
}
Network and Connectivity Problems
Container networking issues block API communication. Implement connection validation and debugging tools.
# tests/utils/diagnostics.py
import subprocess
import httpx
import time
class NetworkDiagnostics:
def __init__(self, ollama_host="localhost", ollama_port=11434):
self.host = ollama_host
self.port = ollama_port
self.base_url = f"http://{ollama_host}:{ollama_port}"
def diagnose_connection(self):
"""Run comprehensive connection diagnostics"""
results = {}
# Test basic connectivity
results['ping'] = self._test_ping()
results['port_open'] = self._test_port()
results['http_response'] = self._test_http()
results['api_version'] = self._test_api()
return results
def _test_ping(self):
"""Test basic network connectivity"""
try:
result = subprocess.run(
['ping', '-c', '1', self.host],
capture_output=True,
text=True,
timeout=10
)
return result.returncode == 0
except (subprocess.TimeoutExpired, FileNotFoundError):
return False
def _test_port(self):
"""Test if port is accessible"""
import socket
try:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
sock.settimeout(5)
result = sock.connect_ex((self.host, self.port))
return result == 0
except Exception:
return False
def _test_http(self):
"""Test HTTP connectivity"""
try:
with httpx.Client(timeout=10.0) as client:
response = client.get(f"{self.base_url}/")
return response.status_code in [200, 404] # 404 is OK for Ollama root
except Exception:
return False
def _test_api(self):
"""Test API endpoint specifically"""
try:
with httpx.Client(timeout=10.0) as client:
response = client.get(f"{self.base_url}/api/version")
return response.status_code == 200
except Exception:
return False
def wait_for_ready(self, timeout=120):
"""Wait for Ollama to become ready"""
start_time = time.time()
while time.time() - start_time < timeout:
if self._test_api():
return True
print(f"Waiting for Ollama at {self.base_url}...")
time.sleep(5)
# Run diagnostics if timeout
print("Timeout waiting for Ollama. Running diagnostics...")
diagnostics = self.diagnose_connection()
for test, result in diagnostics.items():
status = "PASS" if result else "FAIL"
print(f" {test}: {status}")
return False
# Usage in tests
def test_with_diagnostics():
diagnostics = NetworkDiagnostics()
if not diagnostics.wait_for_ready():
pytest.fail("Ollama service not ready after diagnostics")
Performance Bottlenecks
Identify and resolve pipeline performance issues systematically. Profile test execution and optimize slow operations.
# tests/utils/profiler.py
import cProfile
import pstats
import io
from contextlib import contextmanager
@contextmanager
def profile_test(test_name):
"""Profile test execution for performance analysis"""
pr = cProfile.Profile()
pr.enable()
try:
yield
finally:
pr.disable()
# Save profile data
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats()
# Write to file for analysis
with open(f'profiles/{test_name}_profile.txt', 'w') as f:
f.write(s.getvalue())
# Print top time consumers
print(f"\nTop 10 functions in {test_name}:")
ps.print_stats(10)
# Usage in performance tests
def test_inference_performance():
with profile_test("inference_performance"):
# Your test code here
client = RobustOllamaClient()
result = client.test_model_inference("llama2:7b", "Hello world")
assert result is not None
Pipeline optimization strategies:
# Optimize GitHub Actions workflow
jobs:
test:
runs-on: ubuntu-latest-8-cores # Use larger runners
strategy:
matrix:
test-group: [smoke, integration, performance]
steps:
- name: Cache Docker layers
uses: actions/cache@v3
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-
- name: Cache model downloads
uses: actions/cache@v3
with:
path: ~/.ollama/models
key: ollama-models-${{ hashFiles('tests/models.txt') }}
- name: Run tests in parallel
run: |
pytest tests/test_${{ matrix.test-group }}.py \
--numprocesses=auto \
--dist=worksteal
Advanced Pipeline Configurations
Advanced configurations handle complex deployment scenarios. Multi-environment testing, canary deployments, and rollback strategies ensure production stability.
Multi-Environment Testing
Test across development, staging, and production-like environments. Each environment validates different aspects of your deployment.
# .github/workflows/multi-env-test.yml
name: Multi-Environment Ollama Testing
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
test-development:
runs-on: ubuntu-latest
environment: development
steps:
- uses: actions/checkout@v4
- name: Setup Development Environment
run: |
# Lightweight config for dev testing
docker-compose -f docker-compose.dev.yml up -d
- name: Run Development Tests
run: |
# Fast smoke tests only
pytest tests/test_smoke.py --maxfail=1
test-staging:
needs: test-development
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- name: Deploy to Staging
run: |
# Deploy to staging infrastructure
kubectl apply -f k8s/staging/ --namespace=staging
kubectl wait --for=condition=ready pod -l app=ollama --namespace=staging
- name: Run Staging Tests
run: |
# Full test suite against staging
OLLAMA_BASE_URL=${{ secrets.STAGING_OLLAMA_URL }} \
pytest tests/ --tb=short
test-production-simulation:
needs: test-staging
runs-on: ubuntu-latest
environment: production-simulation
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Setup Production-like Environment
run: |
# Production resource limits and configuration
docker-compose -f docker-compose.prod-sim.yml up -d
- name: Load Test
run: |
# Simulate production load
pytest tests/test_load.py --users=100 --duration=300s
- name: Chaos Testing
run: |
# Test resilience under failure conditions
pytest tests/test_chaos.py
Environment-specific configurations ensure realistic testing:
# docker-compose.prod-sim.yml - Production simulation
version: '3.8'
services:
ollama-prod-sim:
image: ollama/ollama:latest
deploy:
resources:
limits:
memory: 16G
cpus: '8.0'
reservations:
memory: 8G
cpus: '4.0'
environment:
- OLLAMA_HOST=0.0.0.0
- OLLAMA_MAX_QUEUE=10
- OLLAMA_MAX_CONCURRENT=4
volumes:
- prod_sim_models:/root/.ollama
networks:
- prod-sim-network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/version"]
interval: 15s
timeout: 5s
retries: 5
start_period: 60s
load-balancer:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.prod-sim.conf:/etc/nginx/nginx.conf
depends_on:
ollama-prod-sim:
condition: service_healthy
networks:
prod-sim-network:
driver: bridge
volumes:
prod_sim_models:
Canary Deployment Integration
Canary deployments reduce risk by gradually rolling out changes. Test new model versions with limited traffic before full deployment.
# tests/test_canary.py
import pytest
import httpx
import random
import time
from typing import Dict, List
class CanaryTester:
def __init__(self, stable_url: str, canary_url: str, traffic_split: float = 0.1):
self.stable_url = stable_url
self.canary_url = canary_url
self.traffic_split = traffic_split
self.stable_client = httpx.Client(base_url=stable_url)
self.canary_client = httpx.Client(base_url=canary_url)
def route_request(self, payload: Dict) -> tuple[httpx.Response, str]:
"""Route request to stable or canary based on traffic split"""
if random.random() < self.traffic_split:
response = self.canary_client.post("/api/generate", json=payload)
return response, "canary"
else:
response = self.stable_client.post("/api/generate", json=payload)
return response, "stable"
def compare_responses(self, payload: Dict, iterations: int = 50) -> Dict:
"""Compare response quality between stable and canary"""
stable_responses = []
canary_responses = []
for _ in range(iterations):
# Get stable response
stable_resp = self.stable_client.post("/api/generate", json=payload)
if stable_resp.status_code == 200:
stable_responses.append(stable_resp.json()["response"])
# Get canary response
canary_resp = self.canary_client.post("/api/generate", json=payload)
if canary_resp.status_code == 200:
canary_responses.append(canary_resp.json()["response"])
return {
"stable_success_rate": len(stable_responses) / iterations,
"canary_success_rate": len(canary_responses) / iterations,
"stable_avg_length": sum(len(r) for r in stable_responses) / len(stable_responses) if stable_responses else 0,
"canary_avg_length": sum(len(r) for r in canary_responses) / len(canary_responses) if canary_responses else 0,
"stable_responses": stable_responses[:5], # Sample responses
"canary_responses": canary_responses[:5]
}
def test_canary_deployment():
"""Test canary deployment performs acceptably compared to stable"""
tester = CanaryTester(
stable_url="https://api.stable.example.com",
canary_url="https://api.canary.example.com"
)
test_prompts = [
"What is machine learning?",
"Explain quantum computing",
"Write a Python function to sort a list"
]
for prompt in test_prompts:
payload = {
"model": "llama2:7b",
"prompt": prompt,
"stream": False
}
comparison = tester.compare_responses(payload)
# Canary must maintain acceptable quality
assert comparison["canary_success_rate"] >= 0.95, "Canary success rate too low"
# Response length should be similar (within 50%)
stable_length = comparison["stable_avg_length"]
canary_length = comparison["canary_avg_length"]
if stable_length > 0:
length_ratio = canary_length / stable_length
assert 0.5 <= length_ratio <= 2.0, f"Canary response length too different: {length_ratio}"
def test_canary_performance():
"""Test canary deployment meets performance requirements"""
canary_client = httpx.Client(base_url="https://api.canary.example.com")
response_times = []
for _ in range(20):
start_time = time.time()
response = canary_client.post("/api/generate", json={
"model": "llama2:7b",
"prompt": "Hello world",
"stream": False
})
end_time = time.time()
if response.status_code == 200:
response_times.append(end_time - start_time)
avg_response_time = sum(response_times) / len(response_times)
# Canary should perform within 20% of expected baseline
expected_baseline = 3.0 # seconds
assert avg_response_time <= expected_baseline * 1.2, f"Canary too slow: {avg_response_time}s"
Automated rollback triggers prevent bad deployments:
# Add to your workflow
- name: Monitor Canary Health
run: |
# Run canary tests
pytest tests/test_canary.py --maxfail=1
# Check error rates from monitoring
error_rate=$(curl -s "https://monitoring.example.com/api/error-rate/canary" | jq '.rate')
if (( $(echo "$error_rate > 0.05" | bc -l) )); then
echo "High error rate detected: $error_rate"
echo "Triggering rollback..."
kubectl rollout undo deployment/ollama-app --namespace=production
exit 1
fi
Conclusion
Automated Ollama model testing in CI/CD pipelines transforms deployment reliability. Your team deploys confidently, catches issues early, and maintains consistent quality.
Implementation delivers measurable benefits: 90% fewer deployment failures, 150+ hours saved annually, and dramatically improved error detection. These improvements compound over time as your testing suite grows more comprehensive.
Start with basic smoke tests and expand gradually. Add integration tests once smoke tests run reliably. Include performance testing when your deployment frequency increases. Build canary deployment capabilities for critical production systems.
The investment in automated testing pays dividends immediately. Your next deployment crisis becomes a prevented incident. Your team focuses on building features instead of fixing production issues.
Ready to implement automated Ollama model testing? Begin with the smoke test examples and GitHub Actions workflow. Your future self will thank you when the 3 AM deployment pages stop coming.
Remember: automated testing isn't about perfection. It's about catching problems before your users do. Start small, iterate quickly, and build confidence with every successful deployment.