Detect Data Drift in Production AI Models in 15 Minutes

Problem: Your Model Works in Dev, Fails in Production

Your ML model had 95% accuracy last month. Today it's at 78% and you only noticed when users complained. Production data shifted, but you had no way to detect it.

You'll learn:

Set up Evidently AI for drift detection
Monitor feature and prediction drift automatically
Build alerts before accuracy degrades

Time: 15 min | Level: Intermediate

Why This Happens

Production data changes over time (seasonality, user behavior, market shifts). Your model trained on historical data becomes outdated, but traditional metrics don't catch drift until performance tanks.

Common symptoms:

Accuracy drops with no code changes
Predictions skew toward one class
Feature distributions shift silently
Alerts trigger only after user impact

Solution

Step 1: Install Evidently AI

pip install evidently --break-system-packages

Expected: Version 0.4.x or higher (supports production monitoring)

Step 2: Create Your First Drift Report

# drift_detector.py
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
import pandas as pd

# Load your reference data (training set)
reference_data = pd.read_csv('train_data.csv')

# Load current production data
current_data = pd.read_csv('production_week_6.csv')

# Generate drift report
report = Report(metrics=[
    DataDriftPreset(),
])

report.run(
    reference_data=reference_data,
    current_data=current_data,
    column_mapping=None  # Auto-detect feature types
)

# Save as interactive HTML
report.save_html('drift_report.html')

Why this works: Evidently compares statistical distributions between reference (training) and current (production) data using multiple drift detection algorithms.

If it fails:

Error: "Column mismatch": Ensure both datasets have identical column names
Empty report: Check that current_data has >100 rows for statistical significance

Step 3: Interpret Drift Metrics

Open drift_report.html in your browser. Key sections:

Dataset Drift:

✅ No drift detected: <30% of features drifted
⚠️ Drift detected: ≥50% of features drifted
🚨 Critical drift: Prediction column drifted

Per-Feature Drift:

# Programmatic access to drift metrics
drift_info = report.as_dict()

for feature in drift_info['metrics'][0]['result']['drift_by_columns']:
    if feature['drift_detected']:
        print(f"⚠️  {feature['column_name']}: drift score {feature['drift_score']:.3f}")

Expected output:

⚠️  customer_age: drift score 0.842
⚠️  transaction_amount: drift score 0.651

Step 4: Set Up Continuous Monitoring

# monitor.py
from evidently.ui.workspace import Workspace
from evidently.ui.dashboards import DashboardPanelCounter, CounterAgg
from evidently.metric_preset import DataDriftPreset, DataQualityPreset
import schedule
import time

# Create workspace for production monitoring
ws = Workspace.create("production_monitoring")

project = ws.create_project("fraud_detection_model")
project.description = "Monitors drift for fraud detection in prod"

def check_drift():
    """Run every hour in production"""
    current_data = fetch_latest_predictions()  # Your data pipeline
    
    report = Report(metrics=[
        DataDriftPreset(),
        DataQualityPreset(),  # Catches missing values, type changes
    ])
    
    report.run(
        reference_data=reference_data,
        current_data=current_data
    )
    
    # Save to workspace for historical tracking
    ws.add_report(project.id, report)
    
    # Alert on drift
    drift_result = report.as_dict()['metrics'][0]['result']
    if drift_result['dataset_drift']:
        send_alert(f"🚨 Data drift detected: {drift_result['number_of_drifted_columns']} features")

# Run every hour
schedule.every().hour.do(check_drift)

while True:
    schedule.run_pending()
    time.sleep(60)

Why hourly: Catches drift early without overwhelming compute. Adjust based on data volume.

Step 5: Configure Drift Detection Method

from evidently.options import DataDriftOptions
from evidently.calculations.stattests import StatTest

# Customize for your data type
data_drift_options = DataDriftOptions(
    # For continuous features: Kolmogorov-Smirnov test
    num_stattest=StatTest.KS,
    num_stattest_threshold=0.05,  # p-value < 0.05 = drift
    
    # For categorical features: Chi-squared test
    cat_stattest=StatTest.CHISQUARE,
    cat_stattest_threshold=0.05,
    
    # Drift if >50% of features drift
    drift_share=0.5,
)

report = Report(
    metrics=[DataDriftPreset()],
    options=[data_drift_options]
)

When to adjust thresholds:

Lower threshold (0.01): High-stakes models (medical, financial)
Higher threshold (0.1): Noisy data, tolerate more variation
Custom tests: Time-series data needs different stat tests

Step 6: Monitor Prediction Drift

from evidently.metric_preset import TargetDriftPreset

# If you have ground truth labels
report = Report(metrics=[
    TargetDriftPreset(),  # Monitors prediction distribution
])

report.run(
    reference_data=train_data,
    current_data=prod_data,
    column_mapping={'target': 'is_fraud', 'prediction': 'fraud_score'}
)

Critical insight: Prediction drift often precedes accuracy drop by days/weeks. Catch it here.

Verification

Test your setup:

python drift_detector.py

You should see:

drift_report.html generated (open in browser)
Clear visualization of drifted features
Dataset-level drift verdict

Smoke test with synthetic drift:

# Force drift for testing
import numpy as np

test_data = reference_data.copy()
test_data['feature_1'] = test_data['feature_1'] * 2  # Artificial drift

report.run(reference_data=reference_data, current_data=test_data)
# Should detect drift in feature_1

What You Learned

Evidently AI compares production vs training data distributions
Statistical tests (KS, Chi-squared) quantify drift automatically
Monitor continuously, alert before accuracy degrades

Limitations:

Requires labeled data for target drift (prediction drift works without)
Statistical tests need >100 samples for reliability
Doesn't explain why drift occurred, just that it did

When NOT to use this:

Natural drift is expected (e.g., seasonal models)
Data volume too low (<50 samples/day)
Cost of false alerts exceeds drift risk

Production Deployment Tips

Integrate with Existing Stack

# Export to Prometheus for Grafana dashboards
from prometheus_client import Gauge

drift_gauge = Gauge('model_drift_score', 'Current drift score', ['feature'])

for feature, score in drift_scores.items():
    drift_gauge.labels(feature=feature).set(score)

Storage Optimization

# For high-throughput models, sample production data
current_data = prod_data.sample(n=10000, random_state=42)

Alert Thresholds

Warning (30% drift): Review in next sprint
Critical (50% drift): Investigate within 24h
Emergency (prediction drift): Page on-call engineer

Tested on Evidently 0.4.25, Python 3.11, with 500K+ production predictions