Credit Risk Modeling using Ollama: Default Probability and Loss Prediction

Build accurate credit risk models with Ollama for default probability and loss prediction. Learn machine learning techniques with practical code examples.

Banks lose billions annually to loan defaults. Traditional credit scoring methods miss subtle patterns that modern AI can detect. Enter Ollama—a powerful tool that transforms how financial institutions predict credit risk and calculate potential losses.

This guide shows you how to build robust credit risk models using Ollama's machine learning capabilities. You'll learn to predict default probabilities, estimate losses, and create actionable risk assessments that protect your financial portfolio.

What is Credit Risk Modeling?

Credit risk modeling evaluates the likelihood that borrowers will default on their obligations. Banks and lenders use these models to:

  • Calculate default probabilities for loan applications
  • Estimate potential losses from credit portfolios
  • Set appropriate interest rates and credit limits
  • Meet regulatory capital requirements
  • Make informed lending decisions

Why Traditional Methods Fall Short

Legacy credit scoring systems rely on simple rules and basic statistical models. They struggle with:

  • Complex data patterns: Modern datasets contain hundreds of variables with intricate relationships
  • Non-linear interactions: Traditional models miss subtle connections between risk factors
  • Real-time adaptation: Static models can't adjust to changing market conditions
  • Unstructured data: Text-based information from applications often goes unused

Understanding Ollama for Financial Risk Assessment

Ollama provides a local, privacy-focused platform for running large language models. For credit risk modeling, Ollama offers several advantages:

  • Data privacy: Process sensitive financial data locally without cloud dependencies
  • Cost efficiency: No API fees or usage limits for model inference
  • Customization: Fine-tune models on specific financial datasets
  • Integration: Easy integration with existing risk management systems

Key Ollama Models for Credit Risk

Different Ollama models excel at various aspects of credit risk assessment:

  • Llama 3.1: Excellent for structured Data Analysis and numerical predictions
  • Mistral: Strong performance on financial text analysis and document processing
  • Code Llama: Ideal for generating and validating risk calculation code
  • Phi-3: Lightweight option for real-time scoring applications

Setting Up Your Credit Risk Environment

Prerequisites

Before building credit risk models, ensure you have:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull required models
ollama pull llama3.1:8b
ollama pull mistral:7b

# Install Python dependencies
pip install pandas numpy scikit-learn matplotlib seaborn requests

Data Requirements

Credit risk models require comprehensive borrower information:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import requests
import json

# Sample credit risk dataset structure
credit_data = {
    'loan_amount': [25000, 50000, 15000, 75000],
    'income': [60000, 85000, 45000, 120000],
    'debt_to_income': [0.3, 0.4, 0.2, 0.5],
    'credit_score': [720, 650, 780, 600],
    'employment_years': [5, 3, 8, 2],
    'loan_term': [36, 60, 24, 72],
    'default': [0, 1, 0, 1]  # Target variable
}

df = pd.DataFrame(credit_data)
print("Credit Risk Dataset Structure:")
print(df.head())

Building Default Probability Models

Data Preprocessing for Risk Assessment

Clean and prepare your credit data for modeling:

def preprocess_credit_data(df):
    """
    Preprocess credit data for risk modeling
    """
    # Handle missing values
    df = df.fillna(df.median())
    
    # Create derived features
    df['loan_to_income'] = df['loan_amount'] / df['income']
    df['monthly_payment'] = df['loan_amount'] / df['loan_term']
    df['payment_to_income'] = df['monthly_payment'] / (df['income'] / 12)
    
    # Categorical encoding
    df['risk_category'] = pd.cut(df['credit_score'], 
                                bins=[0, 580, 670, 740, 850],
                                labels=['High', 'Medium', 'Low', 'Excellent'])
    
    return df

# Preprocess the data
df_processed = preprocess_credit_data(df)
print("Processed features:")
print(df_processed.columns.tolist())

Creating Risk Assessment Prompts

Design effective prompts for Ollama to analyze credit risk:

def create_risk_prompt(borrower_data):
    """
    Generate risk assessment prompt for Ollama
    """
    prompt = f"""
    Analyze the following borrower profile for credit risk:
    
    Loan Amount: ${borrower_data['loan_amount']:,}
    Annual Income: ${borrower_data['income']:,}
    Credit Score: {borrower_data['credit_score']}
    Debt-to-Income Ratio: {borrower_data['debt_to_income']:.2%}
    Employment Years: {borrower_data['employment_years']}
    Loan Term: {borrower_data['loan_term']} months
    
    Please provide:
    1. Default probability (0-1 scale)
    2. Risk factors (list top 3)
    3. Mitigation strategies
    4. Recommended interest rate adjustment
    
    Format response as JSON with keys: default_probability, risk_factors, mitigation, rate_adjustment
    """
    return prompt

# Example usage
sample_borrower = {
    'loan_amount': 45000,
    'income': 75000,
    'credit_score': 680,
    'debt_to_income': 0.35,
    'employment_years': 4,
    'loan_term': 48
}

risk_prompt = create_risk_prompt(sample_borrower)
print("Risk Assessment Prompt:")
print(risk_prompt)

Implementing Ollama Risk Scoring

Connect to Ollama and perform risk assessment:

def assess_credit_risk(borrower_data, model_name="llama3.1:8b"):
    """
    Assess credit risk using Ollama
    """
    prompt = create_risk_prompt(borrower_data)
    
    # Ollama API call
    response = requests.post('http://localhost:11434/api/generate',
                           json={
                               'model': model_name,
                               'prompt': prompt,
                               'stream': False
                           })
    
    if response.status_code == 200:
        result = response.json()
        try:
            # Parse JSON response
            risk_assessment = json.loads(result['response'])
            return risk_assessment
        except json.JSONDecodeError:
            # Fallback parsing for non-JSON responses
            return parse_text_response(result['response'])
    else:
        return {"error": "Failed to get assessment"}

def parse_text_response(text_response):
    """
    Parse text response when JSON parsing fails
    """
    # Extract default probability
    import re
    prob_match = re.search(r'(\d+\.?\d*)%?\s*(?:probability|chance)', text_response.lower())
    default_prob = float(prob_match.group(1)) / 100 if prob_match else 0.5
    
    return {
        'default_probability': default_prob,
        'risk_factors': ['Unable to parse'],
        'mitigation': 'Review manually',
        'rate_adjustment': 0
    }

# Assess risk for sample borrower
risk_result = assess_credit_risk(sample_borrower)
print("Credit Risk Assessment:")
print(json.dumps(risk_result, indent=2))

Loss Prediction and Expected Loss Calculation

Estimating Loss Given Default (LGD)

Calculate potential losses if default occurs:

def calculate_expected_loss(loan_data, risk_assessment):
    """
    Calculate expected loss using probability of default and loss given default
    """
    # Probability of Default (PD)
    pd = risk_assessment['default_probability']
    
    # Loss Given Default (LGD) - typically 40-60% for unsecured loans
    lgd = estimate_lgd(loan_data)
    
    # Exposure at Default (EAD)
    ead = loan_data['loan_amount']
    
    # Expected Loss = PD × LGD × EAD
    expected_loss = pd * lgd * ead
    
    return {
        'probability_of_default': pd,
        'loss_given_default': lgd,
        'exposure_at_default': ead,
        'expected_loss': expected_loss,
        'expected_loss_percentage': (expected_loss / ead) * 100
    }

def estimate_lgd(loan_data):
    """
    Estimate Loss Given Default based on loan characteristics
    """
    base_lgd = 0.45  # Base LGD for unsecured loans
    
    # Adjust based on loan characteristics
    if loan_data.get('collateral'):
        base_lgd *= 0.7  # Secured loans have lower LGD
    
    if loan_data['loan_term'] > 60:
        base_lgd += 0.05  # Longer terms increase LGD
    
    if loan_data['loan_amount'] > 50000:
        base_lgd += 0.03  # Higher amounts may have higher recovery costs
    
    return min(base_lgd, 0.8)  # Cap at 80%

# Calculate expected loss
loss_metrics = calculate_expected_loss(sample_borrower, risk_result)
print("Loss Prediction:")
for key, value in loss_metrics.items():
    if isinstance(value, float):
        print(f"{key}: {value:.4f}")
    else:
        print(f"{key}: {value}")

Portfolio Risk Aggregation

Analyze risk across multiple loans:

def analyze_portfolio_risk(loan_portfolio):
    """
    Analyze risk across entire loan portfolio
    """
    portfolio_results = []
    
    for idx, loan in loan_portfolio.iterrows():
        # Convert pandas Series to dict
        loan_dict = loan.to_dict()
        
        # Assess individual loan risk
        risk_assessment = assess_credit_risk(loan_dict)
        loss_metrics = calculate_expected_loss(loan_dict, risk_assessment)
        
        # Combine results
        loan_result = {
            'loan_id': idx,
            'loan_amount': loan_dict['loan_amount'],
            **risk_assessment,
            **loss_metrics
        }
        portfolio_results.append(loan_result)
    
    # Calculate portfolio metrics
    portfolio_df = pd.DataFrame(portfolio_results)
    
    portfolio_summary = {
        'total_exposure': portfolio_df['loan_amount'].sum(),
        'weighted_avg_pd': (portfolio_df['probability_of_default'] * 
                           portfolio_df['loan_amount']).sum() / portfolio_df['loan_amount'].sum(),
        'total_expected_loss': portfolio_df['expected_loss'].sum(),
        'portfolio_loss_rate': (portfolio_df['expected_loss'].sum() / 
                              portfolio_df['loan_amount'].sum()) * 100
    }
    
    return portfolio_df, portfolio_summary

# Example portfolio analysis
sample_portfolio = pd.DataFrame({
    'loan_amount': [25000, 50000, 15000, 75000, 40000],
    'income': [60000, 85000, 45000, 120000, 70000],
    'credit_score': [720, 650, 780, 600, 690],
    'debt_to_income': [0.3, 0.4, 0.2, 0.5, 0.35],
    'employment_years': [5, 3, 8, 2, 6],
    'loan_term': [36, 60, 24, 72, 48]
})

portfolio_results, portfolio_summary = analyze_portfolio_risk(sample_portfolio)
print("Portfolio Risk Summary:")
for key, value in portfolio_summary.items():
    print(f"{key}: {value:.2f}")

Advanced Risk Modeling Techniques

Stress Testing and Scenario Analysis

Test model performance under adverse conditions:

def stress_test_portfolio(portfolio_df, stress_scenarios):
    """
    Perform stress testing on credit portfolio
    """
    stress_results = {}
    
    for scenario_name, scenario_params in stress_scenarios.items():
        stressed_portfolio = portfolio_df.copy()
        
        # Apply stress scenario
        if 'unemployment_rate' in scenario_params:
            # Increase default probability based on unemployment
            stress_multiplier = 1 + (scenario_params['unemployment_rate'] - 0.05) * 2
            stressed_portfolio['default_probability'] *= stress_multiplier
        
        if 'interest_rate_change' in scenario_params:
            # Adjust for interest rate changes affecting borrower capacity
            rate_impact = scenario_params['interest_rate_change'] * 0.1
            stressed_portfolio['default_probability'] *= (1 + rate_impact)
        
        # Recalculate expected losses
        stressed_portfolio['expected_loss'] = (
            stressed_portfolio['default_probability'] * 
            stressed_portfolio['loss_given_default'] * 
            stressed_portfolio['loan_amount']
        )
        
        stress_results[scenario_name] = {
            'total_expected_loss': stressed_portfolio['expected_loss'].sum(),
            'loss_rate': (stressed_portfolio['expected_loss'].sum() / 
                         stressed_portfolio['loan_amount'].sum()) * 100
        }
    
    return stress_results

# Define stress scenarios
stress_scenarios = {
    'recession': {
        'unemployment_rate': 0.10,
        'interest_rate_change': 0.02
    },
    'severe_recession': {
        'unemployment_rate': 0.15,
        'interest_rate_change': 0.05
    },
    'market_correction': {
        'unemployment_rate': 0.08,
        'interest_rate_change': 0.01
    }
}

# Prepare portfolio with risk metrics
portfolio_with_metrics = portfolio_results.copy()
portfolio_with_metrics['default_probability'] = portfolio_with_metrics['probability_of_default']

stress_results = stress_test_portfolio(portfolio_with_metrics, stress_scenarios)
print("Stress Test Results:")
for scenario, results in stress_results.items():
    print(f"{scenario}: Loss Rate = {results['loss_rate']:.2f}%")

Model Validation and Backtesting

Validate model accuracy using historical data:

def validate_model_performance(predictions, actual_outcomes):
    """
    Validate credit risk model performance
    """
    from sklearn.metrics import roc_auc_score, accuracy_score, classification_report
    
    # Convert probabilities to binary predictions
    binary_predictions = (predictions > 0.5).astype(int)
    
    # Calculate performance metrics
    auc_score = roc_auc_score(actual_outcomes, predictions)
    accuracy = accuracy_score(actual_outcomes, binary_predictions)
    
    # Gini coefficient (common in credit risk)
    gini_coefficient = 2 * auc_score - 1
    
    # Kolmogorov-Smirnov statistic
    from scipy import stats
    good_scores = predictions[actual_outcomes == 0]
    bad_scores = predictions[actual_outcomes == 1]
    ks_statistic = stats.ks_2samp(good_scores, bad_scores)[0]
    
    return {
        'auc_score': auc_score,
        'gini_coefficient': gini_coefficient,
        'ks_statistic': ks_statistic,
        'accuracy': accuracy
    }

# Example validation (using simulated data)
np.random.seed(42)
sample_predictions = np.random.beta(0.8, 4, 100)  # Simulate predicted probabilities
sample_outcomes = np.random.binomial(1, sample_predictions)  # Simulate actual outcomes

validation_results = validate_model_performance(sample_predictions, sample_outcomes)
print("Model Validation Results:")
for metric, value in validation_results.items():
    print(f"{metric}: {value:.4f}")

Real-World Implementation Strategies

Production Deployment Considerations

Deploy credit risk models in production environments:

class CreditRiskAPI:
    """
    Production-ready credit risk assessment API
    """
    def __init__(self, model_name="llama3.1:8b"):
        self.model_name = model_name
        self.scaler = StandardScaler()
        self.feature_columns = [
            'loan_amount', 'income', 'credit_score', 
            'debt_to_income', 'employment_years', 'loan_term'
        ]
    
    def preprocess_request(self, loan_data):
        """
        Preprocess incoming loan application
        """
        # Validate required fields
        required_fields = self.feature_columns
        for field in required_fields:
            if field not in loan_data:
                raise ValueError(f"Missing required field: {field}")
        
        # Data validation
        if loan_data['credit_score'] < 300 or loan_data['credit_score'] > 850:
            raise ValueError("Credit score must be between 300 and 850")
        
        if loan_data['debt_to_income'] < 0 or loan_data['debt_to_income'] > 1:
            raise ValueError("Debt-to-income ratio must be between 0 and 1")
        
        return loan_data
    
    def assess_risk(self, loan_data):
        """
        Assess credit risk for loan application
        """
        try:
            # Preprocess data
            clean_data = self.preprocess_request(loan_data)
            
            # Get risk assessment from Ollama
            risk_assessment = assess_credit_risk(clean_data, self.model_name)
            
            # Calculate expected loss
            loss_metrics = calculate_expected_loss(clean_data, risk_assessment)
            
            # Determine decision
            decision = self.make_lending_decision(risk_assessment, loss_metrics)
            
            return {
                'loan_id': loan_data.get('loan_id', 'N/A'),
                'decision': decision,
                'risk_assessment': risk_assessment,
                'loss_metrics': loss_metrics,
                'timestamp': pd.Timestamp.now().isoformat()
            }
        
        except Exception as e:
            return {
                'error': str(e),
                'timestamp': pd.Timestamp.now().isoformat()
            }
    
    def make_lending_decision(self, risk_assessment, loss_metrics):
        """
        Make lending decision based on risk assessment
        """
        pd = risk_assessment['default_probability']
        expected_loss_pct = loss_metrics['expected_loss_percentage']
        
        if pd > 0.15 or expected_loss_pct > 8:
            return 'REJECT'
        elif pd > 0.08 or expected_loss_pct > 4:
            return 'MANUAL_REVIEW'
        else:
            return 'APPROVE'

# Example usage
credit_api = CreditRiskAPI()

test_application = {
    'loan_id': 'APP123456',
    'loan_amount': 35000,
    'income': 65000,
    'credit_score': 700,
    'debt_to_income': 0.32,
    'employment_years': 3,
    'loan_term': 60
}

result = credit_api.assess_risk(test_application)
print("Credit Decision:")
print(json.dumps(result, indent=2, default=str))

Integration with Existing Systems

Connect Ollama models with banking infrastructure:

def integrate_with_core_banking(loan_application):
    """
    Integrate credit risk assessment with core banking system
    """
    # Simulate core banking system integration
    class CoreBankingSystem:
        def __init__(self):
            self.customers = {}
            self.loans = {}
        
        def get_customer_history(self, customer_id):
            # Simulate customer history retrieval
            return {
                'previous_loans': 2,
                'payment_history': 'Good',
                'account_age_months': 36,
                'average_balance': 15000
            }
        
        def create_loan_record(self, loan_data, risk_assessment):
            # Simulate loan record creation
            loan_id = f"LOAN_{len(self.loans) + 1:06d}"
            self.loans[loan_id] = {
                'loan_data': loan_data,
                'risk_assessment': risk_assessment,
                'status': 'PENDING',
                'created_at': pd.Timestamp.now()
            }
            return loan_id
    
    # Initialize systems
    core_banking = CoreBankingSystem()
    credit_api = CreditRiskAPI()
    
    # Get customer history
    customer_history = core_banking.get_customer_history(
        loan_application.get('customer_id', 'CUST123')
    )
    
    # Enhanced loan data with customer history
    enhanced_application = {
        **loan_application,
        'customer_history': customer_history
    }
    
    # Assess risk
    risk_result = credit_api.assess_risk(enhanced_application)
    
    # Create loan record
    if 'error' not in risk_result:
        loan_id = core_banking.create_loan_record(
            enhanced_application, 
            risk_result['risk_assessment']
        )
        risk_result['loan_id'] = loan_id
    
    return risk_result

# Example integration
integration_result = integrate_with_core_banking(test_application)
print("Core Banking Integration Result:")
print(json.dumps(integration_result, indent=2, default=str))

Performance Optimization and Scaling

Batch Processing for High Volume

Handle multiple loan applications efficiently:

def batch_process_applications(applications_batch, batch_size=50):
    """
    Process multiple loan applications in batches
    """
    results = []
    credit_api = CreditRiskAPI()
    
    for i in range(0, len(applications_batch), batch_size):
        batch = applications_batch[i:i + batch_size]
        batch_results = []
        
        for application in batch:
            try:
                result = credit_api.assess_risk(application)
                batch_results.append(result)
            except Exception as e:
                batch_results.append({
                    'loan_id': application.get('loan_id', 'N/A'),
                    'error': str(e)
                })
        
        results.extend(batch_results)
        
        # Progress tracking
        print(f"Processed {min(i + batch_size, len(applications_batch))} / {len(applications_batch)} applications")
    
    return results

# Generate sample batch
sample_batch = []
for i in range(100):
    sample_batch.append({
        'loan_id': f'BATCH_{i:03d}',
        'loan_amount': np.random.randint(10000, 100000),
        'income': np.random.randint(40000, 150000),
        'credit_score': np.random.randint(550, 800),
        'debt_to_income': np.random.uniform(0.1, 0.6),
        'employment_years': np.random.randint(1, 15),
        'loan_term': np.random.choice([24, 36, 48, 60, 72])
    })

# Process batch
batch_results = batch_process_applications(sample_batch[:10])  # Process first 10 for demo
print(f"Batch processing completed. Processed {len(batch_results)} applications.")

Monitoring and Alerting

Implement monitoring for production credit risk models:

class CreditRiskMonitor:
    """
    Monitor credit risk model performance in production
    """
    def __init__(self):
        self.metrics_history = []
        self.alert_thresholds = {
            'high_default_rate': 0.20,
            'low_approval_rate': 0.30,
            'model_drift': 0.15
        }
    
    def log_decision(self, decision_result):
        """
        Log credit decision for monitoring
        """
        self.metrics_history.append({
            'timestamp': pd.Timestamp.now(),
            'decision': decision_result['decision'],
            'default_probability': decision_result['risk_assessment']['default_probability'],
            'expected_loss_pct': decision_result['loss_metrics']['expected_loss_percentage']
        })
    
    def calculate_daily_metrics(self):
        """
        Calculate daily performance metrics
        """
        if not self.metrics_history:
            return {}
        
        df = pd.DataFrame(self.metrics_history)
        today = pd.Timestamp.now().date()
        today_data = df[df['timestamp'].dt.date == today]
        
        if len(today_data) == 0:
            return {}
        
        metrics = {
            'total_applications': len(today_data),
            'approval_rate': (today_data['decision'] == 'APPROVE').mean(),
            'rejection_rate': (today_data['decision'] == 'REJECT').mean(),
            'manual_review_rate': (today_data['decision'] == 'MANUAL_REVIEW').mean(),
            'avg_default_probability': today_data['default_probability'].mean(),
            'avg_expected_loss': today_data['expected_loss_pct'].mean()
        }
        
        return metrics
    
    def check_alerts(self):
        """
        Check for alert conditions
        """
        metrics = self.calculate_daily_metrics()
        alerts = []
        
        if metrics.get('avg_default_probability', 0) > self.alert_thresholds['high_default_rate']:
            alerts.append("HIGH_DEFAULT_RATE: Average default probability exceeds threshold")
        
        if metrics.get('approval_rate', 1) < self.alert_thresholds['low_approval_rate']:
            alerts.append("LOW_APPROVAL_RATE: Approval rate below threshold")
        
        return alerts

# Example monitoring usage
monitor = CreditRiskMonitor()

# Simulate logging decisions
for result in batch_results:
    if 'error' not in result:
        monitor.log_decision(result)

# Check metrics and alerts
daily_metrics = monitor.calculate_daily_metrics()
alerts = monitor.check_alerts()

print("Daily Metrics:")
for metric, value in daily_metrics.items():
    print(f"{metric}: {value:.3f}")

if alerts:
    print("\nAlerts:")
    for alert in alerts:
        print(f"⚠️  {alert}")

Conclusion

Credit risk modeling with Ollama transforms traditional lending decisions into data-driven processes. You've learned to build comprehensive risk assessment systems that predict default probabilities, calculate expected losses, and make automated lending decisions.

Key takeaways from this implementation:

Technical Benefits: Ollama provides local, cost-effective machine learning capabilities that protect sensitive financial data while delivering accurate risk predictions.

Business Impact: Automated credit risk assessment reduces manual review time, improves decision consistency, and enables real-time lending decisions at scale.

Scalability: The batch processing and monitoring frameworks ensure your credit risk models can handle high-volume applications while maintaining performance standards.

Start with the basic risk assessment framework, then gradually add portfolio analysis, stress testing, and production monitoring. Remember to validate your models regularly and adjust parameters based on actual loan performance data.

Your credit risk modeling journey with Ollama begins with understanding borrower patterns and evolves into sophisticated portfolio management systems that protect your institution while serving customers effectively.

Ready to implement these credit risk models in your financial system? Begin with the preprocessing steps and gradually build toward full production deployment with proper monitoring and validation frameworks.