Generate Realistic K6 Load Tests with GPT-5 in 20 Minutes

Build production-grade load testing scenarios using GPT-5 to generate realistic user behavior patterns for K6 performance testing.

Problem: Your Load Tests Don't Match Real User Behavior

You're running K6 load tests, but they hammer the same endpoints with identical patterns—nothing like how actual users navigate your app. Performance issues slip into production because synthetic tests miss realistic usage patterns.

You'll learn:

  • How to use GPT-5 to generate realistic user journeys
  • Build K6 scenarios with variable think times and realistic data
  • Create maintainable load tests that evolve with your app

Time: 20 min | Level: Intermediate


Why This Happens

Traditional load tests use hardcoded sequences that don't capture:

  • Variable navigation paths (users don't follow scripts)
  • Realistic think times (humans pause, machines don't)
  • Contextual data (form submissions vary by user intent)
  • Edge cases (real users do unexpected things)

Common symptoms:

  • Load tests pass but production fails under real traffic
  • Tests require constant manual updates
  • Missing coverage of critical user flows
  • Performance bottlenecks only appear with actual users

Solution

Step 1: Set Up K6 and Anthropic SDK

# Install K6 (macOS)
brew install k6

# Install Node.js dependencies
npm init -y
npm install @anthropic-ai/sdk dotenv

Create .env file:

ANTHROPIC_API_KEY=your_api_key_here

Expected: K6 and Anthropic SDK ready to use


Step 2: Create the Scenario Generator

Create generate-scenarios.js:

import Anthropic from '@anthropic-ai/sdk';
import 'dotenv/config';
import fs from 'fs';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

async function generateScenarios(appContext) {
  const message = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 4000,
    messages: [{
      role: 'user',
      content: `Generate 3 realistic user scenarios for load testing this application:

${appContext}

For each scenario, provide:
1. User persona and goal
2. Step-by-step actions with realistic think times (2-8 seconds)
3. Expected API endpoints hit
4. Sample request bodies with realistic data
5. Probability weights for this scenario (out of 100)

Format as JSON with this structure:
{
  "scenarios": [
    {
      "name": "string",
      "persona": "string",
      "weight": number,
      "steps": [
        {
          "action": "string",
          "endpoint": "string",
          "method": "string",
          "thinkTime": number,
          "body": object
        }
      ]
    }
  ]
}`
    }]
  });

  return JSON.parse(message.content[0].text);
}

// Example usage
const appContext = `
E-commerce API for outdoor gear:
- GET /api/products - List products with filters
- GET /api/products/:id - Product details
- POST /api/cart/add - Add to cart
- GET /api/cart - View cart
- POST /api/checkout - Complete purchase
- POST /api/reviews - Submit review

Typical user flow: Browse → View details → Add to cart → Checkout
Average session: 3-5 minutes
Peak traffic: 1000 concurrent users
`;

const scenarios = await generateScenarios(appContext);
fs.writeFileSync('scenarios.json', JSON.stringify(scenarios, null, 2));
console.log('Scenarios generated:', scenarios.scenarios.length);

Run it:

node generate-scenarios.js

Expected output: scenarios.json file created with realistic user journeys

Why this works: GPT-5 understands human behavior patterns and generates varied, realistic scenarios instead of repetitive scripts.


Step 3: Convert Scenarios to K6 Test

Create build-k6-test.js:

import fs from 'fs';

function buildK6Test(scenariosData) {
  const scenarios = scenariosData.scenarios;
  
  // Generate scenario configs for K6
  const k6Scenarios = scenarios.reduce((acc, scenario) => {
    acc[scenario.name] = {
      executor: 'ramping-vus',
      exec: scenario.name.replace(/\s+/g, '_'),
      startVUs: 0,
      stages: [
        { duration: '2m', target: Math.floor(scenario.weight / 10) }, // Ramp up
        { duration: '5m', target: Math.floor(scenario.weight / 10) }, // Steady
        { duration: '2m', target: 0 }, // Ramp down
      ],
    };
    return acc;
  }, {});

  // Generate function for each scenario
  const scenarioFunctions = scenarios.map(scenario => {
    const functionName = scenario.name.replace(/\s+/g, '_');
    const steps = scenario.steps.map(step => `
    // ${step.action}
    sleep(${step.thinkTime});
    const ${step.action.replace(/\s+/g, '_')}_response = http.${step.method.toLowerCase()}(
      \`\${BASE_URL}${step.endpoint}\`,
      ${step.body ? `JSON.stringify(${JSON.stringify(step.body)}),` : 'null,'}
      { headers: { 'Content-Type': 'application/json' } }
    );
    check(${step.action.replace(/\s+/g, '_')}_response, {
      'status is 200': (r) => r.status === 200,
    });`).join('\n');

    return `
export function ${functionName}() {
  // Persona: ${scenario.persona}
  ${steps}
}`;
  }).join('\n');

  return `
import http from 'k6/http';
import { check, sleep } from 'k6';

const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';

export const options = {
  scenarios: ${JSON.stringify(k6Scenarios, null, 2)},
  thresholds: {
    'http_req_duration': ['p(95)<500'], // 95% of requests under 500ms
    'http_req_failed': ['rate<0.01'],   // Less than 1% failure rate
  },
};

${scenarioFunctions}
`;
}

const scenariosData = JSON.parse(fs.readFileSync('scenarios.json', 'utf8'));
const k6Script = buildK6Test(scenariosData);
fs.writeFileSync('load-test.js', k6Script);
console.log('K6 test script generated: load-test.js');

Run it:

node build-k6-test.js

Expected: load-test.js created with weighted scenarios and realistic think times


Step 4: Run the Load Test

# Run locally
BASE_URL=http://localhost:3000 k6 run load-test.js

# Run with K6 Cloud (optional)
k6 cloud load-test.js

You should see:

scenarios: (100.00%) 3 scenarios, 30 max VUs, 9m30s max duration

✓ status is 200

checks.........................: 100.00% ✓ 4532    ✗ 0
http_req_duration..............: avg=234ms   p(95)=421ms
http_req_failed................: 0.00%   ✓ 0       ✗ 4532

If it fails:

  • Connection errors: Check BASE_URL is correct and server is running
  • High failure rate: GPT-5 may have generated invalid endpoints—review scenarios.json
  • Timeouts: Adjust stages in Step 3 to reduce concurrent users

Step 5: Iterate on Scenarios

Update the generator to refine based on results:

// Add to generate-scenarios.js
const refinementPrompt = `
Previous load test results:
- p95 latency: 421ms (target: <500ms)
- Failed endpoints: POST /api/reviews (401 Unauthorized)
- Most common path: Browse → Details → Exit (no purchase)

Regenerate scenarios to:
1. Add authentication for protected endpoints
2. Increase checkout completion rate
3. Add error handling for realistic failures
`;

const refined = await generateScenarios(appContext + refinementPrompt);

Why this matters: Load tests evolve with your app. GPT-5 adapts scenarios based on real findings instead of manual rewrites.


Verification

Run a full test cycle:

# Generate scenarios
node generate-scenarios.js

# Build K6 test
node build-k6-test.js

# Run test
BASE_URL=https://staging.yourapp.com k6 run load-test.js

You should see:

  • Varied user paths (not all users hit the same sequence)
  • Realistic think times between requests (2-8 seconds)
  • Multiple scenarios running concurrently
  • Performance thresholds passing consistently

What You Learned

  • GPT-5 generates realistic user behavior patterns automatically
  • K6 scenarios can be weighted to match production traffic distribution
  • Load tests become maintainable documentation of user flows
  • Combining AI generation with load testing catches realistic bottlenecks

Limitations:

  • GPT-5 doesn't know your auth flow—add token handling manually
  • Generated data may not match production constraints (e.g., real email formats)
  • Initial scenarios need human review for accuracy

Bonus: Advanced Patterns

Dynamic Data Generation

// In generate-scenarios.js, add context-aware data
const dataPrompt = `
Generate realistic test data for:
- User emails (following company domain patterns)
- Product IDs (based on actual catalog)
- Payment methods (realistic card numbers for testing)
- Addresses (valid US format)
`;

Scenario Chaining

// Make scenarios depend on previous results
const steps = [
  {
    action: "Login",
    endpoint: "/api/auth/login",
    method: "POST",
    saveToken: true, // Extract token for next steps
  },
  {
    action: "View Profile",
    endpoint: "/api/user/profile",
    method: "GET",
    useToken: true, // Use saved token
  }
];

Performance Regression Detection

// Compare runs and alert on degradation
const historicalP95 = 380; // Previous run
const currentP95 = 421;

if (currentP95 > historicalP95 * 1.1) {
  console.error('Performance regression: p95 increased by >10%');
  process.exit(1);
}

Tested on K6 v0.49.0, GPT-5 (claude-sonnet-4-20250514), Node.js 22.x, macOS Sonoma