Problem: Writing Tests Takes Longer Than Writing Code

You ship a feature in 2 hours but spend 4 hours writing tests to hit 80% coverage. Your CI pipeline blocks merges under 90%, and you're stuck writing repetitive test cases for edge cases you already handled in the code.

You'll learn:

How to set up an agentic workflow for test generation
Why AI agents catch edge cases you miss
When to trust generated tests vs writing manually

Time: 25 min | Level: Intermediate

Why This Happens

Traditional test generation tools use static analysis and struggle with complex logic. They generate basic happy-path tests but miss edge cases, error handling, and integration points. You end up writing the hard tests yourself anyway.

Common symptoms:

Coverage stuck at 70-80% despite hours of effort
Tests pass but production bugs still slip through
Repetitive test patterns for similar functions
No tests for error boundaries or race conditions

Solution

Step 1: Install the Agentic Test Runner

We'll use a local LLM-powered agent that reads your code and generates tests iteratively until coverage targets are met.

# Install the test agent CLI
npm install -D @testgen/agent-runner
# or
pip install testgen-agent

For TypeScript projects:

npx testgen init --framework vitest

For Python projects:

testgen init --framework pytest

Expected: Creates .testgen/config.json with your test framework settings.

If it fails:

"No package.json found": Run from project root
"Unsupported framework": Currently supports Vitest, Jest, Pytest, unittest

Step 2: Configure the Agent Workflow

Edit .testgen/config.json to define coverage targets and agent behavior:

{
  "coverage": {
    "target": 100,
    "statements": 95,
    "branches": 100,
    "functions": 100
  },
  "agent": {
    "model": "claude-sonnet-4-5-20250929",
    "iterations": 5,
    "strategy": "iterative-refinement"
  },
  "files": {
    "include": ["src/**/*.ts", "!src/**/*.test.ts"],
    "prioritize": ["critical", "business-logic"]
  },
  "testPatterns": {
    "edgeCases": true,
    "errorHandling": true,
    "mockExternal": true,
    "parallelSafe": true
  }
}

Why this works: The agent runs in iterations. First pass generates basic tests, then it reads the coverage report and generates tests for uncovered branches. Repeat until target is met.

Strategy options:

iterative-refinement: Generate → measure → fill gaps (recommended)
complete-upfront: Analyze entire file, generate all tests once (faster but less thorough)
interactive: Prompts you to review each test batch

Step 3: Generate Tests for a Module

# Generate tests for a specific file
npx testgen generate src/utils/validator.ts

# Or generate for entire directory
npx testgen generate src/services/

Expected output:

Analyzing src/utils/validator.ts...
├─ Found 8 functions, 23 branches
├─ Generating initial test suite...
├─ Coverage: 73% (17/23 branches)
├─ Iteration 2: Generating tests for uncovered paths...
├─ Coverage: 91% (21/23 branches)
├─ Iteration 3: Edge case detection...
├─ Coverage: 100% (23/23 branches)
âœ" Generated src/utils/validator.test.ts (47 tests)

Generated test example:

// validator.test.ts - Auto-generated by testgen
import { describe, it, expect } from 'vitest';
import { validateEmail } from './validator';

describe('validateEmail', () => {
  // Happy path - agent understands common cases
  it('accepts valid email addresses', () => {
    expect(validateEmail('user@example.com')).toBe(true);
    expect(validateEmail('test.name+tag@domain.co.uk')).toBe(true);
  });

  // Edge cases - agent reads your validation logic
  it('rejects emails without @', () => {
    expect(validateEmail('notanemail')).toBe(false);
  });

  it('handles empty and whitespace strings', () => {
    expect(validateEmail('')).toBe(false);
    expect(validateEmail('   ')).toBe(false);
  });

  // Error boundaries - agent checks null/undefined
  it('handles null and undefined inputs', () => {
    expect(validateEmail(null as any)).toBe(false);
    expect(validateEmail(undefined as any)).toBe(false);
  });

  // Agent found this edge case by reading your regex
  it('rejects consecutive dots in local part', () => {
    expect(validateEmail('user..name@example.com')).toBe(false);
  });
});

If it fails:

"API key not found": Set ANTHROPIC_API_KEY environment variable
Generated tests don't compile: Run npx testgen fix to auto-correct imports
Coverage still under target: Increase agent.iterations to 10

Step 4: Review and Approve Tests

The agent generates tests but doesn't automatically commit them. Review first:

# See what changed
git diff src/utils/validator.test.ts

# Run the generated tests
npm test

# If tests pass, commit
git add src/utils/validator.test.ts
git commit -m "test: add comprehensive validator tests (100% coverage)"

What to look for:

Do tests assert meaningful behavior? Agent should test logic, not implementation
Are mocks realistic? Agent should mock external APIs, not internal functions
Do edge cases make sense? Agent sometimes generates paranoid tests (e.g., testing if 1 + 1 = 2)

Remove tests that:

Test the language/framework itself
Are duplicates with slightly different inputs
Assert internal implementation details you might refactor

Step 5: Integrate with CI/CD

Add to your CI pipeline to block merges that drop coverage:

# .github/workflows/test.yml
name: Tests

on: [pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run tests with coverage
        run: npm test -- --coverage
      
      - name: Generate missing tests
        if: failure()
        run: npx testgen generate --ci-mode
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      
      - name: Verify coverage target
        run: |
          COVERAGE=$(cat coverage/coverage-summary.json | jq '.total.lines.pct')
          if (( $(echo "$COVERAGE < 95" | bc -l) )); then
            echo "Coverage $COVERAGE% is below 95%"
            exit 1
          fi

Why this works: If coverage drops, the agent automatically generates tests to restore it. Developers can review in the PR before merging.

Step 6: Train the Agent on Your Patterns

Create .testgen/examples.ts with your preferred test style:

// examples.ts - Agent learns from these patterns
import { describe, it, expect, vi } from 'vitest';

// Example: How to test async functions
describe('Example: Async operations', () => {
  it('waits for promises to resolve', async () => {
    const result = await fetchData();
    expect(result).toBeDefined();
  });
});

// Example: How to mock external dependencies
describe('Example: Mocking APIs', () => {
  it('mocks fetch calls', async () => {
    global.fetch = vi.fn().mockResolvedValue({
      json: async () => ({ data: 'test' })
    });
    
    const result = await apiCall();
    expect(global.fetch).toHaveBeenCalledWith('/api/endpoint');
  });
});

// Example: How to test error handling
describe('Example: Error scenarios', () => {
  it('throws on invalid input', () => {
    expect(() => parseConfig(null)).toThrow('Config cannot be null');
  });
});

Run training:

npx testgen learn examples.ts

Expected: Agent now generates tests matching your style conventions.

Verification

Run your full test suite with coverage:

npm test -- --coverage

You should see:

Test Suites: 24 passed, 24 total
Tests:       187 passed, 187 total
Coverage:    100% Statements (543/543)
             100% Branches (89/89)
             100% Functions (112/112)
             100% Lines (498/498)

Check the HTML report:

open coverage/index.html

Every file should show green (100% covered). Click through to verify tests actually exercise the logic.

What You Learned

Agentic workflows iterate until coverage targets are met, unlike one-shot generators
AI agents catch edge cases by analyzing your code's branching logic
Generated tests need review - agents can create paranoid or redundant tests
Training on examples makes agents match your team's test style

Limitations:

Agent-generated tests focus on code coverage, not business logic correctness
Works best on pure functions; struggles with complex UI interactions
Costs ~$0.02-0.10 per file with Claude Sonnet (batch API reduces this)
Some tests may be brittle if you refactor implementation details

When NOT to use this:

Integration tests that require real databases or APIs
E2E tests that need user interaction simulation
Security tests that need adversarial thinking
Tests for visual correctness or UX behavior

Advanced: Multi-Agent Test Generation

For complex codebases, use specialized agents:

{
  "agents": {
    "unit": {
      "model": "claude-sonnet-4-5",
      "focus": ["pure-functions", "business-logic"]
    },
    "integration": {
      "model": "claude-opus-4-5",
      "focus": ["api-routes", "database-queries"]
    },
    "edge": {
      "model": "claude-sonnet-4-5",
      "temperature": 0.8,
      "focus": ["error-handling", "race-conditions"]
    }
  },
  "workflow": "parallel"
}

Run multi-agent mode:

npx testgen generate src/ --agents all

Why this works: Each agent specializes. The edge agent uses higher temperature to find unusual test cases. integration agent uses the smarter model for complex scenarios.

Real-World Example: Before vs After

Before (manual testing):

// auth.test.ts - 67% coverage, 2 hours of work
describe('authenticateUser', () => {
  it('returns token for valid credentials', async () => {
    const result = await authenticateUser('user@example.com', 'password123');
    expect(result.token).toBeDefined();
  });
  
  it('throws on invalid password', async () => {
    await expect(
      authenticateUser('user@example.com', 'wrong')
    ).rejects.toThrow();
  });
});

After (agent-generated in 3 minutes):

// auth.test.ts - 100% coverage, auto-generated + 10 min review
describe('authenticateUser', () => {
  // Original tests preserved
  it('returns token for valid credentials', async () => {
    const result = await authenticateUser('user@example.com', 'password123');
    expect(result.token).toBeDefined();
    expect(result.expiresAt).toBeInstanceOf(Date);
  });
  
  it('throws on invalid password', async () => {
    await expect(
      authenticateUser('user@example.com', 'wrong')
    ).rejects.toThrow('Invalid credentials');
  });
  
  // Agent-generated edge cases
  it('handles SQL injection attempts in email', async () => {
    await expect(
      authenticateUser("admin'--", 'password')
    ).rejects.toThrow('Invalid email format');
  });
  
  it('rate limits repeated failed attempts', async () => {
    for (let i = 0; i < 5; i++) {
      await authenticateUser('user@example.com', 'wrong').catch(() => {});
    }
    await expect(
      authenticateUser('user@example.com', 'password123')
    ).rejects.toThrow('Too many attempts');
  });
  
  it('handles concurrent authentication requests', async () => {
    const promises = Array(10).fill(null).map(() =>
      authenticateUser('user@example.com', 'password123')
    );
    const results = await Promise.allSettled(promises);
    expect(results.filter(r => r.status === 'fulfilled')).toHaveLength(10);
  });
  
  it('sanitizes user input before database query', async () => {
    const spy = vi.spyOn(db, 'query');
    await authenticateUser('user+tag@example.com', 'pass');
    expect(spy).toHaveBeenCalledWith(
      expect.not.stringContaining('+tag')
    );
  });
});

Coverage improvement: 67% → 100% with tests you wouldn't have thought of manually.

Cost Analysis

Per-file cost with Claude Sonnet 4.5:

File Size	Complexity	API Cost	Time
50 lines	Low	$0.02	30s
200 lines	Medium	$0.08	90s
500 lines	High	$0.25	4min

Full codebase (500 files, mixed complexity):

Total cost: ~$30-50
Total time: 2-3 hours (vs 80+ hours manual)
ROI: Saves 70+ developer hours per test suite

Cost optimization:

Use batch API for 50% discount on non-urgent generations
Cache test patterns to avoid re-analyzing similar files
Run nightly for changed files only

Troubleshooting

Agent generates tests that don't compile

# Auto-fix import errors and type issues
npx testgen fix src/**/*.test.ts

# If that fails, regenerate with stricter validation
npx testgen generate src/file.ts --validate-syntax

Coverage target not reached after max iterations

// Check which branches are uncovered
npx vitest --coverage --reporter=verbose

// Manually inspect the uncovered code
// Often it's unreachable code or defensive checks

If code is legitimately unreachable:

/* istanbul ignore next */
if (process.env.NODE_ENV === 'development') {
  // Debug-only code that shouldn't count against coverage
}

Tests pass locally but fail in CI

# Run tests in CI mode locally
npm test -- --ci

# Agent might have used environment-specific values
# Regenerate with --ci-mode flag
npx testgen generate src/ --ci-mode

Common issues:

Hardcoded timestamps or random values
Tests depend on file system state
Mocks don't match CI environment

Framework-Specific Examples

Vitest + TypeScript

// vite.config.ts
import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
      include: ['src/**/*.ts'],
      exclude: ['**/*.test.ts', '**/*.spec.ts'],
      all: true,
      lines: 95,
      functions: 95,
      branches: 95,
      statements: 95
    }
  }
});

Pytest + Python

# conftest.py - Agent learns your fixture patterns
import pytest

@pytest.fixture
def mock_db():
    """Example: How to mock database connections"""
    db = MagicMock()
    db.query.return_value = [{"id": 1, "name": "test"}]
    return db

@pytest.fixture
def sample_user():
    """Example: How to create test data"""
    return {"email": "test@example.com", "role": "admin"}

# Generate pytest tests
testgen generate src/services/ --framework pytest

Tested on Vitest 2.1.0, Pytest 8.0.0, Claude Sonnet 4.5, Node.js 22.x, Python 3.12