Problem: Writing Tests Takes Longer Than Writing Code
You ship a feature in 2 hours but spend 4 hours writing tests to hit 80% coverage. Your CI pipeline blocks merges under 90%, and you're stuck writing repetitive test cases for edge cases you already handled in the code.
You'll learn:
- How to set up an agentic workflow for test generation
- Why AI agents catch edge cases you miss
- When to trust generated tests vs writing manually
Time: 25 min | Level: Intermediate
Why This Happens
Traditional test generation tools use static analysis and struggle with complex logic. They generate basic happy-path tests but miss edge cases, error handling, and integration points. You end up writing the hard tests yourself anyway.
Common symptoms:
- Coverage stuck at 70-80% despite hours of effort
- Tests pass but production bugs still slip through
- Repetitive test patterns for similar functions
- No tests for error boundaries or race conditions
Solution
Step 1: Install the Agentic Test Runner
We'll use a local LLM-powered agent that reads your code and generates tests iteratively until coverage targets are met.
# Install the test agent CLI
npm install -D @testgen/agent-runner
# or
pip install testgen-agent
For TypeScript projects:
npx testgen init --framework vitest
For Python projects:
testgen init --framework pytest
Expected: Creates .testgen/config.json with your test framework settings.
If it fails:
- "No package.json found": Run from project root
- "Unsupported framework": Currently supports Vitest, Jest, Pytest, unittest
Step 2: Configure the Agent Workflow
Edit .testgen/config.json to define coverage targets and agent behavior:
{
"coverage": {
"target": 100,
"statements": 95,
"branches": 100,
"functions": 100
},
"agent": {
"model": "claude-sonnet-4-5-20250929",
"iterations": 5,
"strategy": "iterative-refinement"
},
"files": {
"include": ["src/**/*.ts", "!src/**/*.test.ts"],
"prioritize": ["critical", "business-logic"]
},
"testPatterns": {
"edgeCases": true,
"errorHandling": true,
"mockExternal": true,
"parallelSafe": true
}
}
Why this works: The agent runs in iterations. First pass generates basic tests, then it reads the coverage report and generates tests for uncovered branches. Repeat until target is met.
Strategy options:
iterative-refinement: Generate → measure → fill gaps (recommended)complete-upfront: Analyze entire file, generate all tests once (faster but less thorough)interactive: Prompts you to review each test batch
Step 3: Generate Tests for a Module
# Generate tests for a specific file
npx testgen generate src/utils/validator.ts
# Or generate for entire directory
npx testgen generate src/services/
Expected output:
Analyzing src/utils/validator.ts...
├─ Found 8 functions, 23 branches
├─ Generating initial test suite...
├─ Coverage: 73% (17/23 branches)
├─ Iteration 2: Generating tests for uncovered paths...
├─ Coverage: 91% (21/23 branches)
├─ Iteration 3: Edge case detection...
├─ Coverage: 100% (23/23 branches)
âœ" Generated src/utils/validator.test.ts (47 tests)
Generated test example:
// validator.test.ts - Auto-generated by testgen
import { describe, it, expect } from 'vitest';
import { validateEmail } from './validator';
describe('validateEmail', () => {
// Happy path - agent understands common cases
it('accepts valid email addresses', () => {
expect(validateEmail('user@example.com')).toBe(true);
expect(validateEmail('test.name+tag@domain.co.uk')).toBe(true);
});
// Edge cases - agent reads your validation logic
it('rejects emails without @', () => {
expect(validateEmail('notanemail')).toBe(false);
});
it('handles empty and whitespace strings', () => {
expect(validateEmail('')).toBe(false);
expect(validateEmail(' ')).toBe(false);
});
// Error boundaries - agent checks null/undefined
it('handles null and undefined inputs', () => {
expect(validateEmail(null as any)).toBe(false);
expect(validateEmail(undefined as any)).toBe(false);
});
// Agent found this edge case by reading your regex
it('rejects consecutive dots in local part', () => {
expect(validateEmail('user..name@example.com')).toBe(false);
});
});
If it fails:
- "API key not found": Set
ANTHROPIC_API_KEYenvironment variable - Generated tests don't compile: Run
npx testgen fixto auto-correct imports - Coverage still under target: Increase
agent.iterationsto 10
Step 4: Review and Approve Tests
The agent generates tests but doesn't automatically commit them. Review first:
# See what changed
git diff src/utils/validator.test.ts
# Run the generated tests
npm test
# If tests pass, commit
git add src/utils/validator.test.ts
git commit -m "test: add comprehensive validator tests (100% coverage)"
What to look for:
- Do tests assert meaningful behavior? Agent should test logic, not implementation
- Are mocks realistic? Agent should mock external APIs, not internal functions
- Do edge cases make sense? Agent sometimes generates paranoid tests (e.g., testing if
1 + 1 = 2)
Remove tests that:
- Test the language/framework itself
- Are duplicates with slightly different inputs
- Assert internal implementation details you might refactor
Step 5: Integrate with CI/CD
Add to your CI pipeline to block merges that drop coverage:
# .github/workflows/test.yml
name: Tests
on: [pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: npm ci
- name: Run tests with coverage
run: npm test -- --coverage
- name: Generate missing tests
if: failure()
run: npx testgen generate --ci-mode
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Verify coverage target
run: |
COVERAGE=$(cat coverage/coverage-summary.json | jq '.total.lines.pct')
if (( $(echo "$COVERAGE < 95" | bc -l) )); then
echo "Coverage $COVERAGE% is below 95%"
exit 1
fi
Why this works: If coverage drops, the agent automatically generates tests to restore it. Developers can review in the PR before merging.
Step 6: Train the Agent on Your Patterns
Create .testgen/examples.ts with your preferred test style:
// examples.ts - Agent learns from these patterns
import { describe, it, expect, vi } from 'vitest';
// Example: How to test async functions
describe('Example: Async operations', () => {
it('waits for promises to resolve', async () => {
const result = await fetchData();
expect(result).toBeDefined();
});
});
// Example: How to mock external dependencies
describe('Example: Mocking APIs', () => {
it('mocks fetch calls', async () => {
global.fetch = vi.fn().mockResolvedValue({
json: async () => ({ data: 'test' })
});
const result = await apiCall();
expect(global.fetch).toHaveBeenCalledWith('/api/endpoint');
});
});
// Example: How to test error handling
describe('Example: Error scenarios', () => {
it('throws on invalid input', () => {
expect(() => parseConfig(null)).toThrow('Config cannot be null');
});
});
Run training:
npx testgen learn examples.ts
Expected: Agent now generates tests matching your style conventions.
Verification
Run your full test suite with coverage:
npm test -- --coverage
You should see:
Test Suites: 24 passed, 24 total
Tests: 187 passed, 187 total
Coverage: 100% Statements (543/543)
100% Branches (89/89)
100% Functions (112/112)
100% Lines (498/498)
Check the HTML report:
open coverage/index.html
Every file should show green (100% covered). Click through to verify tests actually exercise the logic.
What You Learned
- Agentic workflows iterate until coverage targets are met, unlike one-shot generators
- AI agents catch edge cases by analyzing your code's branching logic
- Generated tests need review - agents can create paranoid or redundant tests
- Training on examples makes agents match your team's test style
Limitations:
- Agent-generated tests focus on code coverage, not business logic correctness
- Works best on pure functions; struggles with complex UI interactions
- Costs ~$0.02-0.10 per file with Claude Sonnet (batch API reduces this)
- Some tests may be brittle if you refactor implementation details
When NOT to use this:
- Integration tests that require real databases or APIs
- E2E tests that need user interaction simulation
- Security tests that need adversarial thinking
- Tests for visual correctness or UX behavior
Advanced: Multi-Agent Test Generation
For complex codebases, use specialized agents:
{
"agents": {
"unit": {
"model": "claude-sonnet-4-5",
"focus": ["pure-functions", "business-logic"]
},
"integration": {
"model": "claude-opus-4-5",
"focus": ["api-routes", "database-queries"]
},
"edge": {
"model": "claude-sonnet-4-5",
"temperature": 0.8,
"focus": ["error-handling", "race-conditions"]
}
},
"workflow": "parallel"
}
Run multi-agent mode:
npx testgen generate src/ --agents all
Why this works: Each agent specializes. The edge agent uses higher temperature to find unusual test cases. integration agent uses the smarter model for complex scenarios.
Real-World Example: Before vs After
Before (manual testing):
// auth.test.ts - 67% coverage, 2 hours of work
describe('authenticateUser', () => {
it('returns token for valid credentials', async () => {
const result = await authenticateUser('user@example.com', 'password123');
expect(result.token).toBeDefined();
});
it('throws on invalid password', async () => {
await expect(
authenticateUser('user@example.com', 'wrong')
).rejects.toThrow();
});
});
After (agent-generated in 3 minutes):
// auth.test.ts - 100% coverage, auto-generated + 10 min review
describe('authenticateUser', () => {
// Original tests preserved
it('returns token for valid credentials', async () => {
const result = await authenticateUser('user@example.com', 'password123');
expect(result.token).toBeDefined();
expect(result.expiresAt).toBeInstanceOf(Date);
});
it('throws on invalid password', async () => {
await expect(
authenticateUser('user@example.com', 'wrong')
).rejects.toThrow('Invalid credentials');
});
// Agent-generated edge cases
it('handles SQL injection attempts in email', async () => {
await expect(
authenticateUser("admin'--", 'password')
).rejects.toThrow('Invalid email format');
});
it('rate limits repeated failed attempts', async () => {
for (let i = 0; i < 5; i++) {
await authenticateUser('user@example.com', 'wrong').catch(() => {});
}
await expect(
authenticateUser('user@example.com', 'password123')
).rejects.toThrow('Too many attempts');
});
it('handles concurrent authentication requests', async () => {
const promises = Array(10).fill(null).map(() =>
authenticateUser('user@example.com', 'password123')
);
const results = await Promise.allSettled(promises);
expect(results.filter(r => r.status === 'fulfilled')).toHaveLength(10);
});
it('sanitizes user input before database query', async () => {
const spy = vi.spyOn(db, 'query');
await authenticateUser('user+tag@example.com', 'pass');
expect(spy).toHaveBeenCalledWith(
expect.not.stringContaining('+tag')
);
});
});
Coverage improvement: 67% → 100% with tests you wouldn't have thought of manually.
Cost Analysis
Per-file cost with Claude Sonnet 4.5:
| File Size | Complexity | API Cost | Time |
|---|---|---|---|
| 50 lines | Low | $0.02 | 30s |
| 200 lines | Medium | $0.08 | 90s |
| 500 lines | High | $0.25 | 4min |
Full codebase (500 files, mixed complexity):
- Total cost: ~$30-50
- Total time: 2-3 hours (vs 80+ hours manual)
- ROI: Saves 70+ developer hours per test suite
Cost optimization:
- Use batch API for 50% discount on non-urgent generations
- Cache test patterns to avoid re-analyzing similar files
- Run nightly for changed files only
Troubleshooting
Agent generates tests that don't compile
# Auto-fix import errors and type issues
npx testgen fix src/**/*.test.ts
# If that fails, regenerate with stricter validation
npx testgen generate src/file.ts --validate-syntax
Coverage target not reached after max iterations
// Check which branches are uncovered
npx vitest --coverage --reporter=verbose
// Manually inspect the uncovered code
// Often it's unreachable code or defensive checks
If code is legitimately unreachable:
/* istanbul ignore next */
if (process.env.NODE_ENV === 'development') {
// Debug-only code that shouldn't count against coverage
}
Tests pass locally but fail in CI
# Run tests in CI mode locally
npm test -- --ci
# Agent might have used environment-specific values
# Regenerate with --ci-mode flag
npx testgen generate src/ --ci-mode
Common issues:
- Hardcoded timestamps or random values
- Tests depend on file system state
- Mocks don't match CI environment
Framework-Specific Examples
Vitest + TypeScript
// vite.config.ts
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
coverage: {
provider: 'v8',
reporter: ['text', 'json', 'html'],
include: ['src/**/*.ts'],
exclude: ['**/*.test.ts', '**/*.spec.ts'],
all: true,
lines: 95,
functions: 95,
branches: 95,
statements: 95
}
}
});
Pytest + Python
# conftest.py - Agent learns your fixture patterns
import pytest
@pytest.fixture
def mock_db():
"""Example: How to mock database connections"""
db = MagicMock()
db.query.return_value = [{"id": 1, "name": "test"}]
return db
@pytest.fixture
def sample_user():
"""Example: How to create test data"""
return {"email": "test@example.com", "role": "admin"}
# Generate pytest tests
testgen generate src/services/ --framework pytest
Tested on Vitest 2.1.0, Pytest 8.0.0, Claude Sonnet 4.5, Node.js 22.x, Python 3.12