Achieve 100% Code Coverage (That Actually Matters) in 30 Minutes

Problem: Your 100% Coverage Doesn't Catch Bugs

Your CI shows 100% test coverage, but production bugs still slip through. The AI-generated tests execute every line but don't actually validate behavior.

You'll learn:

Why line coverage misleads teams
How to use AI for behavior-driven test generation
A 3-step workflow that catches real bugs

Time: 30 min | Level: Intermediate

Why This Happens

Coverage tools measure lines executed, not logic validated. A test that calls a function without assertions gives you 100% coverage and 0% confidence.

Common symptoms:

Tests pass but features break in production
Changing implementation breaks tests that shouldn't care
Coverage reports look great but you're afraid to refactor
AI generates tests like expect(result).toBeDefined()

Solution

Step 1: Define Test Boundaries

Before generating any tests, identify what actually matters.

// payment-processor.ts
export class PaymentProcessor {
  async processPayment(amount: number, currency: string): Promise<Receipt> {
    // 50 lines of implementation
  }
}

Ask the AI this instead of "write tests":

Analyze this PaymentProcessor class. What are:
1. Edge cases that could cause real money errors?
2. Error conditions users will hit?
3. State changes that could corrupt data?

Don't write tests yet. List scenarios.

Expected: You get a list like "negative amounts", "unsupported currency", "network timeouts", "duplicate transaction IDs"

If it fails:

AI jumps to code: Explicitly say "no code, scenarios only"
Generic answers: Provide context: "this handles $2M daily, chargebacks cost $50 each"

Step 2: Generate Behavior Tests

Now ask for tests that validate those scenarios.

// Prompt to AI
Based on scenario "negative amounts should be rejected":
Write a test that:
- Tries amount: -100
- Verifies the specific error thrown
- Checks no database writes happened
- Validates error includes amount for debugging

Use vitest. Show complete test with setup/teardown.

AI generates:

import { describe, it, expect, beforeEach, vi } from 'vitest';
import { PaymentProcessor } from './payment-processor';

describe('PaymentProcessor - negative amount handling', () => {
  let processor: PaymentProcessor;
  let mockDb: any;

  beforeEach(() => {
    mockDb = { write: vi.fn() };
    processor = new PaymentProcessor(mockDb);
  });

  it('rejects negative amounts before touching database', async () => {
    // This validates the actual business requirement
    await expect(
      processor.processPayment(-100, 'USD')
    ).rejects.toThrow('Amount must be positive, received: -100');
    
    // This ensures atomicity - critical for financial code
    expect(mockDb.write).not.toHaveBeenCalled();
  });
});

Why this works: The test validates business rules (positive amounts) and side effects (no DB writes). Line coverage is a byproduct.

Step 3: Validate with Mutation Testing

Coverage says you tested the code. Mutation testing proves your tests catch bugs.

# Install Stryker for mutation testing
npm install --save-dev @stryker-mutator/core @stryker-mutator/vitest-runner

# Run it
npx stryker run

What happens: Stryker modifies your code (e.g., changes amount > 0 to amount >= 0) and reruns tests. If tests still pass, you have weak tests.

// Stryker found this survived mutation
if (amount > 0) { ... }  // Changed to >= 0, tests passed

// Fix: Add boundary test
it('rejects zero amount', async () => {
  await expect(
    processor.processPayment(0, 'USD')
  ).rejects.toThrow('Amount must be positive, received: 0');
});

Expected: Stryker reports 80%+ mutation score (percentage of mutants killed by tests)

If it fails:

Too slow: Configure Stryker to test only modified files in CI
Low score (<60%): Focus on boundary conditions AI might miss

Verification

Test it:

# Check coverage
npm run test -- --coverage

# Check mutation score
npx stryker run

# Verify both metrics

You should see:

Line coverage: 95-100%
Mutation score: 75-85% (perfect 100% is often impractical)
Tests describe behaviors, not implementation

Real Example: Before/After

Before (AI default behavior)

// AI prompt: "write tests for calculateDiscount"
it('should calculate discount', () => {
  const result = calculateDiscount(100, 'SAVE10');
  expect(result).toBeDefined(); // Useless assertion
  expect(typeof result).toBe('number'); // Still useless
});

Coverage: 100% | Mutation score: 20% | Bugs caught: 0

After (behavior-driven prompt)

// AI prompt: "write tests for: invalid coupon should return full price, 
// expired coupon throws error, percentage coupons cap at 90% off"

it('returns full price for invalid coupon code', () => {
  expect(calculateDiscount(100, 'FAKE')).toBe(100);
});

it('throws for expired coupon with expiry date in error', () => {
  expect(() => calculateDiscount(100, 'EXPIRED2025'))
    .toThrow(/expired on 2025-01-01/);
});

it('caps percentage discounts at 90% to prevent negative prices', () => {
  // Catches bug: 100% off coupon caused negative prices
  expect(calculateDiscount(100, 'EVERYTHING_FREE')).toBe(10);
});

Coverage: 100% | Mutation score: 78% | Bugs caught: 3 production issues

What You Learned

Coverage measures execution, not correctness
Ask AI for scenarios before code
Mutation testing validates your test quality

Limitation: High mutation scores take time. Focus on critical paths first (payments, auth, data writes).

When NOT to use this:

Simple getters/setters (skip them in coverage)
Third-party library wrappers (integration tests better)
UI component snapshots (different testing strategy)

Language-Specific Quick Starts

Python (pytest + mutmut)

# Generate tests
pytest --cov=src --cov-report=html

# Mutation testing
pip install mutmut
mutmut run
mutmut results  # See which tests are weak

Go (go test + go-mutesting)

# Coverage
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out

# Mutation testing
go install github.com/zimmski/go-mutesting/cmd/go-mutesting@latest
go-mutesting ./...

Rust (cargo tarpaulin + cargo-mutants)

# Coverage
cargo tarpaulin --out Html

# Mutation testing
cargo install cargo-mutants
cargo mutants

AI Prompting Cheat Sheet

❌ Bad prompt: "Write unit tests for this code"

✅ Good prompt: "List 5 scenarios where this payment function could lose money or corrupt data. Include edge cases around currency conversion and timeouts. No code yet."

Then:

"Write a vitest test for scenario #2 (duplicate transaction IDs). The test should verify we return the original receipt without charging twice, and that we log the duplicate attempt."

Why: Specificity forces AI to think about behavior, not just syntax.

Tested with Claude Code 1.0, GPT-4, TypeScript 5.5, Python 3.12, Go 1.23, Rust 1.75