Migrating a Python Codebase to Full Type Safety with AI: mypy Strict Mode

Step-by-step guide to using AI agents to add type hints to an existing Python project and reach mypy strict mode compliance — without breaking production.

Running mypy on your existing codebase returns 847 errors. AI can fix 70% of them in one pass — here's the order that doesn't break production.

You know you should type-hint your Python. The stats are clear: type hints adoption in Python projects exploded from 48% to 71% between 2022 and 2025 (JetBrains). It’s not just a trend; it’s the new baseline for maintainable code. But the gap between sprinkling a few : str annotations and achieving full type safety is a chasm filled with Any, None, and the existential dread of List[Dict[str, Optional[Union[int, float]]]]. Turning on mypy --strict feels like kicking over a hornet's nest. The secret isn't to swat each error by hand. It's to use AI as your automated swarm control, directing the firepower while you handle the strategic strikes.

Why mypy Strict Mode Is the Professional's Choice

Let's be clear: partial typing is like a partial parachute. It gives you a false sense of security. mypy in its default, lenient mode lets Any types propagate silently, turning your beautifully hinted function arguments into untyped sludge by the third line. Strict mode (--strict or strict = true in pyproject.toml) removes the training wheels. It enforces:

  • No implicit Any (every type must be known or explicitly declared).
  • Checking all generic types (not just the outermost container).
  • Ensuring functions have return type annotations.
  • Verifying that decorated functions have matching signatures.

The payoff isn't theoretical. In a mature codebase, enabling strict mode typically surfaces dozens of latent bugs: operations on potentially None values, mismatched sequence types passed to functions expecting specific iterables, and API calls with swapped argument order that just happened to work because everything was a string. It transforms runtime TypeError: 'NoneType' object is not subscriptable surprises into clear, immediate feedback in your editor. The pain is upfront; the benefit is catching the production bug before you write the commit message.

Phase 1: Let AI Annotate the Obvious Signatures

Your first move is not to open a single .py file. It's to configure your AI assistant. In VS Code (Ctrl+Shift+P to open the command palette), ensure you have GitHub Copilot, Continue.dev, or Codeium active. These tools are trained on millions of typed Python functions and can infer types from context, default values, and docstrings with shocking accuracy.

Start with the lowest-hanging fruit: function and method signatures without any return types. Create a script or use your AI's chat panel with a directive like this:



def process_data(items, threshold=0.5):
    """Filters items based on score exceeding threshold."""
    return [item for item in items if item['score'] > threshold]

def connect_to_db(host, port=5432):
    engine = create_engine(f"postgresql://{host}:{port}/db")
    return engine

# Example AI Output:
from typing import Any, List, Dict
from sqlalchemy.engine import Engine
from sqlalchemy import create_engine

def process_data(items: List[Dict[str, Any]], threshold: float = 0.5) -> List[Dict[str, Any]]:
    """Filters items based on score exceeding threshold."""
    return [item for item in items if item['score'] > threshold]

def connect_to_db(host: str, port: int = 5432) -> Engine:
    engine = create_engine(f"postgresql://{host}:{port}/db")
    return engine

Run this pass across your entire codebase, focusing on one module at a time. Use ruff to manage the formatting afterwards (ruff format .). This single step can eliminate 40-50% of your mypy errors. The key is to review, not blindly accept. The AI gets it right most of the time, but you are the final arbiter.

Taming the Complex: Unions, Optionals, and Generics

Now you hit the hard cases. AI starts to hesitate, and you need to apply actual type theory. The most common patterns:

  1. Optional[T]: A value that can be T or None. The fix for Argument 1 to "process" has incompatible type "None"; expected "str" is almost always to change arg: str to arg: Optional[str] and add a guard.

    • Real Error Fix: TypeError: 'NoneType' object is not subscriptable
    • Exact Fix: Add a None guard before indexing: if data is not None: value = data["key"]
  2. Union[T, U]: Used when a function accepts multiple, distinct types (e.g., str or int). Modern Python 3.10+ syntax T | U is cleaner.

  3. Generics (list[T], dict[K, V]): Specify what's inside your containers. Changing List to List[str] is where strict mode earns its keep, preventing you from accidentally mixing types.

For complex, project-specific types, define a TypeAlias. This is invaluable for data processing pipelines.

# Instead of this mess haunting every signature:
def analyze(df: pd.DataFrame, config: Dict[str, Union[str, int, List[float], None]]) -> Tuple[pd.Series, Optional[Exception]]:

# Define a TypeAlias at the top of your module or a types.py file
from typing import TypeAlias, Union
import pandas as pd

AnalysisConfig: TypeAlias = dict[str, Union[str, int, list[float], None]]
AnalysisResult: TypeAlias = tuple[pd.Series, Union[Exception, None]]

def analyze(df: pd.DataFrame, config: AnalysisConfig) -> AnalysisResult:
    # ... function logic

Use your AI assistant with a prompt like: "Refactor the following complex type hints into TypeAlias definitions for better readability." It will handle the boilerplate extraction.

Pyright vs mypy: Which AI Understands Better?

You're not locked into one tool. mypy is the established, rigorous champion. Pyright (which powers Pylance in VS Code) is the faster, more developer-friendly contender. Here’s the breakdown:

ToolSpeed (Typical Project)StrengthsWeaknessesBest for AI Pairing
mypySlower, caching helpsExtreme rigor, vast ecosystem of plugin stubs, definitive for strict mode.Can be pedantic, slower feedback in editor.Final gatekeeper. Use AI to fix errors it reports.
PyrightVery Fast (~ruff-speed)Excellent inference, fantastic editor integration (instant squiggles), better with complex expressions.Slightly less strict by default on some generic rules.Interactive refactoring. Its speed lets you and your AI iterate in real-time.

The winning strategy: Use Pyright in your editor for live feedback (it's that fast, akin to ruff linting 1M lines in 0.29s vs flake8's 16s). Then, use mypy with strict mode in your CI pipeline as the final, unforgiving arbiter. Configure both in your pyproject.toml:

[tool.mypy]
strict = true
python_version = "3.10"
warn_unused_configs = true

[tool.pyright]
typeCheckingMode = "strict"
pythonVersion = "3.10"

Ask your AI: "Explain the difference between this Pyright error and this mypy error for the same line of code." It's excellent at translating between the two linters' dialects.

Migrating Third-Party Library Stubs with AI Help

Your code is now clean, but mypy screams about sqlalchemy, pandas, or fastapi. These libraries have complex, dynamic behaviors that are hard to type. The solution is type stubs.

  1. Install official stubs or types- packages: uv add sqlalchemy2-stubs -d or uv add types-requests -d.
  2. For libraries without stubs (or partial stubs), use AI to draft your own. Create a stubs/ directory and add it to your MYPYPATH.

Example: You have a legacy internal utility lib/legacy_helper.py that's untyped. Instead of refactoring it immediately, you can create stubs/legacy_helper.pyi:

# AI Prompt: Generate a .pyi type stub interface for the following module. Focus on public functions and classes.

# legacy_helper.py (original, messy)
def get_config(key):
    # ... complex logic returning str, dict, or None
    pass

# AI-generated stub: stubs/legacy_helper.pyi
from typing import Any, Optional, Union

def get_config(key: str) -> Union[str, dict[str, Any], None]: ...

This silences mypy for that external dependency, allowing you to progress with your core migration. You can then gradually replace the stubbed library with a typed alternative.

Benchmark: The Tangible Impact on Code Safety

What do you gain from this ordeal? Let's quantify it with a pre- and post-migration analysis of a typical API module.

Scenario: A FastAPI endpoint (/data/{id}) fetching and processing user data. FastAPI, used by 42% of new Python API projects (JetBrains 2025), integrates deeply with type hints for validation and OpenAPI generation.

MetricBefore Strict TypingAfter Strict Typing + AI RefactorTool That Caught It
Potential Runtime TypeError3 (e.g., id as str vs int, missing None guard)0mypy (strict)
Incorrect API Response Shape1 (dict missing expected key)0Pydantic Model Validation
None-related Bugs20Pyright (editor squiggles)
Code Completion AccuracyLow/GenericHigh/SpecificVS Code Pylance
Refactor ConfidenceLow (fear of breaking hidden contracts)High (types define the contract)N/A

The table shows the shift from hoping it works to knowing it works. The types become a live, machine-verifiable specification.

Enforcing the New Standard: CI and Pre-commit

Your local environment is clean. To keep it that way and ensure team-wide adoption, automate enforcement.

  1. Pre-commit Hooks (Local Guard): Use pre-commit to run checks before every commit.

    # .pre-commit-config.yaml
    repos:
      - repo: https://github.com/astral-sh/ruff-pre-commit
        rev: v0.4.4
        hooks: [ { id: ruff, args: ['--fix'] }, { id: ruff-format } ]
      - repo: https://github.com/pre-commit/mirrors-mypy
        rev: v1.10.0
        hooks: [ { id: mypy } ]
    

    Run pre-commit install. Now, git commit triggers ruff and mypy. If they fail, the commit is blocked.

  2. GitHub Actions (CI Gate): This is your final, non-negotiable check. The workflow should install dependencies with a fast tool like uv (which is 10–100x faster than pip for cold installs) and run the full test suite against strictly typed code.

    # .github/workflows/ci.yml
    - name: Install with uv
      run: uv sync --all-extras --dev
    - name: Type Check
      run: uv run mypy --strict src/
    - name: Test
      run: uv run pytest tests/  # pytest is used by 84% of Python devs (Python Dev Survey 2025)
    

Next Steps: Living in a Typed World

You've migrated. The CI is green. The 847 errors are a memory. What now?

First, leverage your investment. Use pytest with type-aware plugins like pytest-mypy or property-based testing with Hypothesis, which can use your type hints to generate more intelligent test cases. Refactor with confidence: use your IDE's "Rename Symbol" (F2) or "Extract Method" knowing that type errors will immediately flag any missteps.

Second, tackle technical debt incrementally. Enable specific strictness flags one by one in pyproject.toml (e.g., warn_unused_ignores, warn_redundant_casts). This will surface the next layer of minor issues without overwhelming you.

Finally, write new code with types from line one. The muscle memory changes. You'll start writing the signature def transform(data: pd.DataFrame) -> pl.DataFrame: before the function body, designing the data flow upfront. Your AI assistant becomes more powerful in this context, suggesting correct methods because it understands the expected types.

The goal was never just to satisfy a linter. It was to transform your codebase from a dynamic script collection into a statically verifiable, self-documenting system. With AI handling the bulk of the annotation grunt work and you steering through the complex architectural decisions, you've not only fixed those 847 errors—you've built a foundation that prevents the next 847 from ever being written. Your production server, and your future self debugging at 2 AM, will thank you.