Validate LLM Outputs with Pydantic AI in 12 Minutes

Problem: LLMs Return Inconsistent JSON

You're building an AI feature and the LLM sometimes returns valid JSON, sometimes broken strings, and occasionally hallucinates fields that don't exist.

You'll learn:

How Pydantic AI enforces schemas automatically
Why this beats manual JSON parsing and validation
How to handle retries when LLMs fail validation

Time: 12 min | Level: Intermediate

Why This Happens

LLMs are probabilistic—they don't guarantee structure. Even with "JSON mode" enabled, models can:

Return extra fields you didn't ask for
Miss required fields
Use wrong data types ("42" instead of 42)
Hallucinate nested objects

Common symptoms:

JSONDecodeError in production
Silent failures when optional fields are missing
Type errors downstream (int expected, got str)
Inconsistent behavior between API calls

Solution

Step 1: Install Pydantic AI

# Requires Python 3.9+
pip install 'pydantic-ai[openai]' --break-system-packages

# Or for Anthropic/Google
pip install 'pydantic-ai[anthropic]' --break-system-packages

Expected: Installation completes without errors. Check version:

python -c "import pydantic_ai; print(pydantic_ai.__version__)"

Step 2: Define Your Schema

from pydantic import BaseModel, Field
from pydantic_ai import Agent

class ProductReview(BaseModel):
    """Structured product review extraction"""
    
    sentiment: str = Field(
        description="positive, negative, or neutral",
        pattern="^(positive|negative|neutral)$"  # Enforce exact values
    )
    rating: int = Field(ge=1, le=5)  # Between 1-5
    key_points: list[str] = Field(
        min_length=1,
        max_length=5,
        description="Main points from review"
    )
    would_recommend: bool
    
    # Optional field with default
    price_mentioned: float | None = None

Why this works: Pydantic validates:

Field types (int, str, bool)
Constraints (min/max, regex patterns)
Required vs optional fields
Nested structures automatically

Step 3: Create Agent with Schema

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

# Agent automatically uses ProductReview as return type
agent = Agent(
    model=OpenAIModel('gpt-4o-mini'),
    result_type=ProductReview,  # This enforces the schema
    retries=2  # Auto-retry if validation fails
)

# Simple usage - returns validated ProductReview object
result = agent.run_sync(
    "Review: This laptop is amazing! Fast, great screen. "
    "Worth the $1299. Highly recommend. 5/5"
)

print(result.data.sentiment)  # "positive" (guaranteed string)
print(result.data.rating)     # 5 (guaranteed int 1-5)
print(result.data.key_points) # ["fast", "great screen", ...] (guaranteed list)

Expected: Returns a fully validated ProductReview object. No manual parsing needed.

If it fails:

Error: "Retry limit exceeded": LLM couldn't produce valid schema after 2 tries. Check your prompt is clear.
Error: "API key not found": Set OPENAI_API_KEY environment variable

Step 4: Handle Validation Errors

from pydantic import ValidationError
from pydantic_ai.exceptions import ModelRetry

async def extract_review(text: str) -> ProductReview | None:
    """Safe extraction with error handling"""
    
    try:
        result = await agent.run(text)
        return result.data
        
    except ModelRetry as e:
        # LLM failed validation after retries
        print(f"Model couldn't generate valid schema: {e}")
        return None
        
    except ValidationError as e:
        # Should rarely happen (agent retries automatically)
        print(f"Validation error: {e}")
        return None

Why this matters: Pydantic AI retries automatically when validation fails, sending the error back to the LLM to fix itself.

Step 5: Advanced - Multiple Schemas

from typing import Literal

class PositiveReview(BaseModel):
    sentiment: Literal["positive"]
    rating: int = Field(ge=4, le=5)
    highlights: list[str]

class NegativeReview(BaseModel):
    sentiment: Literal["negative"]
    rating: int = Field(ge=1, le=2)
    complaints: list[str]
    
# Union type - LLM picks the right one
agent = Agent(
    model=OpenAIModel('gpt-4o-mini'),
    result_type=PositiveReview | NegativeReview
)

result = agent.run_sync("This product is terrible")
if isinstance(result.data, NegativeReview):
    print(result.data.complaints)  # Type-safe access

Why this works: Pydantic AI uses function calling under the hood to force the LLM to pick one schema variant.

Verification

Test it:

# Test script
test_reviews = [
    "Great product! 5 stars",
    "Broke after 2 days. Waste of money.",
    "It's okay, nothing special. $50 seemed fair.",
]

for review in test_reviews:
    result = agent.run_sync(review)
    assert isinstance(result.data, ProductReview)
    assert 1 <= result.data.rating <= 5
    assert result.data.sentiment in ["positive", "negative", "neutral"]
    print(f"✓ Valid: {result.data.rating}/5 - {result.data.sentiment}")

You should see: All assertions pass, no JSON parsing errors.

What You Learned

Pydantic AI enforces schemas using LLM function calling
Automatic retries when validation fails (LLM fixes itself)
Type-safe outputs—no manual JSON parsing
Works with OpenAI, Anthropic, Google, local models

Limitations:

Requires models that support function calling (GPT-3.5+, Claude 3+, Gemini 1.5+)
Retries use extra tokens (monitor costs)
Complex nested schemas may confuse smaller models

When NOT to use this:

Simple text generation (no structure needed)
When you need the raw LLM response
Streaming responses (Pydantic AI buffers full response)

Bonus: Streaming with Validation

# Stream tokens while building validated object
async with agent.run_stream("Review this product...") as stream:
    async for chunk in stream.stream_text():
        print(chunk, end="", flush=True)  # Stream to user
    
    # Final result is still validated
    result = await stream.get_data()
    assert isinstance(result, ProductReview)

Use case: Show progress to users while guaranteeing valid output.

Real-World Example: Content Moderation

from enum import Enum

class ModerationDecision(str, Enum):
    APPROVE = "approve"
    REJECT = "reject"
    REVIEW = "needs_review"

class ModerationResult(BaseModel):
    decision: ModerationDecision
    confidence: float = Field(ge=0.0, le=1.0)
    reason: str = Field(min_length=10)
    flagged_content: list[str] = Field(default_factory=list)

moderator = Agent(
    model=OpenAIModel('gpt-4o-mini'),
    result_type=ModerationResult,
    retries=3,
    system_prompt=(
        "You are a content moderator. Reject harmful content, "
        "flag suspicious content for review, approve safe content."
    )
)

# Production usage
content = "User submitted comment here..."
result = await moderator.run(content)

if result.data.decision == ModerationDecision.REJECT:
    # Guaranteed to have valid reason (min 10 chars)
    log_rejection(result.data.reason)
    
elif result.data.confidence < 0.8:
    # Send to human review
    queue_for_review(content, result.data)

Why this beats manual parsing: You get autocomplete, type checking, and guaranteed valid enums. No if "decision" in response checks needed.

Tested on Pydantic AI 0.0.14, Python 3.12, OpenAI GPT-4o-mini, macOS & Ubuntu