Problem: LLMs Return Inconsistent JSON
You're building an AI feature and the LLM sometimes returns valid JSON, sometimes broken strings, and occasionally hallucinates fields that don't exist.
You'll learn:
- How Pydantic AI enforces schemas automatically
- Why this beats manual JSON parsing and validation
- How to handle retries when LLMs fail validation
Time: 12 min | Level: Intermediate
Why This Happens
LLMs are probabilistic—they don't guarantee structure. Even with "JSON mode" enabled, models can:
- Return extra fields you didn't ask for
- Miss required fields
- Use wrong data types (
"42"instead of42) - Hallucinate nested objects
Common symptoms:
JSONDecodeErrorin production- Silent failures when optional fields are missing
- Type errors downstream (
intexpected, gotstr) - Inconsistent behavior between API calls
Solution
Step 1: Install Pydantic AI
# Requires Python 3.9+
pip install 'pydantic-ai[openai]' --break-system-packages
# Or for Anthropic/Google
pip install 'pydantic-ai[anthropic]' --break-system-packages
Expected: Installation completes without errors. Check version:
python -c "import pydantic_ai; print(pydantic_ai.__version__)"
Step 2: Define Your Schema
from pydantic import BaseModel, Field
from pydantic_ai import Agent
class ProductReview(BaseModel):
"""Structured product review extraction"""
sentiment: str = Field(
description="positive, negative, or neutral",
pattern="^(positive|negative|neutral)$" # Enforce exact values
)
rating: int = Field(ge=1, le=5) # Between 1-5
key_points: list[str] = Field(
min_length=1,
max_length=5,
description="Main points from review"
)
would_recommend: bool
# Optional field with default
price_mentioned: float | None = None
Why this works: Pydantic validates:
- Field types (int, str, bool)
- Constraints (min/max, regex patterns)
- Required vs optional fields
- Nested structures automatically
Step 3: Create Agent with Schema
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
# Agent automatically uses ProductReview as return type
agent = Agent(
model=OpenAIModel('gpt-4o-mini'),
result_type=ProductReview, # This enforces the schema
retries=2 # Auto-retry if validation fails
)
# Simple usage - returns validated ProductReview object
result = agent.run_sync(
"Review: This laptop is amazing! Fast, great screen. "
"Worth the $1299. Highly recommend. 5/5"
)
print(result.data.sentiment) # "positive" (guaranteed string)
print(result.data.rating) # 5 (guaranteed int 1-5)
print(result.data.key_points) # ["fast", "great screen", ...] (guaranteed list)
Expected: Returns a fully validated ProductReview object. No manual parsing needed.
If it fails:
- Error: "Retry limit exceeded": LLM couldn't produce valid schema after 2 tries. Check your prompt is clear.
- Error: "API key not found": Set
OPENAI_API_KEYenvironment variable
Step 4: Handle Validation Errors
from pydantic import ValidationError
from pydantic_ai.exceptions import ModelRetry
async def extract_review(text: str) -> ProductReview | None:
"""Safe extraction with error handling"""
try:
result = await agent.run(text)
return result.data
except ModelRetry as e:
# LLM failed validation after retries
print(f"Model couldn't generate valid schema: {e}")
return None
except ValidationError as e:
# Should rarely happen (agent retries automatically)
print(f"Validation error: {e}")
return None
Why this matters: Pydantic AI retries automatically when validation fails, sending the error back to the LLM to fix itself.
Step 5: Advanced - Multiple Schemas
from typing import Literal
class PositiveReview(BaseModel):
sentiment: Literal["positive"]
rating: int = Field(ge=4, le=5)
highlights: list[str]
class NegativeReview(BaseModel):
sentiment: Literal["negative"]
rating: int = Field(ge=1, le=2)
complaints: list[str]
# Union type - LLM picks the right one
agent = Agent(
model=OpenAIModel('gpt-4o-mini'),
result_type=PositiveReview | NegativeReview
)
result = agent.run_sync("This product is terrible")
if isinstance(result.data, NegativeReview):
print(result.data.complaints) # Type-safe access
Why this works: Pydantic AI uses function calling under the hood to force the LLM to pick one schema variant.
Verification
Test it:
# Test script
test_reviews = [
"Great product! 5 stars",
"Broke after 2 days. Waste of money.",
"It's okay, nothing special. $50 seemed fair.",
]
for review in test_reviews:
result = agent.run_sync(review)
assert isinstance(result.data, ProductReview)
assert 1 <= result.data.rating <= 5
assert result.data.sentiment in ["positive", "negative", "neutral"]
print(f"✓ Valid: {result.data.rating}/5 - {result.data.sentiment}")
You should see: All assertions pass, no JSON parsing errors.
What You Learned
- Pydantic AI enforces schemas using LLM function calling
- Automatic retries when validation fails (LLM fixes itself)
- Type-safe outputs—no manual JSON parsing
- Works with OpenAI, Anthropic, Google, local models
Limitations:
- Requires models that support function calling (GPT-3.5+, Claude 3+, Gemini 1.5+)
- Retries use extra tokens (monitor costs)
- Complex nested schemas may confuse smaller models
When NOT to use this:
- Simple text generation (no structure needed)
- When you need the raw LLM response
- Streaming responses (Pydantic AI buffers full response)
Bonus: Streaming with Validation
# Stream tokens while building validated object
async with agent.run_stream("Review this product...") as stream:
async for chunk in stream.stream_text():
print(chunk, end="", flush=True) # Stream to user
# Final result is still validated
result = await stream.get_data()
assert isinstance(result, ProductReview)
Use case: Show progress to users while guaranteeing valid output.
Real-World Example: Content Moderation
from enum import Enum
class ModerationDecision(str, Enum):
APPROVE = "approve"
REJECT = "reject"
REVIEW = "needs_review"
class ModerationResult(BaseModel):
decision: ModerationDecision
confidence: float = Field(ge=0.0, le=1.0)
reason: str = Field(min_length=10)
flagged_content: list[str] = Field(default_factory=list)
moderator = Agent(
model=OpenAIModel('gpt-4o-mini'),
result_type=ModerationResult,
retries=3,
system_prompt=(
"You are a content moderator. Reject harmful content, "
"flag suspicious content for review, approve safe content."
)
)
# Production usage
content = "User submitted comment here..."
result = await moderator.run(content)
if result.data.decision == ModerationDecision.REJECT:
# Guaranteed to have valid reason (min 10 chars)
log_rejection(result.data.reason)
elif result.data.confidence < 0.8:
# Send to human review
queue_for_review(content, result.data)
Why this beats manual parsing: You get autocomplete, type checking, and guaranteed valid enums. No if "decision" in response checks needed.
Tested on Pydantic AI 0.0.14, Python 3.12, OpenAI GPT-4o-mini, macOS & Ubuntu