Problem: LLM Outputs Break JSON Schema Validation
You're using an LLM to generate structured JSON and it works 80% of the time — then fails in production with cryptic validation errors like 'price' is a required property or 'active' is not of type 'boolean'. The model looks compliant, but your validator disagrees.
You'll learn:
- Why LLMs produce structurally inconsistent JSON even with detailed prompts
- How to catch and classify validation failures before they reach your app
- How to combine schema enforcement with runtime repair for near-100% reliability
Time: 12 min | Level: Intermediate
Why This Happens
LLMs are next-token predictors, not schema-aware compilers. Even with a schema in the system prompt, the model may omit optional-looking fields, coerce types (returning "true" instead of true), add extra keys, or wrap the JSON in markdown fences like ```json.
Common symptoms:
ValidationError: 'field_name' is a required property'123' is not of type 'integer'— string where number expectedjsonschema.exceptions.ValidationErroron what looks like valid JSON- Intermittent failures — model is consistent 9/10 times but breaks in edge cases
Solution
Step 1: Isolate the Failure Class
Before fixing anything, identify which failure class you're dealing with. They have different fixes.
import json
import jsonschema
def diagnose_llm_output(raw_output: str, schema: dict) -> dict:
"""
Returns a dict with failure class and detail.
Run this before adding any repair logic.
"""
# Class 1: Not even JSON (markdown fences, preamble, etc.)
try:
parsed = json.loads(raw_output.strip())
except json.JSONDecodeError as e:
return {"class": "malformed_json", "detail": str(e), "raw": raw_output}
# Class 2: Valid JSON but wrong shape
try:
jsonschema.validate(instance=parsed, schema=schema)
return {"class": "valid", "parsed": parsed}
except jsonschema.ValidationError as e:
return {
"class": "schema_violation",
"path": list(e.absolute_path),
"message": e.message,
"parsed": parsed
}
Expected: You'll see one of three classes: malformed_json, schema_violation, or valid.
Run this on 20-30 real outputs from your LLM. The distribution tells you where to invest.
If it fails:
- All
malformed_json: Your prompt needs output format instruction — see Step 2 - Mix of classes: Fix malformed first, then schema violations — see Step 3
Step 2: Fix Malformed JSON (Extraction)
If the model wraps output in prose or markdown, strip it before parsing.
import re
def extract_json(raw: str) -> str:
"""
Strips markdown fences and leading/trailing prose.
Works for ```json blocks and inline JSON objects.
"""
# Remove ```json ... ``` fences
fenced = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", raw, re.DOTALL)
if fenced:
return fenced.group(1)
# Fall back: find first { ... } block
brace_match = re.search(r"(\{.*\})", raw, re.DOTALL)
if brace_match:
return brace_match.group(1)
return raw # Return as-is, let json.loads raise the error
# Usage
clean = extract_json(raw_llm_output)
parsed = json.loads(clean)
Also add this to your system prompt — it reduces fencing by ~90%:
Respond ONLY with a valid JSON object. No markdown, no explanation, no code fences.
Start your response with { and end with }.
Why this works: The instruction shifts the token distribution away from ``` as a starting token. The regex handles the remaining cases from cached or fine-tuned model behavior.
Step 3: Fix Schema Violations (Type Coercion + Defaults)
Schema violations usually fall into three patterns: wrong type, missing required field, or extra fields.
from pydantic import BaseModel, validator, Field
from typing import Optional
# Define your schema as a Pydantic model — it handles coercion automatically
class ProductOutput(BaseModel):
name: str
price: float # LLMs often return "9.99" — Pydantic coerces it
in_stock: bool # LLMs often return "true" — Pydantic handles it
category: Optional[str] = "uncategorized" # Default for missing fields
class Config:
# Strip extra fields instead of raising
extra = "ignore"
def parse_llm_product(raw: str) -> ProductOutput:
clean = extract_json(raw) # From Step 2
data = json.loads(clean)
return ProductOutput(**data) # Raises ValidationError with clear message if it still fails
Why Pydantic over jsonschema here: jsonschema validates — it won't coerce "true" to True. Pydantic validates and coerces, which matches what you need for LLM outputs.
If it fails:
value is not a valid float: The model returned something like"N/A". Add a@validatorto handle it:
@validator("price", pre=True)
def handle_price(cls, v):
try:
return float(v)
except (ValueError, TypeError):
return 0.0 # Or raise, depending on your needs
Step 4: Add a Retry Layer for Hard Failures
For cases Pydantic can't auto-repair, retry the LLM with the validation error as feedback.
from pydantic import ValidationError
import anthropic # or your preferred client
client = anthropic.Anthropic()
def get_validated_output(prompt: str, schema_model, max_retries: int = 2):
messages = [{"role": "user", "content": prompt}]
for attempt in range(max_retries + 1):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages
)
raw = response.content[0].text
try:
clean = extract_json(raw)
data = json.loads(clean)
return schema_model(**data)
except (json.JSONDecodeError, ValidationError) as e:
if attempt == max_retries:
raise # Give up after max retries
# Feed the error back — this works surprisingly well
messages.append({"role": "assistant", "content": raw})
messages.append({
"role": "user",
"content": f"That response failed validation: {str(e)}\n\nFix it and respond with valid JSON only."
})
raise RuntimeError("Unreachable")
Why this works: The model has its own output in context. It can see what it did wrong and correct it. One retry resolves ~95% of remaining failures in practice.
Verification
import pytest
SCHEMA_MODEL = ProductOutput # Your Pydantic model from Step 3
@pytest.mark.parametrize("raw,expected_name", [
('{"name": "Widget", "price": "9.99", "in_stock": "true"}', "Widget"),
('```json\n{"name": "Gadget", "price": 4.5, "in_stock": false}\n```', "Gadget"),
('Here is the product: {"name": "Thing", "price": 1.0, "in_stock": true}', "Thing"),
])
def test_parse_llm_product(raw, expected_name):
result = parse_llm_product(raw)
assert result.name == expected_name
pytest test_llm_output.py -v
You should see: All three cases pass. If not, diagnose_llm_output() from Step 1 will tell you which class is failing.
All three edge cases resolved — malformed JSON, type coercion, and fenced output
What You Learned
- LLM schema failures have three distinct classes — diagnose first, then fix
- Pydantic coerces types automatically;
jsonschemaonly validates - A single retry with the error message in context resolves most hard failures
- Always strip JSON from prose/fences before parsing — models fence by default
Limitation: This approach assumes the schema is simple enough that the model can produce it with prompting. For deeply nested schemas (5+ levels), consider using native structured output APIs (e.g., OpenAI's response_format, Anthropic's tool use) instead of prompt-based enforcement.
When NOT to use this: If you need strict determinism (financial calculations, legal data), don't rely on LLM coercion. Use structured output APIs or constrained decoding.
Tested on Python 3.12, Pydantic 2.7, jsonschema 4.22, claude-sonnet-4-6