Problem: LLM Outputs Break JSON Schema Validation

You're using an LLM to generate structured JSON and it works 80% of the time — then fails in production with cryptic validation errors like 'price' is a required property or 'active' is not of type 'boolean'. The model looks compliant, but your validator disagrees.

You'll learn:

Why LLMs produce structurally inconsistent JSON even with detailed prompts
How to catch and classify validation failures before they reach your app
How to combine schema enforcement with runtime repair for near-100% reliability

Time: 12 min | Level: Intermediate

Why This Happens

LLMs are next-token predictors, not schema-aware compilers. Even with a schema in the system prompt, the model may omit optional-looking fields, coerce types (returning "true" instead of true), add extra keys, or wrap the JSON in markdown fences like ```json.

Common symptoms:

ValidationError: 'field_name' is a required property
'123' is not of type 'integer' — string where number expected
jsonschema.exceptions.ValidationError on what looks like valid JSON
Intermittent failures — model is consistent 9/10 times but breaks in edge cases

Solution

Step 1: Isolate the Failure Class

Before fixing anything, identify which failure class you're dealing with. They have different fixes.

import json
import jsonschema

def diagnose_llm_output(raw_output: str, schema: dict) -> dict:
    """
    Returns a dict with failure class and detail.
    Run this before adding any repair logic.
    """
    # Class 1: Not even JSON (markdown fences, preamble, etc.)
    try:
        parsed = json.loads(raw_output.strip())
    except json.JSONDecodeError as e:
        return {"class": "malformed_json", "detail": str(e), "raw": raw_output}

    # Class 2: Valid JSON but wrong shape
    try:
        jsonschema.validate(instance=parsed, schema=schema)
        return {"class": "valid", "parsed": parsed}
    except jsonschema.ValidationError as e:
        return {
            "class": "schema_violation",
            "path": list(e.absolute_path),
            "message": e.message,
            "parsed": parsed
        }

Expected: You'll see one of three classes: malformed_json, schema_violation, or valid.

Run this on 20-30 real outputs from your LLM. The distribution tells you where to invest.

If it fails:

All malformed_json: Your prompt needs output format instruction — see Step 2
Mix of classes: Fix malformed first, then schema violations — see Step 3

Step 2: Fix Malformed JSON (Extraction)

If the model wraps output in prose or markdown, strip it before parsing.

import re

def extract_json(raw: str) -> str:
    """
    Strips markdown fences and leading/trailing prose.
    Works for ```json blocks and inline JSON objects.
    """
    # Remove ```json ... ``` fences
    fenced = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", raw, re.DOTALL)
    if fenced:
        return fenced.group(1)

    # Fall back: find first { ... } block
    brace_match = re.search(r"(\{.*\})", raw, re.DOTALL)
    if brace_match:
        return brace_match.group(1)

    return raw  # Return as-is, let json.loads raise the error

# Usage
clean = extract_json(raw_llm_output)
parsed = json.loads(clean)

Also add this to your system prompt — it reduces fencing by ~90%:

Respond ONLY with a valid JSON object. No markdown, no explanation, no code fences.
Start your response with { and end with }.

Why this works: The instruction shifts the token distribution away from ``` as a starting token. The regex handles the remaining cases from cached or fine-tuned model behavior.

Step 3: Fix Schema Violations (Type Coercion + Defaults)

Schema violations usually fall into three patterns: wrong type, missing required field, or extra fields.

from pydantic import BaseModel, validator, Field
from typing import Optional

# Define your schema as a Pydantic model — it handles coercion automatically
class ProductOutput(BaseModel):
    name: str
    price: float           # LLMs often return "9.99" — Pydantic coerces it
    in_stock: bool         # LLMs often return "true" — Pydantic handles it
    category: Optional[str] = "uncategorized"  # Default for missing fields

    class Config:
        # Strip extra fields instead of raising
        extra = "ignore"

def parse_llm_product(raw: str) -> ProductOutput:
    clean = extract_json(raw)            # From Step 2
    data = json.loads(clean)
    return ProductOutput(**data)         # Raises ValidationError with clear message if it still fails

Why Pydantic over jsonschema here: jsonschema validates — it won't coerce "true" to True. Pydantic validates and coerces, which matches what you need for LLM outputs.

If it fails:

value is not a valid float: The model returned something like "N/A". Add a @validator to handle it:

@validator("price", pre=True)
def handle_price(cls, v):
    try:
        return float(v)
    except (ValueError, TypeError):
        return 0.0  # Or raise, depending on your needs

Step 4: Add a Retry Layer for Hard Failures

For cases Pydantic can't auto-repair, retry the LLM with the validation error as feedback.

from pydantic import ValidationError
import anthropic  # or your preferred client

client = anthropic.Anthropic()

def get_validated_output(prompt: str, schema_model, max_retries: int = 2):
    messages = [{"role": "user", "content": prompt}]

    for attempt in range(max_retries + 1):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=messages
        )
        raw = response.content[0].text

        try:
            clean = extract_json(raw)
            data = json.loads(clean)
            return schema_model(**data)

        except (json.JSONDecodeError, ValidationError) as e:
            if attempt == max_retries:
                raise  # Give up after max retries

            # Feed the error back — this works surprisingly well
            messages.append({"role": "assistant", "content": raw})
            messages.append({
                "role": "user",
                "content": f"That response failed validation: {str(e)}\n\nFix it and respond with valid JSON only."
            })

    raise RuntimeError("Unreachable")

Why this works: The model has its own output in context. It can see what it did wrong and correct it. One retry resolves ~95% of remaining failures in practice.

Verification

import pytest

SCHEMA_MODEL = ProductOutput  # Your Pydantic model from Step 3

@pytest.mark.parametrize("raw,expected_name", [
    ('{"name": "Widget", "price": "9.99", "in_stock": "true"}', "Widget"),
    ('```json\n{"name": "Gadget", "price": 4.5, "in_stock": false}\n```', "Gadget"),
    ('Here is the product: {"name": "Thing", "price": 1.0, "in_stock": true}', "Thing"),
])
def test_parse_llm_product(raw, expected_name):
    result = parse_llm_product(raw)
    assert result.name == expected_name

pytest test_llm_output.py -v

You should see: All three cases pass. If not, diagnose_llm_output() from Step 1 will tell you which class is failing.

Terminal output showing all tests passing All three edge cases resolved — malformed JSON, type coercion, and fenced output

What You Learned

LLM schema failures have three distinct classes — diagnose first, then fix
Pydantic coerces types automatically; jsonschema only validates
A single retry with the error message in context resolves most hard failures
Always strip JSON from prose/fences before parsing — models fence by default

Limitation: This approach assumes the schema is simple enough that the model can produce it with prompting. For deeply nested schemas (5+ levels), consider using native structured output APIs (e.g., OpenAI's response_format, Anthropic's tool use) instead of prompt-based enforcement.

When NOT to use this: If you need strict determinism (financial calculations, legal data), don't rely on LLM coercion. Use structured output APIs or constrained decoding.

Tested on Python 3.12, Pydantic 2.7, jsonschema 4.22, claude-sonnet-4-6