Fix JSON Schema Validation Failures in LLM Outputs in 12 Minutes

Debug and fix JSON Schema validation errors from LLM-generated outputs using Pydantic, jsonschema, and structured prompting techniques.

Problem: LLM Outputs Break JSON Schema Validation

You're using an LLM to generate structured JSON and it works 80% of the time — then fails in production with cryptic validation errors like 'price' is a required property or 'active' is not of type 'boolean'. The model looks compliant, but your validator disagrees.

You'll learn:

  • Why LLMs produce structurally inconsistent JSON even with detailed prompts
  • How to catch and classify validation failures before they reach your app
  • How to combine schema enforcement with runtime repair for near-100% reliability

Time: 12 min | Level: Intermediate


Why This Happens

LLMs are next-token predictors, not schema-aware compilers. Even with a schema in the system prompt, the model may omit optional-looking fields, coerce types (returning "true" instead of true), add extra keys, or wrap the JSON in markdown fences like ```json.

Common symptoms:

  • ValidationError: 'field_name' is a required property
  • '123' is not of type 'integer' — string where number expected
  • jsonschema.exceptions.ValidationError on what looks like valid JSON
  • Intermittent failures — model is consistent 9/10 times but breaks in edge cases

Solution

Step 1: Isolate the Failure Class

Before fixing anything, identify which failure class you're dealing with. They have different fixes.

import json
import jsonschema

def diagnose_llm_output(raw_output: str, schema: dict) -> dict:
    """
    Returns a dict with failure class and detail.
    Run this before adding any repair logic.
    """
    # Class 1: Not even JSON (markdown fences, preamble, etc.)
    try:
        parsed = json.loads(raw_output.strip())
    except json.JSONDecodeError as e:
        return {"class": "malformed_json", "detail": str(e), "raw": raw_output}

    # Class 2: Valid JSON but wrong shape
    try:
        jsonschema.validate(instance=parsed, schema=schema)
        return {"class": "valid", "parsed": parsed}
    except jsonschema.ValidationError as e:
        return {
            "class": "schema_violation",
            "path": list(e.absolute_path),
            "message": e.message,
            "parsed": parsed
        }

Expected: You'll see one of three classes: malformed_json, schema_violation, or valid.

Run this on 20-30 real outputs from your LLM. The distribution tells you where to invest.

If it fails:

  • All malformed_json: Your prompt needs output format instruction — see Step 2
  • Mix of classes: Fix malformed first, then schema violations — see Step 3

Step 2: Fix Malformed JSON (Extraction)

If the model wraps output in prose or markdown, strip it before parsing.

import re

def extract_json(raw: str) -> str:
    """
    Strips markdown fences and leading/trailing prose.
    Works for ```json blocks and inline JSON objects.
    """
    # Remove ```json ... ``` fences
    fenced = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", raw, re.DOTALL)
    if fenced:
        return fenced.group(1)

    # Fall back: find first { ... } block
    brace_match = re.search(r"(\{.*\})", raw, re.DOTALL)
    if brace_match:
        return brace_match.group(1)

    return raw  # Return as-is, let json.loads raise the error

# Usage
clean = extract_json(raw_llm_output)
parsed = json.loads(clean)

Also add this to your system prompt — it reduces fencing by ~90%:

Respond ONLY with a valid JSON object. No markdown, no explanation, no code fences.
Start your response with { and end with }.

Why this works: The instruction shifts the token distribution away from ``` as a starting token. The regex handles the remaining cases from cached or fine-tuned model behavior.


Step 3: Fix Schema Violations (Type Coercion + Defaults)

Schema violations usually fall into three patterns: wrong type, missing required field, or extra fields.

from pydantic import BaseModel, validator, Field
from typing import Optional

# Define your schema as a Pydantic model — it handles coercion automatically
class ProductOutput(BaseModel):
    name: str
    price: float           # LLMs often return "9.99" — Pydantic coerces it
    in_stock: bool         # LLMs often return "true" — Pydantic handles it
    category: Optional[str] = "uncategorized"  # Default for missing fields

    class Config:
        # Strip extra fields instead of raising
        extra = "ignore"

def parse_llm_product(raw: str) -> ProductOutput:
    clean = extract_json(raw)            # From Step 2
    data = json.loads(clean)
    return ProductOutput(**data)         # Raises ValidationError with clear message if it still fails

Why Pydantic over jsonschema here: jsonschema validates — it won't coerce "true" to True. Pydantic validates and coerces, which matches what you need for LLM outputs.

If it fails:

  • value is not a valid float: The model returned something like "N/A". Add a @validator to handle it:
@validator("price", pre=True)
def handle_price(cls, v):
    try:
        return float(v)
    except (ValueError, TypeError):
        return 0.0  # Or raise, depending on your needs

Step 4: Add a Retry Layer for Hard Failures

For cases Pydantic can't auto-repair, retry the LLM with the validation error as feedback.

from pydantic import ValidationError
import anthropic  # or your preferred client

client = anthropic.Anthropic()

def get_validated_output(prompt: str, schema_model, max_retries: int = 2):
    messages = [{"role": "user", "content": prompt}]

    for attempt in range(max_retries + 1):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=messages
        )
        raw = response.content[0].text

        try:
            clean = extract_json(raw)
            data = json.loads(clean)
            return schema_model(**data)

        except (json.JSONDecodeError, ValidationError) as e:
            if attempt == max_retries:
                raise  # Give up after max retries

            # Feed the error back — this works surprisingly well
            messages.append({"role": "assistant", "content": raw})
            messages.append({
                "role": "user",
                "content": f"That response failed validation: {str(e)}\n\nFix it and respond with valid JSON only."
            })

    raise RuntimeError("Unreachable")

Why this works: The model has its own output in context. It can see what it did wrong and correct it. One retry resolves ~95% of remaining failures in practice.


Verification

import pytest

SCHEMA_MODEL = ProductOutput  # Your Pydantic model from Step 3

@pytest.mark.parametrize("raw,expected_name", [
    ('{"name": "Widget", "price": "9.99", "in_stock": "true"}', "Widget"),
    ('```json\n{"name": "Gadget", "price": 4.5, "in_stock": false}\n```', "Gadget"),
    ('Here is the product: {"name": "Thing", "price": 1.0, "in_stock": true}', "Thing"),
])
def test_parse_llm_product(raw, expected_name):
    result = parse_llm_product(raw)
    assert result.name == expected_name
pytest test_llm_output.py -v

You should see: All three cases pass. If not, diagnose_llm_output() from Step 1 will tell you which class is failing.

Terminal output showing all tests passing All three edge cases resolved — malformed JSON, type coercion, and fenced output


What You Learned

  • LLM schema failures have three distinct classes — diagnose first, then fix
  • Pydantic coerces types automatically; jsonschema only validates
  • A single retry with the error message in context resolves most hard failures
  • Always strip JSON from prose/fences before parsing — models fence by default

Limitation: This approach assumes the schema is simple enough that the model can produce it with prompting. For deeply nested schemas (5+ levels), consider using native structured output APIs (e.g., OpenAI's response_format, Anthropic's tool use) instead of prompt-based enforcement.

When NOT to use this: If you need strict determinism (financial calculations, legal data), don't rely on LLM coercion. Use structured output APIs or constrained decoding.


Tested on Python 3.12, Pydantic 2.7, jsonschema 4.22, claude-sonnet-4-6