Skip to content
Back to Blog
Agentic AI6 min read

Structured Outputs: Making LLMs Reliably Return JSON

A comprehensive guide to getting reliable structured JSON output from LLMs, covering native structured output modes, Pydantic validation, retry strategies, and production patterns for building robust data extraction pipelines.

The Structured Output Problem

LLMs generate text. Applications consume structured data. Bridging this gap reliably is one of the most common challenges in production AI systems. A model that returns valid JSON 95% of the time means 5% of your requests fail -- at scale, that is hundreds or thousands of errors per day.

In 2026, three approaches exist to solve this problem, each with different reliability guarantees.

Approach 1: Native Structured Output Modes

Both Anthropic and OpenAI now offer native structured output support that guarantees valid JSON matching a schema.

Anthropic Claude: Tool Use for Structured Output

Claude uses its tool use mechanism to return structured data. You define the expected schema as a tool, and Claude returns data matching that schema:

import anthropic
from pydantic import BaseModel

class ProductReview(BaseModel):
    sentiment: str  # "positive", "negative", "neutral"
    score: float    # 0.0 to 1.0
    key_themes: list[str]
    summary: str
    recommended: bool

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{
        "name": "analyze_review",
        "description": "Analyze a product review and return structured data",
        "input_schema": ProductReview.model_json_schema()
    }],
    tool_choice={"type": "tool", "name": "analyze_review"},
    messages=[{
        "role": "user",
        "content": "Analyze this review: 'The laptop is incredibly fast and the "
                   "battery lasts all day. Build quality is excellent though the "
                   "trackpad could be more responsive. Best purchase this year.'"
    }]
)

# Extract the structured result
tool_use_block = next(b for b in response.content if b.type == "tool_use")
result = ProductReview(**tool_use_block.input)
print(result.sentiment)  # "positive"
print(result.score)      # 0.88

OpenAI: response_format with JSON Schema

OpenAI provides a response_format parameter that constrains the model output to match a JSON schema:

from openai import OpenAI
from pydantic import BaseModel

class ExtractedEntity(BaseModel):
    name: str
    entity_type: str
    confidence: float
    context: str

class ExtractionResult(BaseModel):
    entities: list[ExtractedEntity]
    raw_text_length: int

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract named entities from the text."},
        {"role": "user", "content": "Apple CEO Tim Cook announced new AI features for iPhone at WWDC in San Jose."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "extraction",
            "schema": ExtractionResult.model_json_schema(),
            "strict": True
        }
    }
)

result = ExtractionResult.model_validate_json(response.choices[0].message.content)

Reliability Comparison

Method JSON Valid Rate Schema Match Rate Latency Overhead
Claude tool_choice (forced) 100% 99.8% ~50ms
OpenAI strict JSON schema 100% 99.9% ~30ms
Prompt-based ("return JSON") 92-97% 85-93% None

Native modes achieve near-perfect reliability because the model's token generation is constrained at the decoding level -- it physically cannot output tokens that would create invalid JSON.

Approach 2: Pydantic Validation with Retry

For cases where you need more complex validation logic than a JSON schema can express, use Pydantic models with automatic retry:

from pydantic import BaseModel, field_validator, model_validator
from typing import Optional
import json

class MeetingExtraction(BaseModel):
    title: str
    date: str  # ISO format
    time: str  # HH:MM format
    duration_minutes: int
    attendees: list[str]
    location: Optional[str] = None
    is_recurring: bool

    @field_validator("date")
    @classmethod
    def validate_date(cls, v):
        from datetime import datetime
        try:
            datetime.strptime(v, "%Y-%m-%d")
        except ValueError:
            raise ValueError(f"Date must be in YYYY-MM-DD format, got: {v}")
        return v

    @field_validator("time")
    @classmethod
    def validate_time(cls, v):
        parts = v.split(":")
        if len(parts) != 2 or not all(p.isdigit() for p in parts):
            raise ValueError(f"Time must be in HH:MM format, got: {v}")
        return v

    @field_validator("duration_minutes")
    @classmethod
    def validate_duration(cls, v):
        if v < 5 or v > 480:
            raise ValueError(f"Duration must be 5-480 minutes, got: {v}")
        return v

    @model_validator(mode="after")
    def validate_attendees(self):
        if len(self.attendees) == 0:
            raise ValueError("Must have at least one attendee")
        return self


async def extract_with_retry(
    client, text: str, model_class: type[BaseModel], max_retries: int = 3
) -> BaseModel:
    messages = [{
        "role": "user",
        "content": f"Extract meeting details from this text as JSON "
                   f"matching this schema:\n{model_class.model_json_schema()}\n\n"
                   f"Text: {text}"
    }]

    for attempt in range(max_retries):
        response = await client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=messages,
        )

        text_content = response.content[0].text

        # Try to extract JSON from the response
        try:
            # Handle markdown code blocks
            if "```json" in text_content:
                json_str = text_content.split("```json")[1].split("```")[0]
            elif "```" in text_content:
                json_str = text_content.split("```")[1].split("```")[0]
            else:
                json_str = text_content

            data = json.loads(json_str.strip())
            return model_class(**data)

        except (json.JSONDecodeError, ValueError) as e:
            # Feed the error back to the model
            messages.append({"role": "assistant", "content": text_content})
            messages.append({
                "role": "user",
                "content": f"That output had a validation error: {e}. "
                           f"Please fix and return valid JSON."
            })

    raise ValueError(f"Failed to extract valid data after {max_retries} attempts")

Approach 3: Instructor Library

The Instructor library wraps LLM clients to provide automatic Pydantic validation, retry, and streaming for structured outputs:

import instructor
from anthropic import Anthropic
from pydantic import BaseModel

# Patch the client
client = instructor.from_anthropic(Anthropic())

class ClassificationResult(BaseModel):
    category: str
    confidence: float
    reasoning: str
    suggested_tags: list[str]

# Automatic validation, retry, and type safety
result = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Classify this support ticket: 'My payment failed but "
                   "I was still charged. I need a refund immediately.'"
    }],
    response_model=ClassificationResult,
    max_retries=3,
)

print(result.category)     # "billing"
print(result.confidence)   # 0.96
print(result.suggested_tags)  # ["payment", "refund", "urgent"]

Production Patterns

Pattern 1: Schema Versioning

As your structured output schemas evolve, version them to maintain backward compatibility:

from pydantic import BaseModel
from typing import Union

class ReviewAnalysisV1(BaseModel):
    sentiment: str
    score: float

class ReviewAnalysisV2(BaseModel):
    sentiment: str
    score: float
    themes: list[str]
    confidence: float

# Route to the correct schema version
ReviewAnalysis = Union[ReviewAnalysisV1, ReviewAnalysisV2]

def get_schema(version: int = 2):
    schemas = {1: ReviewAnalysisV1, 2: ReviewAnalysisV2}
    return schemas.get(version, ReviewAnalysisV2)

Pattern 2: Streaming Structured Output

For long structured outputs, stream partial results so the UI can render incrementally:

import instructor
from anthropic import Anthropic

client = instructor.from_anthropic(Anthropic())

class Report(BaseModel):
    title: str
    sections: list[str]
    conclusion: str

# Stream partial results
for partial in client.messages.create_partial(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Write an analysis report..."}],
    response_model=Report,
):
    # partial has whatever fields have been populated so far
    if partial.title:
        print(f"Title: {partial.title}")
    if partial.sections:
        print(f"Sections so far: {len(partial.sections)}")

Pattern 3: Fallback Chain

For critical data extraction, use a fallback chain of decreasing cost and increasing reliability:

async def extract_with_fallback(text: str, schema: type[BaseModel]):
    # Try 1: Native structured output (cheapest, fastest)
    try:
        return await extract_native(text, schema)
    except Exception:
        pass

    # Try 2: Prompt-based with validation retry
    try:
        return await extract_with_retry(text, schema, max_retries=2)
    except Exception:
        pass

    # Try 3: Stronger model with forced tool use
    try:
        return await extract_with_opus(text, schema)
    except Exception:
        pass

    # Final fallback: Return partial data with flag
    return {"_extraction_failed": True, "raw_text": text}

Key Takeaways

For production structured output in 2026:

  1. Use native structured output modes as default -- they provide the highest reliability with minimal overhead
  2. Add Pydantic validation for business logic that JSON schemas cannot express
  3. Always implement retry with error feedback -- it recovers most transient failures
  4. Version your schemas to handle evolution without breaking existing consumers
  5. Monitor extraction success rates and set alerts when they drop below 99%

The gap between "LLM output" and "application data" is now a solved problem for teams that use the right combination of native constraints, validation, and error handling.

Share this article
N

NYC News

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.