Structured Outputs: Making LLMs Reliably Return JSON
A comprehensive guide to getting reliable structured JSON output from LLMs, covering native structured output modes, Pydantic validation, retry strategies, and production patterns for building robust data extraction pipelines.
The Structured Output Problem
LLMs generate text. Applications consume structured data. Bridging this gap reliably is one of the most common challenges in production AI systems. A model that returns valid JSON 95% of the time means 5% of your requests fail -- at scale, that is hundreds or thousands of errors per day.
In 2026, three approaches exist to solve this problem, each with different reliability guarantees.
Approach 1: Native Structured Output Modes
Both Anthropic and OpenAI now offer native structured output support that guarantees valid JSON matching a schema.
Anthropic Claude: Tool Use for Structured Output
Claude uses its tool use mechanism to return structured data. You define the expected schema as a tool, and Claude returns data matching that schema:
import anthropic
from pydantic import BaseModel
class ProductReview(BaseModel):
sentiment: str # "positive", "negative", "neutral"
score: float # 0.0 to 1.0
key_themes: list[str]
summary: str
recommended: bool
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[{
"name": "analyze_review",
"description": "Analyze a product review and return structured data",
"input_schema": ProductReview.model_json_schema()
}],
tool_choice={"type": "tool", "name": "analyze_review"},
messages=[{
"role": "user",
"content": "Analyze this review: 'The laptop is incredibly fast and the "
"battery lasts all day. Build quality is excellent though the "
"trackpad could be more responsive. Best purchase this year.'"
}]
)
# Extract the structured result
tool_use_block = next(b for b in response.content if b.type == "tool_use")
result = ProductReview(**tool_use_block.input)
print(result.sentiment) # "positive"
print(result.score) # 0.88
OpenAI: response_format with JSON Schema
OpenAI provides a response_format parameter that constrains the model output to match a JSON schema:
from openai import OpenAI
from pydantic import BaseModel
class ExtractedEntity(BaseModel):
name: str
entity_type: str
confidence: float
context: str
class ExtractionResult(BaseModel):
entities: list[ExtractedEntity]
raw_text_length: int
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Extract named entities from the text."},
{"role": "user", "content": "Apple CEO Tim Cook announced new AI features for iPhone at WWDC in San Jose."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "extraction",
"schema": ExtractionResult.model_json_schema(),
"strict": True
}
}
)
result = ExtractionResult.model_validate_json(response.choices[0].message.content)
Reliability Comparison
| Method | JSON Valid Rate | Schema Match Rate | Latency Overhead |
|---|---|---|---|
| Claude tool_choice (forced) | 100% | 99.8% | ~50ms |
| OpenAI strict JSON schema | 100% | 99.9% | ~30ms |
| Prompt-based ("return JSON") | 92-97% | 85-93% | None |
Native modes achieve near-perfect reliability because the model's token generation is constrained at the decoding level -- it physically cannot output tokens that would create invalid JSON.
Approach 2: Pydantic Validation with Retry
For cases where you need more complex validation logic than a JSON schema can express, use Pydantic models with automatic retry:
from pydantic import BaseModel, field_validator, model_validator
from typing import Optional
import json
class MeetingExtraction(BaseModel):
title: str
date: str # ISO format
time: str # HH:MM format
duration_minutes: int
attendees: list[str]
location: Optional[str] = None
is_recurring: bool
@field_validator("date")
@classmethod
def validate_date(cls, v):
from datetime import datetime
try:
datetime.strptime(v, "%Y-%m-%d")
except ValueError:
raise ValueError(f"Date must be in YYYY-MM-DD format, got: {v}")
return v
@field_validator("time")
@classmethod
def validate_time(cls, v):
parts = v.split(":")
if len(parts) != 2 or not all(p.isdigit() for p in parts):
raise ValueError(f"Time must be in HH:MM format, got: {v}")
return v
@field_validator("duration_minutes")
@classmethod
def validate_duration(cls, v):
if v < 5 or v > 480:
raise ValueError(f"Duration must be 5-480 minutes, got: {v}")
return v
@model_validator(mode="after")
def validate_attendees(self):
if len(self.attendees) == 0:
raise ValueError("Must have at least one attendee")
return self
async def extract_with_retry(
client, text: str, model_class: type[BaseModel], max_retries: int = 3
) -> BaseModel:
messages = [{
"role": "user",
"content": f"Extract meeting details from this text as JSON "
f"matching this schema:\n{model_class.model_json_schema()}\n\n"
f"Text: {text}"
}]
for attempt in range(max_retries):
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages,
)
text_content = response.content[0].text
# Try to extract JSON from the response
try:
# Handle markdown code blocks
if "```json" in text_content:
json_str = text_content.split("```json")[1].split("```")[0]
elif "```" in text_content:
json_str = text_content.split("```")[1].split("```")[0]
else:
json_str = text_content
data = json.loads(json_str.strip())
return model_class(**data)
except (json.JSONDecodeError, ValueError) as e:
# Feed the error back to the model
messages.append({"role": "assistant", "content": text_content})
messages.append({
"role": "user",
"content": f"That output had a validation error: {e}. "
f"Please fix and return valid JSON."
})
raise ValueError(f"Failed to extract valid data after {max_retries} attempts")
Approach 3: Instructor Library
The Instructor library wraps LLM clients to provide automatic Pydantic validation, retry, and streaming for structured outputs:
import instructor
from anthropic import Anthropic
from pydantic import BaseModel
# Patch the client
client = instructor.from_anthropic(Anthropic())
class ClassificationResult(BaseModel):
category: str
confidence: float
reasoning: str
suggested_tags: list[str]
# Automatic validation, retry, and type safety
result = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Classify this support ticket: 'My payment failed but "
"I was still charged. I need a refund immediately.'"
}],
response_model=ClassificationResult,
max_retries=3,
)
print(result.category) # "billing"
print(result.confidence) # 0.96
print(result.suggested_tags) # ["payment", "refund", "urgent"]
Production Patterns
Pattern 1: Schema Versioning
As your structured output schemas evolve, version them to maintain backward compatibility:
from pydantic import BaseModel
from typing import Union
class ReviewAnalysisV1(BaseModel):
sentiment: str
score: float
class ReviewAnalysisV2(BaseModel):
sentiment: str
score: float
themes: list[str]
confidence: float
# Route to the correct schema version
ReviewAnalysis = Union[ReviewAnalysisV1, ReviewAnalysisV2]
def get_schema(version: int = 2):
schemas = {1: ReviewAnalysisV1, 2: ReviewAnalysisV2}
return schemas.get(version, ReviewAnalysisV2)
Pattern 2: Streaming Structured Output
For long structured outputs, stream partial results so the UI can render incrementally:
import instructor
from anthropic import Anthropic
client = instructor.from_anthropic(Anthropic())
class Report(BaseModel):
title: str
sections: list[str]
conclusion: str
# Stream partial results
for partial in client.messages.create_partial(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": "Write an analysis report..."}],
response_model=Report,
):
# partial has whatever fields have been populated so far
if partial.title:
print(f"Title: {partial.title}")
if partial.sections:
print(f"Sections so far: {len(partial.sections)}")
Pattern 3: Fallback Chain
For critical data extraction, use a fallback chain of decreasing cost and increasing reliability:
async def extract_with_fallback(text: str, schema: type[BaseModel]):
# Try 1: Native structured output (cheapest, fastest)
try:
return await extract_native(text, schema)
except Exception:
pass
# Try 2: Prompt-based with validation retry
try:
return await extract_with_retry(text, schema, max_retries=2)
except Exception:
pass
# Try 3: Stronger model with forced tool use
try:
return await extract_with_opus(text, schema)
except Exception:
pass
# Final fallback: Return partial data with flag
return {"_extraction_failed": True, "raw_text": text}
Key Takeaways
For production structured output in 2026:
- Use native structured output modes as default -- they provide the highest reliability with minimal overhead
- Add Pydantic validation for business logic that JSON schemas cannot express
- Always implement retry with error feedback -- it recovers most transient failures
- Version your schemas to handle evolution without breaking existing consumers
- Monitor extraction success rates and set alerts when they drop below 99%
The gap between "LLM output" and "application data" is now a solved problem for teams that use the right combination of native constraints, validation, and error handling.
NYC News
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.