Agentic AI Structured Outputs: JSON Schema Enforcement and Type-Safe Patterns
Enforce structured JSON outputs from agentic AI with schema validation, Pydantic models, retry logic, and streaming structured responses.
Why Structured Outputs Matter for Agents
When an agent calls a tool, it must format the arguments as structured data. When an agent produces a final result — a booking confirmation, an order summary, an analysis report — downstream systems often need that result in a specific format. Free-form text output is useful for human consumption but useless for machine consumption.
Structured outputs transform agent responses from unpredictable prose into reliable, machine-parseable data. This is the foundation for building agentic systems that integrate with APIs, databases, and other software components. Without structured outputs, every agent response requires brittle text parsing that breaks when the model phrases things differently.
This guide covers schema definition, Pydantic integration, retry strategies, streaming structured output, nested objects, and enum constraints.
JSON Schema Definition for Agent Outputs
The first step is defining exactly what structure you expect from the agent. JSON Schema provides a standard way to describe the shape of JSON data.
Basic Schema Definition
from pydantic import BaseModel, Field
from typing import Literal
from datetime import datetime
class FlightBookingResult(BaseModel):
"""Structured result from a flight booking agent."""
booking_confirmed: bool
confirmation_code: str | None = Field(
None,
description="Airline confirmation code if booking succeeded"
)
flight_number: str = Field(
...,
pattern=r"^[A-Z]{2}\d{1,4}$",
description="Flight number in IATA format (e.g., AA1234)"
)
departure_city: str
arrival_city: str
departure_time: datetime
arrival_time: datetime
total_price: float = Field(..., ge=0)
currency: Literal["USD", "EUR", "GBP", "CAD"]
passenger_count: int = Field(..., ge=1, le=9)
error_message: str | None = None
# Generate JSON Schema for the LLM
schema = FlightBookingResult.model_json_schema()
Schema as Part of the Agent Prompt
Include the expected output schema directly in the agent's system prompt so the model knows exactly what structure to produce.
SYSTEM_PROMPT = f"""You are a flight booking agent. After completing
a booking, return your result as a JSON object matching this schema:
{json.dumps(FlightBookingResult.model_json_schema(), indent=2)}
Always return valid JSON. Do not include any text before or after
the JSON object.
"""
Pydantic Models for Agent Output Validation
Pydantic is the standard library for validating structured agent outputs in Python. It provides automatic type coercion, constraint validation, and clear error messages.
Validation with Automatic Retry
from pydantic import ValidationError
class StructuredOutputParser:
def __init__(self, model_class: type[BaseModel], max_retries: int = 3):
self.model_class = model_class
self.max_retries = max_retries
async def parse_response(
self,
llm_client,
messages: list[dict],
system_prompt: str,
) -> BaseModel:
last_error = None
for attempt in range(self.max_retries):
if attempt > 0:
# Add correction message for retries
messages.append({
"role": "user",
"content": (
f"Your previous response had a validation error: "
f"{last_error}. Please fix the JSON and try again. "
f"Return only valid JSON matching the schema."
),
})
response = await llm_client.chat(
system=system_prompt,
messages=messages,
)
try:
# Extract JSON from response
json_str = self._extract_json(response)
parsed = json.loads(json_str)
return self.model_class(**parsed)
except json.JSONDecodeError as e:
last_error = f"Invalid JSON: {e}"
messages.append({
"role": "assistant",
"content": response,
})
except ValidationError as e:
last_error = self._format_validation_error(e)
messages.append({
"role": "assistant",
"content": response,
})
raise StructuredOutputError(
f"Failed to get valid structured output after "
f"{self.max_retries} attempts. Last error: {last_error}"
)
def _extract_json(self, text: str) -> str:
"""Extract JSON from LLM response that may include surrounding text."""
# Try to find JSON block
if "~~~json" in text:
start = text.index("~~~json") + 7
end = text.index("~~~", start)
return text[start:end].strip()
# Try to find raw JSON object
brace_start = text.find("{")
brace_end = text.rfind("}") + 1
if brace_start != -1 and brace_end > brace_start:
return text[brace_start:brace_end]
return text.strip()
def _format_validation_error(self, error: ValidationError) -> str:
issues = []
for err in error.errors():
field = " -> ".join(str(p) for p in err["loc"])
issues.append(f"Field '{field}': {err['msg']}")
return "; ".join(issues)
Using Native Structured Output APIs
Modern LLM APIs increasingly support structured output natively, where the model is constrained to produce valid JSON matching a specific schema. This eliminates the need for retry loops.
OpenAI Structured Outputs
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def get_structured_output(
messages: list[dict],
response_model: type[BaseModel],
) -> BaseModel:
response = await client.beta.chat.completions.parse(
model="gpt-4o",
messages=messages,
response_format=response_model,
)
return response.choices[0].message.parsed
Anthropic Tool-Based Structured Output
Claude does not have a native structured output mode, but you can achieve the same effect using tool definitions. Define a tool whose parameters match your desired output schema and instruct the model to "use" that tool.
async def get_structured_from_claude(
messages: list[dict],
output_schema: dict,
schema_name: str = "format_response",
) -> dict:
response = await anthropic_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[
{
"name": schema_name,
"description": "Format the response as structured data",
"input_schema": output_schema,
}
],
tool_choice={"type": "tool", "name": schema_name},
messages=messages,
)
for block in response.content:
if block.type == "tool_use":
return block.input
raise ValueError("No structured output in response")
Streaming Structured Output
Streaming structured output is challenging because you receive the JSON token by token, and the output is not valid JSON until the stream completes. Two approaches handle this.
Buffer and Validate
Accumulate tokens until the stream completes, then validate the full JSON. This is simpler but provides no incremental output to the client.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Partial JSON Streaming
Parse partial JSON as it arrives and emit validated fields as they become complete. Libraries like partial-json-parser handle incomplete JSON objects.
import partial_json_parser
class StreamingStructuredParser:
def __init__(self, model_class: type[BaseModel]):
self.model_class = model_class
self.buffer = ""
self.emitted_fields: set[str] = set()
async def process_token(self, token: str) -> dict | None:
self.buffer += token
try:
partial = partial_json_parser.loads(self.buffer)
except Exception:
return None
# Check for newly complete fields
new_fields = {}
for field_name, field_info in self.model_class.model_fields.items():
if field_name in partial and field_name not in self.emitted_fields:
try:
# Validate individual field
field_info.annotation.__class__(partial[field_name])
new_fields[field_name] = partial[field_name]
self.emitted_fields.add(field_name)
except (TypeError, ValueError):
pass # Field value not yet complete
return new_fields if new_fields else None
Nested Object Handling
Real-world agent outputs often have deeply nested structures. A customer analysis might include nested order histories, each containing nested line items.
class LineItem(BaseModel):
product_id: str
product_name: str
quantity: int = Field(..., ge=1)
unit_price: float = Field(..., ge=0)
total_price: float = Field(..., ge=0)
class Order(BaseModel):
order_id: str
order_date: datetime
status: Literal["pending", "shipped", "delivered", "cancelled"]
items: list[LineItem] = Field(..., min_length=1)
subtotal: float = Field(..., ge=0)
tax: float = Field(..., ge=0)
total: float = Field(..., ge=0)
class CustomerAnalysis(BaseModel):
customer_id: str
customer_name: str
lifetime_value: float
order_count: int
recent_orders: list[Order] = Field(..., max_length=10)
risk_level: Literal["low", "medium", "high"]
recommended_actions: list[str]
analysis_summary: str
When the LLM generates nested structures, validation errors often occur deep in the hierarchy. Pydantic's error locations (the "loc" field) tell you exactly where the error is — for example, "recent_orders -> 2 -> items -> 0 -> quantity" — which helps the agent self-correct on retry.
Enum Constraints and Controlled Vocabularies
Enums prevent the agent from inventing values. Without enum constraints, an agent asked about sentiment might return "positive", "good", "favorable", or "thumbs up" — all meaning the same thing but impossible to process programmatically.
from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEUTRAL = "neutral"
NEGATIVE = "negative"
class Priority(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class TicketClassification(BaseModel):
category: Literal[
"billing", "technical", "account", "feature_request", "complaint"
]
sentiment: Sentiment
priority: Priority
requires_human: bool
confidence: float = Field(..., ge=0.0, le=1.0)
reasoning: str = Field(
...,
max_length=500,
description="Brief explanation of the classification"
)
By constraining the output to enumerated values, every downstream system knows exactly what values to expect. Dashboards, analytics pipelines, routing logic, and SLA calculations all work reliably because the agent cannot produce unexpected values.
Best Practices for Production Structured Outputs
Always validate, even with native structured output APIs. Models can still produce logically invalid data (a total that does not match the sum of line items) even when the JSON structure is valid.
Use descriptive field descriptions. The field name and description in your Pydantic model are part of the effective prompt. A field named "x" with no description will be filled arbitrarily. A field named "customer_satisfaction_score" with description "Score from 1-10 based on conversation tone" will be filled meaningfully.
Set appropriate defaults. For optional fields, use None as the default rather than empty strings or zeros. This makes it clear when the agent did not have information for a field versus when it intentionally set a value.
Log failed validations. Track how often structured output validation fails and which fields cause failures. This data guides prompt improvements and schema adjustments.
Frequently Asked Questions
What is the difference between structured outputs and function calling?
Function calling is the mechanism LLMs use to invoke tools — the model produces structured arguments for a specific function. Structured outputs are the mechanism for getting the model's final response in a specific format. They use the same underlying JSON schema technology but serve different purposes. Function calling triggers actions; structured outputs format results.
How reliable are native structured output APIs?
OpenAI's structured output mode guarantees valid JSON matching your schema in virtually all cases. Claude's tool-based approach is highly reliable but not formally guaranteed. In both cases, the JSON will be structurally valid, but the values may still be semantically incorrect (a price of -50 in a field that should be positive). Always add semantic validation on top of structural validation.
Should I use Pydantic v1 or v2 for agent output schemas?
Use Pydantic v2. It is significantly faster (5-50x for validation), has better JSON Schema generation, and is the actively maintained version. Pydantic v1 is in maintenance mode. If you are on an older codebase using v1, the migration is straightforward and well-documented.
How do you handle structured outputs with streaming responses?
Two approaches work: buffer the entire stream and validate at completion (simpler, higher latency to first field), or use partial JSON parsing to emit fields as they become complete (more complex, lower latency). For user-facing applications where responsiveness matters, partial parsing is worth the implementation complexity. For backend agent-to-agent communication, buffer-and-validate is simpler and sufficient.
What happens when the LLM cannot fill a required field?
Design your schema so that fields the LLM might not have information for are optional (with None defaults). For truly required fields, the validation retry loop gives the LLM a chance to either find the information or explain why it cannot. If retries exhaust, surface the error to the calling system with context about which field could not be populated, allowing the system to request the missing information from the user.
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.