Agentic AI Structured Outputs: JSON Schema Enforcement and Type-Safe Patterns

Why Structured Outputs Matter for Agents

When an agent calls a tool, it must format the arguments as structured data. When an agent produces a final result — a booking confirmation, an order summary, an analysis report — downstream systems often need that result in a specific format. Free-form text output is useful for human consumption but useless for machine consumption.

Structured outputs transform agent responses from unpredictable prose into reliable, machine-parseable data. This is the foundation for building agentic systems that integrate with APIs, databases, and other software components. Without structured outputs, every agent response requires brittle text parsing that breaks when the model phrases things differently.

This guide covers schema definition, Pydantic integration, retry strategies, streaming structured output, nested objects, and enum constraints.

JSON Schema Definition for Agent Outputs

The first step is defining exactly what structure you expect from the agent. JSON Schema provides a standard way to describe the shape of JSON data.

Basic Schema Definition

from pydantic import BaseModel, Field
from typing import Literal
from datetime import datetime

class FlightBookingResult(BaseModel):
    """Structured result from a flight booking agent."""
    booking_confirmed: bool
    confirmation_code: str | None = Field(
        None,
        description="Airline confirmation code if booking succeeded"
    )
    flight_number: str = Field(
        ...,
        pattern=r"^[A-Z]{2}\d{1,4}$",
        description="Flight number in IATA format (e.g., AA1234)"
    )
    departure_city: str
    arrival_city: str
    departure_time: datetime
    arrival_time: datetime
    total_price: float = Field(..., ge=0)
    currency: Literal["USD", "EUR", "GBP", "CAD"]
    passenger_count: int = Field(..., ge=1, le=9)
    error_message: str | None = None

# Generate JSON Schema for the LLM
schema = FlightBookingResult.model_json_schema()

Schema as Part of the Agent Prompt

Include the expected output schema directly in the agent's system prompt so the model knows exactly what structure to produce.

SYSTEM_PROMPT = f"""You are a flight booking agent. After completing
a booking, return your result as a JSON object matching this schema:

{json.dumps(FlightBookingResult.model_json_schema(), indent=2)}

Always return valid JSON. Do not include any text before or after
the JSON object.
"""

Pydantic Models for Agent Output Validation

Pydantic is the standard library for validating structured agent outputs in Python. It provides automatic type coercion, constraint validation, and clear error messages.

Validation with Automatic Retry

from pydantic import ValidationError

class StructuredOutputParser:
    def __init__(self, model_class: type[BaseModel], max_retries: int = 3):
        self.model_class = model_class
        self.max_retries = max_retries

    async def parse_response(
        self,
        llm_client,
        messages: list[dict],
        system_prompt: str,
    ) -> BaseModel:
        last_error = None

        for attempt in range(self.max_retries):
            if attempt > 0:
                # Add correction message for retries
                messages.append({
                    "role": "user",
                    "content": (
                        f"Your previous response had a validation error: "
                        f"{last_error}. Please fix the JSON and try again. "
                        f"Return only valid JSON matching the schema."
                    ),
                })

            response = await llm_client.chat(
                system=system_prompt,
                messages=messages,
            )

            try:
                # Extract JSON from response
                json_str = self._extract_json(response)
                parsed = json.loads(json_str)
                return self.model_class(**parsed)
            except json.JSONDecodeError as e:
                last_error = f"Invalid JSON: {e}"
                messages.append({
                    "role": "assistant",
                    "content": response,
                })
            except ValidationError as e:
                last_error = self._format_validation_error(e)
                messages.append({
                    "role": "assistant",
                    "content": response,
                })

        raise StructuredOutputError(
            f"Failed to get valid structured output after "
            f"{self.max_retries} attempts. Last error: {last_error}"
        )

    def _extract_json(self, text: str) -> str:
        """Extract JSON from LLM response that may include surrounding text."""
        # Try to find JSON block
        if "~~~json" in text:
            start = text.index("~~~json") + 7
            end = text.index("~~~", start)
            return text[start:end].strip()

        # Try to find raw JSON object
        brace_start = text.find("{")
        brace_end = text.rfind("}") + 1
        if brace_start != -1 and brace_end > brace_start:
            return text[brace_start:brace_end]

        return text.strip()

    def _format_validation_error(self, error: ValidationError) -> str:
        issues = []
        for err in error.errors():
            field = " -> ".join(str(p) for p in err["loc"])
            issues.append(f"Field '{field}': {err['msg']}")
        return "; ".join(issues)

Using Native Structured Output APIs

Modern LLM APIs increasingly support structured output natively, where the model is constrained to produce valid JSON matching a specific schema. This eliminates the need for retry loops.

OpenAI Structured Outputs

from openai import AsyncOpenAI

client = AsyncOpenAI()

async def get_structured_output(
    messages: list[dict],
    response_model: type[BaseModel],
) -> BaseModel:
    response = await client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=messages,
        response_format=response_model,
    )
    return response.choices[0].message.parsed

Anthropic Tool-Based Structured Output

Claude does not have a native structured output mode, but you can achieve the same effect using tool definitions. Define a tool whose parameters match your desired output schema and instruct the model to "use" that tool.

async def get_structured_from_claude(
    messages: list[dict],
    output_schema: dict,
    schema_name: str = "format_response",
) -> dict:
    response = await anthropic_client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        tools=[
            {
                "name": schema_name,
                "description": "Format the response as structured data",
                "input_schema": output_schema,
            }
        ],
        tool_choice={"type": "tool", "name": schema_name},
        messages=messages,
    )

    for block in response.content:
        if block.type == "tool_use":
            return block.input

    raise ValueError("No structured output in response")

Streaming Structured Output

Streaming structured output is challenging because you receive the JSON token by token, and the output is not valid JSON until the stream completes. Two approaches handle this.

Buffer and Validate

Accumulate tokens until the stream completes, then validate the full JSON. This is simpler but provides no incremental output to the client.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Partial JSON Streaming

Parse partial JSON as it arrives and emit validated fields as they become complete. Libraries like partial-json-parser handle incomplete JSON objects.

import partial_json_parser

class StreamingStructuredParser:
    def __init__(self, model_class: type[BaseModel]):
        self.model_class = model_class
        self.buffer = ""
        self.emitted_fields: set[str] = set()

    async def process_token(self, token: str) -> dict | None:
        self.buffer += token

        try:
            partial = partial_json_parser.loads(self.buffer)
        except Exception:
            return None

        # Check for newly complete fields
        new_fields = {}
        for field_name, field_info in self.model_class.model_fields.items():
            if field_name in partial and field_name not in self.emitted_fields:
                try:
                    # Validate individual field
                    field_info.annotation.__class__(partial[field_name])
                    new_fields[field_name] = partial[field_name]
                    self.emitted_fields.add(field_name)
                except (TypeError, ValueError):
                    pass  # Field value not yet complete

        return new_fields if new_fields else None

Nested Object Handling

Real-world agent outputs often have deeply nested structures. A customer analysis might include nested order histories, each containing nested line items.

class LineItem(BaseModel):
    product_id: str
    product_name: str
    quantity: int = Field(..., ge=1)
    unit_price: float = Field(..., ge=0)
    total_price: float = Field(..., ge=0)

class Order(BaseModel):
    order_id: str
    order_date: datetime
    status: Literal["pending", "shipped", "delivered", "cancelled"]
    items: list[LineItem] = Field(..., min_length=1)
    subtotal: float = Field(..., ge=0)
    tax: float = Field(..., ge=0)
    total: float = Field(..., ge=0)

class CustomerAnalysis(BaseModel):
    customer_id: str
    customer_name: str
    lifetime_value: float
    order_count: int
    recent_orders: list[Order] = Field(..., max_length=10)
    risk_level: Literal["low", "medium", "high"]
    recommended_actions: list[str]
    analysis_summary: str

When the LLM generates nested structures, validation errors often occur deep in the hierarchy. Pydantic's error locations (the "loc" field) tell you exactly where the error is — for example, "recent_orders -> 2 -> items -> 0 -> quantity" — which helps the agent self-correct on retry.

Enum Constraints and Controlled Vocabularies

Enums prevent the agent from inventing values. Without enum constraints, an agent asked about sentiment might return "positive", "good", "favorable", or "thumbs up" — all meaning the same thing but impossible to process programmatically.

from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEUTRAL = "neutral"
    NEGATIVE = "negative"

class Priority(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"

class TicketClassification(BaseModel):
    category: Literal[
        "billing", "technical", "account", "feature_request", "complaint"
    ]
    sentiment: Sentiment
    priority: Priority
    requires_human: bool
    confidence: float = Field(..., ge=0.0, le=1.0)
    reasoning: str = Field(
        ...,
        max_length=500,
        description="Brief explanation of the classification"
    )

By constraining the output to enumerated values, every downstream system knows exactly what values to expect. Dashboards, analytics pipelines, routing logic, and SLA calculations all work reliably because the agent cannot produce unexpected values.

Best Practices for Production Structured Outputs

Always validate, even with native structured output APIs. Models can still produce logically invalid data (a total that does not match the sum of line items) even when the JSON structure is valid.

Use descriptive field descriptions. The field name and description in your Pydantic model are part of the effective prompt. A field named "x" with no description will be filled arbitrarily. A field named "customer_satisfaction_score" with description "Score from 1-10 based on conversation tone" will be filled meaningfully.

Set appropriate defaults. For optional fields, use None as the default rather than empty strings or zeros. This makes it clear when the agent did not have information for a field versus when it intentionally set a value.

Log failed validations. Track how often structured output validation fails and which fields cause failures. This data guides prompt improvements and schema adjustments.

Frequently Asked Questions

What is the difference between structured outputs and function calling?

Function calling is the mechanism LLMs use to invoke tools — the model produces structured arguments for a specific function. Structured outputs are the mechanism for getting the model's final response in a specific format. They use the same underlying JSON schema technology but serve different purposes. Function calling triggers actions; structured outputs format results.

How reliable are native structured output APIs?

OpenAI's structured output mode guarantees valid JSON matching your schema in virtually all cases. Claude's tool-based approach is highly reliable but not formally guaranteed. In both cases, the JSON will be structurally valid, but the values may still be semantically incorrect (a price of -50 in a field that should be positive). Always add semantic validation on top of structural validation.

Should I use Pydantic v1 or v2 for agent output schemas?

Use Pydantic v2. It is significantly faster (5-50x for validation), has better JSON Schema generation, and is the actively maintained version. Pydantic v1 is in maintenance mode. If you are on an older codebase using v1, the migration is straightforward and well-documented.

How do you handle structured outputs with streaming responses?

Two approaches work: buffer the entire stream and validate at completion (simpler, higher latency to first field), or use partial JSON parsing to emit fields as they become complete (more complex, lower latency). For user-facing applications where responsiveness matters, partial parsing is worth the implementation complexity. For backend agent-to-agent communication, buffer-and-validate is simpler and sufficient.

What happens when the LLM cannot fill a required field?

Design your schema so that fields the LLM might not have information for are optional (with None defaults). For truly required fields, the validation retry loop gives the LLM a chance to either find the information or explain why it cannot. If retries exhaust, surface the error to the calling system with context about which field could not be populated, allowing the system to request the missing information from the user.