Pydantic Models for LLM Output: Type-Safe AI Responses in Python
Learn how to use Pydantic BaseModel, Field validators, and nested models to parse and validate LLM responses into type-safe Python objects. Build reliable AI pipelines that never break on malformed output.
Why Type Safety Matters for LLM Outputs
Large language models return strings. Sometimes that string is valid JSON, sometimes it is almost-valid JSON with trailing commas, and sometimes the model ignores your formatting instructions entirely. If your application blindly calls json.loads() on raw LLM output, you are one creative hallucination away from a runtime crash.
Pydantic solves this by letting you define a Python class that describes exactly what your data should look like. When you parse LLM output through a Pydantic model, you get automatic type coercion, validation, and clear error messages when the data does not match your expectations.
Defining a Basic Output Model
Start with a simple model that describes a structured answer from an LLM:
from pydantic import BaseModel, Field
from typing import List, Optional
class AnalysisResult(BaseModel):
sentiment: str = Field(description="positive, negative, or neutral")
confidence: float = Field(ge=0.0, le=1.0, description="Confidence score")
key_phrases: List[str] = Field(description="Important phrases from the text")
summary: Optional[str] = Field(default=None, description="Brief summary")
The Field function adds constraints and descriptions. The ge and le parameters enforce that confidence stays between 0 and 1. The description strings serve double duty: they document your code and they can be fed back to the LLM as schema instructions.
Parsing Raw LLM Responses
Here is how you parse a JSON string from an LLM into your model:
import json
raw_response = '''
{
"sentiment": "positive",
"confidence": 0.92,
"key_phrases": ["excellent product", "fast shipping"],
"summary": "Customer is satisfied with purchase."
}
'''
result = AnalysisResult.model_validate_json(raw_response)
print(result.sentiment) # "positive"
print(result.confidence) # 0.92
print(result.key_phrases) # ["excellent product", "fast shipping"]
If the LLM returns a confidence of 1.5, Pydantic raises a ValidationError with a clear message explaining the constraint violation. No silent failures.
Nested Models for Complex Structures
Real-world extraction often requires nested data. Define models that compose together:
class Address(BaseModel):
street: str
city: str
state: str
zip_code: str = Field(pattern=r"^\d{5}(-\d{4})?$")
class Person(BaseModel):
name: str
age: Optional[int] = Field(default=None, ge=0, le=150)
email: Optional[str] = None
address: Optional[Address] = None
class ExtractionResult(BaseModel):
people: List[Person]
document_type: str
extraction_confidence: float = Field(ge=0.0, le=1.0)
When you call ExtractionResult.model_validate_json(llm_output), Pydantic recursively validates every nested object. The zip code regex runs automatically. Ages outside 0-150 are rejected.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Custom Validators for Domain Logic
Add custom validators when built-in constraints are not enough:
from pydantic import field_validator, model_validator
class InvoiceItem(BaseModel):
description: str
quantity: int = Field(gt=0)
unit_price: float = Field(gt=0)
total: float
@field_validator("description")
@classmethod
def description_not_empty(cls, v: str) -> str:
if not v.strip():
raise ValueError("Description cannot be blank")
return v.strip()
@model_validator(mode="after")
def check_total(self) -> "InvoiceItem":
expected = round(self.quantity * self.unit_price, 2)
if abs(self.total - expected) > 0.01:
raise ValueError(
f"Total {self.total} does not match "
f"quantity * unit_price = {expected}"
)
return self
The field_validator runs on a single field. The model_validator with mode="after" runs after all fields are parsed, so you can do cross-field checks like verifying that the total equals quantity times price.
Generating JSON Schema for the LLM Prompt
One of Pydantic's most powerful features is automatic JSON schema generation. Pass the schema directly to the LLM so it knows exactly what to produce:
schema = AnalysisResult.model_json_schema()
print(json.dumps(schema, indent=2))
prompt = f"""Analyze the following customer review and return your
analysis as JSON matching this exact schema:
{json.dumps(schema, indent=2)}
Review: "The product arrived quickly and works perfectly."
"""
This creates a tight feedback loop: the model sees the schema, generates matching JSON, and Pydantic validates the result. If validation fails, you can retry with the error message included in the prompt.
Handling Partial and Malformed Output
LLMs sometimes return JSON wrapped in markdown code fences or with extra text. Write a helper to clean up common issues:
import re
def parse_llm_json(raw: str, model_class: type[BaseModel]):
"""Extract JSON from LLM output and parse with Pydantic."""
# Strip markdown code fences
cleaned = re.sub(r"```json?\n?", "", raw)
cleaned = re.sub(r"```", "", cleaned)
cleaned = cleaned.strip()
try:
return model_class.model_validate_json(cleaned)
except Exception as e:
# Try parsing as Python dict (handles trailing commas, etc.)
try:
import ast
data = ast.literal_eval(cleaned)
return model_class.model_validate(data)
except Exception:
raise ValueError(f"Could not parse LLM output: {e}")
This two-stage approach handles the most common failure modes: markdown wrapping and minor JSON syntax issues.
FAQ
How does Pydantic v2 differ from v1 for LLM output parsing?
Pydantic v2 introduced model_validate_json() which parses JSON strings directly without an intermediate json.loads() call. It is also significantly faster thanks to the Rust-based core. Use model_validate() for dictionaries and model_validate_json() for raw JSON strings.
What happens when the LLM returns fields not in my schema?
By default, Pydantic v2 ignores extra fields. If you want strict parsing, add model_config = ConfigDict(extra="forbid") to your model class. This causes validation to fail if the LLM includes unexpected fields.
Can I use Pydantic models with streaming LLM responses?
Not directly, because streaming delivers partial JSON that is not valid until complete. You need a partial JSON parser to handle incremental tokens. Libraries like instructor handle this by buffering the stream and validating once the JSON object is complete.
#Pydantic #StructuredOutputs #Python #TypeSafety #LLM #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.