LangChain Output Parsers: Pydantic, JSON, and Structured Output Parsing
Learn how to extract structured data from LLM responses using LangChain output parsers — Pydantic models, JSON parsing, format instructions, and retry parsers for robust extraction.
Why Structured Output Matters
LLMs produce free-form text by default. But downstream code needs structured data — objects, lists, dictionaries, typed fields. Output parsers bridge this gap by defining an expected schema, generating format instructions for the prompt, and parsing the LLM's response into the target structure.
Without structured parsing, you end up writing fragile regex or string-splitting logic that breaks when the model changes phrasing. LangChain's parsers standardize this process and include retry mechanisms for when the model produces malformed output.
The with_structured_output Approach
Modern LangChain models support with_structured_output(), which uses the model's native structured output capability (function calling or JSON mode) rather than text parsing.
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
class MovieReview(BaseModel):
title: str = Field(description="The movie title")
rating: float = Field(description="Rating from 0 to 10")
summary: str = Field(description="One sentence summary")
recommended: bool = Field(description="Whether you recommend this movie")
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(MovieReview)
result = structured_llm.invoke("Review the movie Inception")
print(type(result)) # <class 'MovieReview'>
print(result.title) # "Inception"
print(result.rating) # 8.5
print(result.recommended) # True
This is the recommended approach for models that support it. The Pydantic schema is converted to a function/tool schema, and the model returns structured JSON that is automatically parsed.
PydanticOutputParser
For models without native structured output, the PydanticOutputParser adds format instructions to the prompt and parses the text response.
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
class Recipe(BaseModel):
name: str = Field(description="Name of the recipe")
ingredients: list[str] = Field(description="List of ingredients")
prep_time_minutes: int = Field(description="Preparation time in minutes")
difficulty: str = Field(description="Easy, Medium, or Hard")
parser = PydanticOutputParser(pydantic_object=Recipe)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful cooking assistant."),
("human", "Give me a recipe for {dish}.\n\n{format_instructions}"),
])
chain = prompt.partial(
format_instructions=parser.get_format_instructions()
) | llm | parser
recipe = chain.invoke({"dish": "pasta carbonara"})
print(recipe.name)
print(recipe.ingredients)
print(recipe.prep_time_minutes)
parser.get_format_instructions() returns a string that tells the model exactly what JSON structure to produce. The parser then validates the response against the Pydantic model.
JsonOutputParser
When you want raw dictionaries instead of Pydantic objects, use JsonOutputParser.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from langchain_core.output_parsers import JsonOutputParser
parser = JsonOutputParser()
chain = prompt | llm | parser
result = chain.invoke({"dish": "tacos"})
print(type(result)) # <class 'dict'>
You can optionally provide a Pydantic model for format instructions without strict validation:
parser = JsonOutputParser(pydantic_object=Recipe)
# Generates format instructions but returns a dict, not a Recipe object
StrOutputParser and CommaSeparatedListOutputParser
For simpler outputs, use lightweight parsers.
from langchain_core.output_parsers import StrOutputParser
from langchain_core.output_parsers import CommaSeparatedListOutputParser
# Plain string
str_parser = StrOutputParser()
result = str_parser.invoke(ai_message) # "Just the text content"
# Comma-separated list
list_parser = CommaSeparatedListOutputParser()
chain = prompt | llm | list_parser
result = chain.invoke({"topic": "Python frameworks"})
# ["Django", "Flask", "FastAPI", "LangChain"]
Output-Fixing and Retry Parsers
LLMs sometimes produce invalid output. Retry parsers automatically fix these failures.
from langchain.output_parsers import OutputFixingParser, RetryOutputParser
from langchain_openai import ChatOpenAI
base_parser = PydanticOutputParser(pydantic_object=Recipe)
# Option 1: Use another LLM call to fix malformed output
fixing_parser = OutputFixingParser.from_llm(
parser=base_parser,
llm=ChatOpenAI(model="gpt-4o-mini"),
)
# If the base parser fails, the fixing parser sends the bad output
# to the LLM with instructions to fix the formatting
result = fixing_parser.parse(bad_output_string)
OutputFixingParser receives the malformed output and asks the LLM to reformat it. RetryOutputParser goes further by resending the original prompt along with the error, giving the LLM full context to produce a corrected response.
retry_parser = RetryOutputParser.from_llm(
parser=base_parser,
llm=ChatOpenAI(model="gpt-4o-mini"),
max_retries=2,
)
Enum and Datetime Parsers
LangChain includes specialized parsers for common types.
from langchain.output_parsers import EnumOutputParser
from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
parser = EnumOutputParser(enum=Sentiment)
result = parser.parse("positive")
print(result) # Sentiment.POSITIVE
Composing Parsers in LCEL
Parsers are runnables, so they integrate seamlessly into LCEL chains.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
class Analysis(BaseModel):
sentiment: str = Field(description="positive, negative, or neutral")
confidence: float = Field(description="Confidence score 0-1")
key_phrases: list[str] = Field(description="Important phrases")
parser = PydanticOutputParser(pydantic_object=Analysis)
chain = (
ChatPromptTemplate.from_template(
"Analyze this text: {text}\n{format_instructions}"
).partial(format_instructions=parser.get_format_instructions())
| ChatOpenAI(model="gpt-4o-mini")
| parser
)
analysis = chain.invoke({"text": "The product quality is outstanding!"})
print(analysis.sentiment) # "positive"
print(analysis.confidence) # 0.95
FAQ
Should I use with_structured_output or PydanticOutputParser?
Use with_structured_output() whenever the model supports it — it is more reliable because the model returns structured JSON natively rather than embedding JSON in free text. Fall back to PydanticOutputParser for models that lack native structured output support.
What happens when the LLM ignores format instructions?
The parser raises an OutputParserException. Wrap your parser with OutputFixingParser or RetryOutputParser to handle these failures automatically. Alternatively, with_structured_output avoids this issue entirely by constraining the output format at the API level.
Can I parse streaming output into structured objects?
Yes, if the model supports streaming structured output. Use JsonOutputParser with chain.stream() to receive partial JSON objects as they are generated. For Pydantic parsing, you typically need the full response before validation can occur.
#LangChain #OutputParsing #Pydantic #StructuredData #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.