LangChain Output Parsers: Pydantic, JSON, and Structured Output Parsing

Why Structured Output Matters

LLMs produce free-form text by default. But downstream code needs structured data — objects, lists, dictionaries, typed fields. Output parsers bridge this gap by defining an expected schema, generating format instructions for the prompt, and parsing the LLM's response into the target structure.

Without structured parsing, you end up writing fragile regex or string-splitting logic that breaks when the model changes phrasing. LangChain's parsers standardize this process and include retry mechanisms for when the model produces malformed output.

The with_structured_output Approach

Modern LangChain models support with_structured_output(), which uses the model's native structured output capability (function calling or JSON mode) rather than text parsing.

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class MovieReview(BaseModel):
    title: str = Field(description="The movie title")
    rating: float = Field(description="Rating from 0 to 10")
    summary: str = Field(description="One sentence summary")
    recommended: bool = Field(description="Whether you recommend this movie")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(MovieReview)

result = structured_llm.invoke("Review the movie Inception")
print(type(result))        # <class 'MovieReview'>
print(result.title)        # "Inception"
print(result.rating)       # 8.5
print(result.recommended)  # True

This is the recommended approach for models that support it. The Pydantic schema is converted to a function/tool schema, and the model returns structured JSON that is automatically parsed.

PydanticOutputParser

For models without native structured output, the PydanticOutputParser adds format instructions to the prompt and parses the text response.

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

class Recipe(BaseModel):
    name: str = Field(description="Name of the recipe")
    ingredients: list[str] = Field(description="List of ingredients")
    prep_time_minutes: int = Field(description="Preparation time in minutes")
    difficulty: str = Field(description="Easy, Medium, or Hard")

parser = PydanticOutputParser(pydantic_object=Recipe)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful cooking assistant."),
    ("human", "Give me a recipe for {dish}.\n\n{format_instructions}"),
])

chain = prompt.partial(
    format_instructions=parser.get_format_instructions()
) | llm | parser

recipe = chain.invoke({"dish": "pasta carbonara"})
print(recipe.name)
print(recipe.ingredients)
print(recipe.prep_time_minutes)

parser.get_format_instructions() returns a string that tells the model exactly what JSON structure to produce. The parser then validates the response against the Pydantic model.

JsonOutputParser

When you want raw dictionaries instead of Pydantic objects, use JsonOutputParser.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser()

chain = prompt | llm | parser
result = chain.invoke({"dish": "tacos"})
print(type(result))  # <class 'dict'>

You can optionally provide a Pydantic model for format instructions without strict validation:

parser = JsonOutputParser(pydantic_object=Recipe)
# Generates format instructions but returns a dict, not a Recipe object

StrOutputParser and CommaSeparatedListOutputParser

For simpler outputs, use lightweight parsers.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.output_parsers import CommaSeparatedListOutputParser

# Plain string
str_parser = StrOutputParser()
result = str_parser.invoke(ai_message)  # "Just the text content"

# Comma-separated list
list_parser = CommaSeparatedListOutputParser()
chain = prompt | llm | list_parser
result = chain.invoke({"topic": "Python frameworks"})
# ["Django", "Flask", "FastAPI", "LangChain"]

Output-Fixing and Retry Parsers

LLMs sometimes produce invalid output. Retry parsers automatically fix these failures.

from langchain.output_parsers import OutputFixingParser, RetryOutputParser
from langchain_openai import ChatOpenAI

base_parser = PydanticOutputParser(pydantic_object=Recipe)

# Option 1: Use another LLM call to fix malformed output
fixing_parser = OutputFixingParser.from_llm(
    parser=base_parser,
    llm=ChatOpenAI(model="gpt-4o-mini"),
)

# If the base parser fails, the fixing parser sends the bad output
# to the LLM with instructions to fix the formatting
result = fixing_parser.parse(bad_output_string)

OutputFixingParser receives the malformed output and asks the LLM to reformat it. RetryOutputParser goes further by resending the original prompt along with the error, giving the LLM full context to produce a corrected response.

retry_parser = RetryOutputParser.from_llm(
    parser=base_parser,
    llm=ChatOpenAI(model="gpt-4o-mini"),
    max_retries=2,
)

Enum and Datetime Parsers

LangChain includes specialized parsers for common types.

from langchain.output_parsers import EnumOutputParser
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

parser = EnumOutputParser(enum=Sentiment)
result = parser.parse("positive")
print(result)  # Sentiment.POSITIVE

Composing Parsers in LCEL

Parsers are runnables, so they integrate seamlessly into LCEL chains.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class Analysis(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(description="Confidence score 0-1")
    key_phrases: list[str] = Field(description="Important phrases")

parser = PydanticOutputParser(pydantic_object=Analysis)

chain = (
    ChatPromptTemplate.from_template(
        "Analyze this text: {text}\n{format_instructions}"
    ).partial(format_instructions=parser.get_format_instructions())
    | ChatOpenAI(model="gpt-4o-mini")
    | parser
)

analysis = chain.invoke({"text": "The product quality is outstanding!"})
print(analysis.sentiment)    # "positive"
print(analysis.confidence)   # 0.95

FAQ

Should I use with_structured_output or PydanticOutputParser?

Use with_structured_output() whenever the model supports it — it is more reliable because the model returns structured JSON natively rather than embedding JSON in free text. Fall back to PydanticOutputParser for models that lack native structured output support.

What happens when the LLM ignores format instructions?

The parser raises an OutputParserException. Wrap your parser with OutputFixingParser or RetryOutputParser to handle these failures automatically. Alternatively, with_structured_output avoids this issue entirely by constraining the output format at the API level.

Can I parse streaming output into structured objects?

Yes, if the model supports streaming structured output. Use JsonOutputParser with chain.stream() to receive partial JSON objects as they are generated. For Pydantic parsing, you typically need the full response before validation can occur.

#LangChain #OutputParsing #Pydantic #StructuredData #Python #AgenticAI #LearnAI #AIEngineering