Python Dataclasses vs Pydantic: Choosing the Right Data Structure for Agent State

Two Approaches to Structured Data

Python offers two mainstream ways to define structured data: the built-in dataclasses module and the third-party pydantic library. Both eliminate boilerplate compared to plain classes, but they serve fundamentally different purposes. Dataclasses are data containers. Pydantic models are data validators and serializers.

For AI agent applications, the choice between them affects your codebase's safety, performance, and maintainability. This guide gives you a clear framework for deciding which to use where.

Dataclasses: Lightweight Internal State

Dataclasses generate __init__, __repr__, __eq__, and optionally __hash__ from field definitions. They perform zero validation — whatever you pass in is what you get.

from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime

@dataclass
class ConversationTurn:
    role: str
    content: str
    timestamp: datetime = field(default_factory=datetime.now)
    token_count: Optional[int] = None

@dataclass
class AgentState:
    agent_id: str
    turns: list[ConversationTurn] = field(default_factory=list)
    metadata: dict = field(default_factory=dict)
    total_tokens: int = 0

    def add_turn(self, role: str, content: str, tokens: int = 0) -> None:
        self.turns.append(ConversationTurn(role=role, content=content, token_count=tokens))
        self.total_tokens += tokens

# No validation - this silently accepts bad data
state = AgentState(agent_id=12345)  # int instead of str, no error

Pydantic: Validated External Data

Pydantic validates every field on construction. Invalid data raises clear errors instead of corrupting state silently.

from pydantic import BaseModel, Field, field_validator
from datetime import datetime

class ConversationTurn(BaseModel):
    role: str
    content: str
    timestamp: datetime = Field(default_factory=datetime.now)
    token_count: int = Field(default=0, ge=0)

    @field_validator("role")
    @classmethod
    def validate_role(cls, v: str) -> str:
        allowed = {"user", "assistant", "system", "tool"}
        if v not in allowed:
            raise ValueError(f"role must be one of {allowed}")
        return v

class AgentState(BaseModel):
    model_config = {"extra": "forbid"}

    agent_id: str = Field(min_length=1)
    turns: list[ConversationTurn] = Field(default_factory=list)
    total_tokens: int = Field(default=0, ge=0)

# This raises a ValidationError with a clear message
# AgentState(agent_id=12345)  # int coerced to "12345" in lax mode

Performance Comparison

Dataclasses are faster for construction because they skip validation. The difference matters in hot loops.

import timeit
from dataclasses import dataclass
from pydantic import BaseModel

@dataclass
class PointDC:
    x: float
    y: float
    z: float

class PointPydantic(BaseModel):
    x: float
    y: float
    z: float

# Benchmark: 1 million instantiations
dc_time = timeit.timeit(lambda: PointDC(1.0, 2.0, 3.0), number=1_000_000)
py_time = timeit.timeit(lambda: PointPydantic(x=1.0, y=2.0, z=3.0), number=1_000_000)

# Typical results:
# Dataclass: ~0.3s
# Pydantic v2: ~1.5s (5x slower, but still fast in absolute terms)

For most AI applications, the validation overhead is negligible compared to LLM API latency. Optimize for correctness first.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Serialization Differences

Pydantic has built-in JSON serialization. Dataclasses require manual handling or the dataclasses.asdict helper, which has significant limitations.

from dataclasses import asdict
import json

# Dataclass serialization - fails with non-serializable types
state_dc = AgentStateDC(agent_id="agent-1")
data = asdict(state_dc)
# json.dumps(data) fails if any field contains datetime, UUID, etc.

# Pydantic serialization - handles everything
state_py = AgentStatePydantic(agent_id="agent-1")
json_str = state_py.model_dump_json()  # always works
dict_data = state_py.model_dump()      # clean dict

Decision Framework

Use this practical guide for AI agent projects.

Use dataclasses when:

The data is created internally by your own code
No external input or LLM output touches the structure
You need maximum instantiation speed in tight loops
The structure is simple with no validation rules

Use Pydantic when:

Data comes from external sources (APIs, LLMs, user input)
You need validation, coercion, or error messages
Serialization to JSON is required
You use FastAPI (which requires Pydantic models)
Settings and configuration management

FAQ

Can I convert between dataclasses and Pydantic models?

Yes. Pydantic can validate dataclass instances with model_validate, and you can create a dataclass from a Pydantic model using model.model_dump() unpacked into the dataclass constructor. Some teams define a Pydantic model at the API boundary and convert to a dataclass for internal processing.

Should I use frozen dataclasses or Pydantic's frozen config for immutable state?

Both work. @dataclass(frozen=True) prevents attribute assignment after creation. Pydantic's model_config = {"frozen": True} does the same but also enables hashing. For agent state that should not change after initialization, frozen models prevent subtle mutation bugs in concurrent systems.

What about attrs as a third option?

attrs is a mature library that sits between dataclasses and Pydantic in features. It supports validators and converters without the full serialization machinery. However, the AI ecosystem has standardized heavily on Pydantic, so using attrs means losing compatibility with frameworks like FastAPI and LangChain that expect Pydantic models.

#Python #Dataclasses #Pydantic #DataModeling #AgenticAI #LearnAI #AIEngineering

Python Dataclasses vs Pydantic: Choosing the Right Data Structure for Agent State

Two Approaches to Structured Data

Dataclasses: Lightweight Internal State

Pydantic: Validated External Data

Performance Comparison

Serialization Differences

Decision Framework

FAQ

Can I convert between dataclasses and Pydantic models?

Should I use frozen dataclasses or Pydantic's frozen config for immutable state?

What about attrs as a third option?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding