Skip to content
Learn Agentic AI11 min read0 views

Python Dataclasses vs Pydantic: Choosing the Right Data Structure for Agent State

Compare Python dataclasses and Pydantic models for AI agent state management including performance benchmarks, validation capabilities, serialization, and practical use cases.

Two Approaches to Structured Data

Python offers two mainstream ways to define structured data: the built-in dataclasses module and the third-party pydantic library. Both eliminate boilerplate compared to plain classes, but they serve fundamentally different purposes. Dataclasses are data containers. Pydantic models are data validators and serializers.

For AI agent applications, the choice between them affects your codebase's safety, performance, and maintainability. This guide gives you a clear framework for deciding which to use where.

Dataclasses: Lightweight Internal State

Dataclasses generate __init__, __repr__, __eq__, and optionally __hash__ from field definitions. They perform zero validation — whatever you pass in is what you get.

from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime

@dataclass
class ConversationTurn:
    role: str
    content: str
    timestamp: datetime = field(default_factory=datetime.now)
    token_count: Optional[int] = None

@dataclass
class AgentState:
    agent_id: str
    turns: list[ConversationTurn] = field(default_factory=list)
    metadata: dict = field(default_factory=dict)
    total_tokens: int = 0

    def add_turn(self, role: str, content: str, tokens: int = 0) -> None:
        self.turns.append(ConversationTurn(role=role, content=content, token_count=tokens))
        self.total_tokens += tokens

# No validation - this silently accepts bad data
state = AgentState(agent_id=12345)  # int instead of str, no error

Pydantic: Validated External Data

Pydantic validates every field on construction. Invalid data raises clear errors instead of corrupting state silently.

from pydantic import BaseModel, Field, field_validator
from datetime import datetime

class ConversationTurn(BaseModel):
    role: str
    content: str
    timestamp: datetime = Field(default_factory=datetime.now)
    token_count: int = Field(default=0, ge=0)

    @field_validator("role")
    @classmethod
    def validate_role(cls, v: str) -> str:
        allowed = {"user", "assistant", "system", "tool"}
        if v not in allowed:
            raise ValueError(f"role must be one of {allowed}")
        return v

class AgentState(BaseModel):
    model_config = {"extra": "forbid"}

    agent_id: str = Field(min_length=1)
    turns: list[ConversationTurn] = Field(default_factory=list)
    total_tokens: int = Field(default=0, ge=0)

# This raises a ValidationError with a clear message
# AgentState(agent_id=12345)  # int coerced to "12345" in lax mode

Performance Comparison

Dataclasses are faster for construction because they skip validation. The difference matters in hot loops.

import timeit
from dataclasses import dataclass
from pydantic import BaseModel

@dataclass
class PointDC:
    x: float
    y: float
    z: float

class PointPydantic(BaseModel):
    x: float
    y: float
    z: float

# Benchmark: 1 million instantiations
dc_time = timeit.timeit(lambda: PointDC(1.0, 2.0, 3.0), number=1_000_000)
py_time = timeit.timeit(lambda: PointPydantic(x=1.0, y=2.0, z=3.0), number=1_000_000)

# Typical results:
# Dataclass: ~0.3s
# Pydantic v2: ~1.5s (5x slower, but still fast in absolute terms)

For most AI applications, the validation overhead is negligible compared to LLM API latency. Optimize for correctness first.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Serialization Differences

Pydantic has built-in JSON serialization. Dataclasses require manual handling or the dataclasses.asdict helper, which has significant limitations.

from dataclasses import asdict
import json

# Dataclass serialization - fails with non-serializable types
state_dc = AgentStateDC(agent_id="agent-1")
data = asdict(state_dc)
# json.dumps(data) fails if any field contains datetime, UUID, etc.

# Pydantic serialization - handles everything
state_py = AgentStatePydantic(agent_id="agent-1")
json_str = state_py.model_dump_json()  # always works
dict_data = state_py.model_dump()      # clean dict

Decision Framework

Use this practical guide for AI agent projects.

Use dataclasses when:

  • The data is created internally by your own code
  • No external input or LLM output touches the structure
  • You need maximum instantiation speed in tight loops
  • The structure is simple with no validation rules

Use Pydantic when:

  • Data comes from external sources (APIs, LLMs, user input)
  • You need validation, coercion, or error messages
  • Serialization to JSON is required
  • You use FastAPI (which requires Pydantic models)
  • Settings and configuration management

FAQ

Can I convert between dataclasses and Pydantic models?

Yes. Pydantic can validate dataclass instances with model_validate, and you can create a dataclass from a Pydantic model using model.model_dump() unpacked into the dataclass constructor. Some teams define a Pydantic model at the API boundary and convert to a dataclass for internal processing.

Should I use frozen dataclasses or Pydantic's frozen config for immutable state?

Both work. @dataclass(frozen=True) prevents attribute assignment after creation. Pydantic's model_config = {"frozen": True} does the same but also enables hashing. For agent state that should not change after initialization, frozen models prevent subtle mutation bugs in concurrent systems.

What about attrs as a third option?

attrs is a mature library that sits between dataclasses and Pydantic in features. It supports validators and converters without the full serialization machinery. However, the AI ecosystem has standardized heavily on Pydantic, so using attrs means losing compatibility with frameworks like FastAPI and LangChain that expect Pydantic models.


#Python #Dataclasses #Pydantic #DataModeling #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.