Dataclasses, Pydantic, TypedDict: Which to Choose and Why
This is a question every Python team eventually confronts. The answers found online tend toward the superficial: "Pydantic for APIs, dataclasses for everything else" — which is a starting point, but fails to address the situations where the choice actually matters. Here are the decision rules applied in practice.
Understanding What Each Tool Actually Does
Before the rules, a clear-headed reminder of what each tool is for:
TypedDict is a pure type annotation. It does nothing at runtime — it informs the type checker (mypy, pyright) about the shape of a dictionary. Zero overhead, zero validation.
Dataclass is a Python class generator. It automatically creates __init__, __repr__, and __eq__ from annotations. No runtime type validation.
Pydantic BaseModel is a complete validation system. It converts and validates data at runtime, raises detailed errors, serialises and deserialises JSON, and generates JSON Schema.
These are not three ways to accomplish the same thing — they are three tools with distinct responsibilities.
Rule 1: TypedDict for Dictionaries You Do Not Control
from typing import TypedDict
# Data returned by an external API — you read it, you don't construct it
class GrxCertificate(TypedDict):
id: str
volume: float
period_from: str
period_to: str
status: str
# Usage
def process_certificate(cert: GrxCertificate) -> float:
return cert["volume"] * 1.05 # Type checker validates the field access
TypedDict is ideal for typing dictionaries that come from outside the system — JSON API responses, SQL query results, YAML configs — without converting them into objects. The runtime overhead is zero; it is purely a static construct.
The limitation: TypedDict validates nothing at runtime. If the API returns volume as a string, the code will fail further downstream rather than at deserialisation.
Rule 2: Dataclasses for Internal Models Without Validation
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class CertificateAggregate:
account_id: str
total_volume: float
certificate_count: int
computed_at: datetime = field(default_factory=datetime.now)
def average_volume(self) -> float:
if self.certificate_count == 0:
return 0.0
return self.total_volume / self.certificate_count
Dataclasses are the right choice for objects you construct yourself within business logic — aggregation results, intermediate processing objects, domain value objects. They are lighter than Pydantic and more explicit than raw dictionaries.
@dataclass(frozen=True) makes them immutable — useful for value objects:
@dataclass(frozen=True)
class DateRange:
start: str
end: str
def __post_init__(self):
if self.start > self.end:
raise ValueError(f"start ({self.start}) must be before end ({self.end})")
__post_init__ allows adding simple invariant validation without Pydantic — sufficient for most domain-level constraints.
Rule 3: Pydantic for Everything That Touches System Boundaries
from pydantic import BaseModel, Field, field_validator
from typing import Literal
class CertificateRequest(BaseModel):
account_id: str = Field(min_length=3, max_length=50)
volume: float = Field(gt=0, description="Volume in MWh")
technology: Literal["WIND", "SOLAR", "HYDRO", "BIOMASS"]
period_from: str
period_to: str
@field_validator("period_to")
@classmethod
def period_to_after_from(cls, v: str, info) -> str:
if "period_from" in info.data and v <= info.data["period_from"]:
raise ValueError("period_to must be after period_from")
return v
class CertificateResponse(BaseModel):
id: str
volume: float
status: Literal["ACTIVE", "CANCELLED", "TRANSFERRED"]
model_config = {"from_attributes": True} # ORM compatibility
Pydantic wins at every system boundary: HTTP inputs (request bodies, query parameters), JSON responses, configuration files, environment variables.
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
database_url: str
redis_url: str
secret_key: str
debug: bool = False
model_config = {"env_file": ".env"}
settings = Settings() # Raises a clear error if DATABASE_URL is missing
pydantic-settings is particularly valuable — it reads environment variables, casts them to the correct types, and raises explicit errors at startup if a required variable is absent.
Performance: When It Actually Matters
Pydantic v2 (rewritten in Rust) is substantially faster than v1, but still slower than dataclasses for object construction:
| Tool | Construction (relative) | Validation | JSON Serialisation |
|---|---|---|---|
| TypedDict | 1x | None | Manual |
| Dataclass | 1.2x | __post_init__ | dataclasses.asdict() |
| Pydantic v2 | 3–5x | Complete | .model_dump_json() |
On a FastAPI endpoint handling 1,000 requests per second with simple models, the difference between Pydantic and dataclasses is negligible. It becomes visible in data processing pipelines that instantiate millions of objects — ETL jobs, large file processing.
The practical rule: do not optimise prematurely. Pydantic at boundaries, dataclasses internally — this separation delivers good performance by default without micro-optimisation.
Combining All Three
In a real project, all three coexist naturally:
from typing import TypedDict
from dataclasses import dataclass
from pydantic import BaseModel
# TypedDict: raw response from the external API
class RawApiResponse(TypedDict):
data: list[dict]
meta: dict
# Pydantic: validates and parses the response at the boundary
class Certificate(BaseModel):
id: str
volume: float
status: str
# Dataclass: internal business object after processing
@dataclass
class CertificateReport:
total_volume: float
active_count: int
cancelled_count: int
def process_response(raw: RawApiResponse) -> CertificateReport:
certificates = [Certificate.model_validate(item) for item in raw["data"]]
active = [c for c in certificates if c.status == "ACTIVE"]
cancelled = [c for c in certificates if c.status == "CANCELLED"]
return CertificateReport(
total_volume=sum(c.volume for c in certificates),
active_count=len(active),
cancelled_count=len(cancelled),
)
Each tool in its place: TypedDict for raw external data, Pydantic for boundary validation, dataclass for internal logic. This separation is what keeps the code readable and maintainable over the long term.