20 KiB
Dataclasses
Basic
Decorating a class with @dataclass
is a convenient way to add special methods such as __init__
,
__repr__
, and __eq__
to a class. The following example shows the basic usage of the @dataclass
decorator. By default, only the three mentioned methods are generated.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
alice1 = Person("Alice", 30)
alice2 = Person(name="Alice", age=30)
alice3 = Person(age=30, name="Alice")
alice4 = Person("Alice", age=30)
reveal_type(alice1) # revealed: Person
reveal_type(type(alice1)) # revealed: type[Person]
reveal_type(alice1.name) # revealed: str
reveal_type(alice1.age) # revealed: int | None
reveal_type(repr(alice1)) # revealed: str
reveal_type(alice1 == alice2) # revealed: bool
reveal_type(alice1 == "Alice") # revealed: bool
bob = Person("Bob")
bob2 = Person("Bob", None)
bob3 = Person(name="Bob")
bob4 = Person(name="Bob", age=None)
The signature of the __init__
method is generated based on the classes attributes. The following
calls are not valid:
# error: [missing-argument]
Person()
# error: [too-many-positional-arguments]
Person("Eve", 20, "too many arguments")
# error: [invalid-argument-type]
Person("Eve", "string instead of int")
# error: [invalid-argument-type]
# error: [invalid-argument-type]
Person(20, "Eve")
Signature of __init__
Declarations in the class body are used to generate the signature of the __init__
method. If the
attributes are not just declarations, but also bindings, the type inferred from bindings is used as
the default value.
from dataclasses import dataclass
@dataclass
class D:
x: int
y: str = "default"
z: int | None = 1 + 2
reveal_type(D.__init__) # revealed: (self: D, x: int, y: str = Literal["default"], z: int | None = Literal[3]) -> None
This also works if the declaration and binding are split:
@dataclass
class D:
x: int | None
x = None
reveal_type(D.__init__) # revealed: (self: D, x: int | None = None) -> None
Non-fully static types are handled correctly:
from typing import Any
@dataclass
class C:
x: Any
y: int | Any
z: tuple[int, Any]
reveal_type(C.__init__) # revealed: (self: C, x: Any, y: int | Any, z: tuple[int, Any]) -> None
Variables without annotations are ignored:
@dataclass
class D:
x: int
y = 1
reveal_type(D.__init__) # revealed: (self: D, x: int) -> None
If attributes without default values are declared after attributes with default values, a
TypeError
will be raised at runtime. Ideally, we would emit a diagnostic in that case:
@dataclass
class D:
x: int = 1
# TODO: this should be an error: field without default defined after field with default
y: str
Pure class attributes (ClassVar
) are not included in the signature of __init__
:
from typing import ClassVar
@dataclass
class D:
x: int
y: ClassVar[str] = "default"
z: bool
reveal_type(D.__init__) # revealed: (self: D, x: int, z: bool) -> None
d = D(1, True)
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
Function declarations do not affect the signature of __init__
:
@dataclass
class D:
x: int
def y(self) -> str:
return ""
reveal_type(D.__init__) # revealed: (self: D, x: int) -> None
And neither do nested class declarations:
@dataclass
class D:
x: int
class Nested:
y: str
reveal_type(D.__init__) # revealed: (self: D, x: int) -> None
But if there is a variable annotation with a function or class literal type, the signature of
__init__
will include this field:
from ty_extensions import TypeOf
class SomeClass: ...
def some_function() -> None: ...
@dataclass
class D:
function_literal: TypeOf[some_function]
class_literal: TypeOf[SomeClass]
class_subtype_of: type[SomeClass]
# revealed: (self: D, function_literal: def some_function() -> None, class_literal: <class 'SomeClass'>, class_subtype_of: type[SomeClass]) -> None
reveal_type(D.__init__)
More realistically, dataclasses can have Callable
attributes:
from typing import Callable
@dataclass
class D:
c: Callable[[int], str]
reveal_type(D.__init__) # revealed: (self: D, c: (int, /) -> str) -> None
Implicit instance attributes do not affect the signature of __init__
:
@dataclass
class D:
x: int
def f(self, y: str) -> None:
self.y: str = y
reveal_type(D(1).y) # revealed: str
reveal_type(D.__init__) # revealed: (self: D, x: int) -> None
Annotating expressions does not lead to an entry in __annotations__
at runtime, and so it wouldn't
be included in the signature of __init__
. This is a case that we currently don't detect:
@dataclass
class D:
# (x) is an expression, not a "simple name"
(x): int = 1
# TODO: should ideally not include a `x` parameter
reveal_type(D.__init__) # revealed: (self: D, x: int = Literal[1]) -> None
@dataclass
calls with arguments
The @dataclass
decorator can take several arguments to customize the existence of the generated
methods. The following test makes sure that we still treat the class as a dataclass if (the default)
arguments are passed in:
from dataclasses import dataclass
@dataclass(init=True, repr=True, eq=True)
class Person:
name: str
age: int | None = None
alice = Person("Alice", 30)
reveal_type(repr(alice)) # revealed: str
reveal_type(alice == alice) # revealed: bool
If init
is set to False
, no __init__
method is generated:
from dataclasses import dataclass
@dataclass(init=False)
class C:
x: int
C() # Okay
# error: [too-many-positional-arguments]
C(1)
repr(C())
C() == C()
Other dataclass parameters
repr
A custom __repr__
method is generated by default. It can be disabled by passing repr=False
, but
in that case __repr__
is still available via object.__repr__
:
from dataclasses import dataclass
@dataclass(repr=False)
class WithoutRepr:
x: int
reveal_type(WithoutRepr(1).__repr__) # revealed: bound method WithoutRepr.__repr__() -> str
eq
The same is true for __eq__
. Setting eq=False
disables the generated __eq__
method, but
__eq__
is still available via object.__eq__
:
from dataclasses import dataclass
@dataclass(eq=False)
class WithoutEq:
x: int
reveal_type(WithoutEq(1) == WithoutEq(2)) # revealed: bool
order
[environment]
python-version = "3.12"
order
is set to False
by default. If order=True
, __lt__
, __le__
, __gt__
, and __ge__
methods will be generated:
from dataclasses import dataclass
@dataclass
class WithoutOrder:
x: int
WithoutOrder(1) < WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) <= WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) > WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) >= WithoutOrder(2) # error: [unsupported-operator]
@dataclass(order=True)
class WithOrder:
x: int
WithOrder(1) < WithOrder(2)
WithOrder(1) <= WithOrder(2)
WithOrder(1) > WithOrder(2)
WithOrder(1) >= WithOrder(2)
Comparisons are only allowed for WithOrder
instances:
WithOrder(1) < 2 # error: [unsupported-operator]
WithOrder(1) <= 2 # error: [unsupported-operator]
WithOrder(1) > 2 # error: [unsupported-operator]
WithOrder(1) >= 2 # error: [unsupported-operator]
This also works for generic dataclasses:
from dataclasses import dataclass
@dataclass(order=True)
class GenericWithOrder[T]:
x: T
GenericWithOrder[int](1) < GenericWithOrder[int](1)
GenericWithOrder[int](1) < GenericWithOrder[str]("a") # error: [unsupported-operator]
If a class already defines one of the comparison methods, a TypeError
is raised at runtime.
Ideally, we would emit a diagnostic in that case:
@dataclass(order=True)
class AlreadyHasCustomDunderLt:
x: int
# TODO: Ideally, we would emit a diagnostic here
def __lt__(self, other: object) -> bool:
return False
unsafe_hash
To do
frozen
If true (the default is False), assigning to fields will generate a diagnostic.
from dataclasses import dataclass
@dataclass(frozen=True)
class MyFrozenClass:
x: int
frozen_instance = MyFrozenClass(1)
frozen_instance.x = 2 # error: [invalid-assignment]
If __setattr__()
or __delattr__()
is defined in the class, we should emit a diagnostic.
from dataclasses import dataclass
@dataclass(frozen=True)
class MyFrozenClass:
x: int
# TODO: Emit a diagnostic here
def __setattr__(self, name: str, value: object) -> None: ...
# TODO: Emit a diagnostic here
def __delattr__(self, name: str) -> None: ...
This also works for generic dataclasses:
[environment]
python-version = "3.12"
from dataclasses import dataclass
@dataclass(frozen=True)
class MyFrozenGeneric[T]:
x: T
frozen_instance = MyFrozenGeneric[int](1)
frozen_instance.x = 2 # error: [invalid-assignment]
When attempting to mutate an unresolved attribute on a frozen dataclass, only unresolved-attribute
is emitted:
from dataclasses import dataclass
@dataclass(frozen=True)
class MyFrozenClass: ...
frozen = MyFrozenClass()
frozen.x = 2 # error: [unresolved-attribute]
match_args
To do
kw_only
To do
slots
To do
weakref_slot
To do
Inheritance
Normal class inheriting from a dataclass
from dataclasses import dataclass
@dataclass
class Base:
x: int
class Derived(Base): ...
d = Derived(1) # OK
reveal_type(d.x) # revealed: int
Dataclass inheriting from normal class
from dataclasses import dataclass
class Base:
x: int = 1
@dataclass
class Derived(Base):
y: str
d = Derived("a")
# error: [too-many-positional-arguments]
# error: [invalid-argument-type]
Derived(1, "a")
Dataclass inheriting from another dataclass
from dataclasses import dataclass
@dataclass
class Base:
x: int
y: str
@dataclass
class Derived(Base):
z: bool
d = Derived(1, "a", True) # OK
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
# error: [missing-argument]
Derived(1, "a")
# error: [missing-argument]
Derived(True)
Overwriting attributes from base class
The following example comes from the
Python documentation. The x
attribute appears just once in the __init__
signature, and the default value is taken from the
derived class
from dataclasses import dataclass
from typing import Any
@dataclass
class Base:
x: Any = 15.0
y: int = 0
@dataclass
class C(Base):
z: int = 10
x: int = 15
reveal_type(C.__init__) # revealed: (self: C, x: int = Literal[15], y: int = Literal[0], z: int = Literal[10]) -> None
Generic dataclasses
[environment]
python-version = "3.12"
from dataclasses import dataclass
@dataclass
class DataWithDescription[T]:
data: T
description: str
reveal_type(DataWithDescription[int]) # revealed: <class 'DataWithDescription[int]'>
d_int = DataWithDescription[int](1, "description") # OK
reveal_type(d_int.data) # revealed: int
reveal_type(d_int.description) # revealed: str
# error: [invalid-argument-type]
DataWithDescription[int](None, "description")
Descriptor-typed fields
Same type in __get__
and __set__
For the following descriptor, the return type of __get__
and the type of the value
parameter in
__set__
are the same. The generated __init__
method takes an argument of this type (instead of
the type of the descriptor), and the default value is also of this type:
from typing import overload
from dataclasses import dataclass
class UppercaseString:
_value: str = ""
def __get__(self, instance: object, owner: None | type) -> str:
return self._value
def __set__(self, instance: object, value: str) -> None:
self._value = value.upper()
@dataclass
class C:
upper: UppercaseString = UppercaseString()
reveal_type(C.__init__) # revealed: (self: C, upper: str = str) -> None
c = C("abc")
reveal_type(c.upper) # revealed: str
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
Different types in __get__
and __set__
In general, the type of the __init__
parameter is determined by the value
parameter type of the
__set__
method (str
in the example below). However, the default value is generated by calling
the descriptor's __get__
method as if it had been called on the class itself, i.e. passing None
for the instance
argument.
from typing import Literal, overload
from dataclasses import dataclass
class ConvertToLength:
_len: int = 0
@overload
def __get__(self, instance: None, owner: type) -> Literal[""]: ...
@overload
def __get__(self, instance: object, owner: type | None) -> int: ...
def __get__(self, instance: object | None, owner: type | None) -> str | int:
if instance is None:
return ""
return self._len
def __set__(self, instance, value: str) -> None:
self._len = len(value)
@dataclass
class C:
converter: ConvertToLength = ConvertToLength()
reveal_type(C.__init__) # revealed: (self: C, converter: str = Literal[""]) -> None
c = C("abc")
reveal_type(c.converter) # revealed: int
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
With overloaded __set__
method
If the __set__
method is overloaded, we determine the type for the __init__
parameter as the
union of all possible value
parameter types:
from typing import overload
from dataclasses import dataclass
class AcceptsStrAndInt:
def __get__(self, instance, owner) -> int:
return 0
@overload
def __set__(self, instance: object, value: str) -> None: ...
@overload
def __set__(self, instance: object, value: int) -> None: ...
def __set__(self, instance: object, value) -> None:
pass
@dataclass
class C:
field: AcceptsStrAndInt = AcceptsStrAndInt()
reveal_type(C.__init__) # revealed: (self: C, field: str | int = int) -> None
dataclasses.field
To do
dataclass.fields
Dataclasses have a special __dataclass_fields__
class variable member. The DataclassInstance
protocol checks for the presence of this attribute. It is used in the dataclasses.fields
and
dataclasses.asdict
functions, for example:
from dataclasses import dataclass, fields, asdict
@dataclass
class Foo:
x: int
foo = Foo(1)
reveal_type(foo.__dataclass_fields__) # revealed: dict[str, Field[Any]]
reveal_type(fields(Foo)) # revealed: tuple[Field[Any], ...]
reveal_type(asdict(foo)) # revealed: dict[str, Any]
The class objects themselves also have a __dataclass_fields__
attribute:
reveal_type(Foo.__dataclass_fields__) # revealed: dict[str, Field[Any]]
They can be passed into fields
as well, because it also accepts type[DataclassInstance]
arguments:
reveal_type(fields(Foo)) # revealed: tuple[Field[Any], ...]
But calling asdict
on the class object is not allowed:
# TODO: this should be a invalid-argument-type error, but we don't properly check the
# types (and more importantly, the `ClassVar` type qualifier) of protocol members yet.
asdict(Foo)
dataclasses.KW_ONLY
If an attribute is annotated with dataclasses.KW_ONLY
, it is not added to the synthesized
__init__
of the class. Instead, this special marker annotation causes Python at runtime to ensure
that all annotations following it have keyword-only parameters generated for them in the class's
synthesized __init__
method.
[environment]
python-version = "3.10"
from dataclasses import dataclass, field, KW_ONLY
@dataclass
class C:
x: int
_: KW_ONLY
y: str
# error: [missing-argument]
# error: [too-many-positional-arguments]
C(3, "")
C(3, y="")
Using KW_ONLY
to annotate more than one field in a dataclass causes a TypeError
to be raised at
runtime:
@dataclass
class Fails:
a: int
b: KW_ONLY
c: str
# TODO: we should emit an error here
# (two different names with `KW_ONLY` annotations in the same dataclass means the class fails at runtime)
d: KW_ONLY
Other special cases
dataclasses.dataclass
We also understand dataclasses if they are decorated with the fully qualified name:
import dataclasses
@dataclasses.dataclass
class C:
x: str
reveal_type(C.__init__) # revealed: (self: C, x: str) -> None
Dataclass with custom __init__
method
If a class already defines __init__
, it is not replaced by the dataclass
decorator.
from dataclasses import dataclass
@dataclass(init=True)
class C:
x: str
def __init__(self, x: int) -> None:
self.x = str(x)
C(1) # OK
# error: [invalid-argument-type]
C("a")
Similarly, if we set init=False
, we still recognize the custom __init__
method:
@dataclass(init=False)
class D:
def __init__(self, x: int) -> None:
self.x = str(x)
D(1) # OK
D() # error: [missing-argument]
Accessing instance attributes on the class itself
Just like for normal classes, accessing instance attributes on the class itself is not allowed:
from dataclasses import dataclass
@dataclass
class C:
x: int
# error: [unresolved-attribute] "Attribute `x` can only be accessed on instances, not on the class object `<class 'C'>` itself."
C.x
Return type of dataclass(...)
A call like dataclass(order=True)
returns a callable itself, which is then used as the decorator.
We can store the callable in a variable and later use it as a decorator:
from dataclasses import dataclass
dataclass_with_order = dataclass(order=True)
reveal_type(dataclass_with_order) # revealed: <decorator produced by dataclass-like function>
@dataclass_with_order
class C:
x: int
C(1) < C(2) # ok
Using dataclass
as a function
from dataclasses import dataclass
class B:
x: int
# error: [missing-argument]
dataclass(B)()
# error: [invalid-argument-type]
dataclass(B)("a")
reveal_type(dataclass(B)(3).x) # revealed: int
Internals
The dataclass
decorator returns the class itself. This means that the type of Person
is type
,
and attributes like the MRO are unchanged:
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
reveal_type(type(Person)) # revealed: <class 'type'>
reveal_type(Person.__mro__) # revealed: tuple[<class 'Person'>, <class 'object'>]
The generated methods have the following signatures:
reveal_type(Person.__init__) # revealed: (self: Person, name: str, age: int | None = None) -> None
reveal_type(Person.__repr__) # revealed: def __repr__(self) -> str
reveal_type(Person.__eq__) # revealed: def __eq__(self, value: object, /) -> bool
Function-like behavior of synthesized methods
Here, we make sure that the synthesized methods of dataclasses behave like proper functions.
[environment]
python-version = "3.12"
from dataclasses import dataclass
from typing import Callable
from types import FunctionType
from ty_extensions import CallableTypeOf, TypeOf, static_assert, is_subtype_of, is_assignable_to
@dataclass
class C:
x: int
reveal_type(C.__init__) # revealed: (self: C, x: int) -> None
reveal_type(type(C.__init__)) # revealed: <class 'FunctionType'>
# We can access attributes that are defined on functions:
reveal_type(type(C.__init__).__code__) # revealed: CodeType
reveal_type(C.__init__.__code__) # revealed: CodeType
def equivalent_signature(self: C, x: int) -> None:
pass
type DunderInitType = TypeOf[C.__init__]
type EquivalentPureCallableType = Callable[[C, int], None]
type EquivalentFunctionLikeCallableType = CallableTypeOf[equivalent_signature]
static_assert(is_subtype_of(DunderInitType, EquivalentPureCallableType))
static_assert(is_assignable_to(DunderInitType, EquivalentPureCallableType))
static_assert(not is_subtype_of(EquivalentPureCallableType, DunderInitType))
static_assert(not is_assignable_to(EquivalentPureCallableType, DunderInitType))
static_assert(is_subtype_of(DunderInitType, EquivalentFunctionLikeCallableType))
static_assert(is_assignable_to(DunderInitType, EquivalentFunctionLikeCallableType))
static_assert(not is_subtype_of(EquivalentFunctionLikeCallableType, DunderInitType))
static_assert(not is_assignable_to(EquivalentFunctionLikeCallableType, DunderInitType))
static_assert(is_subtype_of(DunderInitType, FunctionType))