15 KiB
Dataclasses
Basic
Decorating a class with @dataclass
is a convenient way to add special methods such as __init__
,
__repr__
, and __eq__
to a class. The following example shows the basic usage of the @dataclass
decorator. By default, only the three mentioned methods are generated.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
alice1 = Person("Alice", 30)
alice2 = Person(name="Alice", age=30)
alice3 = Person(age=30, name="Alice")
alice4 = Person("Alice", age=30)
reveal_type(alice1) # revealed: Person
reveal_type(type(alice1)) # revealed: type[Person]
reveal_type(alice1.name) # revealed: str
reveal_type(alice1.age) # revealed: int | None
reveal_type(repr(alice1)) # revealed: str
reveal_type(alice1 == alice2) # revealed: bool
reveal_type(alice1 == "Alice") # revealed: bool
bob = Person("Bob")
bob2 = Person("Bob", None)
bob3 = Person(name="Bob")
bob4 = Person(name="Bob", age=None)
The signature of the __init__
method is generated based on the classes attributes. The following
calls are not valid:
# error: [missing-argument]
Person()
# error: [too-many-positional-arguments]
Person("Eve", 20, "too many arguments")
# error: [invalid-argument-type]
Person("Eve", "string instead of int")
# error: [invalid-argument-type]
# error: [invalid-argument-type]
Person(20, "Eve")
Signature of __init__
TODO: All of the following tests are missing the self
argument in the __init__
signature.
Declarations in the class body are used to generate the signature of the __init__
method. If the
attributes are not just declarations, but also bindings, the type inferred from bindings is used as
the default value.
from dataclasses import dataclass
@dataclass
class D:
x: int
y: str = "default"
z: int | None = 1 + 2
reveal_type(D.__init__) # revealed: (x: int, y: str = Literal["default"], z: int | None = Literal[3]) -> None
This also works if the declaration and binding are split:
@dataclass
class D:
x: int | None
x = None
reveal_type(D.__init__) # revealed: (x: int | None = None) -> None
Non-fully static types are handled correctly:
from typing import Any
@dataclass
class C:
x: Any
y: int | Any
z: tuple[int, Any]
reveal_type(C.__init__) # revealed: (x: Any, y: int | Any, z: tuple[int, Any]) -> None
Variables without annotations are ignored:
@dataclass
class D:
x: int
y = 1
reveal_type(D.__init__) # revealed: (x: int) -> None
If attributes without default values are declared after attributes with default values, a
TypeError
will be raised at runtime. Ideally, we would emit a diagnostic in that case:
@dataclass
class D:
x: int = 1
# TODO: this should be an error: field without default defined after field with default
y: str
Pure class attributes (ClassVar
) are not included in the signature of __init__
:
from typing import ClassVar
@dataclass
class D:
x: int
y: ClassVar[str] = "default"
z: bool
reveal_type(D.__init__) # revealed: (x: int, z: bool) -> None
d = D(1, True)
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
Function declarations do not affect the signature of __init__
:
@dataclass
class D:
x: int
def y(self) -> str:
return ""
reveal_type(D.__init__) # revealed: (x: int) -> None
And neither do nested class declarations:
@dataclass
class D:
x: int
class Nested:
y: str
reveal_type(D.__init__) # revealed: (x: int) -> None
But if there is a variable annotation with a function or class literal type, the signature of
__init__
will include this field:
from ty_extensions import TypeOf
class SomeClass: ...
def some_function() -> None: ...
@dataclass
class D:
function_literal: TypeOf[some_function]
class_literal: TypeOf[SomeClass]
class_subtype_of: type[SomeClass]
# revealed: (function_literal: def some_function() -> None, class_literal: Literal[SomeClass], class_subtype_of: type[SomeClass]) -> None
reveal_type(D.__init__)
More realistically, dataclasses can have Callable
attributes:
from typing import Callable
@dataclass
class D:
c: Callable[[int], str]
reveal_type(D.__init__) # revealed: (c: (int, /) -> str) -> None
Implicit instance attributes do not affect the signature of __init__
:
@dataclass
class D:
x: int
def f(self, y: str) -> None:
self.y: str = y
reveal_type(D(1).y) # revealed: str
reveal_type(D.__init__) # revealed: (x: int) -> None
Annotating expressions does not lead to an entry in __annotations__
at runtime, and so it wouldn't
be included in the signature of __init__
. This is a case that we currently don't detect:
@dataclass
class D:
# (x) is an expression, not a "simple name"
(x): int = 1
# TODO: should ideally not include a `x` parameter
reveal_type(D.__init__) # revealed: (x: int = Literal[1]) -> None
@dataclass
calls with arguments
The @dataclass
decorator can take several arguments to customize the existence of the generated
methods. The following test makes sure that we still treat the class as a dataclass if (the default)
arguments are passed in:
from dataclasses import dataclass
@dataclass(init=True, repr=True, eq=True)
class Person:
name: str
age: int | None = None
alice = Person("Alice", 30)
reveal_type(repr(alice)) # revealed: str
reveal_type(alice == alice) # revealed: bool
If init
is set to False
, no __init__
method is generated:
from dataclasses import dataclass
@dataclass(init=False)
class C:
x: int
C() # Okay
# error: [too-many-positional-arguments]
C(1)
repr(C())
C() == C()
Other dataclass parameters
repr
A custom __repr__
method is generated by default. It can be disabled by passing repr=False
, but
in that case __repr__
is still available via object.__repr__
:
from dataclasses import dataclass
@dataclass(repr=False)
class WithoutRepr:
x: int
reveal_type(WithoutRepr(1).__repr__) # revealed: bound method WithoutRepr.__repr__() -> str
eq
The same is true for __eq__
. Setting eq=False
disables the generated __eq__
method, but
__eq__
is still available via object.__eq__
:
from dataclasses import dataclass
@dataclass(eq=False)
class WithoutEq:
x: int
reveal_type(WithoutEq(1) == WithoutEq(2)) # revealed: bool
order
[environment]
python-version = "3.12"
order
is set to False
by default. If order=True
, __lt__
, __le__
, __gt__
, and __ge__
methods will be generated:
from dataclasses import dataclass
@dataclass
class WithoutOrder:
x: int
WithoutOrder(1) < WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) <= WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) > WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) >= WithoutOrder(2) # error: [unsupported-operator]
@dataclass(order=True)
class WithOrder:
x: int
WithOrder(1) < WithOrder(2)
WithOrder(1) <= WithOrder(2)
WithOrder(1) > WithOrder(2)
WithOrder(1) >= WithOrder(2)
Comparisons are only allowed for WithOrder
instances:
WithOrder(1) < 2 # error: [unsupported-operator]
WithOrder(1) <= 2 # error: [unsupported-operator]
WithOrder(1) > 2 # error: [unsupported-operator]
WithOrder(1) >= 2 # error: [unsupported-operator]
This also works for generic dataclasses:
from dataclasses import dataclass
@dataclass(order=True)
class GenericWithOrder[T]:
x: T
GenericWithOrder[int](1) < GenericWithOrder[int](1)
GenericWithOrder[int](1) < GenericWithOrder[str]("a") # error: [unsupported-operator]
If a class already defines one of the comparison methods, a TypeError
is raised at runtime.
Ideally, we would emit a diagnostic in that case:
@dataclass(order=True)
class AlreadyHasCustomDunderLt:
x: int
# TODO: Ideally, we would emit a diagnostic here
def __lt__(self, other: object) -> bool:
return False
unsafe_hash
To do
frozen
To do
match_args
To do
kw_only
To do
slots
To do
weakref_slot
To do
Inheritance
Normal class inheriting from a dataclass
from dataclasses import dataclass
@dataclass
class Base:
x: int
class Derived(Base): ...
d = Derived(1) # OK
reveal_type(d.x) # revealed: int
Dataclass inheriting from normal class
from dataclasses import dataclass
class Base:
x: int = 1
@dataclass
class Derived(Base):
y: str
d = Derived("a")
# error: [too-many-positional-arguments]
# error: [invalid-argument-type]
Derived(1, "a")
Dataclass inheriting from another dataclass
from dataclasses import dataclass
@dataclass
class Base:
x: int
y: str
@dataclass
class Derived(Base):
z: bool
d = Derived(1, "a", True) # OK
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
# error: [missing-argument]
Derived(1, "a")
# error: [missing-argument]
Derived(True)
Overwriting attributes from base class
The following example comes from the
Python documentation. The x
attribute appears just once in the __init__
signature, and the default value is taken from the
derived class
from dataclasses import dataclass
from typing import Any
@dataclass
class Base:
x: Any = 15.0
y: int = 0
@dataclass
class C(Base):
z: int = 10
x: int = 15
reveal_type(C.__init__) # revealed: (x: int = Literal[15], y: int = Literal[0], z: int = Literal[10]) -> None
Generic dataclasses
[environment]
python-version = "3.12"
from dataclasses import dataclass
@dataclass
class DataWithDescription[T]:
data: T
description: str
reveal_type(DataWithDescription[int]) # revealed: Literal[DataWithDescription[int]]
d_int = DataWithDescription[int](1, "description") # OK
reveal_type(d_int.data) # revealed: int
reveal_type(d_int.description) # revealed: str
# error: [invalid-argument-type]
DataWithDescription[int](None, "description")
Descriptor-typed fields
Same type in __get__
and __set__
For the following descriptor, the return type of __get__
and the type of the value
parameter in
__set__
are the same. The generated __init__
method takes an argument of this type (instead of
the type of the descriptor), and the default value is also of this type:
from typing import overload
from dataclasses import dataclass
class UppercaseString:
_value: str = ""
def __get__(self, instance: object, owner: None | type) -> str:
return self._value
def __set__(self, instance: object, value: str) -> None:
self._value = value.upper()
@dataclass
class C:
upper: UppercaseString = UppercaseString()
reveal_type(C.__init__) # revealed: (upper: str = str) -> None
c = C("abc")
reveal_type(c.upper) # revealed: str
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
Different types in __get__
and __set__
In general, the type of the __init__
parameter is determined by the value
parameter type of the
__set__
method (str
in the example below). However, the default value is generated by calling
the descriptor's __get__
method as if it had been called on the class itself, i.e. passing None
for the instance
argument.
from typing import Literal, overload
from dataclasses import dataclass
class ConvertToLength:
_len: int = 0
@overload
def __get__(self, instance: None, owner: type) -> Literal[""]: ...
@overload
def __get__(self, instance: object, owner: type | None) -> int: ...
def __get__(self, instance: object | None, owner: type | None) -> str | int:
if instance is None:
return ""
return self._len
def __set__(self, instance, value: str) -> None:
self._len = len(value)
@dataclass
class C:
converter: ConvertToLength = ConvertToLength()
reveal_type(C.__init__) # revealed: (converter: str = Literal[""]) -> None
c = C("abc")
reveal_type(c.converter) # revealed: int
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
With overloaded __set__
method
If the __set__
method is overloaded, we determine the type for the __init__
parameter as the
union of all possible value
parameter types:
from typing import overload
from dataclasses import dataclass
class AcceptsStrAndInt:
def __get__(self, instance, owner) -> int:
return 0
@overload
def __set__(self, instance: object, value: str) -> None: ...
@overload
def __set__(self, instance: object, value: int) -> None: ...
def __set__(self, instance: object, value) -> None:
pass
@dataclass
class C:
field: AcceptsStrAndInt = AcceptsStrAndInt()
reveal_type(C.__init__) # revealed: (field: str | int = int) -> None
dataclasses.field
To do
Other special cases
dataclasses.dataclass
We also understand dataclasses if they are decorated with the fully qualified name:
import dataclasses
@dataclasses.dataclass
class C:
x: str
reveal_type(C.__init__) # revealed: (x: str) -> None
Dataclass with custom __init__
method
If a class already defines __init__
, it is not replaced by the dataclass
decorator.
from dataclasses import dataclass
@dataclass(init=True)
class C:
x: str
def __init__(self, x: int) -> None:
self.x = str(x)
C(1) # OK
# error: [invalid-argument-type]
C("a")
Similarly, if we set init=False
, we still recognize the custom __init__
method:
@dataclass(init=False)
class D:
def __init__(self, x: int) -> None:
self.x = str(x)
D(1) # OK
D() # error: [missing-argument]
Accessing instance attributes on the class itself
Just like for normal classes, accessing instance attributes on the class itself is not allowed:
from dataclasses import dataclass
@dataclass
class C:
x: int
# error: [unresolved-attribute] "Attribute `x` can only be accessed on instances, not on the class object `Literal[C]` itself."
C.x
Return type of dataclass(...)
A call like dataclass(order=True)
returns a callable itself, which is then used as the decorator.
We can store the callable in a variable and later use it as a decorator:
from dataclasses import dataclass
dataclass_with_order = dataclass(order=True)
reveal_type(dataclass_with_order) # revealed: <decorator produced by dataclass-like function>
@dataclass_with_order
class C:
x: int
C(1) < C(2) # ok
Using dataclass
as a function
To do
Internals
The dataclass
decorator returns the class itself. This means that the type of Person
is type
,
and attributes like the MRO are unchanged:
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
reveal_type(type(Person)) # revealed: Literal[type]
reveal_type(Person.__mro__) # revealed: tuple[Literal[Person], Literal[object]]
The generated methods have the following signatures:
# TODO: `self` is missing here
reveal_type(Person.__init__) # revealed: (name: str, age: int | None = None) -> None
reveal_type(Person.__repr__) # revealed: def __repr__(self) -> str
reveal_type(Person.__eq__) # revealed: def __eq__(self, value: object, /) -> bool