ruff/crates/ty_python_semantic/resources/mdtest/dataclasses.md
2025-06-16 17:27:55 +00:00

20 KiB

Dataclasses

Basic

Decorating a class with @dataclass is a convenient way to add special methods such as __init__, __repr__, and __eq__ to a class. The following example shows the basic usage of the @dataclass decorator. By default, only the three mentioned methods are generated.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int | None = None

alice1 = Person("Alice", 30)
alice2 = Person(name="Alice", age=30)
alice3 = Person(age=30, name="Alice")
alice4 = Person("Alice", age=30)

reveal_type(alice1)  # revealed: Person
reveal_type(type(alice1))  # revealed: type[Person]

reveal_type(alice1.name)  # revealed: str
reveal_type(alice1.age)  # revealed: int | None

reveal_type(repr(alice1))  # revealed: str

reveal_type(alice1 == alice2)  # revealed: bool
reveal_type(alice1 == "Alice")  # revealed: bool

bob = Person("Bob")
bob2 = Person("Bob", None)
bob3 = Person(name="Bob")
bob4 = Person(name="Bob", age=None)

The signature of the __init__ method is generated based on the classes attributes. The following calls are not valid:

# error: [missing-argument]
Person()

# error: [too-many-positional-arguments]
Person("Eve", 20, "too many arguments")

# error: [invalid-argument-type]
Person("Eve", "string instead of int")

# error: [invalid-argument-type]
# error: [invalid-argument-type]
Person(20, "Eve")

Signature of __init__

Declarations in the class body are used to generate the signature of the __init__ method. If the attributes are not just declarations, but also bindings, the type inferred from bindings is used as the default value.

from dataclasses import dataclass

@dataclass
class D:
    x: int
    y: str = "default"
    z: int | None = 1 + 2

reveal_type(D.__init__)  # revealed: (self: D, x: int, y: str = Literal["default"], z: int | None = Literal[3]) -> None

This also works if the declaration and binding are split:

@dataclass
class D:
    x: int | None
    x = None

reveal_type(D.__init__)  # revealed: (self: D, x: int | None = None) -> None

Non-fully static types are handled correctly:

from typing import Any

@dataclass
class C:
    x: Any
    y: int | Any
    z: tuple[int, Any]

reveal_type(C.__init__)  # revealed: (self: C, x: Any, y: int | Any, z: tuple[int, Any]) -> None

Variables without annotations are ignored:

@dataclass
class D:
    x: int
    y = 1

reveal_type(D.__init__)  # revealed: (self: D, x: int) -> None

If attributes without default values are declared after attributes with default values, a TypeError will be raised at runtime. Ideally, we would emit a diagnostic in that case:

@dataclass
class D:
    x: int = 1
    # TODO: this should be an error: field without default defined after field with default
    y: str

Pure class attributes (ClassVar) are not included in the signature of __init__:

from typing import ClassVar

@dataclass
class D:
    x: int
    y: ClassVar[str] = "default"
    z: bool

reveal_type(D.__init__)  # revealed: (self: D, x: int, z: bool) -> None

d = D(1, True)
reveal_type(d.x)  # revealed: int
reveal_type(d.y)  # revealed: str
reveal_type(d.z)  # revealed: bool

Function declarations do not affect the signature of __init__:

@dataclass
class D:
    x: int

    def y(self) -> str:
        return ""

reveal_type(D.__init__)  # revealed: (self: D, x: int) -> None

And neither do nested class declarations:

@dataclass
class D:
    x: int

    class Nested:
        y: str

reveal_type(D.__init__)  # revealed: (self: D, x: int) -> None

But if there is a variable annotation with a function or class literal type, the signature of __init__ will include this field:

from ty_extensions import TypeOf

class SomeClass: ...

def some_function() -> None: ...
@dataclass
class D:
    function_literal: TypeOf[some_function]
    class_literal: TypeOf[SomeClass]
    class_subtype_of: type[SomeClass]

# revealed: (self: D, function_literal: def some_function() -> None, class_literal: <class 'SomeClass'>, class_subtype_of: type[SomeClass]) -> None
reveal_type(D.__init__)

More realistically, dataclasses can have Callable attributes:

from typing import Callable

@dataclass
class D:
    c: Callable[[int], str]

reveal_type(D.__init__)  # revealed: (self: D, c: (int, /) -> str) -> None

Implicit instance attributes do not affect the signature of __init__:

@dataclass
class D:
    x: int

    def f(self, y: str) -> None:
        self.y: str = y

reveal_type(D(1).y)  # revealed: str

reveal_type(D.__init__)  # revealed: (self: D, x: int) -> None

Annotating expressions does not lead to an entry in __annotations__ at runtime, and so it wouldn't be included in the signature of __init__. This is a case that we currently don't detect:

@dataclass
class D:
    # (x) is an expression, not a "simple name"
    (x): int = 1

# TODO: should ideally not include a `x` parameter
reveal_type(D.__init__)  # revealed: (self: D, x: int = Literal[1]) -> None

@dataclass calls with arguments

The @dataclass decorator can take several arguments to customize the existence of the generated methods. The following test makes sure that we still treat the class as a dataclass if (the default) arguments are passed in:

from dataclasses import dataclass

@dataclass(init=True, repr=True, eq=True)
class Person:
    name: str
    age: int | None = None

alice = Person("Alice", 30)
reveal_type(repr(alice))  # revealed: str
reveal_type(alice == alice)  # revealed: bool

If init is set to False, no __init__ method is generated:

from dataclasses import dataclass

@dataclass(init=False)
class C:
    x: int

C()  # Okay

# error: [too-many-positional-arguments]
C(1)

repr(C())

C() == C()

Other dataclass parameters

repr

A custom __repr__ method is generated by default. It can be disabled by passing repr=False, but in that case __repr__ is still available via object.__repr__:

from dataclasses import dataclass

@dataclass(repr=False)
class WithoutRepr:
    x: int

reveal_type(WithoutRepr(1).__repr__)  # revealed: bound method WithoutRepr.__repr__() -> str

eq

The same is true for __eq__. Setting eq=False disables the generated __eq__ method, but __eq__ is still available via object.__eq__:

from dataclasses import dataclass

@dataclass(eq=False)
class WithoutEq:
    x: int

reveal_type(WithoutEq(1) == WithoutEq(2))  # revealed: bool

order

[environment]
python-version = "3.12"

order is set to False by default. If order=True, __lt__, __le__, __gt__, and __ge__ methods will be generated:

from dataclasses import dataclass

@dataclass
class WithoutOrder:
    x: int

WithoutOrder(1) < WithoutOrder(2)  # error: [unsupported-operator]
WithoutOrder(1) <= WithoutOrder(2)  # error: [unsupported-operator]
WithoutOrder(1) > WithoutOrder(2)  # error: [unsupported-operator]
WithoutOrder(1) >= WithoutOrder(2)  # error: [unsupported-operator]

@dataclass(order=True)
class WithOrder:
    x: int

WithOrder(1) < WithOrder(2)
WithOrder(1) <= WithOrder(2)
WithOrder(1) > WithOrder(2)
WithOrder(1) >= WithOrder(2)

Comparisons are only allowed for WithOrder instances:

WithOrder(1) < 2  # error: [unsupported-operator]
WithOrder(1) <= 2  # error: [unsupported-operator]
WithOrder(1) > 2  # error: [unsupported-operator]
WithOrder(1) >= 2  # error: [unsupported-operator]

This also works for generic dataclasses:

from dataclasses import dataclass

@dataclass(order=True)
class GenericWithOrder[T]:
    x: T

GenericWithOrder[int](1) < GenericWithOrder[int](1)

GenericWithOrder[int](1) < GenericWithOrder[str]("a")  # error: [unsupported-operator]

If a class already defines one of the comparison methods, a TypeError is raised at runtime. Ideally, we would emit a diagnostic in that case:

@dataclass(order=True)
class AlreadyHasCustomDunderLt:
    x: int

    # TODO: Ideally, we would emit a diagnostic here
    def __lt__(self, other: object) -> bool:
        return False

unsafe_hash

To do

frozen

If true (the default is False), assigning to fields will generate a diagnostic.

from dataclasses import dataclass

@dataclass(frozen=True)
class MyFrozenClass:
    x: int

frozen_instance = MyFrozenClass(1)
frozen_instance.x = 2  # error: [invalid-assignment]

If __setattr__() or __delattr__() is defined in the class, we should emit a diagnostic.

from dataclasses import dataclass

@dataclass(frozen=True)
class MyFrozenClass:
    x: int

    # TODO: Emit a diagnostic here
    def __setattr__(self, name: str, value: object) -> None: ...

    # TODO: Emit a diagnostic here
    def __delattr__(self, name: str) -> None: ...

This also works for generic dataclasses:

[environment]
python-version = "3.12"
from dataclasses import dataclass

@dataclass(frozen=True)
class MyFrozenGeneric[T]:
    x: T

frozen_instance = MyFrozenGeneric[int](1)
frozen_instance.x = 2  # error: [invalid-assignment]

When attempting to mutate an unresolved attribute on a frozen dataclass, only unresolved-attribute is emitted:

from dataclasses import dataclass

@dataclass(frozen=True)
class MyFrozenClass: ...

frozen = MyFrozenClass()
frozen.x = 2  # error: [unresolved-attribute]

match_args

To do

kw_only

To do

slots

To do

weakref_slot

To do

Inheritance

Normal class inheriting from a dataclass

from dataclasses import dataclass

@dataclass
class Base:
    x: int

class Derived(Base): ...

d = Derived(1)  # OK
reveal_type(d.x)  # revealed: int

Dataclass inheriting from normal class

from dataclasses import dataclass

class Base:
    x: int = 1

@dataclass
class Derived(Base):
    y: str

d = Derived("a")

# error: [too-many-positional-arguments]
# error: [invalid-argument-type]
Derived(1, "a")

Dataclass inheriting from another dataclass

from dataclasses import dataclass

@dataclass
class Base:
    x: int
    y: str

@dataclass
class Derived(Base):
    z: bool

d = Derived(1, "a", True)  # OK

reveal_type(d.x)  # revealed: int
reveal_type(d.y)  # revealed: str
reveal_type(d.z)  # revealed: bool

# error: [missing-argument]
Derived(1, "a")

# error: [missing-argument]
Derived(True)

Overwriting attributes from base class

The following example comes from the Python documentation. The x attribute appears just once in the __init__ signature, and the default value is taken from the derived class

from dataclasses import dataclass
from typing import Any

@dataclass
class Base:
    x: Any = 15.0
    y: int = 0

@dataclass
class C(Base):
    z: int = 10
    x: int = 15

reveal_type(C.__init__)  # revealed: (self: C, x: int = Literal[15], y: int = Literal[0], z: int = Literal[10]) -> None

Generic dataclasses

[environment]
python-version = "3.12"
from dataclasses import dataclass

@dataclass
class DataWithDescription[T]:
    data: T
    description: str

reveal_type(DataWithDescription[int])  # revealed: <class 'DataWithDescription[int]'>

d_int = DataWithDescription[int](1, "description")  # OK
reveal_type(d_int.data)  # revealed: int
reveal_type(d_int.description)  # revealed: str

# error: [invalid-argument-type]
DataWithDescription[int](None, "description")

Descriptor-typed fields

Same type in __get__ and __set__

For the following descriptor, the return type of __get__ and the type of the value parameter in __set__ are the same. The generated __init__ method takes an argument of this type (instead of the type of the descriptor), and the default value is also of this type:

from typing import overload
from dataclasses import dataclass

class UppercaseString:
    _value: str = ""

    def __get__(self, instance: object, owner: None | type) -> str:
        return self._value

    def __set__(self, instance: object, value: str) -> None:
        self._value = value.upper()

@dataclass
class C:
    upper: UppercaseString = UppercaseString()

reveal_type(C.__init__)  # revealed: (self: C, upper: str = str) -> None

c = C("abc")
reveal_type(c.upper)  # revealed: str

# This is also okay:
C()

# error: [invalid-argument-type]
C(1)

# error: [too-many-positional-arguments]
C("a", "b")

Different types in __get__ and __set__

In general, the type of the __init__ parameter is determined by the value parameter type of the __set__ method (str in the example below). However, the default value is generated by calling the descriptor's __get__ method as if it had been called on the class itself, i.e. passing None for the instance argument.

from typing import Literal, overload
from dataclasses import dataclass

class ConvertToLength:
    _len: int = 0

    @overload
    def __get__(self, instance: None, owner: type) -> Literal[""]: ...
    @overload
    def __get__(self, instance: object, owner: type | None) -> int: ...
    def __get__(self, instance: object | None, owner: type | None) -> str | int:
        if instance is None:
            return ""

        return self._len

    def __set__(self, instance, value: str) -> None:
        self._len = len(value)

@dataclass
class C:
    converter: ConvertToLength = ConvertToLength()

reveal_type(C.__init__)  # revealed: (self: C, converter: str = Literal[""]) -> None

c = C("abc")
reveal_type(c.converter)  # revealed: int

# This is also okay:
C()

# error: [invalid-argument-type]
C(1)

# error: [too-many-positional-arguments]
C("a", "b")

With overloaded __set__ method

If the __set__ method is overloaded, we determine the type for the __init__ parameter as the union of all possible value parameter types:

from typing import overload
from dataclasses import dataclass

class AcceptsStrAndInt:
    def __get__(self, instance, owner) -> int:
        return 0

    @overload
    def __set__(self, instance: object, value: str) -> None: ...
    @overload
    def __set__(self, instance: object, value: int) -> None: ...
    def __set__(self, instance: object, value) -> None:
        pass

@dataclass
class C:
    field: AcceptsStrAndInt = AcceptsStrAndInt()

reveal_type(C.__init__)  # revealed: (self: C, field: str | int = int) -> None

dataclasses.field

To do

dataclass.fields

Dataclasses have a special __dataclass_fields__ class variable member. The DataclassInstance protocol checks for the presence of this attribute. It is used in the dataclasses.fields and dataclasses.asdict functions, for example:

from dataclasses import dataclass, fields, asdict

@dataclass
class Foo:
    x: int

foo = Foo(1)

reveal_type(foo.__dataclass_fields__)  # revealed: dict[str, Field[Any]]
reveal_type(fields(Foo))  # revealed: tuple[Field[Any], ...]
reveal_type(asdict(foo))  # revealed: dict[str, Any]

The class objects themselves also have a __dataclass_fields__ attribute:

reveal_type(Foo.__dataclass_fields__)  # revealed: dict[str, Field[Any]]

They can be passed into fields as well, because it also accepts type[DataclassInstance] arguments:

reveal_type(fields(Foo))  # revealed: tuple[Field[Any], ...]

But calling asdict on the class object is not allowed:

# TODO: this should be a invalid-argument-type error, but we don't properly check the
# types (and more importantly, the `ClassVar` type qualifier) of protocol members yet.
asdict(Foo)

dataclasses.KW_ONLY

If an attribute is annotated with dataclasses.KW_ONLY, it is not added to the synthesized __init__ of the class. Instead, this special marker annotation causes Python at runtime to ensure that all annotations following it have keyword-only parameters generated for them in the class's synthesized __init__ method.

[environment]
python-version = "3.10"
from dataclasses import dataclass, field, KW_ONLY

@dataclass
class C:
    x: int
    _: KW_ONLY
    y: str

# error: [missing-argument]
# error: [too-many-positional-arguments]
C(3, "")

C(3, y="")

Using KW_ONLY to annotate more than one field in a dataclass causes a TypeError to be raised at runtime:

@dataclass
class Fails:
    a: int
    b: KW_ONLY
    c: str

    # TODO: we should emit an error here
    # (two different names with `KW_ONLY` annotations in the same dataclass means the class fails at runtime)
    d: KW_ONLY

Other special cases

dataclasses.dataclass

We also understand dataclasses if they are decorated with the fully qualified name:

import dataclasses

@dataclasses.dataclass
class C:
    x: str

reveal_type(C.__init__)  # revealed: (self: C, x: str) -> None

Dataclass with custom __init__ method

If a class already defines __init__, it is not replaced by the dataclass decorator.

from dataclasses import dataclass

@dataclass(init=True)
class C:
    x: str

    def __init__(self, x: int) -> None:
        self.x = str(x)

C(1)  # OK

# error: [invalid-argument-type]
C("a")

Similarly, if we set init=False, we still recognize the custom __init__ method:

@dataclass(init=False)
class D:
    def __init__(self, x: int) -> None:
        self.x = str(x)

D(1)  # OK
D()  # error: [missing-argument]

Accessing instance attributes on the class itself

Just like for normal classes, accessing instance attributes on the class itself is not allowed:

from dataclasses import dataclass

@dataclass
class C:
    x: int

# error: [unresolved-attribute] "Attribute `x` can only be accessed on instances, not on the class object `<class 'C'>` itself."
C.x

Return type of dataclass(...)

A call like dataclass(order=True) returns a callable itself, which is then used as the decorator. We can store the callable in a variable and later use it as a decorator:

from dataclasses import dataclass

dataclass_with_order = dataclass(order=True)

reveal_type(dataclass_with_order)  # revealed: <decorator produced by dataclass-like function>

@dataclass_with_order
class C:
    x: int

C(1) < C(2)  # ok

Using dataclass as a function

from dataclasses import dataclass

class B:
    x: int

# error: [missing-argument]
dataclass(B)()

# error: [invalid-argument-type]
dataclass(B)("a")

reveal_type(dataclass(B)(3).x)  # revealed: int

Internals

The dataclass decorator returns the class itself. This means that the type of Person is type, and attributes like the MRO are unchanged:

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int | None = None

reveal_type(type(Person))  # revealed: <class 'type'>
reveal_type(Person.__mro__)  # revealed: tuple[<class 'Person'>, <class 'object'>]

The generated methods have the following signatures:

reveal_type(Person.__init__)  # revealed: (self: Person, name: str, age: int | None = None) -> None

reveal_type(Person.__repr__)  # revealed: def __repr__(self) -> str

reveal_type(Person.__eq__)  # revealed: def __eq__(self, value: object, /) -> bool

Function-like behavior of synthesized methods

Here, we make sure that the synthesized methods of dataclasses behave like proper functions.

[environment]
python-version = "3.12"
from dataclasses import dataclass
from typing import Callable
from types import FunctionType
from ty_extensions import CallableTypeOf, TypeOf, static_assert, is_subtype_of, is_assignable_to

@dataclass
class C:
    x: int

reveal_type(C.__init__)  # revealed: (self: C, x: int) -> None
reveal_type(type(C.__init__))  # revealed: <class 'FunctionType'>

# We can access attributes that are defined on functions:
reveal_type(type(C.__init__).__code__)  # revealed: CodeType
reveal_type(C.__init__.__code__)  # revealed: CodeType

def equivalent_signature(self: C, x: int) -> None:
    pass

type DunderInitType = TypeOf[C.__init__]
type EquivalentPureCallableType = Callable[[C, int], None]
type EquivalentFunctionLikeCallableType = CallableTypeOf[equivalent_signature]

static_assert(is_subtype_of(DunderInitType, EquivalentPureCallableType))
static_assert(is_assignable_to(DunderInitType, EquivalentPureCallableType))

static_assert(not is_subtype_of(EquivalentPureCallableType, DunderInitType))
static_assert(not is_assignable_to(EquivalentPureCallableType, DunderInitType))

static_assert(is_subtype_of(DunderInitType, EquivalentFunctionLikeCallableType))
static_assert(is_assignable_to(DunderInitType, EquivalentFunctionLikeCallableType))

static_assert(not is_subtype_of(EquivalentFunctionLikeCallableType, DunderInitType))
static_assert(not is_assignable_to(EquivalentFunctionLikeCallableType, DunderInitType))

static_assert(is_subtype_of(DunderInitType, FunctionType))