ruff/crates/ty_python_semantic/resources/mdtest/dataclasses.md

731 lines
15 KiB
Markdown

# Dataclasses
## Basic
Decorating a class with `@dataclass` is a convenient way to add special methods such as `__init__`,
`__repr__`, and `__eq__` to a class. The following example shows the basic usage of the `@dataclass`
decorator. By default, only the three mentioned methods are generated.
```py
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
alice1 = Person("Alice", 30)
alice2 = Person(name="Alice", age=30)
alice3 = Person(age=30, name="Alice")
alice4 = Person("Alice", age=30)
reveal_type(alice1) # revealed: Person
reveal_type(type(alice1)) # revealed: type[Person]
reveal_type(alice1.name) # revealed: str
reveal_type(alice1.age) # revealed: int | None
reveal_type(repr(alice1)) # revealed: str
reveal_type(alice1 == alice2) # revealed: bool
reveal_type(alice1 == "Alice") # revealed: bool
bob = Person("Bob")
bob2 = Person("Bob", None)
bob3 = Person(name="Bob")
bob4 = Person(name="Bob", age=None)
```
The signature of the `__init__` method is generated based on the classes attributes. The following
calls are not valid:
```py
# error: [missing-argument]
Person()
# error: [too-many-positional-arguments]
Person("Eve", 20, "too many arguments")
# error: [invalid-argument-type]
Person("Eve", "string instead of int")
# error: [invalid-argument-type]
# error: [invalid-argument-type]
Person(20, "Eve")
```
## Signature of `__init__`
TODO: All of the following tests are missing the `self` argument in the `__init__` signature.
Declarations in the class body are used to generate the signature of the `__init__` method. If the
attributes are not just declarations, but also bindings, the type inferred from bindings is used as
the default value.
```py
from dataclasses import dataclass
@dataclass
class D:
x: int
y: str = "default"
z: int | None = 1 + 2
reveal_type(D.__init__) # revealed: (x: int, y: str = Literal["default"], z: int | None = Literal[3]) -> None
```
This also works if the declaration and binding are split:
```py
@dataclass
class D:
x: int | None
x = None
reveal_type(D.__init__) # revealed: (x: int | None = None) -> None
```
Non-fully static types are handled correctly:
```py
from typing import Any
@dataclass
class C:
x: Any
y: int | Any
z: tuple[int, Any]
reveal_type(C.__init__) # revealed: (x: Any, y: int | Any, z: tuple[int, Any]) -> None
```
Variables without annotations are ignored:
```py
@dataclass
class D:
x: int
y = 1
reveal_type(D.__init__) # revealed: (x: int) -> None
```
If attributes without default values are declared after attributes with default values, a
`TypeError` will be raised at runtime. Ideally, we would emit a diagnostic in that case:
```py
@dataclass
class D:
x: int = 1
# TODO: this should be an error: field without default defined after field with default
y: str
```
Pure class attributes (`ClassVar`) are not included in the signature of `__init__`:
```py
from typing import ClassVar
@dataclass
class D:
x: int
y: ClassVar[str] = "default"
z: bool
reveal_type(D.__init__) # revealed: (x: int, z: bool) -> None
d = D(1, True)
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
```
Function declarations do not affect the signature of `__init__`:
```py
@dataclass
class D:
x: int
def y(self) -> str:
return ""
reveal_type(D.__init__) # revealed: (x: int) -> None
```
And neither do nested class declarations:
```py
@dataclass
class D:
x: int
class Nested:
y: str
reveal_type(D.__init__) # revealed: (x: int) -> None
```
But if there is a variable annotation with a function or class literal type, the signature of
`__init__` will include this field:
```py
from ty_extensions import TypeOf
class SomeClass: ...
def some_function() -> None: ...
@dataclass
class D:
function_literal: TypeOf[some_function]
class_literal: TypeOf[SomeClass]
class_subtype_of: type[SomeClass]
# revealed: (function_literal: def some_function() -> None, class_literal: <class 'SomeClass'>, class_subtype_of: type[SomeClass]) -> None
reveal_type(D.__init__)
```
More realistically, dataclasses can have `Callable` attributes:
```py
from typing import Callable
@dataclass
class D:
c: Callable[[int], str]
reveal_type(D.__init__) # revealed: (c: (int, /) -> str) -> None
```
Implicit instance attributes do not affect the signature of `__init__`:
```py
@dataclass
class D:
x: int
def f(self, y: str) -> None:
self.y: str = y
reveal_type(D(1).y) # revealed: str
reveal_type(D.__init__) # revealed: (x: int) -> None
```
Annotating expressions does not lead to an entry in `__annotations__` at runtime, and so it wouldn't
be included in the signature of `__init__`. This is a case that we currently don't detect:
```py
@dataclass
class D:
# (x) is an expression, not a "simple name"
(x): int = 1
# TODO: should ideally not include a `x` parameter
reveal_type(D.__init__) # revealed: (x: int = Literal[1]) -> None
```
## `@dataclass` calls with arguments
The `@dataclass` decorator can take several arguments to customize the existence of the generated
methods. The following test makes sure that we still treat the class as a dataclass if (the default)
arguments are passed in:
```py
from dataclasses import dataclass
@dataclass(init=True, repr=True, eq=True)
class Person:
name: str
age: int | None = None
alice = Person("Alice", 30)
reveal_type(repr(alice)) # revealed: str
reveal_type(alice == alice) # revealed: bool
```
If `init` is set to `False`, no `__init__` method is generated:
```py
from dataclasses import dataclass
@dataclass(init=False)
class C:
x: int
C() # Okay
# error: [too-many-positional-arguments]
C(1)
repr(C())
C() == C()
```
## Other dataclass parameters
### `repr`
A custom `__repr__` method is generated by default. It can be disabled by passing `repr=False`, but
in that case `__repr__` is still available via `object.__repr__`:
```py
from dataclasses import dataclass
@dataclass(repr=False)
class WithoutRepr:
x: int
reveal_type(WithoutRepr(1).__repr__) # revealed: bound method WithoutRepr.__repr__() -> str
```
### `eq`
The same is true for `__eq__`. Setting `eq=False` disables the generated `__eq__` method, but
`__eq__` is still available via `object.__eq__`:
```py
from dataclasses import dataclass
@dataclass(eq=False)
class WithoutEq:
x: int
reveal_type(WithoutEq(1) == WithoutEq(2)) # revealed: bool
```
### `order`
```toml
[environment]
python-version = "3.12"
```
`order` is set to `False` by default. If `order=True`, `__lt__`, `__le__`, `__gt__`, and `__ge__`
methods will be generated:
```py
from dataclasses import dataclass
@dataclass
class WithoutOrder:
x: int
WithoutOrder(1) < WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) <= WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) > WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) >= WithoutOrder(2) # error: [unsupported-operator]
@dataclass(order=True)
class WithOrder:
x: int
WithOrder(1) < WithOrder(2)
WithOrder(1) <= WithOrder(2)
WithOrder(1) > WithOrder(2)
WithOrder(1) >= WithOrder(2)
```
Comparisons are only allowed for `WithOrder` instances:
```py
WithOrder(1) < 2 # error: [unsupported-operator]
WithOrder(1) <= 2 # error: [unsupported-operator]
WithOrder(1) > 2 # error: [unsupported-operator]
WithOrder(1) >= 2 # error: [unsupported-operator]
```
This also works for generic dataclasses:
```py
from dataclasses import dataclass
@dataclass(order=True)
class GenericWithOrder[T]:
x: T
GenericWithOrder[int](1) < GenericWithOrder[int](1)
GenericWithOrder[int](1) < GenericWithOrder[str]("a") # error: [unsupported-operator]
```
If a class already defines one of the comparison methods, a `TypeError` is raised at runtime.
Ideally, we would emit a diagnostic in that case:
```py
@dataclass(order=True)
class AlreadyHasCustomDunderLt:
x: int
# TODO: Ideally, we would emit a diagnostic here
def __lt__(self, other: object) -> bool:
return False
```
### `unsafe_hash`
To do
### `frozen`
To do
### `match_args`
To do
### `kw_only`
To do
### `slots`
To do
### `weakref_slot`
To do
## Inheritance
### Normal class inheriting from a dataclass
```py
from dataclasses import dataclass
@dataclass
class Base:
x: int
class Derived(Base): ...
d = Derived(1) # OK
reveal_type(d.x) # revealed: int
```
### Dataclass inheriting from normal class
```py
from dataclasses import dataclass
class Base:
x: int = 1
@dataclass
class Derived(Base):
y: str
d = Derived("a")
# error: [too-many-positional-arguments]
# error: [invalid-argument-type]
Derived(1, "a")
```
### Dataclass inheriting from another dataclass
```py
from dataclasses import dataclass
@dataclass
class Base:
x: int
y: str
@dataclass
class Derived(Base):
z: bool
d = Derived(1, "a", True) # OK
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
# error: [missing-argument]
Derived(1, "a")
# error: [missing-argument]
Derived(True)
```
### Overwriting attributes from base class
The following example comes from the
[Python documentation](https://docs.python.org/3/library/dataclasses.html#inheritance). The `x`
attribute appears just once in the `__init__` signature, and the default value is taken from the
derived class
```py
from dataclasses import dataclass
from typing import Any
@dataclass
class Base:
x: Any = 15.0
y: int = 0
@dataclass
class C(Base):
z: int = 10
x: int = 15
reveal_type(C.__init__) # revealed: (x: int = Literal[15], y: int = Literal[0], z: int = Literal[10]) -> None
```
## Generic dataclasses
```toml
[environment]
python-version = "3.12"
```
```py
from dataclasses import dataclass
@dataclass
class DataWithDescription[T]:
data: T
description: str
reveal_type(DataWithDescription[int]) # revealed: <class 'DataWithDescription[int]'>
d_int = DataWithDescription[int](1, "description") # OK
reveal_type(d_int.data) # revealed: int
reveal_type(d_int.description) # revealed: str
# error: [invalid-argument-type]
DataWithDescription[int](None, "description")
```
## Descriptor-typed fields
### Same type in `__get__` and `__set__`
For the following descriptor, the return type of `__get__` and the type of the `value` parameter in
`__set__` are the same. The generated `__init__` method takes an argument of this type (instead of
the type of the descriptor), and the default value is also of this type:
```py
from typing import overload
from dataclasses import dataclass
class UppercaseString:
_value: str = ""
def __get__(self, instance: object, owner: None | type) -> str:
return self._value
def __set__(self, instance: object, value: str) -> None:
self._value = value.upper()
@dataclass
class C:
upper: UppercaseString = UppercaseString()
reveal_type(C.__init__) # revealed: (upper: str = str) -> None
c = C("abc")
reveal_type(c.upper) # revealed: str
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
```
### Different types in `__get__` and `__set__`
In general, the type of the `__init__` parameter is determined by the `value` parameter type of the
`__set__` method (`str` in the example below). However, the default value is generated by calling
the descriptor's `__get__` method as if it had been called on the class itself, i.e. passing `None`
for the `instance` argument.
```py
from typing import Literal, overload
from dataclasses import dataclass
class ConvertToLength:
_len: int = 0
@overload
def __get__(self, instance: None, owner: type) -> Literal[""]: ...
@overload
def __get__(self, instance: object, owner: type | None) -> int: ...
def __get__(self, instance: object | None, owner: type | None) -> str | int:
if instance is None:
return ""
return self._len
def __set__(self, instance, value: str) -> None:
self._len = len(value)
@dataclass
class C:
converter: ConvertToLength = ConvertToLength()
reveal_type(C.__init__) # revealed: (converter: str = Literal[""]) -> None
c = C("abc")
reveal_type(c.converter) # revealed: int
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
```
### With overloaded `__set__` method
If the `__set__` method is overloaded, we determine the type for the `__init__` parameter as the
union of all possible `value` parameter types:
```py
from typing import overload
from dataclasses import dataclass
class AcceptsStrAndInt:
def __get__(self, instance, owner) -> int:
return 0
@overload
def __set__(self, instance: object, value: str) -> None: ...
@overload
def __set__(self, instance: object, value: int) -> None: ...
def __set__(self, instance: object, value) -> None:
pass
@dataclass
class C:
field: AcceptsStrAndInt = AcceptsStrAndInt()
reveal_type(C.__init__) # revealed: (field: str | int = int) -> None
```
## `dataclasses.field`
To do
## Other special cases
### `dataclasses.dataclass`
We also understand dataclasses if they are decorated with the fully qualified name:
```py
import dataclasses
@dataclasses.dataclass
class C:
x: str
reveal_type(C.__init__) # revealed: (x: str) -> None
```
### Dataclass with custom `__init__` method
If a class already defines `__init__`, it is not replaced by the `dataclass` decorator.
```py
from dataclasses import dataclass
@dataclass(init=True)
class C:
x: str
def __init__(self, x: int) -> None:
self.x = str(x)
C(1) # OK
# error: [invalid-argument-type]
C("a")
```
Similarly, if we set `init=False`, we still recognize the custom `__init__` method:
```py
@dataclass(init=False)
class D:
def __init__(self, x: int) -> None:
self.x = str(x)
D(1) # OK
D() # error: [missing-argument]
```
### Accessing instance attributes on the class itself
Just like for normal classes, accessing instance attributes on the class itself is not allowed:
```py
from dataclasses import dataclass
@dataclass
class C:
x: int
# error: [unresolved-attribute] "Attribute `x` can only be accessed on instances, not on the class object `<class 'C'>` itself."
C.x
```
### Return type of `dataclass(...)`
A call like `dataclass(order=True)` returns a callable itself, which is then used as the decorator.
We can store the callable in a variable and later use it as a decorator:
```py
from dataclasses import dataclass
dataclass_with_order = dataclass(order=True)
reveal_type(dataclass_with_order) # revealed: <decorator produced by dataclass-like function>
@dataclass_with_order
class C:
x: int
C(1) < C(2) # ok
```
### Using `dataclass` as a function
To do
## Internals
The `dataclass` decorator returns the class itself. This means that the type of `Person` is `type`,
and attributes like the MRO are unchanged:
```py
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
reveal_type(type(Person)) # revealed: <class 'type'>
reveal_type(Person.__mro__) # revealed: tuple[<class 'Person'>, <class 'object'>]
```
The generated methods have the following signatures:
```py
# TODO: `self` is missing here
reveal_type(Person.__init__) # revealed: (name: str, age: int | None = None) -> None
reveal_type(Person.__repr__) # revealed: def __repr__(self) -> str
reveal_type(Person.__eq__) # revealed: def __eq__(self, value: object, /) -> bool
```