mirror of
https://github.com/astral-sh/ruff.git
synced 2025-11-02 21:03:11 +00:00
## Summary
Having a recursive type method to check whether a type is fully static
is inefficient, unnecessary, and makes us overly strict about subtyping
relations.
It's inefficient because we end up re-walking the same types many times
to check for fully-static-ness.
It's unnecessary because we can check relations involving the dynamic
type appropriately, depending whether the relation is subtyping or
assignability.
We use the subtyping relation to simplify unions and intersections. We
can usefully consider that `S <: T` for gradual types also, as long as
it remains true that `S | T` is equivalent to `T` and `S & T` is
equivalent to `S`.
One conservative definition (implemented here) that satisfies this
requirement is that we consider `S <: T` if, for every possible pair of
materializations `S'` and `T'`, `S' <: T'`. Or put differently the top
materialization of `S` (`S+` -- the union of all possible
materializations of `S`) is a subtype of the bottom materialization of
`T` (`T-` -- the intersection of all possible materializations of `T`).
In the most basic cases we can usefully say that `Any <: object` and
that `Never <: Any`, and we can handle more complex cases inductively
from there.
This definition of subtyping for gradual subtypes is not reflexive
(`Any` is not a subtype of `Any`).
As a corollary, we also remove `is_gradual_equivalent_to` --
`is_equivalent_to` now has the meaning that `is_gradual_equivalent_to`
used to have. If necessary, we could restore an
`is_fully_static_equivalent_to` or similar (which would not do an
`is_fully_static` pre-check of the types, but would instead pass a
relation-kind enum down through a recursive equivalence check, similar
to `has_relation_to`), but so far this doesn't appear to be necessary.
Credit to @JelleZijlstra for the observation that `is_fully_static` is
unnecessary and overly restrictive on subtyping.
There is another possible definition of gradual subtyping: instead of
requiring that `S+ <: T-`, we could instead require that `S+ <: T+` and
`S- <: T-`. In other words, instead of requiring all materializations of
`S` to be a subtype of every materialization of `T`, we just require
that every materialization of `S` be a subtype of _some_ materialization
of `T`, and that every materialization of `T` be a supertype of some
materialization of `S`. This definition also preserves the core
invariant that `S <: T` implies that `S | T = T` and `S & T = S`, and it
restores reflexivity: under this definition, `Any` is a subtype of
`Any`, and for any equivalent types `S` and `T`, `S <: T` and `T <: S`.
But unfortunately, this definition breaks transitivity of subtyping,
because nominal subclasses in Python use assignability ("consistent
subtyping") to define acceptable overrides. This means that we may have
a class `A` with `def method(self) -> Any` and a subtype `B(A)` with
`def method(self) -> int`, since `int` is assignable to `Any`. This
means that if we have a protocol `P` with `def method(self) -> Any`, we
would have `B <: A` (from nominal subtyping) and `A <: P` (`Any` is a
subtype of `Any`), but not `B <: P` (`int` is not a subtype of `Any`).
Breaking transitivity of subtyping is not tenable, so we don't use this
definition of subtyping.
## Test Plan
Existing tests (modified in some cases to account for updated
semantics.)
Stable property tests pass at a million iterations:
`QUICKCHECK_TESTS=1000000 cargo test -p ty_python_semantic -- --ignored
types::property_tests::stable`
### Changes to property test type generation
Since we no longer have a method of categorizing built types as
fully-static or not-fully-static, I had to add a previously-discussed
feature to the property tests so that some tests can build types that
are known by construction to be fully static, because there are still
properties that only apply to fully-static types (for example,
reflexiveness of subtyping.)
## Changes to handling of `*args, **kwargs` signatures
This PR "discovered" that, once we allow non-fully-static types to
participate in subtyping under the above definitions, `(*args: Any,
**kwargs: Any) -> Any` is now a subtype of `() -> object`. This is true,
if we take a literal interpretation of the former signature: all
materializations of the parameters `*args: Any, **kwargs: Any` can
accept zero arguments, making the former signature a subtype of the
latter. But the spec actually says that `*args: Any, **kwargs: Any`
should be interpreted as equivalent to `...`, and that makes a
difference here: `(...) -> Any` is not a subtype of `() -> object`,
because (unlike a literal reading of `(*args: Any, **kwargs: Any)`),
`...` can materialize to _any_ signature, including a signature with
required positional arguments.
This matters for this PR because it makes the "any two types are both
assignable to their union" property test fail if we don't implement the
equivalence to `...`. Because `FunctionType.__call__` has the signature
`(*args: Any, **kwargs: Any) -> Any`, and if we take that at face value
it's a subtype of `() -> object`, making `FunctionType` a subtype of `()
-> object)` -- but then a function with a required argument is also a
subtype of `FunctionType`, but not a subtype of `() -> object`. So I
went ahead and implemented the equivalence to `...` in this PR.
## Ecosystem analysis
* Most of the ecosystem report are cases of improved union/intersection
simplification. For example, we can now simplify a union like `bool |
(bool & Unknown) | Unknown` to simply `bool | Unknown`, because we can
now observe that every possible materialization of `bool & Unknown` is
still a subtype of `bool` (whereas before we would set aside `bool &
Unknown` as a not-fully-static type.) This is clearly an improvement.
* The `possibly-unresolved-reference` errors in sockeye, pymongo,
ignite, scrapy and others are true positives for conditional imports
that were formerly silenced by bogus conflicting-declarations (which we
currently don't issue a diagnostic for), because we considered two
different declarations of `Unknown` to be conflicting (we used
`is_equivalent_to` not `is_gradual_equivalent_to`). In this PR that
distinction disappears and all equivalence is gradual, so a declaration
of `Unknown` no longer conflicts with a declaration of `Unknown`, which
then results in us surfacing the possibly-unbound error.
* We will now issue "redundant cast" for casting from a typevar with a
gradual bound to the same typevar (the hydra-zen diagnostic). This seems
like an improvement.
* The new diagnostics in bandersnatch are interesting. For some reason
primer in CI seems to be checking bandersnatch on Python 3.10 (not yet
sure why; this doesn't happen when I run it locally). But bandersnatch
uses `enum.StrEnum`, which doesn't exist on 3.10. That makes the `class
SimpleDigest(StrEnum)` a class that inherits from `Unknown` (and
bypasses our current TODO handling for accessing attributes on enum
classes, since we don't recognize it as an enum class at all). This PR
improves our understanding of assignability to classes that inherit from
`Any` / `Unknown`, and we now recognize that a string literal is not
assignable to a class inheriting `Any` or `Unknown`.
937 lines
21 KiB
Markdown
937 lines
21 KiB
Markdown
# Dataclasses
|
|
|
|
## Basic
|
|
|
|
Decorating a class with `@dataclass` is a convenient way to add special methods such as `__init__`,
|
|
`__repr__`, and `__eq__` to a class. The following example shows the basic usage of the `@dataclass`
|
|
decorator. By default, only the three mentioned methods are generated.
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class Person:
|
|
name: str
|
|
age: int | None = None
|
|
|
|
alice1 = Person("Alice", 30)
|
|
alice2 = Person(name="Alice", age=30)
|
|
alice3 = Person(age=30, name="Alice")
|
|
alice4 = Person("Alice", age=30)
|
|
|
|
reveal_type(alice1) # revealed: Person
|
|
reveal_type(type(alice1)) # revealed: type[Person]
|
|
|
|
reveal_type(alice1.name) # revealed: str
|
|
reveal_type(alice1.age) # revealed: int | None
|
|
|
|
reveal_type(repr(alice1)) # revealed: str
|
|
|
|
reveal_type(alice1 == alice2) # revealed: bool
|
|
reveal_type(alice1 == "Alice") # revealed: bool
|
|
|
|
bob = Person("Bob")
|
|
bob2 = Person("Bob", None)
|
|
bob3 = Person(name="Bob")
|
|
bob4 = Person(name="Bob", age=None)
|
|
```
|
|
|
|
The signature of the `__init__` method is generated based on the classes attributes. The following
|
|
calls are not valid:
|
|
|
|
```py
|
|
# error: [missing-argument]
|
|
Person()
|
|
|
|
# error: [too-many-positional-arguments]
|
|
Person("Eve", 20, "too many arguments")
|
|
|
|
# error: [invalid-argument-type]
|
|
Person("Eve", "string instead of int")
|
|
|
|
# error: [invalid-argument-type]
|
|
# error: [invalid-argument-type]
|
|
Person(20, "Eve")
|
|
```
|
|
|
|
## Signature of `__init__`
|
|
|
|
Declarations in the class body are used to generate the signature of the `__init__` method. If the
|
|
attributes are not just declarations, but also bindings, the type inferred from bindings is used as
|
|
the default value.
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class D:
|
|
x: int
|
|
y: str = "default"
|
|
z: int | None = 1 + 2
|
|
|
|
reveal_type(D.__init__) # revealed: (self: D, x: int, y: str = Literal["default"], z: int | None = Literal[3]) -> None
|
|
```
|
|
|
|
This also works if the declaration and binding are split:
|
|
|
|
```py
|
|
@dataclass
|
|
class D:
|
|
x: int | None
|
|
x = None
|
|
|
|
reveal_type(D.__init__) # revealed: (self: D, x: int | None = None) -> None
|
|
```
|
|
|
|
Non-fully static types are handled correctly:
|
|
|
|
```py
|
|
from typing import Any
|
|
|
|
@dataclass
|
|
class C:
|
|
w: type[Any]
|
|
x: Any
|
|
y: int | Any
|
|
z: tuple[int, Any]
|
|
|
|
reveal_type(C.__init__) # revealed: (self: C, w: type[Any], x: Any, y: int | Any, z: tuple[int, Any]) -> None
|
|
```
|
|
|
|
Variables without annotations are ignored:
|
|
|
|
```py
|
|
@dataclass
|
|
class D:
|
|
x: int
|
|
y = 1
|
|
|
|
reveal_type(D.__init__) # revealed: (self: D, x: int) -> None
|
|
```
|
|
|
|
If attributes without default values are declared after attributes with default values, a
|
|
`TypeError` will be raised at runtime. Ideally, we would emit a diagnostic in that case:
|
|
|
|
```py
|
|
@dataclass
|
|
class D:
|
|
x: int = 1
|
|
# TODO: this should be an error: field without default defined after field with default
|
|
y: str
|
|
```
|
|
|
|
Pure class attributes (`ClassVar`) are not included in the signature of `__init__`:
|
|
|
|
```py
|
|
from typing import ClassVar
|
|
|
|
@dataclass
|
|
class D:
|
|
x: int
|
|
y: ClassVar[str] = "default"
|
|
z: bool
|
|
|
|
reveal_type(D.__init__) # revealed: (self: D, x: int, z: bool) -> None
|
|
|
|
d = D(1, True)
|
|
reveal_type(d.x) # revealed: int
|
|
reveal_type(d.y) # revealed: str
|
|
reveal_type(d.z) # revealed: bool
|
|
```
|
|
|
|
Function declarations do not affect the signature of `__init__`:
|
|
|
|
```py
|
|
@dataclass
|
|
class D:
|
|
x: int
|
|
|
|
def y(self) -> str:
|
|
return ""
|
|
|
|
reveal_type(D.__init__) # revealed: (self: D, x: int) -> None
|
|
```
|
|
|
|
And neither do nested class declarations:
|
|
|
|
```py
|
|
@dataclass
|
|
class D:
|
|
x: int
|
|
|
|
class Nested:
|
|
y: str
|
|
|
|
reveal_type(D.__init__) # revealed: (self: D, x: int) -> None
|
|
```
|
|
|
|
But if there is a variable annotation with a function or class literal type, the signature of
|
|
`__init__` will include this field:
|
|
|
|
```py
|
|
from ty_extensions import TypeOf
|
|
|
|
class SomeClass: ...
|
|
|
|
def some_function() -> None: ...
|
|
@dataclass
|
|
class D:
|
|
function_literal: TypeOf[some_function]
|
|
class_literal: TypeOf[SomeClass]
|
|
class_subtype_of: type[SomeClass]
|
|
|
|
# revealed: (self: D, function_literal: def some_function() -> None, class_literal: <class 'SomeClass'>, class_subtype_of: type[SomeClass]) -> None
|
|
reveal_type(D.__init__)
|
|
```
|
|
|
|
More realistically, dataclasses can have `Callable` attributes:
|
|
|
|
```py
|
|
from typing import Callable
|
|
|
|
@dataclass
|
|
class D:
|
|
c: Callable[[int], str]
|
|
|
|
reveal_type(D.__init__) # revealed: (self: D, c: (int, /) -> str) -> None
|
|
```
|
|
|
|
Implicit instance attributes do not affect the signature of `__init__`:
|
|
|
|
```py
|
|
@dataclass
|
|
class D:
|
|
x: int
|
|
|
|
def f(self, y: str) -> None:
|
|
self.y: str = y
|
|
|
|
reveal_type(D(1).y) # revealed: str
|
|
|
|
reveal_type(D.__init__) # revealed: (self: D, x: int) -> None
|
|
```
|
|
|
|
Annotating expressions does not lead to an entry in `__annotations__` at runtime, and so it wouldn't
|
|
be included in the signature of `__init__`. This is a case that we currently don't detect:
|
|
|
|
```py
|
|
@dataclass
|
|
class D:
|
|
# (x) is an expression, not a "simple name"
|
|
(x): int = 1
|
|
|
|
# TODO: should ideally not include a `x` parameter
|
|
reveal_type(D.__init__) # revealed: (self: D, x: int = Literal[1]) -> None
|
|
```
|
|
|
|
## `@dataclass` calls with arguments
|
|
|
|
The `@dataclass` decorator can take several arguments to customize the existence of the generated
|
|
methods. The following test makes sure that we still treat the class as a dataclass if (the default)
|
|
arguments are passed in:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(init=True, repr=True, eq=True)
|
|
class Person:
|
|
name: str
|
|
age: int | None = None
|
|
|
|
alice = Person("Alice", 30)
|
|
reveal_type(repr(alice)) # revealed: str
|
|
reveal_type(alice == alice) # revealed: bool
|
|
```
|
|
|
|
If `init` is set to `False`, no `__init__` method is generated:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(init=False)
|
|
class C:
|
|
x: int
|
|
|
|
C() # Okay
|
|
|
|
# error: [too-many-positional-arguments]
|
|
C(1)
|
|
|
|
repr(C())
|
|
|
|
C() == C()
|
|
```
|
|
|
|
## Other dataclass parameters
|
|
|
|
### `repr`
|
|
|
|
A custom `__repr__` method is generated by default. It can be disabled by passing `repr=False`, but
|
|
in that case `__repr__` is still available via `object.__repr__`:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(repr=False)
|
|
class WithoutRepr:
|
|
x: int
|
|
|
|
reveal_type(WithoutRepr(1).__repr__) # revealed: bound method WithoutRepr.__repr__() -> str
|
|
```
|
|
|
|
### `eq`
|
|
|
|
The same is true for `__eq__`. Setting `eq=False` disables the generated `__eq__` method, but
|
|
`__eq__` is still available via `object.__eq__`:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(eq=False)
|
|
class WithoutEq:
|
|
x: int
|
|
|
|
reveal_type(WithoutEq(1) == WithoutEq(2)) # revealed: bool
|
|
```
|
|
|
|
### `order`
|
|
|
|
```toml
|
|
[environment]
|
|
python-version = "3.12"
|
|
```
|
|
|
|
`order` is set to `False` by default. If `order=True`, `__lt__`, `__le__`, `__gt__`, and `__ge__`
|
|
methods will be generated:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class WithoutOrder:
|
|
x: int
|
|
|
|
WithoutOrder(1) < WithoutOrder(2) # error: [unsupported-operator]
|
|
WithoutOrder(1) <= WithoutOrder(2) # error: [unsupported-operator]
|
|
WithoutOrder(1) > WithoutOrder(2) # error: [unsupported-operator]
|
|
WithoutOrder(1) >= WithoutOrder(2) # error: [unsupported-operator]
|
|
|
|
@dataclass(order=True)
|
|
class WithOrder:
|
|
x: int
|
|
|
|
WithOrder(1) < WithOrder(2)
|
|
WithOrder(1) <= WithOrder(2)
|
|
WithOrder(1) > WithOrder(2)
|
|
WithOrder(1) >= WithOrder(2)
|
|
```
|
|
|
|
Comparisons are only allowed for `WithOrder` instances:
|
|
|
|
```py
|
|
WithOrder(1) < 2 # error: [unsupported-operator]
|
|
WithOrder(1) <= 2 # error: [unsupported-operator]
|
|
WithOrder(1) > 2 # error: [unsupported-operator]
|
|
WithOrder(1) >= 2 # error: [unsupported-operator]
|
|
```
|
|
|
|
This also works for generic dataclasses:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(order=True)
|
|
class GenericWithOrder[T]:
|
|
x: T
|
|
|
|
GenericWithOrder[int](1) < GenericWithOrder[int](1)
|
|
|
|
GenericWithOrder[int](1) < GenericWithOrder[str]("a") # error: [unsupported-operator]
|
|
```
|
|
|
|
If a class already defines one of the comparison methods, a `TypeError` is raised at runtime.
|
|
Ideally, we would emit a diagnostic in that case:
|
|
|
|
```py
|
|
@dataclass(order=True)
|
|
class AlreadyHasCustomDunderLt:
|
|
x: int
|
|
|
|
# TODO: Ideally, we would emit a diagnostic here
|
|
def __lt__(self, other: object) -> bool:
|
|
return False
|
|
```
|
|
|
|
### `unsafe_hash`
|
|
|
|
To do
|
|
|
|
### `frozen`
|
|
|
|
If true (the default is False), assigning to fields will generate a diagnostic.
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(frozen=True)
|
|
class MyFrozenClass:
|
|
x: int
|
|
|
|
frozen_instance = MyFrozenClass(1)
|
|
frozen_instance.x = 2 # error: [invalid-assignment]
|
|
```
|
|
|
|
If `__setattr__()` or `__delattr__()` is defined in the class, we should emit a diagnostic.
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(frozen=True)
|
|
class MyFrozenClass:
|
|
x: int
|
|
|
|
# TODO: Emit a diagnostic here
|
|
def __setattr__(self, name: str, value: object) -> None: ...
|
|
|
|
# TODO: Emit a diagnostic here
|
|
def __delattr__(self, name: str) -> None: ...
|
|
```
|
|
|
|
This also works for generic dataclasses:
|
|
|
|
```toml
|
|
[environment]
|
|
python-version = "3.12"
|
|
```
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(frozen=True)
|
|
class MyFrozenGeneric[T]:
|
|
x: T
|
|
|
|
frozen_instance = MyFrozenGeneric[int](1)
|
|
frozen_instance.x = 2 # error: [invalid-assignment]
|
|
```
|
|
|
|
When attempting to mutate an unresolved attribute on a frozen dataclass, only `unresolved-attribute`
|
|
is emitted:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(frozen=True)
|
|
class MyFrozenClass: ...
|
|
|
|
frozen = MyFrozenClass()
|
|
frozen.x = 2 # error: [unresolved-attribute]
|
|
```
|
|
|
|
### `match_args`
|
|
|
|
To do
|
|
|
|
### `kw_only`
|
|
|
|
To do
|
|
|
|
### `slots`
|
|
|
|
To do
|
|
|
|
### `weakref_slot`
|
|
|
|
To do
|
|
|
|
## Inheritance
|
|
|
|
### Normal class inheriting from a dataclass
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class Base:
|
|
x: int
|
|
|
|
class Derived(Base): ...
|
|
|
|
d = Derived(1) # OK
|
|
reveal_type(d.x) # revealed: int
|
|
```
|
|
|
|
### Dataclass inheriting from normal class
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
class Base:
|
|
x: int = 1
|
|
|
|
@dataclass
|
|
class Derived(Base):
|
|
y: str
|
|
|
|
d = Derived("a")
|
|
|
|
# error: [too-many-positional-arguments]
|
|
# error: [invalid-argument-type]
|
|
Derived(1, "a")
|
|
```
|
|
|
|
### Dataclass inheriting from another dataclass
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class Base:
|
|
x: int
|
|
y: str
|
|
|
|
@dataclass
|
|
class Derived(Base):
|
|
z: bool
|
|
|
|
d = Derived(1, "a", True) # OK
|
|
|
|
reveal_type(d.x) # revealed: int
|
|
reveal_type(d.y) # revealed: str
|
|
reveal_type(d.z) # revealed: bool
|
|
|
|
# error: [missing-argument]
|
|
Derived(1, "a")
|
|
|
|
# error: [missing-argument]
|
|
Derived(True)
|
|
```
|
|
|
|
### Overwriting attributes from base class
|
|
|
|
The following example comes from the
|
|
[Python documentation](https://docs.python.org/3/library/dataclasses.html#inheritance). The `x`
|
|
attribute appears just once in the `__init__` signature, and the default value is taken from the
|
|
derived class
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
from typing import Any
|
|
|
|
@dataclass
|
|
class Base:
|
|
x: Any = 15.0
|
|
y: int = 0
|
|
|
|
@dataclass
|
|
class C(Base):
|
|
z: int = 10
|
|
x: int = 15
|
|
|
|
reveal_type(C.__init__) # revealed: (self: C, x: int = Literal[15], y: int = Literal[0], z: int = Literal[10]) -> None
|
|
```
|
|
|
|
## Generic dataclasses
|
|
|
|
```toml
|
|
[environment]
|
|
python-version = "3.12"
|
|
```
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class DataWithDescription[T]:
|
|
data: T
|
|
description: str
|
|
|
|
reveal_type(DataWithDescription[int]) # revealed: <class 'DataWithDescription[int]'>
|
|
|
|
d_int = DataWithDescription[int](1, "description") # OK
|
|
reveal_type(d_int.data) # revealed: int
|
|
reveal_type(d_int.description) # revealed: str
|
|
|
|
# error: [invalid-argument-type]
|
|
DataWithDescription[int](None, "description")
|
|
```
|
|
|
|
## Descriptor-typed fields
|
|
|
|
### Same type in `__get__` and `__set__`
|
|
|
|
For the following descriptor, the return type of `__get__` and the type of the `value` parameter in
|
|
`__set__` are the same. The generated `__init__` method takes an argument of this type (instead of
|
|
the type of the descriptor), and the default value is also of this type:
|
|
|
|
```py
|
|
from typing import overload
|
|
from dataclasses import dataclass
|
|
|
|
class UppercaseString:
|
|
_value: str = ""
|
|
|
|
def __get__(self, instance: object, owner: None | type) -> str:
|
|
return self._value
|
|
|
|
def __set__(self, instance: object, value: str) -> None:
|
|
self._value = value.upper()
|
|
|
|
@dataclass
|
|
class C:
|
|
upper: UppercaseString = UppercaseString()
|
|
|
|
reveal_type(C.__init__) # revealed: (self: C, upper: str = str) -> None
|
|
|
|
c = C("abc")
|
|
reveal_type(c.upper) # revealed: str
|
|
|
|
# This is also okay:
|
|
C()
|
|
|
|
# error: [invalid-argument-type]
|
|
C(1)
|
|
|
|
# error: [too-many-positional-arguments]
|
|
C("a", "b")
|
|
```
|
|
|
|
### Different types in `__get__` and `__set__`
|
|
|
|
In general, the type of the `__init__` parameter is determined by the `value` parameter type of the
|
|
`__set__` method (`str` in the example below). However, the default value is generated by calling
|
|
the descriptor's `__get__` method as if it had been called on the class itself, i.e. passing `None`
|
|
for the `instance` argument.
|
|
|
|
```py
|
|
from typing import Literal, overload
|
|
from dataclasses import dataclass
|
|
|
|
class ConvertToLength:
|
|
_len: int = 0
|
|
|
|
@overload
|
|
def __get__(self, instance: None, owner: type) -> Literal[""]: ...
|
|
@overload
|
|
def __get__(self, instance: object, owner: type | None) -> int: ...
|
|
def __get__(self, instance: object | None, owner: type | None) -> str | int:
|
|
if instance is None:
|
|
return ""
|
|
|
|
return self._len
|
|
|
|
def __set__(self, instance, value: str) -> None:
|
|
self._len = len(value)
|
|
|
|
@dataclass
|
|
class C:
|
|
converter: ConvertToLength = ConvertToLength()
|
|
|
|
reveal_type(C.__init__) # revealed: (self: C, converter: str = Literal[""]) -> None
|
|
|
|
c = C("abc")
|
|
reveal_type(c.converter) # revealed: int
|
|
|
|
# This is also okay:
|
|
C()
|
|
|
|
# error: [invalid-argument-type]
|
|
C(1)
|
|
|
|
# error: [too-many-positional-arguments]
|
|
C("a", "b")
|
|
```
|
|
|
|
### With overloaded `__set__` method
|
|
|
|
If the `__set__` method is overloaded, we determine the type for the `__init__` parameter as the
|
|
union of all possible `value` parameter types:
|
|
|
|
```py
|
|
from typing import overload
|
|
from dataclasses import dataclass
|
|
|
|
class AcceptsStrAndInt:
|
|
def __get__(self, instance, owner) -> int:
|
|
return 0
|
|
|
|
@overload
|
|
def __set__(self, instance: object, value: str) -> None: ...
|
|
@overload
|
|
def __set__(self, instance: object, value: int) -> None: ...
|
|
def __set__(self, instance: object, value) -> None:
|
|
pass
|
|
|
|
@dataclass
|
|
class C:
|
|
field: AcceptsStrAndInt = AcceptsStrAndInt()
|
|
|
|
reveal_type(C.__init__) # revealed: (self: C, field: str | int = int) -> None
|
|
```
|
|
|
|
## `dataclasses.field`
|
|
|
|
To do
|
|
|
|
## `dataclass.fields`
|
|
|
|
Dataclasses have a special `__dataclass_fields__` class variable member. The `DataclassInstance`
|
|
protocol checks for the presence of this attribute. It is used in the `dataclasses.fields` and
|
|
`dataclasses.asdict` functions, for example:
|
|
|
|
```py
|
|
from dataclasses import dataclass, fields, asdict
|
|
|
|
@dataclass
|
|
class Foo:
|
|
x: int
|
|
|
|
foo = Foo(1)
|
|
|
|
reveal_type(foo.__dataclass_fields__) # revealed: dict[str, Field[Any]]
|
|
reveal_type(fields(Foo)) # revealed: tuple[Field[Any], ...]
|
|
reveal_type(asdict(foo)) # revealed: dict[str, Any]
|
|
```
|
|
|
|
The class objects themselves also have a `__dataclass_fields__` attribute:
|
|
|
|
```py
|
|
reveal_type(Foo.__dataclass_fields__) # revealed: dict[str, Field[Any]]
|
|
```
|
|
|
|
They can be passed into `fields` as well, because it also accepts `type[DataclassInstance]`
|
|
arguments:
|
|
|
|
```py
|
|
reveal_type(fields(Foo)) # revealed: tuple[Field[Any], ...]
|
|
```
|
|
|
|
But calling `asdict` on the class object is not allowed:
|
|
|
|
```py
|
|
# TODO: this should be a invalid-argument-type error, but we don't properly check the
|
|
# types (and more importantly, the `ClassVar` type qualifier) of protocol members yet.
|
|
asdict(Foo)
|
|
```
|
|
|
|
## `dataclasses.KW_ONLY`
|
|
|
|
<!-- snapshot-diagnostics -->
|
|
|
|
If an attribute is annotated with `dataclasses.KW_ONLY`, it is not added to the synthesized
|
|
`__init__` of the class. Instead, this special marker annotation causes Python at runtime to ensure
|
|
that all annotations following it have keyword-only parameters generated for them in the class's
|
|
synthesized `__init__` method.
|
|
|
|
```toml
|
|
[environment]
|
|
python-version = "3.10"
|
|
```
|
|
|
|
```py
|
|
from dataclasses import dataclass, field, KW_ONLY
|
|
from typing_extensions import reveal_type
|
|
|
|
@dataclass
|
|
class C:
|
|
x: int
|
|
_: KW_ONLY
|
|
y: str
|
|
|
|
reveal_type(C.__init__) # revealed: (self: C, x: int, *, y: str) -> None
|
|
|
|
# error: [missing-argument]
|
|
# error: [too-many-positional-arguments]
|
|
C(3, "")
|
|
|
|
C(3, y="")
|
|
```
|
|
|
|
Using `KW_ONLY` to annotate more than one field in a dataclass causes a `TypeError` to be raised at
|
|
runtime:
|
|
|
|
```py
|
|
@dataclass
|
|
class Fails: # error: [duplicate-kw-only]
|
|
a: int
|
|
b: KW_ONLY
|
|
c: str
|
|
d: KW_ONLY
|
|
e: bytes
|
|
|
|
reveal_type(Fails.__init__) # revealed: (self: Fails, a: int, *, c: str, e: bytes) -> None
|
|
```
|
|
|
|
## Other special cases
|
|
|
|
### `dataclasses.dataclass`
|
|
|
|
We also understand dataclasses if they are decorated with the fully qualified name:
|
|
|
|
```py
|
|
import dataclasses
|
|
|
|
@dataclasses.dataclass
|
|
class C:
|
|
x: str
|
|
|
|
reveal_type(C.__init__) # revealed: (self: C, x: str) -> None
|
|
```
|
|
|
|
### Dataclass with custom `__init__` method
|
|
|
|
If a class already defines `__init__`, it is not replaced by the `dataclass` decorator.
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass(init=True)
|
|
class C:
|
|
x: str
|
|
|
|
def __init__(self, x: int) -> None:
|
|
self.x = str(x)
|
|
|
|
C(1) # OK
|
|
|
|
# error: [invalid-argument-type]
|
|
C("a")
|
|
```
|
|
|
|
Similarly, if we set `init=False`, we still recognize the custom `__init__` method:
|
|
|
|
```py
|
|
@dataclass(init=False)
|
|
class D:
|
|
def __init__(self, x: int) -> None:
|
|
self.x = str(x)
|
|
|
|
D(1) # OK
|
|
D() # error: [missing-argument]
|
|
```
|
|
|
|
### Accessing instance attributes on the class itself
|
|
|
|
Just like for normal classes, accessing instance attributes on the class itself is not allowed:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class C:
|
|
x: int
|
|
|
|
# error: [unresolved-attribute] "Attribute `x` can only be accessed on instances, not on the class object `<class 'C'>` itself."
|
|
C.x
|
|
```
|
|
|
|
### Return type of `dataclass(...)`
|
|
|
|
A call like `dataclass(order=True)` returns a callable itself, which is then used as the decorator.
|
|
We can store the callable in a variable and later use it as a decorator:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
dataclass_with_order = dataclass(order=True)
|
|
|
|
reveal_type(dataclass_with_order) # revealed: <decorator produced by dataclass-like function>
|
|
|
|
@dataclass_with_order
|
|
class C:
|
|
x: int
|
|
|
|
C(1) < C(2) # ok
|
|
```
|
|
|
|
### Using `dataclass` as a function
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
class B:
|
|
x: int
|
|
|
|
# error: [missing-argument]
|
|
dataclass(B)()
|
|
|
|
# error: [invalid-argument-type]
|
|
dataclass(B)("a")
|
|
|
|
reveal_type(dataclass(B)(3).x) # revealed: int
|
|
```
|
|
|
|
## Internals
|
|
|
|
The `dataclass` decorator returns the class itself. This means that the type of `Person` is `type`,
|
|
and attributes like the MRO are unchanged:
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
|
|
@dataclass
|
|
class Person:
|
|
name: str
|
|
age: int | None = None
|
|
|
|
reveal_type(type(Person)) # revealed: <class 'type'>
|
|
reveal_type(Person.__mro__) # revealed: tuple[<class 'Person'>, <class 'object'>]
|
|
```
|
|
|
|
The generated methods have the following signatures:
|
|
|
|
```py
|
|
reveal_type(Person.__init__) # revealed: (self: Person, name: str, age: int | None = None) -> None
|
|
|
|
reveal_type(Person.__repr__) # revealed: def __repr__(self) -> str
|
|
|
|
reveal_type(Person.__eq__) # revealed: def __eq__(self, value: object, /) -> bool
|
|
```
|
|
|
|
## Function-like behavior of synthesized methods
|
|
|
|
Here, we make sure that the synthesized methods of dataclasses behave like proper functions.
|
|
|
|
```toml
|
|
[environment]
|
|
python-version = "3.12"
|
|
```
|
|
|
|
```py
|
|
from dataclasses import dataclass
|
|
from typing import Callable
|
|
from types import FunctionType
|
|
from ty_extensions import CallableTypeOf, TypeOf, static_assert, is_subtype_of, is_assignable_to
|
|
|
|
@dataclass
|
|
class C:
|
|
x: int
|
|
|
|
reveal_type(C.__init__) # revealed: (self: C, x: int) -> None
|
|
reveal_type(type(C.__init__)) # revealed: <class 'FunctionType'>
|
|
|
|
# We can access attributes that are defined on functions:
|
|
reveal_type(type(C.__init__).__code__) # revealed: CodeType
|
|
reveal_type(C.__init__.__code__) # revealed: CodeType
|
|
|
|
def equivalent_signature(self: C, x: int) -> None:
|
|
pass
|
|
|
|
type DunderInitType = TypeOf[C.__init__]
|
|
type EquivalentPureCallableType = Callable[[C, int], None]
|
|
type EquivalentFunctionLikeCallableType = CallableTypeOf[equivalent_signature]
|
|
|
|
static_assert(is_subtype_of(DunderInitType, EquivalentPureCallableType))
|
|
static_assert(is_assignable_to(DunderInitType, EquivalentPureCallableType))
|
|
|
|
static_assert(not is_subtype_of(EquivalentPureCallableType, DunderInitType))
|
|
static_assert(not is_assignable_to(EquivalentPureCallableType, DunderInitType))
|
|
|
|
static_assert(is_subtype_of(DunderInitType, EquivalentFunctionLikeCallableType))
|
|
static_assert(is_assignable_to(DunderInitType, EquivalentFunctionLikeCallableType))
|
|
|
|
static_assert(not is_subtype_of(EquivalentFunctionLikeCallableType, DunderInitType))
|
|
static_assert(not is_assignable_to(EquivalentFunctionLikeCallableType, DunderInitType))
|
|
|
|
static_assert(is_subtype_of(DunderInitType, FunctionType))
|
|
```
|