Fixes: https://github.com/astral-sh/ty/issues/92 ## Summary We currently get a `invalid-argument-type` error when using `dataclass.fields` on a dataclass, because we do not synthesize the `__dataclass_fields__` member. This PR fixes this diagnostic. Note that we do not yet model the `Field` type correctly. After that is done, we can assign a more precise `tuple[Field, ...]` type to this new member. ## Test Plan New mdtest. --------- Co-authored-by: David Peter <mail@david-peter.de>
16 KiB
Dataclasses
Basic
Decorating a class with @dataclass is a convenient way to add special methods such as __init__,
__repr__, and __eq__ to a class. The following example shows the basic usage of the @dataclass
decorator. By default, only the three mentioned methods are generated.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
alice1 = Person("Alice", 30)
alice2 = Person(name="Alice", age=30)
alice3 = Person(age=30, name="Alice")
alice4 = Person("Alice", age=30)
reveal_type(alice1) # revealed: Person
reveal_type(type(alice1)) # revealed: type[Person]
reveal_type(alice1.name) # revealed: str
reveal_type(alice1.age) # revealed: int | None
reveal_type(repr(alice1)) # revealed: str
reveal_type(alice1 == alice2) # revealed: bool
reveal_type(alice1 == "Alice") # revealed: bool
bob = Person("Bob")
bob2 = Person("Bob", None)
bob3 = Person(name="Bob")
bob4 = Person(name="Bob", age=None)
The signature of the __init__ method is generated based on the classes attributes. The following
calls are not valid:
# error: [missing-argument]
Person()
# error: [too-many-positional-arguments]
Person("Eve", 20, "too many arguments")
# error: [invalid-argument-type]
Person("Eve", "string instead of int")
# error: [invalid-argument-type]
# error: [invalid-argument-type]
Person(20, "Eve")
Signature of __init__
TODO: All of the following tests are missing the self argument in the __init__ signature.
Declarations in the class body are used to generate the signature of the __init__ method. If the
attributes are not just declarations, but also bindings, the type inferred from bindings is used as
the default value.
from dataclasses import dataclass
@dataclass
class D:
x: int
y: str = "default"
z: int | None = 1 + 2
reveal_type(D.__init__) # revealed: (x: int, y: str = Literal["default"], z: int | None = Literal[3]) -> None
This also works if the declaration and binding are split:
@dataclass
class D:
x: int | None
x = None
reveal_type(D.__init__) # revealed: (x: int | None = None) -> None
Non-fully static types are handled correctly:
from typing import Any
@dataclass
class C:
x: Any
y: int | Any
z: tuple[int, Any]
reveal_type(C.__init__) # revealed: (x: Any, y: int | Any, z: tuple[int, Any]) -> None
Variables without annotations are ignored:
@dataclass
class D:
x: int
y = 1
reveal_type(D.__init__) # revealed: (x: int) -> None
If attributes without default values are declared after attributes with default values, a
TypeError will be raised at runtime. Ideally, we would emit a diagnostic in that case:
@dataclass
class D:
x: int = 1
# TODO: this should be an error: field without default defined after field with default
y: str
Pure class attributes (ClassVar) are not included in the signature of __init__:
from typing import ClassVar
@dataclass
class D:
x: int
y: ClassVar[str] = "default"
z: bool
reveal_type(D.__init__) # revealed: (x: int, z: bool) -> None
d = D(1, True)
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
Function declarations do not affect the signature of __init__:
@dataclass
class D:
x: int
def y(self) -> str:
return ""
reveal_type(D.__init__) # revealed: (x: int) -> None
And neither do nested class declarations:
@dataclass
class D:
x: int
class Nested:
y: str
reveal_type(D.__init__) # revealed: (x: int) -> None
But if there is a variable annotation with a function or class literal type, the signature of
__init__ will include this field:
from ty_extensions import TypeOf
class SomeClass: ...
def some_function() -> None: ...
@dataclass
class D:
function_literal: TypeOf[some_function]
class_literal: TypeOf[SomeClass]
class_subtype_of: type[SomeClass]
# revealed: (function_literal: def some_function() -> None, class_literal: <class 'SomeClass'>, class_subtype_of: type[SomeClass]) -> None
reveal_type(D.__init__)
More realistically, dataclasses can have Callable attributes:
from typing import Callable
@dataclass
class D:
c: Callable[[int], str]
reveal_type(D.__init__) # revealed: (c: (int, /) -> str) -> None
Implicit instance attributes do not affect the signature of __init__:
@dataclass
class D:
x: int
def f(self, y: str) -> None:
self.y: str = y
reveal_type(D(1).y) # revealed: str
reveal_type(D.__init__) # revealed: (x: int) -> None
Annotating expressions does not lead to an entry in __annotations__ at runtime, and so it wouldn't
be included in the signature of __init__. This is a case that we currently don't detect:
@dataclass
class D:
# (x) is an expression, not a "simple name"
(x): int = 1
# TODO: should ideally not include a `x` parameter
reveal_type(D.__init__) # revealed: (x: int = Literal[1]) -> None
@dataclass calls with arguments
The @dataclass decorator can take several arguments to customize the existence of the generated
methods. The following test makes sure that we still treat the class as a dataclass if (the default)
arguments are passed in:
from dataclasses import dataclass
@dataclass(init=True, repr=True, eq=True)
class Person:
name: str
age: int | None = None
alice = Person("Alice", 30)
reveal_type(repr(alice)) # revealed: str
reveal_type(alice == alice) # revealed: bool
If init is set to False, no __init__ method is generated:
from dataclasses import dataclass
@dataclass(init=False)
class C:
x: int
C() # Okay
# error: [too-many-positional-arguments]
C(1)
repr(C())
C() == C()
Other dataclass parameters
repr
A custom __repr__ method is generated by default. It can be disabled by passing repr=False, but
in that case __repr__ is still available via object.__repr__:
from dataclasses import dataclass
@dataclass(repr=False)
class WithoutRepr:
x: int
reveal_type(WithoutRepr(1).__repr__) # revealed: bound method WithoutRepr.__repr__() -> str
eq
The same is true for __eq__. Setting eq=False disables the generated __eq__ method, but
__eq__ is still available via object.__eq__:
from dataclasses import dataclass
@dataclass(eq=False)
class WithoutEq:
x: int
reveal_type(WithoutEq(1) == WithoutEq(2)) # revealed: bool
order
[environment]
python-version = "3.12"
order is set to False by default. If order=True, __lt__, __le__, __gt__, and __ge__
methods will be generated:
from dataclasses import dataclass
@dataclass
class WithoutOrder:
x: int
WithoutOrder(1) < WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) <= WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) > WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) >= WithoutOrder(2) # error: [unsupported-operator]
@dataclass(order=True)
class WithOrder:
x: int
WithOrder(1) < WithOrder(2)
WithOrder(1) <= WithOrder(2)
WithOrder(1) > WithOrder(2)
WithOrder(1) >= WithOrder(2)
Comparisons are only allowed for WithOrder instances:
WithOrder(1) < 2 # error: [unsupported-operator]
WithOrder(1) <= 2 # error: [unsupported-operator]
WithOrder(1) > 2 # error: [unsupported-operator]
WithOrder(1) >= 2 # error: [unsupported-operator]
This also works for generic dataclasses:
from dataclasses import dataclass
@dataclass(order=True)
class GenericWithOrder[T]:
x: T
GenericWithOrder[int](1) < GenericWithOrder[int](1)
GenericWithOrder[int](1) < GenericWithOrder[str]("a") # error: [unsupported-operator]
If a class already defines one of the comparison methods, a TypeError is raised at runtime.
Ideally, we would emit a diagnostic in that case:
@dataclass(order=True)
class AlreadyHasCustomDunderLt:
x: int
# TODO: Ideally, we would emit a diagnostic here
def __lt__(self, other: object) -> bool:
return False
unsafe_hash
To do
frozen
To do
match_args
To do
kw_only
To do
slots
To do
weakref_slot
To do
Inheritance
Normal class inheriting from a dataclass
from dataclasses import dataclass
@dataclass
class Base:
x: int
class Derived(Base): ...
d = Derived(1) # OK
reveal_type(d.x) # revealed: int
Dataclass inheriting from normal class
from dataclasses import dataclass
class Base:
x: int = 1
@dataclass
class Derived(Base):
y: str
d = Derived("a")
# error: [too-many-positional-arguments]
# error: [invalid-argument-type]
Derived(1, "a")
Dataclass inheriting from another dataclass
from dataclasses import dataclass
@dataclass
class Base:
x: int
y: str
@dataclass
class Derived(Base):
z: bool
d = Derived(1, "a", True) # OK
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
# error: [missing-argument]
Derived(1, "a")
# error: [missing-argument]
Derived(True)
Overwriting attributes from base class
The following example comes from the
Python documentation. The x
attribute appears just once in the __init__ signature, and the default value is taken from the
derived class
from dataclasses import dataclass
from typing import Any
@dataclass
class Base:
x: Any = 15.0
y: int = 0
@dataclass
class C(Base):
z: int = 10
x: int = 15
reveal_type(C.__init__) # revealed: (x: int = Literal[15], y: int = Literal[0], z: int = Literal[10]) -> None
Generic dataclasses
[environment]
python-version = "3.12"
from dataclasses import dataclass
@dataclass
class DataWithDescription[T]:
data: T
description: str
reveal_type(DataWithDescription[int]) # revealed: <class 'DataWithDescription[int]'>
d_int = DataWithDescription[int](1, "description") # OK
reveal_type(d_int.data) # revealed: int
reveal_type(d_int.description) # revealed: str
# error: [invalid-argument-type]
DataWithDescription[int](None, "description")
Descriptor-typed fields
Same type in __get__ and __set__
For the following descriptor, the return type of __get__ and the type of the value parameter in
__set__ are the same. The generated __init__ method takes an argument of this type (instead of
the type of the descriptor), and the default value is also of this type:
from typing import overload
from dataclasses import dataclass
class UppercaseString:
_value: str = ""
def __get__(self, instance: object, owner: None | type) -> str:
return self._value
def __set__(self, instance: object, value: str) -> None:
self._value = value.upper()
@dataclass
class C:
upper: UppercaseString = UppercaseString()
reveal_type(C.__init__) # revealed: (upper: str = str) -> None
c = C("abc")
reveal_type(c.upper) # revealed: str
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
Different types in __get__ and __set__
In general, the type of the __init__ parameter is determined by the value parameter type of the
__set__ method (str in the example below). However, the default value is generated by calling
the descriptor's __get__ method as if it had been called on the class itself, i.e. passing None
for the instance argument.
from typing import Literal, overload
from dataclasses import dataclass
class ConvertToLength:
_len: int = 0
@overload
def __get__(self, instance: None, owner: type) -> Literal[""]: ...
@overload
def __get__(self, instance: object, owner: type | None) -> int: ...
def __get__(self, instance: object | None, owner: type | None) -> str | int:
if instance is None:
return ""
return self._len
def __set__(self, instance, value: str) -> None:
self._len = len(value)
@dataclass
class C:
converter: ConvertToLength = ConvertToLength()
reveal_type(C.__init__) # revealed: (converter: str = Literal[""]) -> None
c = C("abc")
reveal_type(c.converter) # revealed: int
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
With overloaded __set__ method
If the __set__ method is overloaded, we determine the type for the __init__ parameter as the
union of all possible value parameter types:
from typing import overload
from dataclasses import dataclass
class AcceptsStrAndInt:
def __get__(self, instance, owner) -> int:
return 0
@overload
def __set__(self, instance: object, value: str) -> None: ...
@overload
def __set__(self, instance: object, value: int) -> None: ...
def __set__(self, instance: object, value) -> None:
pass
@dataclass
class C:
field: AcceptsStrAndInt = AcceptsStrAndInt()
reveal_type(C.__init__) # revealed: (field: str | int = int) -> None
dataclasses.field
To do
dataclass.fields
Dataclasses have __dataclass_fields__ in them, which makes them a subtype of the
DataclassInstance protocol.
Here, we verify that dataclasses can be passed to dataclasses.fields without any errors, and that
the return type of dataclasses.fields is correct.
from dataclasses import dataclass, fields
@dataclass
class Foo:
x: int
reveal_type(Foo.__dataclass_fields__) # revealed: dict[str, Field[Any]]
reveal_type(fields(Foo)) # revealed: tuple[Field[Any], ...]
Other special cases
dataclasses.dataclass
We also understand dataclasses if they are decorated with the fully qualified name:
import dataclasses
@dataclasses.dataclass
class C:
x: str
reveal_type(C.__init__) # revealed: (x: str) -> None
Dataclass with custom __init__ method
If a class already defines __init__, it is not replaced by the dataclass decorator.
from dataclasses import dataclass
@dataclass(init=True)
class C:
x: str
def __init__(self, x: int) -> None:
self.x = str(x)
C(1) # OK
# error: [invalid-argument-type]
C("a")
Similarly, if we set init=False, we still recognize the custom __init__ method:
@dataclass(init=False)
class D:
def __init__(self, x: int) -> None:
self.x = str(x)
D(1) # OK
D() # error: [missing-argument]
Accessing instance attributes on the class itself
Just like for normal classes, accessing instance attributes on the class itself is not allowed:
from dataclasses import dataclass
@dataclass
class C:
x: int
# error: [unresolved-attribute] "Attribute `x` can only be accessed on instances, not on the class object `<class 'C'>` itself."
C.x
Return type of dataclass(...)
A call like dataclass(order=True) returns a callable itself, which is then used as the decorator.
We can store the callable in a variable and later use it as a decorator:
from dataclasses import dataclass
dataclass_with_order = dataclass(order=True)
reveal_type(dataclass_with_order) # revealed: <decorator produced by dataclass-like function>
@dataclass_with_order
class C:
x: int
C(1) < C(2) # ok
Using dataclass as a function
To do
Internals
The dataclass decorator returns the class itself. This means that the type of Person is type,
and attributes like the MRO are unchanged:
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
reveal_type(type(Person)) # revealed: <class 'type'>
reveal_type(Person.__mro__) # revealed: tuple[<class 'Person'>, <class 'object'>]
The generated methods have the following signatures:
# TODO: `self` is missing here
reveal_type(Person.__init__) # revealed: (name: str, age: int | None = None) -> None
reveal_type(Person.__repr__) # revealed: def __repr__(self) -> str
reveal_type(Person.__eq__) # revealed: def __eq__(self, value: object, /) -> bool