
## Summary Part of #15383, this PR adds support for overloaded callables. Typing spec: https://typing.python.org/en/latest/spec/overload.html Specifically, it does the following: 1. Update the `FunctionType::signature` method to return signatures from a possibly overloaded callable using a new `FunctionSignature` enum 2. Update `CallableType` to accommodate overloaded callable by updating the inner type to `Box<[Signature]>` 3. Update the relation methods on `CallableType` with logic specific to overloads 4. Update the display of callable type to display a list of signatures enclosed by parenthesis 5. Update `CallableTypeOf` special form to recognize overloaded callable 6. Update subtyping, assignability and fully static check to account for callables (equivalence is planned to be done as a follow-up) For (2), it is required to be done in this PR because otherwise I'd need to add some workaround for `into_callable_type` and I though it would be best to include it in here. For (2), another possible design would be convert `CallableType` in an enum with two variants `CallableType::Single` and `CallableType::Overload` but I decided to go with `Box<[Signature]>` for now to (a) mirror it to be equivalent to `overload` field on `CallableSignature` and (b) to avoid any refactor in this PR. This could be done in a follow-up to better split the two kind of callables. ### Design There were two main candidates on how to represent the overloaded definition: 1. To include it in the existing infrastructure which is what this PR is doing by recognizing all the signatures within the `FunctionType::signature` method 2. To create a new `Overload` type variant <details><summary>For context, this is what I had in mind with the new type variant:</summary> <p> ```rs pub enum Type { FunctionLiteral(FunctionType), Overload(OverloadType), BoundMethod(BoundMethodType), ... } pub struct OverloadType { // FunctionLiteral or BoundMethod overloads: Box<[Type]>, // FunctionLiteral or BoundMethod implementation: Option<Type> } pub struct BoundMethodType { kind: BoundMethodKind, self_instance: Type, } pub enum BoundMethodKind { Function(FunctionType), Overload(OverloadType), } ``` </p> </details> The main reasons to choose (1) are the simplicity in the implementation, reusing the existing infrastructure, avoiding any complications that the new type variant has specifically around the different variants between function and methods which would require the overload type to use `Type` instead. ### Implementation The core logic is how to collect all the overloaded functions. The way this is done in this PR is by recording a **use** on the `Identifier` node that represents the function name in the use-def map. This is then used to fetch the previous symbol using the same name. This way the signatures are going to be propagated from top to bottom (from first overload to the final overload or the implementation) with each function / method. For example: ```py from typing import overload @overload def foo(x: int) -> int: ... @overload def foo(x: str) -> str: ... def foo(x: int | str) -> int | str: return x ``` Here, each definition of `foo` knows about all the signatures that comes before itself. So, the first overload would only see itself, the second would see the first and itself and so on until the implementation or the final overload. This approach required some updates specifically recognizing `Identifier` node to record the function use because it doesn't use `ExprName`. ## Test Plan Update existing test cases which were limited by the overload support and add test cases for the following cases: * Valid overloads as functions, methods, generics, version specific * Invalid overloads as stated in https://typing.python.org/en/latest/spec/overload.html#invalid-overload-definitions (implementation will be done in a follow-up) * Various relation: fully static, subtyping, and assignability (others in a follow-up) ## Ecosystem changes _WIP_ After going through the ecosystem changes (there are a lot!), here's what I've found: We need assignability check between a callable type and a class literal because a lot of builtins are defined as classes in typeshed whose constructor method is overloaded e.g., `map`, `sorted`, `list.sort`, `max`, `min` with the `key` parameter, `collections.abc.defaultdict`, etc. (https://github.com/astral-sh/ruff/issues/17343). This makes up most of the ecosystem diff **roughly 70 diagnostics**. For example: ```py from collections import defaultdict # red-knot: No overload of bound method `__init__` matches arguments [lint:no-matching-overload] defaultdict(int) # red-knot: No overload of bound method `__init__` matches arguments [lint:no-matching-overload] defaultdict(list) class Foo: def __init__(self, x: int): self.x = x # red-knot: No overload of function `__new__` matches arguments [lint:no-matching-overload] map(Foo, ["a", "b", "c"]) ``` Duplicate diagnostics in unpacking (https://github.com/astral-sh/ruff/issues/16514) has **~16 diagnostics**. Support for the `callable` builtin which requires `TypeIs` support. This is **5 diagnostics**. For example: ```py from typing import Any def _(x: Any | None) -> None: if callable(x): # red-knot: `Any | None` # Pyright: `(...) -> object` # mypy: `Any` # pyrefly: `(...) -> object` reveal_type(x) ``` Narrowing on `assert` which has **11 diagnostics**. This is being worked on in https://github.com/astral-sh/ruff/pull/17345. For example: ```py import re match = re.search("", "") assert match match.group() # error: [possibly-unbound-attribute] ``` Others: * `Self`: 2 * Type aliases: 6 * Generics: 3 * Protocols: 13 * Unpacking in comprehension: 1 (https://github.com/astral-sh/ruff/pull/17396) ## Performance Refer to https://github.com/astral-sh/ruff/pull/17366#issuecomment-2814053046.
15 KiB
Dataclasses
Basic
Decorating a class with @dataclass
is a convenient way to add special methods such as __init__
,
__repr__
, and __eq__
to a class. The following example shows the basic usage of the @dataclass
decorator. By default, only the three mentioned methods are generated.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
alice1 = Person("Alice", 30)
alice2 = Person(name="Alice", age=30)
alice3 = Person(age=30, name="Alice")
alice4 = Person("Alice", age=30)
reveal_type(alice1) # revealed: Person
reveal_type(type(alice1)) # revealed: type[Person]
reveal_type(alice1.name) # revealed: str
reveal_type(alice1.age) # revealed: int | None
reveal_type(repr(alice1)) # revealed: str
reveal_type(alice1 == alice2) # revealed: bool
reveal_type(alice1 == "Alice") # revealed: bool
bob = Person("Bob")
bob2 = Person("Bob", None)
bob3 = Person(name="Bob")
bob4 = Person(name="Bob", age=None)
The signature of the __init__
method is generated based on the classes attributes. The following
calls are not valid:
# error: [missing-argument]
Person()
# error: [too-many-positional-arguments]
Person("Eve", 20, "too many arguments")
# error: [invalid-argument-type]
Person("Eve", "string instead of int")
# error: [invalid-argument-type]
# error: [invalid-argument-type]
Person(20, "Eve")
Signature of __init__
TODO: All of the following tests are missing the self
argument in the __init__
signature.
Declarations in the class body are used to generate the signature of the __init__
method. If the
attributes are not just declarations, but also bindings, the type inferred from bindings is used as
the default value.
from dataclasses import dataclass
@dataclass
class D:
x: int
y: str = "default"
z: int | None = 1 + 2
reveal_type(D.__init__) # revealed: (x: int, y: str = Literal["default"], z: int | None = Literal[3]) -> None
This also works if the declaration and binding are split:
@dataclass
class D:
x: int | None
x = None
reveal_type(D.__init__) # revealed: (x: int | None = None) -> None
Non-fully static types are handled correctly:
from typing import Any
@dataclass
class C:
x: Any
y: int | Any
z: tuple[int, Any]
reveal_type(C.__init__) # revealed: (x: Any, y: int | Any, z: tuple[int, Any]) -> None
Variables without annotations are ignored:
@dataclass
class D:
x: int
y = 1
reveal_type(D.__init__) # revealed: (x: int) -> None
If attributes without default values are declared after attributes with default values, a
TypeError
will be raised at runtime. Ideally, we would emit a diagnostic in that case:
@dataclass
class D:
x: int = 1
# TODO: this should be an error: field without default defined after field with default
y: str
Pure class attributes (ClassVar
) are not included in the signature of __init__
:
from typing import ClassVar
@dataclass
class D:
x: int
y: ClassVar[str] = "default"
z: bool
reveal_type(D.__init__) # revealed: (x: int, z: bool) -> None
d = D(1, True)
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
Function declarations do not affect the signature of __init__
:
@dataclass
class D:
x: int
def y(self) -> str:
return ""
reveal_type(D.__init__) # revealed: (x: int) -> None
And neither do nested class declarations:
@dataclass
class D:
x: int
class Nested:
y: str
reveal_type(D.__init__) # revealed: (x: int) -> None
But if there is a variable annotation with a function or class literal type, the signature of
__init__
will include this field:
from knot_extensions import TypeOf
class SomeClass: ...
def some_function() -> None: ...
@dataclass
class D:
function_literal: TypeOf[some_function]
class_literal: TypeOf[SomeClass]
class_subtype_of: type[SomeClass]
# revealed: (function_literal: def some_function() -> None, class_literal: Literal[SomeClass], class_subtype_of: type[SomeClass]) -> None
reveal_type(D.__init__)
More realistically, dataclasses can have Callable
attributes:
from typing import Callable
@dataclass
class D:
c: Callable[[int], str]
reveal_type(D.__init__) # revealed: (c: (int, /) -> str) -> None
Implicit instance attributes do not affect the signature of __init__
:
@dataclass
class D:
x: int
def f(self, y: str) -> None:
self.y: str = y
reveal_type(D(1).y) # revealed: str
reveal_type(D.__init__) # revealed: (x: int) -> None
Annotating expressions does not lead to an entry in __annotations__
at runtime, and so it wouldn't
be included in the signature of __init__
. This is a case that we currently don't detect:
@dataclass
class D:
# (x) is an expression, not a "simple name"
(x): int = 1
# TODO: should ideally not include a `x` parameter
reveal_type(D.__init__) # revealed: (x: int = Literal[1]) -> None
@dataclass
calls with arguments
The @dataclass
decorator can take several arguments to customize the existence of the generated
methods. The following test makes sure that we still treat the class as a dataclass if (the default)
arguments are passed in:
from dataclasses import dataclass
@dataclass(init=True, repr=True, eq=True)
class Person:
name: str
age: int | None = None
alice = Person("Alice", 30)
reveal_type(repr(alice)) # revealed: str
reveal_type(alice == alice) # revealed: bool
If init
is set to False
, no __init__
method is generated:
from dataclasses import dataclass
@dataclass(init=False)
class C:
x: int
C() # Okay
# error: [too-many-positional-arguments]
C(1)
repr(C())
C() == C()
Other dataclass parameters
repr
A custom __repr__
method is generated by default. It can be disabled by passing repr=False
, but
in that case __repr__
is still available via object.__repr__
:
from dataclasses import dataclass
@dataclass(repr=False)
class WithoutRepr:
x: int
reveal_type(WithoutRepr(1).__repr__) # revealed: bound method WithoutRepr.__repr__() -> str
eq
The same is true for __eq__
. Setting eq=False
disables the generated __eq__
method, but
__eq__
is still available via object.__eq__
:
from dataclasses import dataclass
@dataclass(eq=False)
class WithoutEq:
x: int
reveal_type(WithoutEq(1) == WithoutEq(2)) # revealed: bool
order
[environment]
python-version = "3.12"
order
is set to False
by default. If order=True
, __lt__
, __le__
, __gt__
, and __ge__
methods will be generated:
from dataclasses import dataclass
@dataclass
class WithoutOrder:
x: int
WithoutOrder(1) < WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) <= WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) > WithoutOrder(2) # error: [unsupported-operator]
WithoutOrder(1) >= WithoutOrder(2) # error: [unsupported-operator]
@dataclass(order=True)
class WithOrder:
x: int
WithOrder(1) < WithOrder(2)
WithOrder(1) <= WithOrder(2)
WithOrder(1) > WithOrder(2)
WithOrder(1) >= WithOrder(2)
Comparisons are only allowed for WithOrder
instances:
WithOrder(1) < 2 # error: [unsupported-operator]
WithOrder(1) <= 2 # error: [unsupported-operator]
WithOrder(1) > 2 # error: [unsupported-operator]
WithOrder(1) >= 2 # error: [unsupported-operator]
This also works for generic dataclasses:
from dataclasses import dataclass
@dataclass(order=True)
class GenericWithOrder[T]:
x: T
GenericWithOrder[int](1) < GenericWithOrder[int](1)
GenericWithOrder[int](1) < GenericWithOrder[str]("a") # error: [unsupported-operator]
If a class already defines one of the comparison methods, a TypeError
is raised at runtime.
Ideally, we would emit a diagnostic in that case:
@dataclass(order=True)
class AlreadyHasCustomDunderLt:
x: int
# TODO: Ideally, we would emit a diagnostic here
def __lt__(self, other: object) -> bool:
return False
unsafe_hash
To do
frozen
To do
match_args
To do
kw_only
To do
slots
To do
weakref_slot
To do
Inheritance
Normal class inheriting from a dataclass
from dataclasses import dataclass
@dataclass
class Base:
x: int
class Derived(Base): ...
d = Derived(1) # OK
reveal_type(d.x) # revealed: int
Dataclass inheriting from normal class
from dataclasses import dataclass
class Base:
x: int = 1
@dataclass
class Derived(Base):
y: str
d = Derived("a")
# error: [too-many-positional-arguments]
# error: [invalid-argument-type]
Derived(1, "a")
Dataclass inheriting from another dataclass
from dataclasses import dataclass
@dataclass
class Base:
x: int
y: str
@dataclass
class Derived(Base):
z: bool
d = Derived(1, "a", True) # OK
reveal_type(d.x) # revealed: int
reveal_type(d.y) # revealed: str
reveal_type(d.z) # revealed: bool
# error: [missing-argument]
Derived(1, "a")
# error: [missing-argument]
Derived(True)
Overwriting attributes from base class
The following example comes from the
Python documentation. The x
attribute appears just once in the __init__
signature, and the default value is taken from the
derived class
from dataclasses import dataclass
from typing import Any
@dataclass
class Base:
x: Any = 15.0
y: int = 0
@dataclass
class C(Base):
z: int = 10
x: int = 15
reveal_type(C.__init__) # revealed: (x: int = Literal[15], y: int = Literal[0], z: int = Literal[10]) -> None
Generic dataclasses
[environment]
python-version = "3.12"
from dataclasses import dataclass
@dataclass
class DataWithDescription[T]:
data: T
description: str
reveal_type(DataWithDescription[int]) # revealed: Literal[DataWithDescription[int]]
d_int = DataWithDescription[int](1, "description") # OK
reveal_type(d_int.data) # revealed: int
reveal_type(d_int.description) # revealed: str
# error: [invalid-argument-type]
DataWithDescription[int](None, "description")
Descriptor-typed fields
Same type in __get__
and __set__
For the following descriptor, the return type of __get__
and the type of the value
parameter in
__set__
are the same. The generated __init__
method takes an argument of this type (instead of
the type of the descriptor), and the default value is also of this type:
from typing import overload
from dataclasses import dataclass
class UppercaseString:
_value: str = ""
def __get__(self, instance: object, owner: None | type) -> str:
return self._value
def __set__(self, instance: object, value: str) -> None:
self._value = value.upper()
@dataclass
class C:
upper: UppercaseString = UppercaseString()
reveal_type(C.__init__) # revealed: (upper: str = str) -> None
c = C("abc")
reveal_type(c.upper) # revealed: str
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
Different types in __get__
and __set__
In general, the type of the __init__
parameter is determined by the value
parameter type of the
__set__
method (str
in the example below). However, the default value is generated by calling
the descriptor's __get__
method as if it had been called on the class itself, i.e. passing None
for the instance
argument.
from typing import Literal, overload
from dataclasses import dataclass
class ConvertToLength:
_len: int = 0
@overload
def __get__(self, instance: None, owner: type) -> Literal[""]: ...
@overload
def __get__(self, instance: object, owner: type | None) -> int: ...
def __get__(self, instance: object | None, owner: type | None) -> str | int:
if instance is None:
return ""
return self._len
def __set__(self, instance, value: str) -> None:
self._len = len(value)
@dataclass
class C:
converter: ConvertToLength = ConvertToLength()
reveal_type(C.__init__) # revealed: (converter: str = Literal[""]) -> None
c = C("abc")
reveal_type(c.converter) # revealed: int
# This is also okay:
C()
# error: [invalid-argument-type]
C(1)
# error: [too-many-positional-arguments]
C("a", "b")
With overloaded __set__
method
If the __set__
method is overloaded, we determine the type for the __init__
parameter as the
union of all possible value
parameter types:
from typing import overload
from dataclasses import dataclass
class AcceptsStrAndInt:
def __get__(self, instance, owner) -> int:
return 0
@overload
def __set__(self, instance: object, value: str) -> None: ...
@overload
def __set__(self, instance: object, value: int) -> None: ...
def __set__(self, instance: object, value) -> None:
pass
@dataclass
class C:
field: AcceptsStrAndInt = AcceptsStrAndInt()
reveal_type(C.__init__) # revealed: (field: str | int = int) -> None
dataclasses.field
To do
Other special cases
dataclasses.dataclass
We also understand dataclasses if they are decorated with the fully qualified name:
import dataclasses
@dataclasses.dataclass
class C:
x: str
reveal_type(C.__init__) # revealed: (x: str) -> None
Dataclass with custom __init__
method
If a class already defines __init__
, it is not replaced by the dataclass
decorator.
from dataclasses import dataclass
@dataclass(init=True)
class C:
x: str
def __init__(self, x: int) -> None:
self.x = str(x)
C(1) # OK
# error: [invalid-argument-type]
C("a")
Similarly, if we set init=False
, we still recognize the custom __init__
method:
@dataclass(init=False)
class D:
def __init__(self, x: int) -> None:
self.x = str(x)
D(1) # OK
D() # error: [missing-argument]
Accessing instance attributes on the class itself
Just like for normal classes, accessing instance attributes on the class itself is not allowed:
from dataclasses import dataclass
@dataclass
class C:
x: int
# error: [unresolved-attribute] "Attribute `x` can only be accessed on instances, not on the class object `Literal[C]` itself."
C.x
Return type of dataclass(...)
A call like dataclass(order=True)
returns a callable itself, which is then used as the decorator.
We can store the callable in a variable and later use it as a decorator:
from dataclasses import dataclass
dataclass_with_order = dataclass(order=True)
reveal_type(dataclass_with_order) # revealed: <decorator produced by dataclasses.dataclass>
@dataclass_with_order
class C:
x: int
C(1) < C(2) # ok
Using dataclass
as a function
To do
Internals
The dataclass
decorator returns the class itself. This means that the type of Person
is type
,
and attributes like the MRO are unchanged:
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int | None = None
reveal_type(type(Person)) # revealed: Literal[type]
reveal_type(Person.__mro__) # revealed: tuple[Literal[Person], Literal[object]]
The generated methods have the following signatures:
# TODO: `self` is missing here
reveal_type(Person.__init__) # revealed: (name: str, age: int | None = None) -> None
reveal_type(Person.__repr__) # revealed: def __repr__(self) -> str
reveal_type(Person.__eq__) # revealed: def __eq__(self, value: object, /) -> bool