ruff/crates/red_knot_python_semantic/resources/mdtest/dataclasses.md
Dhruv Manilawala 44ad201262
[red-knot] Add support for overloaded functions (#17366)
## Summary

Part of #15383, this PR adds support for overloaded callables.

Typing spec: https://typing.python.org/en/latest/spec/overload.html

Specifically, it does the following:
1. Update the `FunctionType::signature` method to return signatures from
a possibly overloaded callable using a new `FunctionSignature` enum
2. Update `CallableType` to accommodate overloaded callable by updating
the inner type to `Box<[Signature]>`
3. Update the relation methods on `CallableType` with logic specific to
overloads
4. Update the display of callable type to display a list of signatures
enclosed by parenthesis
5. Update `CallableTypeOf` special form to recognize overloaded callable
6. Update subtyping, assignability and fully static check to account for
callables (equivalence is planned to be done as a follow-up)

For (2), it is required to be done in this PR because otherwise I'd need
to add some workaround for `into_callable_type` and I though it would be
best to include it in here.

For (2), another possible design would be convert `CallableType` in an
enum with two variants `CallableType::Single` and
`CallableType::Overload` but I decided to go with `Box<[Signature]>` for
now to (a) mirror it to be equivalent to `overload` field on
`CallableSignature` and (b) to avoid any refactor in this PR. This could
be done in a follow-up to better split the two kind of callables.

### Design

There were two main candidates on how to represent the overloaded
definition:
1. To include it in the existing infrastructure which is what this PR is
doing by recognizing all the signatures within the
`FunctionType::signature` method
2. To create a new `Overload` type variant

<details><summary>For context, this is what I had in mind with the new
type variant:</summary>
<p>

```rs
pub enum Type {
	FunctionLiteral(FunctionType),
    Overload(OverloadType),
    BoundMethod(BoundMethodType),
    ...
}

pub struct OverloadType {
	// FunctionLiteral or BoundMethod
    overloads: Box<[Type]>,
	// FunctionLiteral or BoundMethod
    implementation: Option<Type>
}

pub struct BoundMethodType {
    kind: BoundMethodKind,
    self_instance: Type,
}

pub enum BoundMethodKind {
    Function(FunctionType),
    Overload(OverloadType),
}
```

</p>
</details> 

The main reasons to choose (1) are the simplicity in the implementation,
reusing the existing infrastructure, avoiding any complications that the
new type variant has specifically around the different variants between
function and methods which would require the overload type to use `Type`
instead.

### Implementation

The core logic is how to collect all the overloaded functions. The way
this is done in this PR is by recording a **use** on the `Identifier`
node that represents the function name in the use-def map. This is then
used to fetch the previous symbol using the same name. This way the
signatures are going to be propagated from top to bottom (from first
overload to the final overload or the implementation) with each function
/ method. For example:

```py
from typing import overload

@overload
def foo(x: int) -> int: ...
@overload
def foo(x: str) -> str: ...
def foo(x: int | str) -> int | str:
	return x
```

Here, each definition of `foo` knows about all the signatures that comes
before itself. So, the first overload would only see itself, the second
would see the first and itself and so on until the implementation or the
final overload.

This approach required some updates specifically recognizing
`Identifier` node to record the function use because it doesn't use
`ExprName`.

## Test Plan

Update existing test cases which were limited by the overload support
and add test cases for the following cases:
* Valid overloads as functions, methods, generics, version specific
* Invalid overloads as stated in
https://typing.python.org/en/latest/spec/overload.html#invalid-overload-definitions
(implementation will be done in a follow-up)
* Various relation: fully static, subtyping, and assignability (others
in a follow-up)

## Ecosystem changes

_WIP_

After going through the ecosystem changes (there are a lot!), here's
what I've found:

We need assignability check between a callable type and a class literal
because a lot of builtins are defined as classes in typeshed whose
constructor method is overloaded e.g., `map`, `sorted`, `list.sort`,
`max`, `min` with the `key` parameter, `collections.abc.defaultdict`,
etc. (https://github.com/astral-sh/ruff/issues/17343). This makes up
most of the ecosystem diff **roughly 70 diagnostics**. For example:

```py
from collections import defaultdict

# red-knot: No overload of bound method `__init__` matches arguments [lint:no-matching-overload]
defaultdict(int)
# red-knot: No overload of bound method `__init__` matches arguments [lint:no-matching-overload]
defaultdict(list)

class Foo:
    def __init__(self, x: int):
        self.x = x

# red-knot: No overload of function `__new__` matches arguments [lint:no-matching-overload]
map(Foo, ["a", "b", "c"])
```

Duplicate diagnostics in unpacking
(https://github.com/astral-sh/ruff/issues/16514) has **~16
diagnostics**.

Support for the `callable` builtin which requires `TypeIs` support. This
is **5 diagnostics**. For example:
```py
from typing import Any

def _(x: Any | None) -> None:
    if callable(x):
        # red-knot: `Any | None`
        # Pyright: `(...) -> object`
        # mypy: `Any`
        # pyrefly: `(...) -> object`
        reveal_type(x)
```

Narrowing on `assert` which has **11 diagnostics**. This is being worked
on in https://github.com/astral-sh/ruff/pull/17345. For example:
```py
import re

match = re.search("", "")
assert match
match.group()  # error: [possibly-unbound-attribute]
```

Others:
* `Self`: 2
* Type aliases: 6
* Generics: 3
* Protocols: 13
* Unpacking in comprehension: 1
(https://github.com/astral-sh/ruff/pull/17396)

## Performance

Refer to
https://github.com/astral-sh/ruff/pull/17366#issuecomment-2814053046.
2025-04-18 09:57:40 +05:30

15 KiB

Dataclasses

Basic

Decorating a class with @dataclass is a convenient way to add special methods such as __init__, __repr__, and __eq__ to a class. The following example shows the basic usage of the @dataclass decorator. By default, only the three mentioned methods are generated.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int | None = None

alice1 = Person("Alice", 30)
alice2 = Person(name="Alice", age=30)
alice3 = Person(age=30, name="Alice")
alice4 = Person("Alice", age=30)

reveal_type(alice1)  # revealed: Person
reveal_type(type(alice1))  # revealed: type[Person]

reveal_type(alice1.name)  # revealed: str
reveal_type(alice1.age)  # revealed: int | None

reveal_type(repr(alice1))  # revealed: str

reveal_type(alice1 == alice2)  # revealed: bool
reveal_type(alice1 == "Alice")  # revealed: bool

bob = Person("Bob")
bob2 = Person("Bob", None)
bob3 = Person(name="Bob")
bob4 = Person(name="Bob", age=None)

The signature of the __init__ method is generated based on the classes attributes. The following calls are not valid:

# error: [missing-argument]
Person()

# error: [too-many-positional-arguments]
Person("Eve", 20, "too many arguments")

# error: [invalid-argument-type]
Person("Eve", "string instead of int")

# error: [invalid-argument-type]
# error: [invalid-argument-type]
Person(20, "Eve")

Signature of __init__

TODO: All of the following tests are missing the self argument in the __init__ signature.

Declarations in the class body are used to generate the signature of the __init__ method. If the attributes are not just declarations, but also bindings, the type inferred from bindings is used as the default value.

from dataclasses import dataclass

@dataclass
class D:
    x: int
    y: str = "default"
    z: int | None = 1 + 2

reveal_type(D.__init__)  # revealed: (x: int, y: str = Literal["default"], z: int | None = Literal[3]) -> None

This also works if the declaration and binding are split:

@dataclass
class D:
    x: int | None
    x = None

reveal_type(D.__init__)  # revealed: (x: int | None = None) -> None

Non-fully static types are handled correctly:

from typing import Any

@dataclass
class C:
    x: Any
    y: int | Any
    z: tuple[int, Any]

reveal_type(C.__init__)  # revealed: (x: Any, y: int | Any, z: tuple[int, Any]) -> None

Variables without annotations are ignored:

@dataclass
class D:
    x: int
    y = 1

reveal_type(D.__init__)  # revealed: (x: int) -> None

If attributes without default values are declared after attributes with default values, a TypeError will be raised at runtime. Ideally, we would emit a diagnostic in that case:

@dataclass
class D:
    x: int = 1
    # TODO: this should be an error: field without default defined after field with default
    y: str

Pure class attributes (ClassVar) are not included in the signature of __init__:

from typing import ClassVar

@dataclass
class D:
    x: int
    y: ClassVar[str] = "default"
    z: bool

reveal_type(D.__init__)  # revealed: (x: int, z: bool) -> None

d = D(1, True)
reveal_type(d.x)  # revealed: int
reveal_type(d.y)  # revealed: str
reveal_type(d.z)  # revealed: bool

Function declarations do not affect the signature of __init__:

@dataclass
class D:
    x: int

    def y(self) -> str:
        return ""

reveal_type(D.__init__)  # revealed: (x: int) -> None

And neither do nested class declarations:

@dataclass
class D:
    x: int

    class Nested:
        y: str

reveal_type(D.__init__)  # revealed: (x: int) -> None

But if there is a variable annotation with a function or class literal type, the signature of __init__ will include this field:

from knot_extensions import TypeOf

class SomeClass: ...

def some_function() -> None: ...
@dataclass
class D:
    function_literal: TypeOf[some_function]
    class_literal: TypeOf[SomeClass]
    class_subtype_of: type[SomeClass]

# revealed: (function_literal: def some_function() -> None, class_literal: Literal[SomeClass], class_subtype_of: type[SomeClass]) -> None
reveal_type(D.__init__)

More realistically, dataclasses can have Callable attributes:

from typing import Callable

@dataclass
class D:
    c: Callable[[int], str]

reveal_type(D.__init__)  # revealed: (c: (int, /) -> str) -> None

Implicit instance attributes do not affect the signature of __init__:

@dataclass
class D:
    x: int

    def f(self, y: str) -> None:
        self.y: str = y

reveal_type(D(1).y)  # revealed: str

reveal_type(D.__init__)  # revealed: (x: int) -> None

Annotating expressions does not lead to an entry in __annotations__ at runtime, and so it wouldn't be included in the signature of __init__. This is a case that we currently don't detect:

@dataclass
class D:
    # (x) is an expression, not a "simple name"
    (x): int = 1

# TODO: should ideally not include a `x` parameter
reveal_type(D.__init__)  # revealed: (x: int = Literal[1]) -> None

@dataclass calls with arguments

The @dataclass decorator can take several arguments to customize the existence of the generated methods. The following test makes sure that we still treat the class as a dataclass if (the default) arguments are passed in:

from dataclasses import dataclass

@dataclass(init=True, repr=True, eq=True)
class Person:
    name: str
    age: int | None = None

alice = Person("Alice", 30)
reveal_type(repr(alice))  # revealed: str
reveal_type(alice == alice)  # revealed: bool

If init is set to False, no __init__ method is generated:

from dataclasses import dataclass

@dataclass(init=False)
class C:
    x: int

C()  # Okay

# error: [too-many-positional-arguments]
C(1)

repr(C())

C() == C()

Other dataclass parameters

repr

A custom __repr__ method is generated by default. It can be disabled by passing repr=False, but in that case __repr__ is still available via object.__repr__:

from dataclasses import dataclass

@dataclass(repr=False)
class WithoutRepr:
    x: int

reveal_type(WithoutRepr(1).__repr__)  # revealed: bound method WithoutRepr.__repr__() -> str

eq

The same is true for __eq__. Setting eq=False disables the generated __eq__ method, but __eq__ is still available via object.__eq__:

from dataclasses import dataclass

@dataclass(eq=False)
class WithoutEq:
    x: int

reveal_type(WithoutEq(1) == WithoutEq(2))  # revealed: bool

order

[environment]
python-version = "3.12"

order is set to False by default. If order=True, __lt__, __le__, __gt__, and __ge__ methods will be generated:

from dataclasses import dataclass

@dataclass
class WithoutOrder:
    x: int

WithoutOrder(1) < WithoutOrder(2)  # error: [unsupported-operator]
WithoutOrder(1) <= WithoutOrder(2)  # error: [unsupported-operator]
WithoutOrder(1) > WithoutOrder(2)  # error: [unsupported-operator]
WithoutOrder(1) >= WithoutOrder(2)  # error: [unsupported-operator]

@dataclass(order=True)
class WithOrder:
    x: int

WithOrder(1) < WithOrder(2)
WithOrder(1) <= WithOrder(2)
WithOrder(1) > WithOrder(2)
WithOrder(1) >= WithOrder(2)

Comparisons are only allowed for WithOrder instances:

WithOrder(1) < 2  # error: [unsupported-operator]
WithOrder(1) <= 2  # error: [unsupported-operator]
WithOrder(1) > 2  # error: [unsupported-operator]
WithOrder(1) >= 2  # error: [unsupported-operator]

This also works for generic dataclasses:

from dataclasses import dataclass

@dataclass(order=True)
class GenericWithOrder[T]:
    x: T

GenericWithOrder[int](1) < GenericWithOrder[int](1)

GenericWithOrder[int](1) < GenericWithOrder[str]("a")  # error: [unsupported-operator]

If a class already defines one of the comparison methods, a TypeError is raised at runtime. Ideally, we would emit a diagnostic in that case:

@dataclass(order=True)
class AlreadyHasCustomDunderLt:
    x: int

    # TODO: Ideally, we would emit a diagnostic here
    def __lt__(self, other: object) -> bool:
        return False

unsafe_hash

To do

frozen

To do

match_args

To do

kw_only

To do

slots

To do

weakref_slot

To do

Inheritance

Normal class inheriting from a dataclass

from dataclasses import dataclass

@dataclass
class Base:
    x: int

class Derived(Base): ...

d = Derived(1)  # OK
reveal_type(d.x)  # revealed: int

Dataclass inheriting from normal class

from dataclasses import dataclass

class Base:
    x: int = 1

@dataclass
class Derived(Base):
    y: str

d = Derived("a")

# error: [too-many-positional-arguments]
# error: [invalid-argument-type]
Derived(1, "a")

Dataclass inheriting from another dataclass

from dataclasses import dataclass

@dataclass
class Base:
    x: int
    y: str

@dataclass
class Derived(Base):
    z: bool

d = Derived(1, "a", True)  # OK

reveal_type(d.x)  # revealed: int
reveal_type(d.y)  # revealed: str
reveal_type(d.z)  # revealed: bool

# error: [missing-argument]
Derived(1, "a")

# error: [missing-argument]
Derived(True)

Overwriting attributes from base class

The following example comes from the Python documentation. The x attribute appears just once in the __init__ signature, and the default value is taken from the derived class

from dataclasses import dataclass
from typing import Any

@dataclass
class Base:
    x: Any = 15.0
    y: int = 0

@dataclass
class C(Base):
    z: int = 10
    x: int = 15

reveal_type(C.__init__)  # revealed: (x: int = Literal[15], y: int = Literal[0], z: int = Literal[10]) -> None

Generic dataclasses

[environment]
python-version = "3.12"
from dataclasses import dataclass

@dataclass
class DataWithDescription[T]:
    data: T
    description: str

reveal_type(DataWithDescription[int])  # revealed: Literal[DataWithDescription[int]]

d_int = DataWithDescription[int](1, "description")  # OK
reveal_type(d_int.data)  # revealed: int
reveal_type(d_int.description)  # revealed: str

# error: [invalid-argument-type]
DataWithDescription[int](None, "description")

Descriptor-typed fields

Same type in __get__ and __set__

For the following descriptor, the return type of __get__ and the type of the value parameter in __set__ are the same. The generated __init__ method takes an argument of this type (instead of the type of the descriptor), and the default value is also of this type:

from typing import overload
from dataclasses import dataclass

class UppercaseString:
    _value: str = ""

    def __get__(self, instance: object, owner: None | type) -> str:
        return self._value

    def __set__(self, instance: object, value: str) -> None:
        self._value = value.upper()

@dataclass
class C:
    upper: UppercaseString = UppercaseString()

reveal_type(C.__init__)  # revealed: (upper: str = str) -> None

c = C("abc")
reveal_type(c.upper)  # revealed: str

# This is also okay:
C()

# error: [invalid-argument-type]
C(1)

# error: [too-many-positional-arguments]
C("a", "b")

Different types in __get__ and __set__

In general, the type of the __init__ parameter is determined by the value parameter type of the __set__ method (str in the example below). However, the default value is generated by calling the descriptor's __get__ method as if it had been called on the class itself, i.e. passing None for the instance argument.

from typing import Literal, overload
from dataclasses import dataclass

class ConvertToLength:
    _len: int = 0

    @overload
    def __get__(self, instance: None, owner: type) -> Literal[""]: ...
    @overload
    def __get__(self, instance: object, owner: type | None) -> int: ...
    def __get__(self, instance: object | None, owner: type | None) -> str | int:
        if instance is None:
            return ""

        return self._len

    def __set__(self, instance, value: str) -> None:
        self._len = len(value)

@dataclass
class C:
    converter: ConvertToLength = ConvertToLength()

reveal_type(C.__init__)  # revealed: (converter: str = Literal[""]) -> None

c = C("abc")
reveal_type(c.converter)  # revealed: int

# This is also okay:
C()

# error: [invalid-argument-type]
C(1)

# error: [too-many-positional-arguments]
C("a", "b")

With overloaded __set__ method

If the __set__ method is overloaded, we determine the type for the __init__ parameter as the union of all possible value parameter types:

from typing import overload
from dataclasses import dataclass

class AcceptsStrAndInt:
    def __get__(self, instance, owner) -> int:
        return 0

    @overload
    def __set__(self, instance: object, value: str) -> None: ...
    @overload
    def __set__(self, instance: object, value: int) -> None: ...
    def __set__(self, instance: object, value) -> None:
        pass

@dataclass
class C:
    field: AcceptsStrAndInt = AcceptsStrAndInt()

reveal_type(C.__init__)  # revealed: (field: str | int = int) -> None

dataclasses.field

To do

Other special cases

dataclasses.dataclass

We also understand dataclasses if they are decorated with the fully qualified name:

import dataclasses

@dataclasses.dataclass
class C:
    x: str

reveal_type(C.__init__)  # revealed: (x: str) -> None

Dataclass with custom __init__ method

If a class already defines __init__, it is not replaced by the dataclass decorator.

from dataclasses import dataclass

@dataclass(init=True)
class C:
    x: str

    def __init__(self, x: int) -> None:
        self.x = str(x)

C(1)  # OK

# error: [invalid-argument-type]
C("a")

Similarly, if we set init=False, we still recognize the custom __init__ method:

@dataclass(init=False)
class D:
    def __init__(self, x: int) -> None:
        self.x = str(x)

D(1)  # OK
D()  # error: [missing-argument]

Accessing instance attributes on the class itself

Just like for normal classes, accessing instance attributes on the class itself is not allowed:

from dataclasses import dataclass

@dataclass
class C:
    x: int

# error: [unresolved-attribute] "Attribute `x` can only be accessed on instances, not on the class object `Literal[C]` itself."
C.x

Return type of dataclass(...)

A call like dataclass(order=True) returns a callable itself, which is then used as the decorator. We can store the callable in a variable and later use it as a decorator:

from dataclasses import dataclass

dataclass_with_order = dataclass(order=True)

reveal_type(dataclass_with_order)  # revealed: <decorator produced by dataclasses.dataclass>

@dataclass_with_order
class C:
    x: int

C(1) < C(2)  # ok

Using dataclass as a function

To do

Internals

The dataclass decorator returns the class itself. This means that the type of Person is type, and attributes like the MRO are unchanged:

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int | None = None

reveal_type(type(Person))  # revealed: Literal[type]
reveal_type(Person.__mro__)  # revealed: tuple[Literal[Person], Literal[object]]

The generated methods have the following signatures:

# TODO: `self` is missing here
reveal_type(Person.__init__)  # revealed: (name: str, age: int | None = None) -> None

reveal_type(Person.__repr__)  # revealed: def __repr__(self) -> str

reveal_type(Person.__eq__)  # revealed: def __eq__(self, value: object, /) -> bool