Previously, we used a very fine-grained representation for individual
constraints: each constraint was _either_ a range constraint, a
not-equivalent constraint, or an incomparable constraint. These three
pieces are enough to represent all of the "real" constraints we need to
create — range constraints and their negation.
However, it meant that we weren't picking up as many chances to simplify
constraint sets as we could. Our simplification logic depends on being
able to look at _pairs_ of constraints or clauses to see if they
simplify relative to each other. With our fine-grained representation,
we could easily encounter situations that we should have been able to
simplify, but that would require looking at three or more individual
constraints.
For instance, negating a range constraint would produce:
```
¬(Base ≤ T ≤ Super) = ((T ≤ Base) ∧ (T ≠ Base)) ∨ (T ≁ Base) ∨
((Super ≤ T) ∧ (T ≠ Super)) ∨ (T ≁ Super)
```
That is, `T` must be (strictly) less than `Base`, (strictly) greater
than `Super`, or incomparable to either.
If we tried to union those back together, we should get `always`, since
`x ∨ ¬x` should always be true, no matter what `x` is. But instead we
would get:
```
(Base ≤ T ≤ Super) ∨ ((T ≤ Base) ∧ (T ≠ Base)) ∨ (T ≁ Base) ∨ ((Super ≤ T) ∧ (T ≠
Super)) ∨ (T ≁ Super)
```
Nothing would simplify relative to each other, because we'd have to look
at all five union elements to see that together they do in fact combine
to `always`.
The fine-grained representation was nice, because it made it easier to
[work out the math](https://dcreager.net/theory/constraints/) for
intersections and unions of each kind of constraint. But being able to
simplify is more important, since the example above comes up immediately
in #20093 when trying to handle constrained typevars.
The fix in this PR is to go back to a more coarse-grained
representation, where each individual constraint consists of a positive
range (which might be `always` / `Never ≤ T ≤ object`), and zero or more
negative ranges. The intuition is to think of a constraint as a region
of the type space (representable as a range) with zero or more "holes"
removed from it.
With this representation, negating a range constraint produces:
```
¬(Base ≤ T ≤ Super) = (always ∧ ¬(Base ≤ T ≤ Super))
```
(That looks trivial, because it is! We just move the positive range to
the negative side.)
The math is not that much harder than before, because there are only
three combinations to consider (each for intersection and union) —
though the fact that there can be multiple holes in a constraint does
require some nested loops. But the mdtest suite gives me confidence that
this is not introducing any new issues, and it definitely removes a
troublesome TODO.
(As an aside, this change also means that we are back to having each
clause contain no more than one individual constraint for any typevar.
This turned out to be important, because part of our simplification
logic was also depending on that!)
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
This PR adds a new `ty_extensions.ConstraintSet` class, which is used to
expose constraint sets to our mdtest framework. This lets us write a
large collection of unit tests that exercise the invariants and rewrite
rules of our constraint set implementation.
As part of this, `is_assignable_to` and friends are updated to return a
`ConstraintSet` instead of a `bool`, and we implement
`ConstraintSet.__bool__` to return when a constraint set is always
satisfied. That lets us still use
`static_assert(is_assignable_to(...))`, since the assertion will coerce
the constraint set to a bool, and also lets us
`reveal_type(is_assignable_to(...))` to see more detail about
whether/when the two types are assignable. That lets us get rid of
`reveal_when_assignable_to` and friends, since they are now redundant
with the expanded capabilities of `is_assignable_to`.
This PR adds two new `ty_extensions` functions,
`reveal_when_assignable_to` and `reveal_when_subtype_of`. These are
closely related to the existing `is_assignable_to` and `is_subtype_of`,
but instead of returning when the property (always) holds, it produces a
diagnostic that describes _when_ the property holds. (This will let us
construct mdtests that print out constraints that are not always true or
always false — though we don't currently have any instances of those.)
I did not replace _every_ occurrence of the `is_property` variants in
the mdtest suite, instead focusing on the generics-related tests where
it will be important to see the full detail of the constraint sets.
As part of this, I also updated the mdtest harness to accept the shorter
`# revealed:` assertion format for more than just `reveal_type`, and
updated the existing uses of `reveal_protocol_interface` to take
advantage of this.
Part of astral-sh/ty#994
## Summary
Add new special forms to `ty_extensions`, `Top[T]` and `Bottom[T]`.
Remove `ty_extensions.top_materialization` and
`ty_extensions.bottom_materialization`.
## Test Plan
Converted the existing `materialization.md` mdtest to the new syntax.
Added some tests for invalid use of the new special form.
## Summary
We use the `System` abstraction in ty to abstract away the host/system
on which ty runs.
This has a few benefits:
* Tests can run in full isolation using a memory system (that uses an
in-memory file system)
* The LSP has a custom implementation where `read_to_string` returns the
content as seen by the editor (e.g. unsaved changes) instead of always
returning the content as it is stored on disk
* We don't require any file system polyfills for wasm in the browser
However, it does require extra care that we don't accidentally use
`std::fs` or `std::env` (etc.) methods in ty's code base (which is very
easy).
This PR sets up Clippy and disallows the most common methods, instead
pointing users towards the corresponding `System` methods.
The setup is a bit awkward because clippy doesn't support inheriting
configurations. That means, a crate can only override the entire
workspace configuration or not at all.
The approach taken in this PR is:
* Configure the disallowed methods at the workspace level
* Allow `disallowed_methods` at the workspace level
* Enable the lint at the crate level using the warn attribute (in code)
The obvious downside is that it won't work if we ever want to disallow
other methods, but we can figure that out once we reach that point.
What about false positives: Just add an `allow` and move on with your
life :) This isn't something that we have to enforce strictly; the goal
is to catch accidental misuse.
## Test Plan
Clippy found a place where we incorrectly used `std::fs::read_to_string`
## Summary
Adds a way to list all members of an `Enum` and implements almost all of
the mechanisms by which members are distinguished from non-members
([spec](https://typing.python.org/en/latest/spec/enums.html#defining-members)).
This has no effect on actual enums, so far.
## Test Plan
New Markdown tests using `ty_extensions.enum_members`.
## Summary
Extracts the vendored typeshed stubs lazily and caches them on the local
filesystem to support go-to in the LSP.
Resolves https://github.com/astral-sh/ty/issues/77.
## Summary
Having a recursive type method to check whether a type is fully static
is inefficient, unnecessary, and makes us overly strict about subtyping
relations.
It's inefficient because we end up re-walking the same types many times
to check for fully-static-ness.
It's unnecessary because we can check relations involving the dynamic
type appropriately, depending whether the relation is subtyping or
assignability.
We use the subtyping relation to simplify unions and intersections. We
can usefully consider that `S <: T` for gradual types also, as long as
it remains true that `S | T` is equivalent to `T` and `S & T` is
equivalent to `S`.
One conservative definition (implemented here) that satisfies this
requirement is that we consider `S <: T` if, for every possible pair of
materializations `S'` and `T'`, `S' <: T'`. Or put differently the top
materialization of `S` (`S+` -- the union of all possible
materializations of `S`) is a subtype of the bottom materialization of
`T` (`T-` -- the intersection of all possible materializations of `T`).
In the most basic cases we can usefully say that `Any <: object` and
that `Never <: Any`, and we can handle more complex cases inductively
from there.
This definition of subtyping for gradual subtypes is not reflexive
(`Any` is not a subtype of `Any`).
As a corollary, we also remove `is_gradual_equivalent_to` --
`is_equivalent_to` now has the meaning that `is_gradual_equivalent_to`
used to have. If necessary, we could restore an
`is_fully_static_equivalent_to` or similar (which would not do an
`is_fully_static` pre-check of the types, but would instead pass a
relation-kind enum down through a recursive equivalence check, similar
to `has_relation_to`), but so far this doesn't appear to be necessary.
Credit to @JelleZijlstra for the observation that `is_fully_static` is
unnecessary and overly restrictive on subtyping.
There is another possible definition of gradual subtyping: instead of
requiring that `S+ <: T-`, we could instead require that `S+ <: T+` and
`S- <: T-`. In other words, instead of requiring all materializations of
`S` to be a subtype of every materialization of `T`, we just require
that every materialization of `S` be a subtype of _some_ materialization
of `T`, and that every materialization of `T` be a supertype of some
materialization of `S`. This definition also preserves the core
invariant that `S <: T` implies that `S | T = T` and `S & T = S`, and it
restores reflexivity: under this definition, `Any` is a subtype of
`Any`, and for any equivalent types `S` and `T`, `S <: T` and `T <: S`.
But unfortunately, this definition breaks transitivity of subtyping,
because nominal subclasses in Python use assignability ("consistent
subtyping") to define acceptable overrides. This means that we may have
a class `A` with `def method(self) -> Any` and a subtype `B(A)` with
`def method(self) -> int`, since `int` is assignable to `Any`. This
means that if we have a protocol `P` with `def method(self) -> Any`, we
would have `B <: A` (from nominal subtyping) and `A <: P` (`Any` is a
subtype of `Any`), but not `B <: P` (`int` is not a subtype of `Any`).
Breaking transitivity of subtyping is not tenable, so we don't use this
definition of subtyping.
## Test Plan
Existing tests (modified in some cases to account for updated
semantics.)
Stable property tests pass at a million iterations:
`QUICKCHECK_TESTS=1000000 cargo test -p ty_python_semantic -- --ignored
types::property_tests::stable`
### Changes to property test type generation
Since we no longer have a method of categorizing built types as
fully-static or not-fully-static, I had to add a previously-discussed
feature to the property tests so that some tests can build types that
are known by construction to be fully static, because there are still
properties that only apply to fully-static types (for example,
reflexiveness of subtyping.)
## Changes to handling of `*args, **kwargs` signatures
This PR "discovered" that, once we allow non-fully-static types to
participate in subtyping under the above definitions, `(*args: Any,
**kwargs: Any) -> Any` is now a subtype of `() -> object`. This is true,
if we take a literal interpretation of the former signature: all
materializations of the parameters `*args: Any, **kwargs: Any` can
accept zero arguments, making the former signature a subtype of the
latter. But the spec actually says that `*args: Any, **kwargs: Any`
should be interpreted as equivalent to `...`, and that makes a
difference here: `(...) -> Any` is not a subtype of `() -> object`,
because (unlike a literal reading of `(*args: Any, **kwargs: Any)`),
`...` can materialize to _any_ signature, including a signature with
required positional arguments.
This matters for this PR because it makes the "any two types are both
assignable to their union" property test fail if we don't implement the
equivalence to `...`. Because `FunctionType.__call__` has the signature
`(*args: Any, **kwargs: Any) -> Any`, and if we take that at face value
it's a subtype of `() -> object`, making `FunctionType` a subtype of `()
-> object)` -- but then a function with a required argument is also a
subtype of `FunctionType`, but not a subtype of `() -> object`. So I
went ahead and implemented the equivalence to `...` in this PR.
## Ecosystem analysis
* Most of the ecosystem report are cases of improved union/intersection
simplification. For example, we can now simplify a union like `bool |
(bool & Unknown) | Unknown` to simply `bool | Unknown`, because we can
now observe that every possible materialization of `bool & Unknown` is
still a subtype of `bool` (whereas before we would set aside `bool &
Unknown` as a not-fully-static type.) This is clearly an improvement.
* The `possibly-unresolved-reference` errors in sockeye, pymongo,
ignite, scrapy and others are true positives for conditional imports
that were formerly silenced by bogus conflicting-declarations (which we
currently don't issue a diagnostic for), because we considered two
different declarations of `Unknown` to be conflicting (we used
`is_equivalent_to` not `is_gradual_equivalent_to`). In this PR that
distinction disappears and all equivalence is gradual, so a declaration
of `Unknown` no longer conflicts with a declaration of `Unknown`, which
then results in us surfacing the possibly-unbound error.
* We will now issue "redundant cast" for casting from a typevar with a
gradual bound to the same typevar (the hydra-zen diagnostic). This seems
like an improvement.
* The new diagnostics in bandersnatch are interesting. For some reason
primer in CI seems to be checking bandersnatch on Python 3.10 (not yet
sure why; this doesn't happen when I run it locally). But bandersnatch
uses `enum.StrEnum`, which doesn't exist on 3.10. That makes the `class
SimpleDigest(StrEnum)` a class that inherits from `Unknown` (and
bypasses our current TODO handling for accessing attributes on enum
classes, since we don't recognize it as an enum class at all). This PR
improves our understanding of assignability to classes that inherit from
`Any` / `Unknown`, and we now recognize that a string literal is not
assignable to a class inheriting `Any` or `Unknown`.
## Summary
This is to support https://github.com/astral-sh/ruff/pull/18607.
This PR adds support for generating the top materialization (or upper
bound materialization) and the bottom materialization (or lower bound
materialization) of a type. This is the most general and the most
specific form of the type which is fully static, respectively.
More concretely, `T'`, the top materialization of `T`, is the type `T`
with all occurrences
of dynamic type (`Any`, `Unknown`, `@Todo`) replaced as follows:
- In covariant position, it's replaced with `object`
- In contravariant position, it's replaced with `Never`
- In invariant position, it's replaced with an unresolved type variable
(For an invariant position, it should actually be replaced with an
existential type, but this is not currently representable in our type
system, so we use an unresolved type variable for now instead.)
The bottom materialization is implemented in the same way, except we
start out in "contravariant" position.
## Test Plan
Add test cases for various types.
This PR adds initial support for listing all attributes of
an object. It is exposed through a new `all_members`
routine in `ty_extensions`, which is in turn used to test
the functionality.
The purpose of listing all members is for code
completion. That is, given a `object.<CURSOR>`, we
would like to list all available attributes on
`object`.
## Summary
This is something I wrote a few months ago, and continued to update from
time to time. It was mostly written for my own education. I found a few
bugs while writing it at the time (there are still one or two TODOs in
the test assertions that are probably bugs). Our other tests are fairly
comprehensive, but they are usually structured around a certain
functionality or operation (subtyping, assignability, narrowing). The
idea here was to focus on individual *types and their properties*.
closes#197 (added `JustFloat` and `JustComplex` to `ty_extensions`).
## Summary
This PR adds support for the `__all__` module variable.
Reference spec:
https://typing.python.org/en/latest/spec/distributing.html#library-interface-public-and-private-symbols
This PR adds a new `dunder_all_names` query that returns a set of
`Name`s defined in the `__all__` variable of the given `File`. The query
works by implementing the `StatementVisitor` and collects all the names
by recognizing the supported idioms as mentioned in the spec. Any idiom
that's not recognized are ignored.
The current implementation is minimum to what's required for us to
remove all the false positives that this is causing. Refer to the
"Follow-ups" section below to see what we can do next. I'll a open
separate issue to keep track of them.
Closes: astral-sh/ty#106Closes: astral-sh/ty#199
### Follow-ups
* Diagnostics:
* Add warning diagnostics for unrecognized `__all__` idioms, `__all__`
containing non-string element
* Add an error diagnostic for elements that are present in `__all__` but
not defined in the module. This could lead to runtime error
* Maybe we should return `<type>` instead of `Unknown | <type>` for
`module.__all__`. For example:
https://playknot.ruff.rs/2a6fe5d7-4e16-45b1-8ec3-d79f2d4ca894
* Mark a symbol that's mentioned in `__all__` as used otherwise it could
raise (possibly in the future) "unused-name" diagnostic
Supporting diagnostics will require that we update the return type of
the query to be something other than `Option<FxHashSet<Name>>`,
something that behaves like a result and provides a way to check whether
a name exists in `__all__`, loop over elements in `__all__`, loop over
the invalid elements, etc.
## Ecosystem analysis
The following are the maximum amount of diagnostics **removed** in the
ecosystem:
* "Type <module '...'> has no attribute ..."
* `collections.abc` - 14
* `numpy` - 35534
* `numpy.ma` - 296
* `numpy.char` - 37
* `numpy.testing` - 175
* `hashlib` - 311
* `scipy.fft` - 2
* `scipy.stats` - 38
* "Module '...' has no member ..."
* `collections.abc` - 85
* `numpy` - 508
* `numpy.testing` - 741
* `hashlib` - 36
* `scipy.stats` - 68
* `scipy.interpolate` - 7
* `scipy.signal` - 5
The following modules have dynamic `__all__` definition, so `ty` assumes
that `__all__` doesn't exists in that module:
* `scipy.stats`
(95a5d6ea8b/scipy/stats/__init__.py (L665))
* `scipy.interpolate`
(95a5d6ea8b/scipy/interpolate/__init__.py (L221))
* `scipy.signal` (indirectly via
95a5d6ea8b/scipy/signal/_signal_api.py (L30))
* `numpy.testing`
(de784cd6ee/numpy/testing/__init__.py (L16-L18))
~There's this one category of **false positives** that have been added:~
Fixed the false positives by also ignoring `__all__` from a module that
uses unrecognized idioms.
<details><summary>Details about the false postivie:</summary>
<p>
The `scipy.stats` module has dynamic `__all__` and it imports a bunch of
symbols via star imports. Some of those modules have a mix of valid and
invalid `__all__` idioms. For example, in
95a5d6ea8b/scipy/stats/distributions.py (L18-L24),
2 out of 4 `__all__` idioms are invalid but currently `ty` recognizes
two of them and says that the module has a `__all__` with 5 values. This
leads to around **2055** newly added false positives of the form:
```
Type <module 'scipy.stats'> has no attribute ...
```
I think the fix here is to completely ignore `__all__`, not only if
there are invalid elements in it, but also if there are unrecognized
idioms used in the module.
</p>
</details>
## Test Plan
Add a bunch of test cases using the new `ty_extensions.dunder_all_names`
function to extract a module's `__all__` names.
Update various test cases to remove false positives around `*` imports
and re-export convention.
Add new test cases for named import behavior as `*` imports covers all
of it already (thanks Alex!).