This doesn't seem to be flaky in the sense of tests failing
non-deterministically, but they are flaky in the sense of unrelated
changes causing testing failures from the clauses of a constraint set
being rendered in different orders. This flakiness is because we're
using Salsa IDs to determine the order in which typevars appear in a
constraint set BDD, and those IDs are assigned non-deterministically.
The fix is ham-fisted but effective: sort the constraints in each
clause, and the clauses in each set, as part of the rendering process.
Constraint sets are only rendered in our test cases, so we don't need to
over-optimize this.
<!--
Thank you for contributing to Ruff/ty! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title? (Please prefix
with `[ty]` for ty pull
requests.)
- Does this pull request include references to any relevant issues?
-->
## Summary
<!-- What's the purpose of the change? What does it do, and why? -->
Since we are trying to import both `AutoImport` and `SourceModuleMoved`,
the previous naming was not as descriptive. Renaming it to `Rename`
better reflects the intention.
## Test Plan
<!-- How was it tested? -->
no functionality change
## Summary
This PR uses the new `Diagnostic` type for rendering formatter
diagnostics. This allows the formatter to inherit all of the output
formats already implemented in the linter and ty. For example, here's
the new `full` output format, with the formatting diff displayed using
the same infrastructure as the linter:
<img width="592" height="364" alt="image"
src="https://github.com/user-attachments/assets/6d09817d-3f27-4960-aa8b-41ba47fb4dc0"
/>
<details><summary>Resolved TODOs</summary>
<p>
~~There are several limitiations/todos here still, especially around the
`OutputFormat` type~~:
- [x] A few literal `todo!`s for the remaining `OutputFormat`s without
matching `DiagnosticFormat`s
- [x] The default output format is `full` instead of something more
concise like the current output
- [x] Some of the output formats (namely JSON) have information that
doesn't make much sense for these diagnostics
The first of these is definitely resolved, and I think the other two are
as well, based on discussion on the design document. In brief, we're
okay inheriting the default `OutputFormat` and can separate the global
option into `lint.output-format` and `format.output-format` in the
future, if needed; and we're okay including redundant information in the
non-human-readable output formats.
My last major concern is with the performance of the new code, as
discussed in the `Benchmarks` section below.
A smaller question is whether we should use `Diagnostic`s for formatting
errors too. I think the answer to this is yes, in line with changes
we're making in the linter too. I still need to implement that here.
</p>
</details>
<details><summary>Benchmarks</summary>
<p>
The values in the table are from a large benchmark on the CPython 3.10
code
base, which involves checking 2011 files, 1872 of which need to be
reformatted.
`stable` corresponds to the same code used on `main`, while
`preview-full` and
`preview-concise` use the new `Diagnostic` code gated behind `--preview`
for the
`full` and `concise` output formats, respectively. `stable-diff` uses
the
`--diff` to compare the two diff rendering approaches. See the full
hyperfine
command below for more details. For a sense of scale, the `stable`
output format
produces 1873 lines on stdout, compared to 855,278 for `preview-full`
and
857,798 for `stable-diff`.
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:------------------|--------------:|---------:|---------:|-------------:|
| `stable` | 201.2 ± 6.8 | 192.9 | 220.6 | 1.00 |
| `preview-full` | 9113.2 ± 31.2 | 9076.1 | 9152.0 | 45.29 ± 1.54 |
| `preview-concise` | 214.2 ± 1.4 | 212.0 | 217.6 | 1.06 ± 0.04 |
| `stable-diff` | 3308.6 ± 20.2 | 3278.6 | 3341.8 | 16.44 ± 0.56 |
In summary, the `preview-concise` diagnostics are ~6% slower than the
stable
output format, increasing the average runtime from 201.2 ms to 214.2 ms.
The
`full` preview diagnostics are much more expensive, taking over 9113.2
ms to
complete, which is ~3x more expensive even than the stable diffs
produced by the
`--diff` flag.
My main takeaways here are:
1. Rendering `Edit`s is much more expensive than rendering the diffs
from `--diff`
2. Constructing `Edit`s actually isn't too bad
### Constructing `Edit`s
I also took a closer look at `Edit` construction by modifying the code
and
repeating the `preview-concise` benchmark and found that the main issue
is
constructing a `SourceFile` for use in the `Edit` rendering. Commenting
out the
`Edit` construction itself has basically no effect:
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:----------|------------:|---------:|---------:|------------:|
| `stable` | 197.5 ± 1.6 | 195.0 | 200.3 | 1.00 |
| `no-edit` | 208.9 ± 2.2 | 204.8 | 212.2 | 1.06 ± 0.01 |
However, also omitting the source text from the `SourceFile`
construction
resolves the slowdown compared to `stable`. So it seems that copying the
full
source text into a `SourceFile` is the main cause of the slowdown for
non-`full`
diagnostics.
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:-----------------|------------:|---------:|---------:|------------:|
| `stable` | 202.4 ± 2.9 | 197.6 | 207.9 | 1.00 |
| `no-source-text` | 202.7 ± 3.3 | 196.3 | 209.1 | 1.00 ± 0.02 |
### Rendering diffs
The main difference between `stable-diff` and `preview-full` seems to be
the diffing strategy we use from `similar`. Both versions use the same
algorithm, but in the existing
[`CodeDiff`](https://github.com/astral-sh/ruff/blob/main/crates/ruff_linter/src/source_kind.rs#L259)
rendering for the `--diff` flag, we only do line-level diffing, whereas
for `Diagnostic`s we use `TextDiff::iter_inline_changes` to highlight
word-level changes too. Skipping the word diff for `Diagnostic`s closes
most of the gap:
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `stable-diff` | 3.323 ± 0.015 | 3.297 | 3.341 | 1.00 |
| `preview-full` | 3.654 ± 0.019 | 3.618 | 3.682 | 1.10 ± 0.01 |
(In some repeated runs, I've seen as small as a ~5% difference, down
from 10% in the table)
This doesn't actually change any of our snapshots, but it would
obviously change the rendered result in a terminal since we wouldn't
highlight the specific words that changed within a line.
Another much smaller change that we can try is removing the deadline
from the `iter_inline_changes` call. It looks like there's a fair amount
of overhead from the default 500 ms deadline for computing these, and
using `iter_inline_changes(op, None)` (`None` for the optional deadline
argument) improves the runtime quite a bit:
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `stable-diff` | 3.322 ± 0.013 | 3.298 | 3.341 | 1.00 |
| `preview-full` | 5.296 ± 0.030 | 5.251 | 5.366 | 1.59 ± 0.01 |
<hr>
<details><summary>hyperfine command</summary>
```shell
cargo build --release --bin ruff && hyperfine --ignore-failure --warmup 10 --export-markdown /tmp/table.md \
-n stable -n preview-full -n preview-concise -n stable-diff \
"./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache" \
"./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache --preview --output-format=full" \
"./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache --preview --output-format=concise" \
"./target/release/ruff format --check ./crates/ruff_linter/resources/test/cpython/ --no-cache --diff"
```
</details>
</p>
</details>
## Test Plan
Some new CLI tests and manual testing
## Summary
Not sure if this was the original intention, but it looks to me like the
previous `Type::literal_promotion_type` was more of an implementation
detail for the actual operation of promoting all literals in a
possibly-nested position of a type.
This is not a pure refactor, as I'm technically changing the behavior
for that protocols diagnostic message suggestion.
## Test Plan
New Markdown test
## Summary
Add two simple tests that we recently discussed with @dcreager. They
demonstrate that the `TypeMapping::MarkTypeVarsInferable` operation
really does need to keep track of the binding context.
## Test Plan
Made sure that those tests fail if we create
`TypeMapping::MarkTypeVarsInferable(None)`s everywhere.
This PR ensures that we always put `./src` before `.` in our list of
first-party search paths. This better emulates the fact that at runtime,
the module name of a file `src/foo.py` would almost certainly be `foo`
rather than `src.foo`.
I wondered if fixing this might fix
https://github.com/astral-sh/ruff/pull/20603#issuecomment-3345317444. It
seems like that's not the case, but it also seems like it leads to
better diagnostics because we report much more intuitive module names to
the user in our error messages -- so, it's probably a good change
anyway.
## Summary
Modify the (external) signature of instance methods such that the first
parameter uses `Self` unless it is explicitly annotated. This allows us
to correctly type-check more code, and allows us to infer correct return
types for many functions that return `Self`. For example:
```py
from pathlib import Path
from datetime import datetime, timedelta
reveal_type(Path(".config") / ".ty") # now Path, previously Unknown
def _(dt: datetime, delta: timedelta):
reveal_type(dt - delta) # now datetime, previously Unknown
```
part of https://github.com/astral-sh/ty/issues/159
## Performance
I ran benchmarks locally on `attrs`, `freqtrade` and `colour`, the
projects with the largest regressions on CodSpeed. I see much smaller
effects locally, but can definitely reproduce the regression on `attrs`.
From looking at the profiling results (on Codspeed), it seems that we
simply do more type inference work, which seems plausible, given that we
now understand much more return types (of many stdlib functions). In
particular, whenever a function uses an implicit `self` and returns
`Self` (without mentioning `Self` anywhere else in its signature), we
will now infer the correct type, whereas we would previously return
`Unknown`. This also means that we need to invoke the generics solver in
more cases. Comparing half a million lines of log output on attrs, I can
see that we do 5% more "work" (number of lines in the log), and have a
lot more `apply_specialization` events (7108 vs 4304). On freqtrade, I
see similar numbers for `apply_specialization` (11360 vs 5138 calls).
Given these results, I'm not sure if it's generally worth doing more
performance work, especially since none of the code modifications
themselves seem to be likely candidates for regressions.
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./ty_main check /home/shark/ecosystem/attrs` | 92.6 ± 3.6 | 85.9 |
102.6 | 1.00 |
| `./ty_self check /home/shark/ecosystem/attrs` | 101.7 ± 3.5 | 96.9 |
113.8 | 1.10 ± 0.06 |
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./ty_main check /home/shark/ecosystem/freqtrade` | 599.0 ± 20.2 |
568.2 | 627.5 | 1.00 |
| `./ty_self check /home/shark/ecosystem/freqtrade` | 607.9 ± 11.5 |
594.9 | 626.4 | 1.01 ± 0.04 |
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./ty_main check /home/shark/ecosystem/colour` | 423.9 ± 17.9 | 394.6
| 447.4 | 1.00 |
| `./ty_self check /home/shark/ecosystem/colour` | 426.9 ± 24.9 | 373.8
| 456.6 | 1.01 ± 0.07 |
## Test Plan
New Markdown tests
## Ecosystem report
* apprise: ~300 new diagnostics related to problematic stubs in apprise
😩
* attrs: a new true positive, since [this
function](4e2c89c823/tests/test_make.py (L2135))
is missing a `@staticmethod`?
* Some legitimate true positives
* sympy: lots of new `invalid-operator` false positives in [matrix
multiplication](cf9f4b6805/sympy/matrices/matrixbase.py (L3267-L3269))
due to our limited understanding of [generic `Callable[[Callable[[T1,
T2], T3]], Callable[[T1, T2], T3]]` "identity"
types](cf9f4b6805/sympy/core/decorators.py (L83-L84))
of decorators. This is not related to type-of-self.
## Typing conformance results
The changes are all correct, except for
```diff
+generics_self_usage.py:50:5: error[invalid-assignment] Object of type `def foo(self) -> int` is not assignable to `(typing.Self, /) -> int`
```
which is related to an assignability problem involving type variables on
both sides:
```py
class CallableAttribute:
def foo(self) -> int:
return 0
bar: Callable[[Self], int] = foo # <- we currently error on this assignment
```
---------
Co-authored-by: Shaygan Hooshyari <sh.hooshyari@gmail.com>
## Summary
Addresses
https://github.com/astral-sh/ruff/pull/20443#discussion_r2381237640 by
factoring out the `match` on the ruff output format in a way that should
be reusable by the formatter.
I didn't think this was going to work at first, but the fact that the
config holds options that apply only to certain output formats works in
our favor here. We can set up a single config for all of the output
formats and then use `try_from` to convert the `OutputFormat` to a
`DiagnosticFormat` later.
## Test Plan
Existing tests, plus a few new ones to make sure relocating the
`SHOW_FIX_SUMMARY` rendering worked, that was untested before. I deleted
a bunch of test code along with the `text` module, but I believe all of
it is now well-covered by the `full` and `concise` tests in `ruff_db`.
I also merged this branch into
https://github.com/astral-sh/ruff/pull/20443 locally and made sure that
the API actually helps. `render_diagnostics` dropped in perfectly and
passed the tests there too.
`TypeMapping` is no longer cow-shaped.
Before, `TypeMapping` defined a `to_owned` method, which would make an
owned copy of the type mapping. This let us apply type mappings to
function literals lazily. The primary part of a function that you have
to apply the type mapping to is its signature. The hypothesis was that
doing this lazily would prevent us from constructing the signature of a
function just to apply a type mapping; if you never ended up needed the
updated function signature, that would be extraneous work.
But looking at the CI for this PR, it looks like that hypothesis is
wrong! And this definitely cleans up the code quite a bit. It also means
that over time we can consider replacing all of these `TypeMapping` enum
variants with separate `TypeTransformer` impls.
---------
Co-authored-by: David Peter <mail@david-peter.de>
<!--
Thank you for contributing to Ruff/ty! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title? (Please prefix
with `[ty]` for ty pull
requests.)
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR addresses #20570 . In the example, the correct usage had a
bug/issue where in the except block after logging exception, None was
getting returned, which made the linters flag out the code. So adding an
empty raise solves the issue.
## Test Plan
Tested it by building the doc locally.
## Summary
Fixes a bug observed by @AlexWaygood where `C[Any] <: C[object]` should
hold for a class that is covariant in its type parameter (and similar
subtyping relations involving dynamic types for other variance
configurations).
## Test Plan
New and updated Markdown tests
While working on #20093, I kept running into test failures due to
constraint sets not simplifying as much as they could, and therefore not
being easily testable against "always true" and "always false".
This PR updates our constraint set representation to use BDDs. Because
BDDs are reduced and ordered, they are canonical — equivalent boolean
formulas are represented by the same interned BDD node.
That said, there is a wrinkle, in that the "variables" that we use in
these BDDs — the individual constraints like `Lower ≤ T ≤ Upper` are not
always independent of each other.
As an example, given types `A ≤ B ≤ C ≤ D` and a typevar `T`, the
constraints `A ≤ T ≤ C` and `B ≤ T ≤ D` "overlap" — their intersection
is non-empty. So we should be able to simplify
```
(A ≤ T ≤ C) ∧ (B ≤ T ≤ D) == (B ≤ T ≤ C)
```
That's not a simplification that the BDD structure can perform itself,
since those three constraints are modeled as separate BDD variables, and
are therefore "opaque" to the BDD algorithms.
That means we need to perform this kind of simplification ourselves. We
look at pairs of constraints that appear in a BDD and see if they can be
simplified relative to each other, and if so, replace the pair with the
simplification. A large part of the toil of getting this PR to work was
identifying all of those patterns and getting that substitution logic
correct.
With this new representation, all existing tests pass, as well as some
new ones that represent test failures that were occuring on #20093.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
<!--
Thank you for contributing to Ruff/ty! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title? (Please prefix
with `[ty]` for ty pull
requests.)
- Does this pull request include references to any relevant issues?
-->
## Summary
Follow up on #20495. The improvement suggested by @AlexWaygood cannot be
applied as-is since the `argument_matches` vector is indexed by argument
number, while the two boolean vectors are indexed by parameter number.
Still coalescing the latter two saves one allocation.
I guess I missed these in #20007, but I found them today while grepping
for something else. `Option::unwrap` has been const since 1.83, so we
can use it here and avoid some unsafe code.
<!--
Thank you for contributing to Ruff/ty! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title? (Please prefix
with `[ty]` for ty pull
requests.)
- Does this pull request include references to any relevant issues?
-->
## Summary
<!-- What's the purpose of the change? What does it do, and why? -->
This PR implements
https://docs.astral.sh/ruff/rules/future-feature-not-defined/ (F407) as
a semantic syntax error.
## Test Plan
<!-- How was it tested? -->
I have written inline tests as directed in #17412
---------
Signed-off-by: 11happy <soni5happy@gmail.com>
Summary
--
Fixes#20536 by linking between the isort options `case-sensitive` and
`order-by-type`. The latter takes precedence over the former, so it
seems good to clarify this somewhere.
I tweaked the wording slightly, but this is otherwise based on the patch
from @SkylerWittman in
https://github.com/astral-sh/ruff/issues/20536#issuecomment-3326097324
(thank you!)
Test Plan
--
N/a
---------
Co-authored-by: Skyler Wittman <skyler.wittman@gmail.com>
Co-authored-by: Micha Reiser <micha@reiser.io>
Summary
--
This fixes a bug pointed out in #20560 where one of the `pylint`
settings wasn't used in its `Display` implementation.
Test Plan
--
Existing tests with updated snapshots
## Summary
Improve the SIM105 rule message to prevent user confusion about how to
properly use `contextlib.suppress`.
The previous message "Replace with `contextlib.suppress(ValueError)`"
was ambiguous and led users to incorrectly use
`contextlib.suppress(ValueError)` as a statement inside except blocks
instead of replacing the entire try-except-pass block with `with
contextlib.suppress(ValueError):`.
This change makes the message more explicit:
- **Before**: `"Use \`contextlib.suppress({exception})\` instead of
\`try\`-\`except\`-\`pass\`"`
- **After**: `"Replace \`try\`-\`except\`-\`pass\` block with \`with
contextlib.suppress({exception})\`"`
The fix title is also updated to be more specific:
- **Before**: `"Replace with \`contextlib.suppress({exception})\`"`
- **After**: `"Replace \`try\`-\`except\`-\`pass\` with \`with
contextlib.suppress({exception})\`"`
Fixes#20462
## Test Plan
- ✅ All existing SIM105 tests pass with updated snapshots
- ✅ Cargo clippy passes without warnings
- ✅ Full test suite passes
- ✅ The new messages clearly indicate that the entire try-except-pass
block should be replaced with a `with` statement, preventing the misuse
described in the issue
---------
Co-authored-by: Giovani Moutinho <e@mgiovani.dev>
## Summary
Closes: https://github.com/astral-sh/ty/issues/551
This PR adds support for step 4 of the overload call evaluation
algorithm which states that:
> If the argument list is compatible with two or more overloads,
determine whether one or more of the overloads has a variadic parameter
(either `*args` or `**kwargs`) that maps to a corresponding argument
that supplies an indeterminate number of positional or keyword
arguments. If so, eliminate overloads that do not have a variadic
parameter.
And, with that, the overload call evaluation algorithm has been
implemented completely end to end as stated in the typing spec.
## Test Plan
Expand the overload call test suite.
## Summary
This removes a hack in the protocol satisfiability check that was
previously needed to work around missing assignability-modeling of
inferable type variables. Assignability of type variables is not
implemented fully, but some recent changes allow us to remove that hack
with limited impact on the ecosystem (and the test suite). The change in
the typing conformance test is favorable.
## Test Plan
* Adapted Markdown tests
* Made sure that this change works in combination with
https://github.com/astral-sh/ruff/pull/20517
## Summary
Closes: https://github.com/astral-sh/ty/issues/1236
This PR fixes a bug where the variadic argument wouldn't match against
the variadic parameter in certain scenarios.
This was happening because I didn't realize that the `all_elements`
iterator wouldn't keep on returning the variable element (which is
correct, I just didn't realize it back then).
I don't think we can use the `resize` method here because we don't know
how many parameters this variadic argument is matching against as this
is where the actual parameter matching occurs.
## Test Plan
Expand test cases to consider a few more combinations of arguments and
parameters which are variadic.
<!--
Thank you for contributing to Ruff/ty! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title? (Please prefix
with `[ty]` for ty pull
requests.)
- Does this pull request include references to any relevant issues?
-->
## Summary
<!-- What's the purpose of the change? What does it do, and why? -->
First contribution so please let me know if I've made a mistake
anywhere. This was aimed to fix#19982, it adds the isolation level to
PYI021 to in the same style as the PIE790 rule.
fixes: #19982
## Test Plan
<!-- How was it tested? -->
I added a case to the PYI021.pyi file where the two rules are present as
there wasn't a case with them both interacting, using the minimal
reproducible example that @ntBre created on the issue (I think I got the
`# ERROR` markings wrong, so please let me know how to fix that if I
did).
---------
Co-authored-by: Brent Westbrook <brentrwestbrook@gmail.com>
<!--
Thank you for contributing to Ruff/ty! To help us out with reviewing,
please consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title? (Please prefix
with `[ty]` for ty pull
requests.)
- Does this pull request include references to any relevant issues?
-->
## Summary
This PR implements
https://docs.astral.sh/ruff/rules/multiple-starred-expressions/ as a
semantic syntax error
## Test Plan
I have added inline tests as directed in #17412
---------
Signed-off-by: 11happy <soni5happy@gmail.com>
Co-authored-by: Brent Westbrook <36778786+ntBre@users.noreply.github.com>
## Summary
Fixes https://github.com/astral-sh/ty/issues/1242
From finding references with the LSP, `FileResolver::path` is only
called once, in `UnifiedFile::path`, so I went through those references,
and it looked safe to make this change in every case. Most of the
references are in the various output formats, where we inherited the
absolute vs relative path decision from Ruff. Two other uses are as
fallbacks if converting a relativized path to a string fails. Finally,
we use the path for sorting and in `UnifiedFile::relative_path`.
## Test Plan
Existing tests, with snapshots updated to show absolute paths (in the
`TestDb` this just added a `/` in front of the file names). I also
updated the GitLab CLI test to set the `CI_PROJECT_DIR` environment
variable and ran a test in GitLab CI:
<img width="613" height="114" alt="image"
src="https://github.com/user-attachments/assets/8ab81dba-54fd-4a24-9110-77ef89293cff"
/>
- Adds test cases exercising file selection by extension with
`--preview` enabled and disabled.
- Adds `INCLUDE_PREVIEW` with file patterns including `*.pyw`.
- In global preview mode, default configuration selects patterns from
`INCLUDE_PREVIEW`.
- Manually tested ruff server with local vscode for both formatting and
linting of a `.pyw` file.
Closes https://github.com/astral-sh/ruff/issues/13246
## Summary
This applies the trick that we use for `builtins.open` to similar
functions that have the same problem. The reason is that the problem
would otherwise become even more pronounced once we add understanding of
the implicit type of `self` parameters, because then something like
`(base_path / "test.bin").open("rb")` also leads to a wrong return type
and can result in false positives.
## Test Plan
New Markdown tests
## Summary
I found this bug while working on #20528.
The minimum reproducible code is:
```python
from __future__ import annotations
from typing import NamedTuple
from ty_extensions import is_disjoint_from, static_assert
class Path(NamedTuple):
prev: Path | None
key: str
static_assert(not is_disjoint_from(Path, Path))
```
A stack overflow occurs when a nominal instance type inherits from
`NamedTuple` and is defined recursively.
This PR fixes this bug.
## Test Plan
mdtest updated