Commit graph

86 commits

Author SHA1 Message Date
Dhruv Manilawala
68ada05b00
[red-knot] Infer value expr for empty list / tuple target (#15121)
## Summary

This PR resolves
https://github.com/astral-sh/ruff/pull/15058#discussion_r1893868406 by
inferring the value expression even if there are no targets in the list
/ tuple expression.

## Test Plan

Remove TODO from corpus tests, making sure it doesn't panic.
2024-12-23 16:00:35 +05:30
Micha Reiser
2f85749fa0
type: ignore[codes] and knot: ignore (#15078) 2024-12-23 10:52:43 +01:00
Dhruv Manilawala
113c804a62
[red-knot] Add support for unpacking for target (#15058)
## Summary

Related to #13773 

This PR adds support for unpacking `for` statement targets.

This involves updating the `value` field in the `Unpack` target to use
an enum which specifies the "where did the value expression came from?".
This is because for an iterable expression, we need to unpack the
iterator type while for assignment statement we need to unpack the value
type itself. And, this needs to be done in the unpack query.

### Question

One of the ways unpacking works in `for` statement is by looking at the
union of the types because if the iterable expression is a tuple then
the iterator type will be union of all the types in the tuple. This
means that the test cases that will test the unpacking in `for`
statement will also implicitly test the unpacking union logic. I was
wondering if it makes sense to merge these cases and only add the ones
that are specific to the union unpacking or for statement unpacking
logic.

## Test Plan

Add test cases involving iterating over a tuple type. I've intentionally
left out certain cases for now and I'm curious to know any thoughts on
the above query.
2024-12-23 06:13:49 +00:00
David Peter
000948ad3b
[red-knot] Statically known branches (#15019)
## Summary

This changeset adds support for precise type-inference and
boundness-handling of definitions inside control-flow branches with
statically-known conditions, i.e. test-expressions whose truthiness we
can unambiguously infer as *always false* or *always true*.

This branch also includes:
- `sys.platform` support
- statically-known branches handling for Boolean expressions and while
  loops
- new `target-version` requirements in some Markdown tests which were
  now required due to the understanding of `sys.version_info` branches.

closes #12700 
closes #15034 

## Performance

### `tomllib`, -7%, needs to resolve one additional module (sys)

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./red_knot_main --project /home/shark/tomllib` | 22.2 ± 1.3 | 19.1 |
25.6 | 1.00 |
| `./red_knot_feature --project /home/shark/tomllib` | 23.8 ± 1.6 | 20.8
| 28.6 | 1.07 ± 0.09 |

### `black`, -6%

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./red_knot_main --project /home/shark/black` | 129.3 ± 5.1 | 119.0 |
137.8 | 1.00 |
| `./red_knot_feature --project /home/shark/black` | 136.5 ± 6.8 | 123.8
| 147.5 | 1.06 ± 0.07 |

## Test Plan

- New Markdown tests for the main feature in
  `statically-known-branches.md`
- New Markdown tests for `sys.platform`
- Adapted tests for `EllipsisType`, `Never`, etc
2024-12-21 11:33:10 +01:00
Micha Reiser
c3b6139f39
Upgrade salsa (#15039)
The only code change is that Salsa now requires the `Db` to implement
`Clone` to create "lightweight" snapshots.
2024-12-17 15:50:33 +00:00
Micha Reiser
f52b1f4a4d
Add tracing support to mdtest (#14935)
## Summary

This PR extends the mdtest configuration with a `log` setting that can
be any of:

* `true`: Enables tracing
* `false`: Disables tracing (default)
* String: An ENV_FILTER similar to `RED_KNOT_LOG`

```toml
log = true
```

Closes https://github.com/astral-sh/ruff/issues/13865

## Test Plan

I changed a test and tried `log=true`, `log=false`, and `log=INFO`
2024-12-13 09:10:01 +00:00
Micha Reiser
c1837e4189
Rename custom-typeshed-dir, target-version and current-directory CLI options (#14930)
## Summary

This PR renames the `--custom-typeshed-dir`, `target-version`, and
`--current-directory` cli options to `--typeshed`,
`--python-version`, and `--project` as discussed in the CLI proposal
document.
I added aliases for `--target-version` (for Ruff compat) and
`--custom-typeshed-dir` (for Alex)

## Test Plan

Long help

```
An extremely fast Python type checker.

Usage: red_knot [OPTIONS] [COMMAND]

Commands:
  server  Start the language server
  help    Print this message or the help of the given subcommand(s)

Options:
      --project <PROJECT>
          Run the command within the given project directory.
          
          All `pyproject.toml` files will be discovered by walking up the directory tree from the project root, as will the project's virtual environment (`.venv`).
          
          Other command-line arguments (such as relative paths) will be resolved relative to the current working directory."#,

      --venv-path <PATH>
          Path to the virtual environment the project uses.
          
          If provided, red-knot will use the `site-packages` directory of this virtual environment to resolve type information for the project's third-party dependencies.

      --typeshed-path <PATH>
          Custom directory to use for stdlib typeshed stubs

      --extra-search-path <PATH>
          Additional path to use as a module-resolution source (can be passed multiple times)

      --python-version <VERSION>
          Python version to assume when resolving types
          
          [possible values: 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13]

  -v, --verbose...
          Use verbose output (or `-vv` and `-vvv` for more verbose output)

  -W, --watch
          Run in watch mode by re-running whenever files change

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version
```

Short help 

```
An extremely fast Python type checker.

Usage: red_knot [OPTIONS] [COMMAND]

Commands:
  server  Start the language server
  help    Print this message or the help of the given subcommand(s)

Options:
      --project <PROJECT>         Run the command within the given project directory
      --venv-path <PATH>          Path to the virtual environment the project uses
      --typeshed-path <PATH>      Custom directory to use for stdlib typeshed stubs
      --extra-search-path <PATH>  Additional path to use as a module-resolution source (can be passed multiple times)
      --python-version <VERSION>  Python version to assume when resolving types [possible values: 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13]
  -v, --verbose...                Use verbose output (or `-vv` and `-vvv` for more verbose output)
  -W, --watch                     Run in watch mode by re-running whenever files change
  -h, --help                      Print help (see more with '--help')
  -V, --version                   Print version

```

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-12-13 08:21:52 +00:00
Micha Reiser
881375a8d9
[red-knot] Lint registry and rule selection (#14874)
## Summary

This is the third and last PR in this stack that adds support for
toggling lints at a per-rule level.

This PR introduces a new `LintRegistry`, a central index of known lints.
The registry is required because we want to support lint rules from many
different crates but need a way to look them up by name, e.g., when
resolving a lint from a name in the configuration or analyzing a
suppression comment.

Adding a lint now requires two steps:

1. Declare the lint with `declare_lint`
2. Register the lint in the registry inside the `register_lints`
function.

I considered some more involved macros to avoid changes in two places.
Still, I ultimately decided against it because a) it's just two places
and b) I'd expect that registering a type checker lint will differ from
registering a lint that runs as a rule in the linter. I worry that any
more opinionated design could limit our options when working on the
linter, so I kept it simple.

The second part of this PR is the `RuleSelection`. It stores which lints
are enabled and what severity they should use for created diagnostics.
For now, the `RuleSelection` always gets initialized with all known
lints and it uses their default level.

## Linter crates

Each crate that defines lints should export a `register_lints` function
that accepts a `&mut LintRegistryBuilder` to register all its known
lints in the registry. This should make registering all known lints in a
top-level crate easy: Just call `register_lints` of every crate that
defines lint rules.

I considered defining a `LintCollection` trait and even some fancy
macros to accomplish the same but decided to go for this very simplistic
approach for now. We can add more abstraction once needed.

## Lint rules

This is a bit hand-wavy. I don't have a good sense for how our linter
infrastructure will look like, but I expect we'll need a way to register
the rules that should run as part of the red knot linter. One way is to
keep doing what Ruff does by having one massive `checker` and each lint
rule adds a call to itself in the relevant AST visitor methods. An
alternative is that we have a `LintRule` trait that provides common
hooks and implementations will be called at the "right time". Such a
design would need a way to register all known lint implementations,
possibly with the lint. This is where we'd probably want a dedicated
`register_rule` method. A third option is that lint rules are handled
separately from the `LintRegistry` and are specific to the linter crate.

The current design should be flexible enough to support the three
options.


## Documentation generation

The documentation for all known lints can be generated by creating a
factory, registering all lints by calling the `register_lints` methods,
and then querying the registry for the metadata.

## Deserialization and Schema generation

I haven't fully decided what the best approach is when it comes to
deserializing lint rule names:

* Reject invalid names in the deserializer. This gives us error messages
with line and column numbers (by serde)
* Don't validate lint rule names during deserialization; defer the
validation until the configuration is resolved. This gives us more
control over handling the error, e.g. emit a warning diagnostic instead
of aborting when a rule isn't known.

One technical challenge for both deserialization and schema generation
is that the `Deserialize` and `JSONSchema` traits do not allow passing
the `LintRegistry`, which is required to look up the lints by name. I
suggest that we either rely on the salsa db being set for the current
thread (`salsa::Attach`) or build our own thread-local storage for the
`LintRegistry`. It's the caller's responsibility to make the lint
registry available before calling `Deserialize` or `JSONSchema`.


## CLI support

I prefer deferring adding support for enabling and disabling lints from
the CLI for now because I think it will be easier
to add once I've figured out how to handle configurations. 

## Bitset optimization

Ruff tracks the enabled rules using a cheap copyable `Bitset` instead of
a hash map. This helped improve performance by a few percent (see
https://github.com/astral-sh/ruff/pull/3606). However, this approach is
no longer possible because lints have no "cheap" way to compute their
index inside the registry (other than using a hash map).

We could consider doing something similar to Salsa where each
`LintMetadata` stores a `LazyLintIndex`.

```
pub struct LazyLintIndex {
	cached: OnceLock<(Nonce, LintIndex)>
}

impl LazyLintIndex {
	pub fn get(registry: &LintRegistry, lint: &'static LintMetadata) {
	
	let (nonce, index) = self.cached.get_or_init(|| registry.lint_index(lint));

	if registry.nonce() == nonce {
		index
	} else {
		registry.lint_index(lint)
	}
}
```

Each registry keeps a map from `LintId` to `LintIndex` where `LintIndex`
is in the range of `0...registry.len()`. The `LazyLintIndex` is based on
the assumption that every program has exactly **one** registry. This
assumption allows to cache the `LintIndex` directly on the
`LintMetadata`. The implementation falls back to the "slow" path if
there is more than one registry at runtime.

I was very close to implementing this optimization because it's kind of
fun to implement. I ultimately decided against it because it adds
complexity and I don't think it's worth doing in Red Knot today:

* Red Knot only queries the rule selection when deciding whether or not
to emit a diagnostic. It is rarely used to detect if a certain code
block should run. This is different from Ruff where the rule selection
is queried many times for every single AST node to determine which rules
*should* run.
* I'm not sure if a 2-3% performance improvement is worth the complexity

I suggest revisiting this decision when working on the linter where a
fast path for deciding if a rule is enabled might be more important (but
that depends on how lint rules are implemented)


## Test Plan

I removed a lint from the default rule registry, and the MD tests
started failing because the diagnostics were no longer emitted.
2024-12-11 13:25:19 +01:00
Micha Reiser
5f548072d9
[red-knot] Typed diagnostic id (#14869)
## Summary

This PR introduces a structured `DiagnosticId` instead of using a plain
`&'static str`. It is the first of three in a stack that implements a
basic rules infrastructure for Red Knot.

`DiagnosticId` is an enum over all known diagnostic codes. A closed enum
reduces the risk of accidentally introducing two identical diagnostic
codes. It also opens the possibility of generating reference
documentation from the enum in the future (not part of this PR).

The enum isn't *fully closed* because it uses a `&'static str` for lint
names. This is because we want the flexibility to define lints in
different crates, and all names are only known in `red_knot_linter` or
above. Still, lower-level crates must already reference the lint names
to emit diagnostics. We could define all lint-names in `DiagnosticId`
but I decided against it because:

* We probably want to share the `DiagnosticId` type between Ruff and Red
Knot to avoid extra complexity in the diagnostic crate, and both tools
use different lint names.
* Lints require a lot of extra metadata beyond just the name. That's why
I think defining them close to their implementation is important.

In the long term, we may also want to support plugins, which would make
it impossible to know all lint names at compile time. The next PR in the
stack introduces extra syntax for defining lints.

A closed enum does have a few disadvantages:

* rustc can't help us detect unused diagnostic codes because the enum is
public
* Adding a new diagnostic in the workspace crate now requires changes to
at least two crates: It requires changing the workspace crate to add the
diagnostic and the `ruff_db` crate to define the diagnostic ID. I
consider this an acceptable trade. We may want to move `DiagnosticId` to
its own crate or into a shared `red_knot_diagnostic` crate.


## Preventing duplicate diagnostic identifiers

One goal of this PR is to make it harder to introduce ambiguous
diagnostic IDs, which is achieved by defining a closed enum. However,
the enum isn't fully "closed" because it doesn't explicitly list the IDs
for all lint rules. That leaves the possibility that a lint rule and a
diagnostic ID share the same name.

I made the names unambiguous in this PR by separating them into
different namespaces by using `lint/<rule>` for lint rule codes. I don't
mind the `lint` prefix in a *Ruff next* context, but it is a bit weird
for a standalone type checker. I'd like to not overfocus on this for now
because I see a few different options:

* We remove the `lint` prefix and add a unit test in a top-level crate
that iterates over all known lint rules and diagnostic IDs to ensure the
names are non-overlapping.
* We only render `[lint]` as the error code and add a note to the
diagnostic mentioning the lint rule. This is similar to clippy and has
the advantage that the header line remains short
(`lint/some-long-rule-name` is very long ;))
* Any other form of adjusting the diagnostic rendering to make the
distinction clear

I think we can defer this decision for now because the `DiagnosticId`
contains all the relevant information to change the rendering
accordingly.


## Why `Lint` and not `LintRule`

I see three kinds of diagnostics in Red Knot:

* Non-suppressable: Reveal type, IO errors, configuration errors, etc.
(any `DiagnosticId`)
* Lints: code-related diagnostics that are suppressable. 
* Lint rules: The same as lints, but they can be enabled or disabled in
the configuration. The majority of lints in Red Knot and the Ruff
linter.

Our current implementation doesn't distinguish between lints and Lint
rules because we aren't aware of a suppressible code-related lint that
can't be configured in the configuration. The only lint that comes to my
mind is maybe `division-by-zero` if we're 99.99% sure that it is always
right. However, I want to keep the door open to making this distinction
in the future if it proves useful.

Another reason why I chose lint over lint rule (or just rule) is that I
want to leave room for a future lint rule and lint phase concept:

* lint is the *what*: a specific code smell, pattern, or violation 
* the lint rule is the *how*: I could see a future `LintRule` trait in
`red_knot_python_linter` that provides the necessary hooks to run as
part of the linter. A lint rule produces diagnostics for exactly one
lint. A lint rule differs from all lints in `red_knot_python_semantic`
because they don't run as "rules" in the Ruff sense. Instead, they're a
side-product of type inference.
* the lint phase is a different form of *how*: A lint phase can produce
many different lints in a single pass. This is a somewhat common pattern
in Ruff where running one analysis collects the necessary information
for finding many different lints
* diagnostic is the *presentation*: Unlike a lint, the diagnostic isn't
the what, but how a specific lint gets presented. I expect that many
lints can use one generic `LintDiagnostic`, but a few lints might need
more flexibility and implement their custom diagnostic rendering (at
least custom `Diagnostic` implementation).


## Test Plan

`cargo test`
2024-12-10 15:58:07 +00:00
Dimitri Papadopoulos Orfanos
59145098d6
Fix typos found by codespell (#14863)
## Summary

Just fix typos.

## Test Plan

CI tests.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2024-12-09 09:32:12 +00:00
Micha Reiser
c2e17d0399
Possible fix for flaky file watching test (#14543) 2024-12-03 08:22:42 +01:00
David Peter
5137fcc9c8
[red-knot] Re-enable linter corpus tests (#14736)
## Summary

Seeing the fuzzing results from @dhruvmanila in #13778, I think we can
re-enable these tests. We also had one regression that would have been
caught by these tests, so there is some value in having them enabled.
2024-12-02 20:11:30 +01:00
Micha Reiser
30d80d9746
Sort discovered workspace packages for consistent cross-platform package discovery (#14725) 2024-12-02 07:36:08 +00:00
Micha Reiser
b63c2e126b
Upgrade Rust toolchain to 1.83 (#14677) 2024-11-29 12:05:05 +00:00
David Peter
b94d6cf567
[red-knot] Fix panic related to f-strings in annotations (#14613)
## Summary

Fix panics related to expressions without inferred types in invalid
syntax examples like:
```py
x: f"Literal[{1 + 2}]" = 3
```
where the `1 + 2` expression (and its sub-expressions) inside the
annotation did not have an inferred type.

## Test Plan

Added new corpus test.
2024-11-26 16:35:44 +01:00
David Peter
cd0c97211c
[red-knot] Update KNOWN_FAILURES (#14612)
## Summary

Remove entry that was prevously fixed in
5a30ec0df6.

## Test Plan

```sh
cargo test -p red_knot_workspace -- --ignored linter_af linter_gz
```
2024-11-26 15:56:42 +01:00
David Peter
f6b2cd5588
[red-knot] Semantic index: handle invalid breaks (#14522)
## Summary

This fix addresses panics related to invalid syntax like the following
where a `break` statement is used in a nested definition inside a
loop:

```py
while True:

    def b():
        x: int

        break
```

closes #14342

## Test Plan

* New corpus regression tests.
* New unit test to make sure we handle nested while loops correctly.
This test is passing on `main`, but can easily fail if the
`is_inside_loop` state isn't properly saved/restored.
2024-11-22 13:13:55 +01:00
David Peter
a90e404c3f
[red-knot] PEP 695 type aliases (#14357)
## Summary

Add support for (non-generic) type aliases. The main motivation behind
this was to get rid of panics involving expressions in (generic) type
aliases. But it turned out the best way to fix it was to implement
(partial) support for type aliases.

```py
type IntOrStr = int | str

reveal_type(IntOrStr)  # revealed: typing.TypeAliasType
reveal_type(IntOrStr.__name__)  # revealed: Literal["IntOrStr"]

x: IntOrStr = 1

reveal_type(x)  # revealed: Literal[1]

def f() -> None:
    reveal_type(x)  # revealed: int | str
```

## Test Plan

- Updated corpus test allow list to reflect that we don't panic anymore.
- Added Markdown-based test for type aliases (`type_alias.md`)
2024-11-22 08:47:14 +01:00
David Peter
f684b6fff4
[red-knot] Fix: Infer type for typing.Union[..] tuple expression (#14510)
## Summary

Fixes a panic related to sub-expressions of `typing.Union` where we fail
to store a type for the `int, str` tuple-expression in code like this:
```
x: Union[int, str] = 1
```

relates to [my
comment](https://github.com/astral-sh/ruff/pull/14499#discussion_r1851794467)
on #14499.

## Test Plan

New corpus test
2024-11-21 11:49:20 +01:00
David Peter
f8c20258ae
[red-knot] Do not panic on f-string format spec expressions (#14436)
## Summary

Previously, we panicked on expressions like `f"{v:{f'0.2f'}}"` because
we did not infer types for expressions nested inside format spec
elements.

## Test Plan

```
cargo nextest run -p red_knot_workspace -- --ignored linter_af linter_gz
```
2024-11-19 10:04:51 +01:00
David Peter
d81b6cd334
[red-knot] Types for subexpressions of annotations (#14426)
## Summary

This patches up various missing paths where sub-expressions of type
annotations previously had no type attached. Examples include:
```py
tuple[int, str]
#     ~~~~~~~~

type[MyClass]
#    ~~~~~~~

Literal["foo"]
#       ~~~~~

Literal["foo", Literal[1, 2]]
#              ~~~~~~~~~~~~~

Literal[1, "a", random.illegal(sub[expr + ession])]
#               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

## Test Plan

```
cargo nextest run -p red_knot_workspace -- --ignored linter_af linter_gz
```
2024-11-18 13:03:27 +01:00
Micha Reiser
d99210c049
[red-knot] Default to python 3.9 (#14429) 2024-11-18 11:27:40 +00:00
David Peter
d470f29093
[red-knot] Disable linter-corpus tests (#14391)
## Summary

Disable the no-panic tests for the linter corpus, as there are too many
problems right now, requiring linter-contributors to add their test
files to the allow-list.

We can still run the tests using `cargo test -p red_knot_workspace --
--ignored linter_af linter_gz`. This is also why I left the
`crates/ruff_linter/` entries in the allow list for now, even if they
will get out of sync. But let me know if I should rather remove them.
2024-11-16 23:33:19 +01:00
Simon Brugman
1fbed6c325
[ruff] Implement redundant-bool-literal (RUF038) (#14319)
## Summary

Implements `redundant-bool-literal`

## Test Plan

<!-- How was it tested? -->

`cargo test`

The ecosystem results are all correct, but for `Airflow` the rule is not
relevant due to the use of overloading (and is marked as unsafe
correctly).

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-11-16 21:52:51 +00:00
David Peter
4dcb7ddafe
[red-knot] Remove duplicates from KNOWN_FAILURES (#14386)
## Summary

- Sort the list of `KNOWN_FAILURES`
- Remove accidental duplicates
2024-11-16 20:54:21 +01:00
Micha Reiser
5be90c3a67
Split the corpus tests into smaller tests (#14367)
## Summary

This PR splits the corpus tests into smaller chunks because running all
of them takes 8s on my windows machine and it's by far the longest test
in `red_knot_workspace`.

Splitting the tests has the advantage that they run in parallel. This PR
brings down the wall time from 8s to 4s.

This PR also limits the glob for the linter tests because it's common to
clone cpython into the `ruff_linter/resources/test` folder for
benchmarks (because that's what's written in the contributing guides)

## Test Plan

`cargo test`
2024-11-16 20:29:21 +01:00
Simon Brugman
78210b198b
[flake8-pyi] Implement redundant-none-literal (PYI061) (#14316)
## Summary

`Literal[None]` can be simplified into `None` in type annotations.

Surprising to see that this is not that rare:
-
https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chat_models/base.py#L54
-
https://github.com/sqlalchemy/sqlalchemy/blob/main/lib/sqlalchemy/sql/annotation.py#L69
- https://github.com/jax-ml/jax/blob/main/jax/numpy/__init__.pyi#L961
-
https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_common.py#L179

## Test Plan

`cargo test`

Reviewed all ecosystem results, and they are true positives.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-11-16 18:22:51 +00:00
Micha Reiser
a6a3d3f656
Fix file watcher panic when event has no paths (#14364) 2024-11-16 08:36:57 +01:00
Micha Reiser
81e5830585
Workspace discovery (#14308) 2024-11-15 19:20:15 +01:00
David Peter
9f3235a37f
[red-knot] Expand test corpus (#14360)
## Summary

- Add 383 files from `crates/ruff_python_parser/resources` to the test
corpus
- Add 1296 files from `crates/ruff_linter/resources` to the test corpus
- Use in-memory file system for tests
- Improve test isolation by cleaning the test environment between checks
- Add a mechanism for "known failures". Mark ~80 files as known
failures.
- The corpus test is now a lot slower (6 seconds).

Note:
While `red_knot` as a command line tool can run over all of these
files without panicking, we still have a lot of test failures caused by
explicitly "pulling" all types.

## Test Plan

Run `cargo test -p red_knot_workspace` while making sure that
- Introducing code that is known to lead to a panic fails the test
- Removing code that is known to lead to a panic from
`KNOWN_FAILURES`-files also fails the test
2024-11-15 17:09:15 +01:00
David Peter
77e8da7497
[red-knot] Avoid panics for ipython magic commands (#14326)
## Summary

Avoids panics when encountering Jupyter notebooks with [IPython magic
commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html).

## Test Plan

Added Jupyter notebook to corpus.
2024-11-13 20:58:08 +01:00
David Peter
5e64863895
[red-knot] Handle invalid assignment targets (#14325)
## Summary

This fixes several panics related to invalid assignment targets. All of
these led to some a crash, previously:
```py
(x.y := 1)  # only name-expressions are valid targets of named expressions
([x, y] := [1, 2])  # same
(x, y): tuple[int, int] = (2, 3)  # tuples are not valid targets for annotated assignments
(x, y) += 2  # tuples are not valid targets for augmented assignments
```

closes #14321
closes #14322

## Test Plan

I symlinked four files from `crates/ruff_python_parser/resources` into
the red knot corpus, as they seemed like ideal test files for this exact
scenario. I think eventually, it might be a good idea to simply include *all*
invalid-syntax examples from the parser tests into red knots corpus (I believe
we're actually not too far from that goal). Or expand the scope of the corpus
test to this directory. Then we can get rid of these symlinks again.
2024-11-13 20:50:39 +01:00
David Peter
0eb36e4345
[red-knot] Avoid panic for generic type aliases (#14312)
## Summary

This avoids a panic inside `TypeInferenceBuilder::infer_type_parameters`
when encountering generic type aliases:
```py
type ListOrSet[T] = list[T] | set[T]
```

To fix this properly, we would have to treat type aliases as being their own
annotation scope [1]. The left hand side is a definition for the type parameter
`T` which is being used in the special annotation scope on the right hand side.
Similar to how it works for generic functions and classes.

[1] https://docs.python.org/3/reference/compound_stmts.html#generic-type-aliases


closes #14307

## Test Plan

Added new example to the corpus.
2024-11-13 16:01:15 +01:00
David Peter
b946cfd1f7
[red-knot] Use memory address as AST node key (#14317)
## Summary

Use the memory address to uniquely identify AST nodes, instead of
relying on source range and kind. The latter fails for ASTs resulting
from invalid syntax examples. See #14313 for details.

Also results in a 1-2% speedup
(67349cf55f)

closes #14313 

## Review

Here are the places where we use `NodeKey` directly or indirectly (via
`ExpressionNodeKey` or `DefinitionNodeKey`):

```rs
// semantic_index.rs
pub(crate) struct SemanticIndex<'db> { 
    // [...]
    /// Map expressions to their corresponding scope.
    scopes_by_expression: FxHashMap<ExpressionNodeKey, FileScopeId>,

    /// Map from a node creating a definition to its definition.
    definitions_by_node: FxHashMap<DefinitionNodeKey, Definition<'db>>,

    /// Map from a standalone expression to its [`Expression`] ingredient.
    expressions_by_node: FxHashMap<ExpressionNodeKey, Expression<'db>>,
    // [...]
}

// semantic_index/builder.rs
pub(super) struct SemanticIndexBuilder<'db> {
    // [...]
    scopes_by_expression: FxHashMap<ExpressionNodeKey, FileScopeId>,
    definitions_by_node: FxHashMap<ExpressionNodeKey, Definition<'db>>,
    expressions_by_node: FxHashMap<ExpressionNodeKey, Expression<'db>>,
}

// semantic_index/ast_ids.rs
pub(crate) struct AstIds {
    /// Maps expressions to their expression id.
    expressions_map: FxHashMap<ExpressionNodeKey, ScopedExpressionId>,
    /// Maps expressions which "use" a symbol (that is, [`ast::ExprName`]) to a use id.
    uses_map: FxHashMap<ExpressionNodeKey, ScopedUseId>,
}

pub(super) struct AstIdsBuilder {
    expressions_map: FxHashMap<ExpressionNodeKey, ScopedExpressionId>,
    uses_map: FxHashMap<ExpressionNodeKey, ScopedUseId>,
}
```

## Test Plan

Added two failing examples to the corpus.
2024-11-13 14:35:54 +01:00
David Peter
3e36a7ab81
[red-knot] Fix assertion for invalid match pattern (#14306)
## Summary

Fixes a failing debug assertion that triggers for the following code:
```py
match some_int:
    case x:=2:
        pass
```

closes #14305

## Test Plan

Added problematic code example to corpus.
2024-11-13 10:07:29 +01:00
Carl Meyer
645ce7e5ec
[red-knot] infer types for PEP695 typevars (#14182)
## Summary

Create definitions and infer types for PEP 695 type variables.

This just gives us the type of the type variable itself (the type of `T`
as a runtime object in the body of `def f[T](): ...`), with special
handling for its attributes `__name__`, `__bound__`, `__constraints__`,
and `__default__`. Mostly the support for these attributes exists
because it is easy to implement and allows testing that we are
internally representing the typevar correctly.

This PR doesn't yet have support for interpreting a typevar as a type
annotation, which is of course the primary use of a typevar. But the
information we store in the typevar's type in this PR gives us
everything we need to handle it correctly in a future PR when the
typevar appears in an annotation.

## Test Plan

Added mdtest.
2024-11-08 21:23:05 +00:00
Micha Reiser
59c0dacea0
Introduce Diagnostic trait (#14130) 2024-11-07 13:26:21 +01:00
David Peter
239cbc6f33
[red-knot] Store starred-expression annotation types (#14106)
## Summary

- Store the expression type for annotations that are starred expressions
(see [discussion
here](https://github.com/astral-sh/ruff/pull/14091#discussion_r1828332857))
- Use `self.store_expression_type(…)` consistently throughout, as it
makes sure that no double-insertion errors occur.

closes #14115

## Test Plan

Added an invalid-syntax example to the corpus which leads to a panic on
`main`. Also added a Markdown test with a valid-syntax example that
would lead to a panic once we implement function parameter inference.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-11-05 20:25:45 +01:00
Micha Reiser
6aaf1d9446
[red-knot] Remove lint-phase (#13922)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-10-25 18:40:52 +00:00
David Peter
f335fe4d4a
[red-knot] rename {Class,Module,Function} => {Class,Module,Function}Literal (#13873)
## Summary

* Rename `Type::Class` => `Type::ClassLiteral`
* Rename `Type::Function` => `Type::FunctionLiteral`
* Do not rename `Type::Module`
* Remove `*Literal` suffixes in `display::LiteralTypeKind` variants, as
per clippy suggestion
* Get rid of `Type::is_class()` in favor of `is_subtype_of(…, 'type')`;
modifiy `is_subtype_of` to support this.
* Add new `Type::is_xyz()` methods and use them instead of matching on
`Type` variants.

closes #13863 

## Test Plan

New `is_subtype_of_class_literals` unit test.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-10-22 22:10:53 +02:00
Dhruv Manilawala
b16f665a81
[red-knot] Infer target types for unpacked tuple assignment (#13316)
## Summary

This PR adds support for unpacking tuple expression in an assignment
statement where the target expression can be a tuple or a list (the
allowed sequence targets).

The implementation introduces a new `infer_assignment_target` which can
then be used for other targets like the ones in for loops as well. This
delegates it to the `infer_definition`. The final implementation uses a
recursive function that visits the target expression in source order and
compares the variable node that corresponds to the definition. At the
same time, it keeps track of where it is on the assignment value type.

The logic also accounts for the number of elements on both sides such
that it matches even if there's a gap in between. For example, if
there's a starred expression like `(a, *b, c) = (1, 2, 3)`, then the
type of `a` will be `Literal[1]` and the type of `b` will be
`Literal[2]`.

There are a couple of follow-ups that can be done:
* Use this logic for other target positions like `for` loop
* Add diagnostics for mis-match length between LHS and RHS

## Test Plan

Add various test cases using the new markdown test framework.
Validate that existing test cases pass.

---------

Co-authored-by: Carl Meyer <carl@astral.sh>
2024-10-15 19:07:11 +00:00
Micha Reiser
5f65e842e8
Upgrade salsa (#13757) 2024-10-15 11:06:32 +00:00
Charlie Marsh
c3b40da0d2
Use backticks for code in red-knot messages (#13599)
## Summary

...and remove periods from messages that don't span more than a single
sentence.

This is more consistent with how we present user-facing messages in uv
(which has a defined style guide).
2024-10-02 03:14:28 +00:00
Alex Waygood
82324678cf
Rename the ruff_vendored crate to red_knot_vendored (#13586) 2024-10-01 16:16:59 +01:00
Micha Reiser
653c09001a
Use an empty vendored file system in Ruff (#13436)
## Summary

This PR changes removes the typeshed stubs from the vendored file system
shipped with ruff
and instead ships an empty "typeshed".

Making the typeshed files optional required extracting the typshed files
into a new `ruff_vendored` crate. I do like this even if all our builds
always include typeshed because it means `red_knot_python_semantic`
contains less code that needs compiling.

This also allows us to use deflate because the compression algorithm
doesn't matter for an archive containing a single, empty file.

## Test Plan

`cargo test`

I verified with ` cargo tree -f "{p} {f}" -p <package> ` that:

* red_knot_wasm: enables `deflate` compression
* red_knot: enables `zstd` compression
* `ruff`: uses stored


I'm not quiet sure how to build the binary that maturin builds but
comparing the release artifact size with `strip = true` shows a `1.5MB`
size reduction

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-09-21 16:31:42 +00:00
Carl Meyer
149fb2090e
[red-knot] more efficient UnionBuilder::add (#13411)
Avoid quadratic time in subsumed elements when adding a super-type of
existing union elements.

Reserve space in advance when adding multiple elements (from another
union) to a union.

Make union elements a `Box<[Type]>` instead of an `FxOrderSet`; the set
doesn't buy much since the rules of union uniqueness are defined in
terms of supertype/subtype, not in terms of simple type identity.

Move sealed-boolean handling out of a separate `UnionBuilder::simplify`
method and into `UnionBuilder::add`; now that `add` is iterating
existing elements anyway, this is more efficient.

Remove `UnionType::contains`, since it's now `O(n)` and we shouldn't
really need it, generally we care about subtype/supertype, not type
identity. (Right now it's used for `Type::Unbound`, which shouldn't even
be a type.)

Add support for `is_subtype_of` for the `object` type.

Addresses comments on https://github.com/astral-sh/ruff/pull/13401
2024-09-20 10:49:45 -07:00
Carl Meyer
260c2ecd15
[red-knot] visit with-item vars even if not a Name (#13409)
This fixes the last panic on checking pandas.

(Match statement became an `if let` because clippy decided it wanted
that once I added the additional line in the else case?)

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-09-19 10:37:49 -07:00
Carl Meyer
29c36a56b2
[red-knot] fix scope inference with deferred types (#13204)
Test coverage for #13131 wasn't as good as I thought it was, because
although we infer a lot of types in stubs in typeshed, we don't check
typeshed, and therefore we don't do scope-level inference and pull all
types for a scope. So we didn't really have good test coverage for
scope-level inference in a stub. And because of this, I got the code for
supporting that wrong, meaning that if we did scope-level inference with
deferred types, we'd end up never populating the deferred types in the
scope's `TypeInference`, which causes panics like #13160.

Here I both add test coverage by running the corpus tests both as `.py`
and as `.pyi` (which reveals the panic), and I fix the code to support
deferred types in scope inference.

This also revealed a problem with deferred types in generic functions,
which effectively span two scopes. That problem will require a bit more
thought, and I don't want to block this PR on it, so for now I just
don't defer annotations on generic functions.

Fixes #13160.
2024-09-03 11:20:43 -07:00
Micha Reiser
599103c933
Add a few missing #[return_ref] attributes (#13223) 2024-09-03 09:15:43 +02:00
Micha Reiser
ecab04e338
Basic concurrent checking (#13049) 2024-08-24 09:53:27 +01:00