Commit graph

434 commits

Author SHA1 Message Date
Micha Reiser
e18b4e42d3
[red-knot] Upgrade to the *new* *new* salsa (#12406) 2024-07-29 07:21:24 +00:00
Carl Meyer
4b69271809
[red-knot] resolve int/list/dict/set/tuple to builtin type (#12521)
Now that we have builtins available, resolve some simple cases to the
right builtin type.

We should also adjust the display for types to include their module
name; that's not done yet here.
2024-07-26 08:21:31 -07:00
Carl Meyer
2d3914296d
[red-knot] handle all syntax without panic (#12499)
Extend red-knot type inference to cover all syntax, so that inferring
types for a scope gives all expressions a type. This means we can run
the red-knot semantic lint on all Python code without panics. It also
means we can infer types for `builtins.pyi` without panics.

To keep things simple, this PR intentionally doesn't add any new type
inference capabilities: the expanded coverage is all achieved with
`Type::Unknown`. But this puts the skeleton in place for adding better
inference of all these language features.

I also had to add basic Salsa cycle recovery (with just `Type::Unknown`
for now), because some `builtins.pyi` definitions are cyclic.

To test this, I added a comprehensive corpus of test snippets sourced
from Cinder under [MIT
license](https://github.com/facebookincubator/cinder/blob/cinder/3.10/cinderx/LICENSE),
which matches Ruff's license. I also added to this corpus some
additional snippets for newer language features: all the
`27_func_generic_*` and `73_class_generic_*` files, as well as
`20_lambda_default_arg.py`, and added a test which runs semantic-lint
over all these files. (The test doesn't assert the test-corpus files are
lint-free; just that they are able to lint without a panic.)
2024-07-25 17:38:08 -07:00
Micha Reiser
eac965ecaf
[red-knot] Watch search paths (#12407) 2024-07-24 07:38:50 +00:00
Micha Reiser
40d9324f5a
[red-knot] Improved file watching (#12382) 2024-07-23 08:18:59 +02:00
Carl Meyer
c7b13bb8fc
[red-knot] add cycle-free while-loop control flow (#12413)
Add support for while-loop control flow.

This doesn't yet include general support for terminals and reachability;
that is wider than just while loops and belongs in its own PR.

This also doesn't yet add support for cyclic definitions in loops; that
comes with enough of its own complexity in Salsa that I want to handle
it separately.
2024-07-22 14:27:33 -07:00
Carl Meyer
f22c8ab811
[red-knot] add maybe-undefined lint rule (#12414)
Add a lint rule to detect if a name is definitely or possibly undefined
at a given usage.

If I create the file `undef/main.py` with contents:

```python
x = int
def foo():
    z
    return x
if flag:
    y = x
y
```

And then run `cargo run --bin red_knot -- --current-directory
../ruff-examples/undef`, I get the output:

```
Name 'z' used when not defined.
Name 'flag' used when not defined.
Name 'y' used when possibly not defined.
```

If I modify the file to add `y = 0` at the top, red-knot re-checks it
and I get the new output:

```
Name 'z' used when not defined.
Name 'flag' used when not defined.
```

Note that `int` is not flagged, since it's a builtin, and `return x` in
the function scope is not flagged, since it refers to the global `x`.
2024-07-22 13:53:59 -07:00
Alex Waygood
d8cf8ac2ef
[red-knot] Resolve symbols from builtins.pyi in the stdlib if they cannot be found in other scopes (#12390)
Co-authored-by: Carl Meyer <carl@astral.sh>
2024-07-19 17:44:56 +01:00
Carl Meyer
f82bb67555
[red-knot] trace file when inferring types (#12401)
When poring over traces, the ones that just include a definition or
symbol or expression ID aren't very useful, because you don't know which
file it comes from. This adds that information to the trace.

I guess the downside here is that if calling `.file(db)` on a
scope/definition/expression would execute other traced code, it would be
marked as outside the span? I don't think that's a concern, because I
don't think a simple field access on a tracked struct should ever
execute our code. If I'm wrong and this is a problem, it seems like the
tracing crate has this feature where you can record a field as
`tracing::field::Empty` and then fill in its value later with
`span.record(...)`, but when I tried this it wasn't working for me, not
sure why.

I think there's a lot more we can do to make our tracing output more
useful for debugging (e.g. record an event whenever a
definition/symbol/expression/use id is created with the details of that
definition/symbol/expression/use), this is just dipping my toes in the
water.
2024-07-19 07:13:51 -07:00
Carl Meyer
181e7b3c0d
[red-knot] rename module_global to global (#12385)
Per comments in https://github.com/astral-sh/ruff/pull/12269, "module
global" is kind of long, and arguably redundant.

I tried just using "module" but there were too many cases where I felt
this was ambiguous. I like the way "global" works out better, though it
does require an understanding that in Python "global" generally means
"module global" not "globally global" (though in a sense module globals
are also globally global since modules are singletons).
2024-07-18 13:05:30 -07:00
Carl Meyer
519eca9fe7
[red-knot] support implicit global name lookups (#12374)
Support falling back to a global name lookup if a name isn't defined in
the local scope, in the cases where that is correct according to Python
semantics.

In class scopes, a name lookup checks the local namespace first, and if
the name isn't found there, looks it up in globals.

In function scopes (and type parameter scopes, which are function-like),
if a name has any definitions in the local scope, it is a local, and
accessing it when none of those definitions have executed yet just
results in an `UnboundLocalError`, it does not fall back to a global. If
the name does not have any definitions in the local scope, then it is an
implicit global.

Public symbol type lookups never include such a fall back. For example,
if a name is not defined in a class scope, it is not available as a
member on that class, even if a name lookup within the class scope would
have fallen back to a global lookup.

This PR makes the `@override` lint rule work again.

Not yet included/supported in this PR:

* Support for free variables / closures: a free symbol in a nested
function-like scope referring to a symbol in an outer function-like
scope.
* Support for `global` and `nonlocal` statements, which force a symbol
to be treated as global or nonlocal even if it has definitions in the
local scope.
* Module-global lookups should fall back to builtins if the name isn't
found in the module scope.

I would like to expose nicer APIs for the various kinds of symbols
(explicit global, implicit global, free, etc), but this will also wait
for a later PR, when more kinds of symbols are supported.
2024-07-18 10:50:43 -07:00
Carl Meyer
811f78d94d
[red-knot] small efficiency improvements and bugfixes to use-def map building (#12373)
Adds inference tests sufficient to give full test coverage of the
`UseDefMapBuilder::merge` method.

In the process I realized that we could implement visiting of if
statements in `SemanticBuilder` with fewer `snapshot`, `restore`, and
`merge` operations, so I restructured that visit a bit.

I also found one correctness bug in the `merge` method (it failed to
extend the given snapshot with "unbound" for any missing symbols,
meaning we would just lose the fact that the symbol could be unbound in
the merged-in path), and two efficiency bugs (if one of the ranges to
merge is empty, we can just use the other one, no need for copies, and
if the ranges are overlapping -- which can occur with nested branches --
we can still just merge them with no copies), and fixed all three.
2024-07-18 09:24:58 -07:00
Carl Meyer
b2a49d8140
[red-knot] better docs for use-def maps (#12357)
Add better doc comments and comments, as well as one debug assertion, to
use-def map building.
2024-07-17 17:50:58 -07:00
Carl Meyer
985a999234
[red-knot] better docs for type inference (#12356)
Add some docs for how type inference works.

Also a couple minor code changes to rearrange or rename for better
clarity.
2024-07-17 13:36:58 -07:00
Micha Reiser
91338ae902
[red-knot] Add basic workspace support (#12318) 2024-07-17 11:34:21 +02:00
Carl Meyer
073588b48e
[red-knot] improve semantic index tests (#12355)
Improve semantic index tests with better assertions than just `.len()`,
and re-add use-definition test that was commented out in the switch to
Salsa initially.
2024-07-16 23:46:49 -07:00
Carl Meyer
595b1aa4a1
[red-knot] per-definition inference, use-def maps (#12269)
Implements definition-level type inference, with basic control flow
(only if statements and if expressions so far) in Salsa.

There are a couple key ideas here:

1) We can do type inference queries at any of three region
granularities: an entire scope, a single definition, or a single
expression. These are represented by the `InferenceRegion` enum, and the
entry points are the salsa queries `infer_scope_types`,
`infer_definition_types`, and `infer_expression_types`. Generally
per-scope will be used for scopes that we are directly checking and
per-definition will be used anytime we are looking up symbol types from
another module/scope. Per-expression should be uncommon: used only for
the RHS of an unpacking or multi-target assignment (to avoid
re-inferring the RHS once per symbol defined in the assignment) and for
test nodes in type narrowing (e.g. the `test` of an `If` node). All
three queries return a `TypeInference` with a map of types for all
definitions and expressions within their region. If you do e.g.
scope-level inference, when it hits a definition, or an
independently-inferable expression, it should use the relevant query
(which may already be cached) to get all types within the smaller
region. This avoids double-inferring smaller regions, even though larger
regions encompass smaller ones.

2) Instead of building a control-flow graph and lazily traversing it to
find definitions which reach a use of a name (which is O(n^2) in the
worst case), instead semantic indexing builds a use-def map, where every
use of a name knows which definitions can reach that use. We also no
longer track all definitions of a symbol in the symbol itself; instead
the use-def map also records which defs remain visible at the end of the
scope, and considers these the publicly-visible definitions of the
symbol (see below).

Major items left as TODOs in this PR, to be done in follow-up PRs:

1) Free/global references aren't supported yet (only lookup based on
definitions in current scope), which means the override-check example
doesn't currently work. This is the first thing I'll fix as follow-up to
this PR.

2) Control flow outside of if statements and expressions.

3) Type narrowing.

There are also some smaller relevant changes here:

1) Eliminate `Option` in the return type of member lookups; instead
always return `Type::Unbound` for a name we can't find. Also use
`Type::Unbound` for modules we can't resolve (not 100% sure about this
one yet.)

2) Eliminate the use of the terms "public" and "root" to refer to
module-global scope or symbols. Instead consistently use the term
"module-global". It's longer, but it's the clearest, and the most
consistent with typical Python terminology. In particular I don't like
"public" for this use because it has other implications around author
intent (is an underscore-prefixed module-global symbol "public"?). And
"root" is just not commonly used for this in Python.

3) Eliminate the `PublicSymbol` Salsa ingredient. Many non-module-global
symbols can also be seen from other scopes (e.g. by a free var in a
nested scope, or by class attribute access), and thus need to have a
"public type" (that is, the type not as seen from a particular use in
the control flow of the same scope, but the type as seen from some other
scope.) So all symbols need to have a "public type" (here I want to keep
the use of the term "public", unless someone has a better term to
suggest -- since it's "public type of a symbol" and not "public symbol"
the confusion with e.g. initial underscores is less of an issue.) At
least initially, I would like to try not having special handling for
module-global symbols vs other symbols.

4) Switch to using "definitions that reach end of scope" rather than
"all definitions" in determining the public type of a symbol. I'm
convinced that in general this is the right way to go. We may want to
refine this further in future for some free-variable cases, but it can
be changed purely by making changes to the building of the use-def map
(the `public_definitions` index in it), without affecting any other
code. One consequence of combining this with no control-flow support
(just last-definition-wins) is that some inference tests now give more
wrong-looking results; I left TODO comments on these tests to fix them
when control flow is added.

And some potential areas for consideration in the future:

1) Should `symbol_ty` be a Salsa query? This would require making all
symbols a Salsa ingredient, and tracking even more dependencies. But it
would save some repeated reconstruction of unions, for symbols with
multiple public definitions. For now I'm not making it a query, but open
to changing this in future with actual perf evidence that it's better.
2024-07-16 11:02:30 -07:00
Alex Waygood
5b21922420
[red-knot] Add more stress tests for module resolver invalidation (#12272) 2024-07-10 14:34:06 +00:00
Micha Reiser
ac04380f36
[red-knot] Rename FileSystem to System (#12214) 2024-07-09 07:20:51 +00:00
Alex Waygood
a62a432a48
[red-knot] Respect typeshed's VERSIONS file when resolving stdlib modules (#12141) 2024-07-05 22:43:31 +00:00
Carl Meyer
0e44235981
[red-knot] intern types using Salsa (#12061)
Intern types using Salsa interning instead of in the `TypeInference`
result.

This eliminates the need for `TypingContext`, and also paves the way for
finer-grained type inference queries.
2024-07-05 12:16:37 -07:00
Micha Reiser
4d385b60c8
[red-knot] Migrate CLI to Salsa (#11972) 2024-07-04 07:23:45 +00:00
Micha Reiser
262053f85c
[red-knot]: Implement HasTy for Alias (#11971) 2024-07-04 07:17:10 +00:00
Micha Reiser
3ce8b9fcae
Make Definition a salsa-ingredient (#12151) 2024-07-04 06:46:08 +00:00
Micha Reiser
dcb9523b1e
Address review feedback from 11963 (#12145) 2024-07-02 09:05:55 +02:00
Micha Reiser
25080acb7a
[red-knot] Introduce ExpressionNodeKey to improve typing of expression_map (#12142) 2024-07-01 16:15:53 +02:00
Micha Reiser
228b1c4235
[red-knot] Remove Scope::name (#12137) 2024-07-01 15:55:50 +02:00
Micha Reiser
955138b74a
Refactor ast_ids traits to take ScopeId instead of VfsFile plus FileScopeId. (#12139) 2024-07-01 15:50:07 +02:00
Micha Reiser
37f260b5af
Introduce HasTy trait and SemanticModel facade (#11963) 2024-07-01 14:48:27 +02:00
Micha Reiser
5109b50bb3
Use CompactString for Identifier (#12101) 2024-07-01 10:06:02 +02:00
Alex Waygood
736a4ead14
[red-knot] Move module-resolution logic to its own crate (#11964) 2024-06-21 13:25:44 +00:00
Micha Reiser
927069c12f
[red-knot] Upgrade to Salsa 3.0 (#11952) 2024-06-20 20:19:16 +01:00
Micha Reiser
b456051be8
[red-knot] Add tracing to Salsa queries (#11949) 2024-06-20 13:33:41 +02:00
Micha Reiser
2dfbf118d7
[red-knot] Extract red_knot_python_semantic crate (#11926) 2024-06-20 13:24:24 +02:00