Commit graph

18 commits

Author SHA1 Message Date
Micha Reiser
29927f2b59
Update Rust toolchain to 1.88 and MSRV to 1.86 (#19011) 2025-06-28 20:24:00 +02:00
Ibraheem Ahmed
6f7b1c9bb3
[ty] Add environment variable to dump Salsa memory usage stats (#18928)
## Summary

Setting `TY_MEMORY_REPORT=full` will generate and print a memory usage
report to the CLI after a `ty check` run:

```
=======SALSA STRUCTS=======
`Definition`                                       metadata=7.24MB   fields=17.38MB  count=181062
`Expression`                                       metadata=4.45MB   fields=5.94MB   count=92804
`member_lookup_with_policy_::interned_arguments`   metadata=1.97MB   fields=2.25MB   count=35176
...
=======SALSA QUERIES=======
`File -> ty_python_semantic::semantic_index::SemanticIndex`
    metadata=11.46MB  fields=88.86MB  count=1638
`Definition -> ty_python_semantic::types::infer::TypeInference`
    metadata=24.52MB  fields=86.68MB  count=146018
`File -> ruff_db::parsed::ParsedModule`
    metadata=0.12MB   fields=69.06MB  count=1642
...
=======SALSA SUMMARY=======
TOTAL MEMORY USAGE: 577.61MB
    struct metadata = 29.00MB
    struct fields = 35.68MB
    memo metadata = 103.87MB
    memo fields = 409.06MB
```

Eventually, we should integrate these numbers into CI in some form. The
one limitation currently is that heap allocations in salsa structs (e.g.
interned values) are not tracked, but memoized values should have full
coverage. We may also want a peak memory usage counter (that accounts
for non-salsa memory), but that is relatively simple to profile manually
(e.g. `time -v ty check`) and would require a compile-time option to
avoid runtime overhead.
2025-06-26 21:27:51 +00:00
Micha Reiser
9ae698fe30
Switch to Rust 2024 edition (#18129) 2025-05-16 13:25:28 +02:00
Micha Reiser
fa628018b2
Use #[expect(lint)] over #[allow(lint)] where possible (#17822) 2025-05-03 21:20:31 +02:00
Douglas Creager
ba44e9de13
[red-knot] Don't use separate ID types for each alist (#16415)
Regardless of whether #16408 and #16311 pan out, this part is worth
pulling out as a separate PR.

Before, you had to define a new `IndexVec` index type for each type of
association list you wanted to create. Now there's a single index type
that's internal to the alist implementation, and you use `List<K, V>` to
store a handle to a particular list.

This also adds some property tests for the alist implementation.
2025-02-28 14:55:55 -05:00
Douglas Creager
fa76f6cbb2
[red-knot] Use arena-allocated association lists for narrowing constraints (#16306)
This PR adds an implementation of [association
lists](https://en.wikipedia.org/wiki/Association_list), and uses them to
replace the previous `BitSet`/`SmallVec` representation for narrowing
constraints.

An association list is a linked list of key/value pairs. We additionally
guarantee that the elements of an association list are sorted (by their
keys), and that they do not contain any entries with duplicate keys.

Association lists have fallen out of favor in recent decades, since you
often need operations that are inefficient on them. In particular,
looking up a random element by index is O(n), just like a linked list;
and looking up an element by key is also O(n), since you must do a
linear scan of the list to find the matching element. Luckily we don't
need either of those operations for narrowing constraints!

The typical implementation also suffers from poor cache locality and
high memory allocation overhead, since individual list cells are
typically allocated separately from the heap. We solve that last problem
by storing the cells of an association list in an `IndexVec` arena.

---------

Co-authored-by: Carl Meyer <carl@astral.sh>
2025-02-25 10:58:56 -05:00
Ibraheem Ahmed
69d86d1d69
Transition to salsa coarse-grained tracked structs (#15763)
## Summary

Transition to using coarse-grained tracked structs (depends on
https://github.com/salsa-rs/salsa/pull/657). For now, this PR doesn't
add any `#[tracked]` fields, meaning that any changes cause the entire
struct to be invalidated. It also changes `AstNodeRef` to be
compared/hashed by pointer address, instead of performing a deep AST
comparison.

## Test Plan

This yields a 10-15% improvement on my machine (though weirdly some runs
were 5-10% without being flagged as inconsistent by criterion, is there
some non-determinism involved?). It's possible that some of this is
unrelated, I'll try applying the patch to the current salsa version to
make sure.

---------

Co-authored-by: Micha Reiser <micha@reiser.io>
2025-02-11 11:38:50 +01:00
Carl Meyer
811f78d94d
[red-knot] small efficiency improvements and bugfixes to use-def map building (#12373)
Adds inference tests sufficient to give full test coverage of the
`UseDefMapBuilder::merge` method.

In the process I realized that we could implement visiting of if
statements in `SemanticBuilder` with fewer `snapshot`, `restore`, and
`merge` operations, so I restructured that visit a bit.

I also found one correctness bug in the `merge` method (it failed to
extend the given snapshot with "unbound" for any missing symbols,
meaning we would just lose the fact that the symbol could be unbound in
the merged-in path), and two efficiency bugs (if one of the ranges to
merge is empty, we can just use the other one, no need for copies, and
if the ranges are overlapping -- which can occur with nested branches --
we can still just merge them with no copies), and fixed all three.
2024-07-18 09:24:58 -07:00
Carl Meyer
595b1aa4a1
[red-knot] per-definition inference, use-def maps (#12269)
Implements definition-level type inference, with basic control flow
(only if statements and if expressions so far) in Salsa.

There are a couple key ideas here:

1) We can do type inference queries at any of three region
granularities: an entire scope, a single definition, or a single
expression. These are represented by the `InferenceRegion` enum, and the
entry points are the salsa queries `infer_scope_types`,
`infer_definition_types`, and `infer_expression_types`. Generally
per-scope will be used for scopes that we are directly checking and
per-definition will be used anytime we are looking up symbol types from
another module/scope. Per-expression should be uncommon: used only for
the RHS of an unpacking or multi-target assignment (to avoid
re-inferring the RHS once per symbol defined in the assignment) and for
test nodes in type narrowing (e.g. the `test` of an `If` node). All
three queries return a `TypeInference` with a map of types for all
definitions and expressions within their region. If you do e.g.
scope-level inference, when it hits a definition, or an
independently-inferable expression, it should use the relevant query
(which may already be cached) to get all types within the smaller
region. This avoids double-inferring smaller regions, even though larger
regions encompass smaller ones.

2) Instead of building a control-flow graph and lazily traversing it to
find definitions which reach a use of a name (which is O(n^2) in the
worst case), instead semantic indexing builds a use-def map, where every
use of a name knows which definitions can reach that use. We also no
longer track all definitions of a symbol in the symbol itself; instead
the use-def map also records which defs remain visible at the end of the
scope, and considers these the publicly-visible definitions of the
symbol (see below).

Major items left as TODOs in this PR, to be done in follow-up PRs:

1) Free/global references aren't supported yet (only lookup based on
definitions in current scope), which means the override-check example
doesn't currently work. This is the first thing I'll fix as follow-up to
this PR.

2) Control flow outside of if statements and expressions.

3) Type narrowing.

There are also some smaller relevant changes here:

1) Eliminate `Option` in the return type of member lookups; instead
always return `Type::Unbound` for a name we can't find. Also use
`Type::Unbound` for modules we can't resolve (not 100% sure about this
one yet.)

2) Eliminate the use of the terms "public" and "root" to refer to
module-global scope or symbols. Instead consistently use the term
"module-global". It's longer, but it's the clearest, and the most
consistent with typical Python terminology. In particular I don't like
"public" for this use because it has other implications around author
intent (is an underscore-prefixed module-global symbol "public"?). And
"root" is just not commonly used for this in Python.

3) Eliminate the `PublicSymbol` Salsa ingredient. Many non-module-global
symbols can also be seen from other scopes (e.g. by a free var in a
nested scope, or by class attribute access), and thus need to have a
"public type" (that is, the type not as seen from a particular use in
the control flow of the same scope, but the type as seen from some other
scope.) So all symbols need to have a "public type" (here I want to keep
the use of the term "public", unless someone has a better term to
suggest -- since it's "public type of a symbol" and not "public symbol"
the confusion with e.g. initial underscores is less of an issue.) At
least initially, I would like to try not having special handling for
module-global symbols vs other symbols.

4) Switch to using "definitions that reach end of scope" rather than
"all definitions" in determining the public type of a symbol. I'm
convinced that in general this is the right way to go. We may want to
refine this further in future for some free-variable cases, but it can
be changed purely by making changes to the building of the use-def map
(the `public_definitions` index in it), without affecting any other
code. One consequence of combining this with no control-flow support
(just last-definition-wins) is that some inference tests now give more
wrong-looking results; I left TODO comments on these tests to fix them
when control flow is added.

And some potential areas for consideration in the future:

1) Should `symbol_ty` be a Salsa query? This would require making all
symbols a Salsa ingredient, and tracking even more dependencies. But it
would save some repeated reconstruction of unions, for symbols with
multiple public definitions. For now I'm not making it a query, but open
to changing this in future with actual perf evidence that it's better.
2024-07-16 11:02:30 -07:00
Micha Reiser
b0b4706e2d
Red-knot: Track scopes per expression (#11754) 2024-06-05 17:53:26 +02:00
Charlie Marsh
af60d539ab
Move sub-crates to workspace dependencies (#11407)
## Summary

This matches the setup we use in `uv` and allows for consistency in the
`Cargo.toml` files.
2024-05-13 14:37:50 +00:00
Micha Reiser
341c2698a7
Run doctests as part of CI pipeline (#9939) 2024-02-12 10:18:58 +01:00
Charlie Marsh
9073220887
Make all dependencies workspace dependencies (#9333)
## Summary

This PR modifies our `Cargo.toml` files to use workspace dependencies
for _all_ dependencies, rather than the status quo of sporadically
trying to use workspace dependencies for those dependencies that are
used across multiple crates. I find the current situation more confusing
and harder to manage, since we have a mix of workspace and crate-local
dependencies, whereas this setup consistently uses the same approach for
all dependencies.
2024-01-02 13:41:59 +00:00
konsti
14e65afdc6
Update to Rust 1.74 and use new clippy lints table (#8722)
Update to [Rust
1.74](https://blog.rust-lang.org/2023/11/16/Rust-1.74.0.html) and use
the new clippy lints table.

The update itself introduced a new clippy lint about superfluous hashes
in raw strings, which got removed.

I moved our lint config from `rustflags` to the newly stabilized
[workspace.lints](https://doc.rust-lang.org/stable/cargo/reference/workspaces.html#the-lints-table).
One consequence is that we have to `unsafe_code = "warn"` instead of
"forbid" because the latter now actually bans unsafe code:

```
error[E0453]: allow(unsafe_code) incompatible with previous forbid
  --> crates/ruff_source_file/src/newlines.rs:62:17
   |
62 |         #[allow(unsafe_code)]
   |                 ^^^^^^^^^^^ overruled by previous forbid
   |
   = note: `forbid` lint level was set on command line
```

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2023-11-16 18:12:46 -05:00
Thomas de Zeeuw
0b963ddcfa
Add unreachable code rule (#5384)
Co-authored-by: Thomas de Zeeuw <thomas@astral.sh>
Co-authored-by: Micha Reiser <micha@reiser.io>
2023-07-04 14:27:23 +00:00
Charlie Marsh
716cab2f19
Run rustfmt on nightly to clean up erroneous comments (#5106)
## Summary

This PR runs `rustfmt` with a few nightly options as a one-time fix to
catch some malformatted comments. I ended up just running with:

```toml
condense_wildcard_suffixes = true
edition = "2021"
max_width = 100
normalize_comments = true
normalize_doc_attributes = true
reorder_impl_items = true
unstable_features = true
use_field_init_shorthand = true
```

Since these all seem like reasonable things to fix, so may as well while
I'm here.
2023-06-15 00:19:05 +00:00
Charlie Marsh
68b6d30c46
Use consistent Cargo.toml metadata in all crates (#5015) 2023-06-12 00:02:40 +00:00
Micha Reiser
652c644c2a
Introduce ruff_index crate (#4597) 2023-05-23 17:40:35 +02:00