We have a handy `to_meta_type` that does the right thing for class
instances, and also works for all of the other types that are “instances
of” something. Unless I'm missing something, this should let us get rid
of the catch-all clause in one fell swoop.
cf #14548
## Summary
I'm currently on the fence about landing the #14760 PR because it's
unclear how we'd support tracking used and unused suppression comments
in a performant way:
* Salsa adds an "untracked" dependency to every query reading
accumulated values. This has the effect that the query re-runs on every
revision. For example, a possible future query
`unused_suppression_comments(db, file)` would re-run on every
incremental change and for every file. I don't expect the operation
itself to be expensive, but it all adds up in a project with 100k+ files
* Salsa collects the accumulated values by traversing the entire query
dependency graph. It can skip over sub-graphs if it is known that they
contain no accumulated values. This makes accumulators a great tool for
when they are rare; diagnostics are a good example. Unfortunately,
suppressions are more common, and they often appear in many different
files, making the "skip over subgraphs" optimization less effective.
Because of that, I want to wait to adopt salsa accumulators for type
check diagnostics (we could start using them for other diagnostics)
until we have very specific reasons that justify regressing incremental
check performance.
This PR does a "small" refactor that brings us closer to what I have in
#14760 but without using accumulators. To emit a diagnostic, a method
needs:
* Access to the db
* Access to the currently checked file
This PR introduces a new `InferContext` that holds on to the db, the
current file, and the reported diagnostics. It replaces the
`TypeCheckDiagnosticsBuilder`. We pass the `InferContext` instead of the
`db` to methods that *might* emit diagnostics. This simplifies some of
the `Outcome` methods, which can now be called with a context instead of
a `db` and the diagnostics builder. Having the `db` and the file on a
single type like this would also be useful when using accumulators.
This PR doesn't solve the issue that the `Outcome` types feel somewhat
complicated nor that it can be annoying when you need to report a
`Diagnostic,` but you don't have access to an `InferContext` (or the
file). However, I also believe that accumulators won't solve these
problems because:
* Even with accumulators, it's necessary to have a reference to the file
that's being checked. The struggle would be to get a reference to that
file rather than getting a reference to `InferContext`.
* Users of the `HasTy` trait (e.g., a linter) don't want to bother
getting the `File` when calling `Type::return_ty` because they aren't
interested in the created diagnostics. They just want to know what
calling the current expression would return (and if it even is a
callable). This is what the different methods of `Outcome` enable today.
I can ask for the return type without needing extra data that's only
relevant for emitting a diagnostic.
A shortcoming of this approach is that it is now a bit confusing when to
pass `db` and when an `InferContext`. An option is that we'd make the
`file` on `InferContext` optional (it won't collect any diagnostics if
`None`) and change all methods on `Type` to take `InferContext` as the
first argument instead of a `db`. I'm interested in your opinion on
this.
Accumulators are definitely harder to use incorrectly because they
remove the need to merge the diagnostics explicitly and there's no risk
that we accidentally merge the diagnostics twice, resulting in
duplicated diagnostics. I still value performance more over making our
life slightly easier.
This tweaks the new semantics from #15026 a bit when a symbol could be
interpreted both as an attribute and a submodule of a package. For
`from...import`, we should actually prioritize the attribute, because of
how the statement itself is implemented [1].
> 1. check if the imported module has an attribute by that name
> 2. if not, attempt to import a submodule with that name and then check
the imported module again for that attribute
[1] https://docs.python.org/3/reference/simple_stmts.html#the-import-statement
## Summary
Fixes#14550.
Add `AlwaysTruthy` and `AlwaysFalsy` types, representing the set of objects whose `__bool__` method can only ever return `True` or `False`, respectively, and narrow `if x` and `if not x` accordingly.
## Test Plan
- New Markdown test for truthiness narrowing `narrow/truthiness.md`
- unit tests in `types.rs` and `builders.rs` (`cargo test --package
red_knot_python_semantic --lib -- types`)
## Summary
Fixes https://github.com/astral-sh/ruff/issues/15027
The `MemoryFileSystem::write_file` API automatically creates
non-existing ancestor directoryes
but we failed to update the status of the now created ancestor
directories in the `Files` data structure.
## Test Plan
Tested that the case in https://github.com/astral-sh/ruff/issues/15027
now passes regardless of whether the *Simple* case is commented out or
not
## Summary
This PR updates the logic when raising conflicting declarations
diagnostic to avoid the undeclared path if present.
The conflicting declaration diagnostics is added when there are two or
more declarations in the control flow path of a definition whose type
isn't equivalent to each other. This can be seen in the following
example:
```py
if flag:
x: int
x = 1 # conflicting-declarations: Unknown, int
```
After this PR, we'd avoid considering "Unknown" as part of the
conflicting declarations. This means we'd still flag it for the
following case:
```py
if flag:
x: int
else:
x: str
x = 1 # conflicting-declarations: int, str
```
A solution that's local to the exception control flow was also explored
which required updating the logic for merging the flow snapshot to avoid
considering declarations using a flag. This is preserved here:
https://github.com/astral-sh/ruff/compare/dhruv/control-flow-no-declarations?expand=1.
The main motivation to avoid that is we don't really understand what the
user experience is w.r.t. the Unknown type and the
conflicting-declaration diagnostics. This makes us unsure on what the
right semantics are as to whether that diagnostics should be raised or
not and when to raise them. For now, we've decided to move forward with
this PR and could decide to adopt another solution or remove the
conflicting-declaration diagnostics in the future.
Closes: #13966
## Test Plan
Update the existing mdtest case. Add an additional case specific to
exception control flow to verify that the diagnostic is not being raised
now.
When importing a nested module, we were correctly creating a binding for
the top-most parent, but we were binding that to the nested module, not
to that parent module. Moreover, we weren't treating those submodules as
members of their containing parents. This PR addresses both issues, so
that nested imports work as expected.
As discussed in ~Slack~ whatever chat app I find myself in these days
😄, this requires keeping track of which modules have been imported
within the current file, so that when we resolve member access on a
module reference, we can see if that member has been imported as a
submodule. If so, we return the submodule reference immediately, instead
of checking whether the parent module's definition defines the symbol.
This is currently done in a flow insensitive manner. The `SemanticIndex`
now tracks all of the modules that are imported (via `import`, not via
`from...import`). The member access logic mentioned above currently only
considers module imports in the file containing the attribute
expression.
---------
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
A class is an instance of its metaclass, so `ClassLiteral("ABC")` is not
disjoint from `Instance("ABCMeta")`. However, we erroneously consider
the two types disjoint on the `main` branch. This PR fixes that.
This bug was uncovered by adding some more core types to the property
tests that provide coverage for classes that have custom metaclasses.
The additions to the property tests are included in this PR.
## Test Plan
New unit tests and property tests added. Tested with:
- `cargo test -p red_knot_python_semantic`
- `QUICKCHECK_TESTS=100000 cargo test -p red_knot_python_semantic --
--ignored types::property_tests::stable`
The assignability property test fails on this branch, but that's a known
issue that exists on `main`, due to
https://github.com/astral-sh/ruff/issues/14899.
## Summary
Teach red-knot that `type[...]` is always disjoint from `None` and from
`LiteralString`. Fixes#14925.
This should properly be generalized to "all instances of final types
which are not subclasses of `type`", but until we support finality,
hardcoding `None` (which is known to be final) allows us to fix the
subtype transitivity property test.
## Test Plan
Existing tests pass, added new unit tests for `is_disjoint_from` and
`is_subtype_of`.
`QUICKCHECK_TESTS=100000 cargo test -p red_knot_python_semantic --
--ignored types::property_tests::stable` fails only the "assignability
is reflexive" test, which is known to fail on `main` (#14899).
The same command, with `property_tests.rs` edited to prevent generating
intersection tests (the cause of #14899), passes all quickcheck tests.
## Summary
Resolves#14922.
## Test Plan
Markdown tests.
---------
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Carl Meyer <carl@astral.sh>
## Summary
This is not strictly required yet, but makes these tests future-proof.
They need a `python-version` requirement as they rely on language
features that are not available in 3.9.
## Summary
Add support for `typing.TYPE_CHECKING` and
`typing_extensions.TYPE_CHECKING`.
relates to: https://github.com/astral-sh/ruff/issues/14170
## Test Plan
New Markdown-based tests
## Summary
This PR extends the mdtest configuration with a `log` setting that can
be any of:
* `true`: Enables tracing
* `false`: Disables tracing (default)
* String: An ENV_FILTER similar to `RED_KNOT_LOG`
```toml
log = true
```
Closes https://github.com/astral-sh/ruff/issues/13865
## Test Plan
I changed a test and tried `log=true`, `log=false`, and `log=INFO`
## Summary
This PR renames the `--custom-typeshed-dir`, `target-version`, and
`--current-directory` cli options to `--typeshed`,
`--python-version`, and `--project` as discussed in the CLI proposal
document.
I added aliases for `--target-version` (for Ruff compat) and
`--custom-typeshed-dir` (for Alex)
## Test Plan
Long help
```
An extremely fast Python type checker.
Usage: red_knot [OPTIONS] [COMMAND]
Commands:
server Start the language server
help Print this message or the help of the given subcommand(s)
Options:
--project <PROJECT>
Run the command within the given project directory.
All `pyproject.toml` files will be discovered by walking up the directory tree from the project root, as will the project's virtual environment (`.venv`).
Other command-line arguments (such as relative paths) will be resolved relative to the current working directory."#,
--venv-path <PATH>
Path to the virtual environment the project uses.
If provided, red-knot will use the `site-packages` directory of this virtual environment to resolve type information for the project's third-party dependencies.
--typeshed-path <PATH>
Custom directory to use for stdlib typeshed stubs
--extra-search-path <PATH>
Additional path to use as a module-resolution source (can be passed multiple times)
--python-version <VERSION>
Python version to assume when resolving types
[possible values: 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13]
-v, --verbose...
Use verbose output (or `-vv` and `-vvv` for more verbose output)
-W, --watch
Run in watch mode by re-running whenever files change
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
```
Short help
```
An extremely fast Python type checker.
Usage: red_knot [OPTIONS] [COMMAND]
Commands:
server Start the language server
help Print this message or the help of the given subcommand(s)
Options:
--project <PROJECT> Run the command within the given project directory
--venv-path <PATH> Path to the virtual environment the project uses
--typeshed-path <PATH> Custom directory to use for stdlib typeshed stubs
--extra-search-path <PATH> Additional path to use as a module-resolution source (can be passed multiple times)
--python-version <VERSION> Python version to assume when resolving types [possible values: 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13]
-v, --verbose... Use verbose output (or `-vv` and `-vvv` for more verbose output)
-W, --watch Run in watch mode by re-running whenever files change
-h, --help Print help (see more with '--help')
-V, --version Print version
```
---------
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
## Summary
Regression test(s) for something that broken while implementing #14759.
We have similar tests for other control flow elements, but feel free to
let me know if this seems superfluous.
## Test Plan
New mdtests
## Summary
Using `typing.LiteralString` breaks as soon as we understand
`sys.version_info` branches, as it's only available in 3.11 and later.
## Test Plan
Made sure it didn't fail on my #14759 branch anymore.
We support using `typing.Type[]` as a base class (and we have tests for
it), but not yet `builtins.type[]`. At some point we should fix that,
but I don't think it';s worth spending much time on now (and it might be
easier once we've implemented generics?). This PR just adds a failing
test with a TODO.
## Summary
This is the third and last PR in this stack that adds support for
toggling lints at a per-rule level.
This PR introduces a new `LintRegistry`, a central index of known lints.
The registry is required because we want to support lint rules from many
different crates but need a way to look them up by name, e.g., when
resolving a lint from a name in the configuration or analyzing a
suppression comment.
Adding a lint now requires two steps:
1. Declare the lint with `declare_lint`
2. Register the lint in the registry inside the `register_lints`
function.
I considered some more involved macros to avoid changes in two places.
Still, I ultimately decided against it because a) it's just two places
and b) I'd expect that registering a type checker lint will differ from
registering a lint that runs as a rule in the linter. I worry that any
more opinionated design could limit our options when working on the
linter, so I kept it simple.
The second part of this PR is the `RuleSelection`. It stores which lints
are enabled and what severity they should use for created diagnostics.
For now, the `RuleSelection` always gets initialized with all known
lints and it uses their default level.
## Linter crates
Each crate that defines lints should export a `register_lints` function
that accepts a `&mut LintRegistryBuilder` to register all its known
lints in the registry. This should make registering all known lints in a
top-level crate easy: Just call `register_lints` of every crate that
defines lint rules.
I considered defining a `LintCollection` trait and even some fancy
macros to accomplish the same but decided to go for this very simplistic
approach for now. We can add more abstraction once needed.
## Lint rules
This is a bit hand-wavy. I don't have a good sense for how our linter
infrastructure will look like, but I expect we'll need a way to register
the rules that should run as part of the red knot linter. One way is to
keep doing what Ruff does by having one massive `checker` and each lint
rule adds a call to itself in the relevant AST visitor methods. An
alternative is that we have a `LintRule` trait that provides common
hooks and implementations will be called at the "right time". Such a
design would need a way to register all known lint implementations,
possibly with the lint. This is where we'd probably want a dedicated
`register_rule` method. A third option is that lint rules are handled
separately from the `LintRegistry` and are specific to the linter crate.
The current design should be flexible enough to support the three
options.
## Documentation generation
The documentation for all known lints can be generated by creating a
factory, registering all lints by calling the `register_lints` methods,
and then querying the registry for the metadata.
## Deserialization and Schema generation
I haven't fully decided what the best approach is when it comes to
deserializing lint rule names:
* Reject invalid names in the deserializer. This gives us error messages
with line and column numbers (by serde)
* Don't validate lint rule names during deserialization; defer the
validation until the configuration is resolved. This gives us more
control over handling the error, e.g. emit a warning diagnostic instead
of aborting when a rule isn't known.
One technical challenge for both deserialization and schema generation
is that the `Deserialize` and `JSONSchema` traits do not allow passing
the `LintRegistry`, which is required to look up the lints by name. I
suggest that we either rely on the salsa db being set for the current
thread (`salsa::Attach`) or build our own thread-local storage for the
`LintRegistry`. It's the caller's responsibility to make the lint
registry available before calling `Deserialize` or `JSONSchema`.
## CLI support
I prefer deferring adding support for enabling and disabling lints from
the CLI for now because I think it will be easier
to add once I've figured out how to handle configurations.
## Bitset optimization
Ruff tracks the enabled rules using a cheap copyable `Bitset` instead of
a hash map. This helped improve performance by a few percent (see
https://github.com/astral-sh/ruff/pull/3606). However, this approach is
no longer possible because lints have no "cheap" way to compute their
index inside the registry (other than using a hash map).
We could consider doing something similar to Salsa where each
`LintMetadata` stores a `LazyLintIndex`.
```
pub struct LazyLintIndex {
cached: OnceLock<(Nonce, LintIndex)>
}
impl LazyLintIndex {
pub fn get(registry: &LintRegistry, lint: &'static LintMetadata) {
let (nonce, index) = self.cached.get_or_init(|| registry.lint_index(lint));
if registry.nonce() == nonce {
index
} else {
registry.lint_index(lint)
}
}
```
Each registry keeps a map from `LintId` to `LintIndex` where `LintIndex`
is in the range of `0...registry.len()`. The `LazyLintIndex` is based on
the assumption that every program has exactly **one** registry. This
assumption allows to cache the `LintIndex` directly on the
`LintMetadata`. The implementation falls back to the "slow" path if
there is more than one registry at runtime.
I was very close to implementing this optimization because it's kind of
fun to implement. I ultimately decided against it because it adds
complexity and I don't think it's worth doing in Red Knot today:
* Red Knot only queries the rule selection when deciding whether or not
to emit a diagnostic. It is rarely used to detect if a certain code
block should run. This is different from Ruff where the rule selection
is queried many times for every single AST node to determine which rules
*should* run.
* I'm not sure if a 2-3% performance improvement is worth the complexity
I suggest revisiting this decision when working on the linter where a
fast path for deciding if a rule is enabled might be more important (but
that depends on how lint rules are implemented)
## Test Plan
I removed a lint from the default rule registry, and the MD tests
started failing because the diagnostics were no longer emitted.