[red-knot] per-definition inference, use-def maps (#12269)

Implements definition-level type inference, with basic control flow
(only if statements and if expressions so far) in Salsa.

There are a couple key ideas here:

1) We can do type inference queries at any of three region
granularities: an entire scope, a single definition, or a single
expression. These are represented by the `InferenceRegion` enum, and the
entry points are the salsa queries `infer_scope_types`,
`infer_definition_types`, and `infer_expression_types`. Generally
per-scope will be used for scopes that we are directly checking and
per-definition will be used anytime we are looking up symbol types from
another module/scope. Per-expression should be uncommon: used only for
the RHS of an unpacking or multi-target assignment (to avoid
re-inferring the RHS once per symbol defined in the assignment) and for
test nodes in type narrowing (e.g. the `test` of an `If` node). All
three queries return a `TypeInference` with a map of types for all
definitions and expressions within their region. If you do e.g.
scope-level inference, when it hits a definition, or an
independently-inferable expression, it should use the relevant query
(which may already be cached) to get all types within the smaller
region. This avoids double-inferring smaller regions, even though larger
regions encompass smaller ones.

2) Instead of building a control-flow graph and lazily traversing it to
find definitions which reach a use of a name (which is O(n^2) in the
worst case), instead semantic indexing builds a use-def map, where every
use of a name knows which definitions can reach that use. We also no
longer track all definitions of a symbol in the symbol itself; instead
the use-def map also records which defs remain visible at the end of the
scope, and considers these the publicly-visible definitions of the
symbol (see below).

Major items left as TODOs in this PR, to be done in follow-up PRs:

1) Free/global references aren't supported yet (only lookup based on
definitions in current scope), which means the override-check example
doesn't currently work. This is the first thing I'll fix as follow-up to
this PR.

2) Control flow outside of if statements and expressions.

3) Type narrowing.

There are also some smaller relevant changes here:

1) Eliminate `Option` in the return type of member lookups; instead
always return `Type::Unbound` for a name we can't find. Also use
`Type::Unbound` for modules we can't resolve (not 100% sure about this
one yet.)

2) Eliminate the use of the terms "public" and "root" to refer to
module-global scope or symbols. Instead consistently use the term
"module-global". It's longer, but it's the clearest, and the most
consistent with typical Python terminology. In particular I don't like
"public" for this use because it has other implications around author
intent (is an underscore-prefixed module-global symbol "public"?). And
"root" is just not commonly used for this in Python.

3) Eliminate the `PublicSymbol` Salsa ingredient. Many non-module-global
symbols can also be seen from other scopes (e.g. by a free var in a
nested scope, or by class attribute access), and thus need to have a
"public type" (that is, the type not as seen from a particular use in
the control flow of the same scope, but the type as seen from some other
scope.) So all symbols need to have a "public type" (here I want to keep
the use of the term "public", unless someone has a better term to
suggest -- since it's "public type of a symbol" and not "public symbol"
the confusion with e.g. initial underscores is less of an issue.) At
least initially, I would like to try not having special handling for
module-global symbols vs other symbols.

4) Switch to using "definitions that reach end of scope" rather than
"all definitions" in determining the public type of a symbol. I'm
convinced that in general this is the right way to go. We may want to
refine this further in future for some free-variable cases, but it can
be changed purely by making changes to the building of the use-def map
(the `public_definitions` index in it), without affecting any other
code. One consequence of combining this with no control-flow support
(just last-definition-wins) is that some inference tests now give more
wrong-looking results; I left TODO comments on these tests to fix them
when control flow is added.

And some potential areas for consideration in the future:

1) Should `symbol_ty` be a Salsa query? This would require making all
symbols a Salsa ingredient, and tracking even more dependencies. But it
would save some repeated reconstruction of unions, for symbols with
multiple public definitions. For now I'm not making it a query, but open
to changing this in future with actual perf evidence that it's better.
This commit is contained in:
Carl Meyer 2024-07-16 11:02:30 -07:00 committed by GitHub
parent 30cef67b45
commit 595b1aa4a1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
17 changed files with 1488 additions and 815 deletions

View file

@ -4,9 +4,8 @@ use ruff_python_ast as ast;
use ruff_python_ast::{Expr, ExpressionRef, StmtClassDef};
use crate::semantic_index::ast_ids::HasScopedAstId;
use crate::semantic_index::symbol::PublicSymbolId;
use crate::semantic_index::{public_symbol, semantic_index};
use crate::types::{infer_types, public_symbol_ty, Type};
use crate::semantic_index::semantic_index;
use crate::types::{definition_ty, infer_scope_types, module_global_symbol_ty_by_name, Type};
use crate::Db;
pub struct SemanticModel<'db> {
@ -29,12 +28,8 @@ impl<'db> SemanticModel<'db> {
resolve_module(self.db.upcast(), module_name)
}
pub fn public_symbol(&self, module: &Module, symbol_name: &str) -> Option<PublicSymbolId<'db>> {
public_symbol(self.db, module.file(), symbol_name)
}
pub fn public_symbol_ty(&self, symbol: PublicSymbolId<'db>) -> Type {
public_symbol_ty(self.db, symbol)
pub fn module_global_symbol_ty(&self, module: &Module, symbol_name: &str) -> Type<'db> {
module_global_symbol_ty_by_name(self.db, module.file(), symbol_name)
}
}
@ -53,7 +48,7 @@ impl HasTy for ast::ExpressionRef<'_> {
let scope = file_scope.to_scope_id(model.db, model.file);
let expression_id = self.scoped_ast_id(model.db, scope);
infer_types(model.db, scope).expression_ty(expression_id)
infer_scope_types(model.db, scope).expression_ty(expression_id)
}
}
@ -145,11 +140,7 @@ impl HasTy for ast::StmtFunctionDef {
fn ty<'db>(&self, model: &SemanticModel<'db>) -> Type<'db> {
let index = semantic_index(model.db, model.file);
let definition = index.definition(self);
let scope = definition.scope(model.db).to_scope_id(model.db, model.file);
let types = infer_types(model.db, scope);
types.definition_ty(definition)
definition_ty(model.db, definition)
}
}
@ -157,11 +148,7 @@ impl HasTy for StmtClassDef {
fn ty<'db>(&self, model: &SemanticModel<'db>) -> Type<'db> {
let index = semantic_index(model.db, model.file);
let definition = index.definition(self);
let scope = definition.scope(model.db).to_scope_id(model.db, model.file);
let types = infer_types(model.db, scope);
types.definition_ty(definition)
definition_ty(model.db, definition)
}
}
@ -169,11 +156,7 @@ impl HasTy for ast::Alias {
fn ty<'db>(&self, model: &SemanticModel<'db>) -> Type<'db> {
let index = semantic_index(model.db, model.file);
let definition = index.definition(self);
let scope = definition.scope(model.db).to_scope_id(model.db, model.file);
let types = infer_types(model.db, scope);
types.definition_ty(definition)
definition_ty(model.db, definition)
}
}