[red-knot] per-definition inference, use-def maps (#12269)

Implements definition-level type inference, with basic control flow (only if statements and if expressions so far) in Salsa. There are a couple key ideas here: 1) We can do type inference queries at any of three region granularities: an entire scope, a single definition, or a single expression. These are represented by the `InferenceRegion` enum, and the entry points are the salsa queries `infer_scope_types`, `infer_definition_types`, and `infer_expression_types`. Generally per-scope will be used for scopes that we are directly checking and per-definition will be used anytime we are looking up symbol types from another module/scope. Per-expression should be uncommon: used only for the RHS of an unpacking or multi-target assignment (to avoid re-inferring the RHS once per symbol defined in the assignment) and for test nodes in type narrowing (e.g. the `test` of an `If` node). All three queries return a `TypeInference` with a map of types for all definitions and expressions within their region. If you do e.g. scope-level inference, when it hits a definition, or an independently-inferable expression, it should use the relevant query (which may already be cached) to get all types within the smaller region. This avoids double-inferring smaller regions, even though larger regions encompass smaller ones. 2) Instead of building a control-flow graph and lazily traversing it to find definitions which reach a use of a name (which is O(n^2) in the worst case), instead semantic indexing builds a use-def map, where every use of a name knows which definitions can reach that use. We also no longer track all definitions of a symbol in the symbol itself; instead the use-def map also records which defs remain visible at the end of the scope, and considers these the publicly-visible definitions of the symbol (see below). Major items left as TODOs in this PR, to be done in follow-up PRs: 1) Free/global references aren't supported yet (only lookup based on definitions in current scope), which means the override-check example doesn't currently work. This is the first thing I'll fix as follow-up to this PR. 2) Control flow outside of if statements and expressions. 3) Type narrowing. There are also some smaller relevant changes here: 1) Eliminate `Option` in the return type of member lookups; instead always return `Type::Unbound` for a name we can't find. Also use `Type::Unbound` for modules we can't resolve (not 100% sure about this one yet.) 2) Eliminate the use of the terms "public" and "root" to refer to module-global scope or symbols. Instead consistently use the term "module-global". It's longer, but it's the clearest, and the most consistent with typical Python terminology. In particular I don't like "public" for this use because it has other implications around author intent (is an underscore-prefixed module-global symbol "public"?). And "root" is just not commonly used for this in Python. 3) Eliminate the `PublicSymbol` Salsa ingredient. Many non-module-global symbols can also be seen from other scopes (e.g. by a free var in a nested scope, or by class attribute access), and thus need to have a "public type" (that is, the type not as seen from a particular use in the control flow of the same scope, but the type as seen from some other scope.) So all symbols need to have a "public type" (here I want to keep the use of the term "public", unless someone has a better term to suggest -- since it's "public type of a symbol" and not "public symbol" the confusion with e.g. initial underscores is less of an issue.) At least initially, I would like to try not having special handling for module-global symbols vs other symbols. 4) Switch to using "definitions that reach end of scope" rather than "all definitions" in determining the public type of a symbol. I'm convinced that in general this is the right way to go. We may want to refine this further in future for some free-variable cases, but it can be changed purely by making changes to the building of the use-def map (the `public_definitions` index in it), without affecting any other code. One consequence of combining this with no control-flow support (just last-definition-wins) is that some inference tests now give more wrong-looking results; I left TODO comments on these tests to fix them when control flow is added. And some potential areas for consideration in the future: 1) Should `symbol_ty` be a Salsa query? This would require making all symbols a Salsa ingredient, and tracking even more dependencies. But it would save some repeated reconstruction of unions, for symbols with multiple public definitions. For now I'm not making it a query, but open to changing this in future with actual perf evidence that it's better.
2025-09-30 22:01:47 +00:00 · 2024-07-16 11:02:30 -07:00 · 2024-07-16 11:02:30 -07:00 · 595b1aa4a1
commit 595b1aa4a1
parent 30cef67b45
17 changed files with 1488 additions and 815 deletions
--- a/crates/red_knot_python_semantic/src/semantic_model.rs
+++ b/crates/red_knot_python_semantic/src/semantic_model.rs
@ -4,9 +4,8 @@ use ruff_python_ast as ast;
 use ruff_python_ast::{Expr, ExpressionRef, StmtClassDef};

 use crate::semantic_index::ast_ids::HasScopedAstId;
-use crate::semantic_index::symbol::PublicSymbolId;
-use crate::semantic_index::{public_symbol, semantic_index};
-use crate::types::{infer_types, public_symbol_ty, Type};
+use crate::semantic_index::semantic_index;
+use crate::types::{definition_ty, infer_scope_types, module_global_symbol_ty_by_name, Type};
 use crate::Db;

 pub struct SemanticModel<'db> {
@ -29,12 +28,8 @@ impl<'db> SemanticModel<'db> {
        resolve_module(self.db.upcast(), module_name)
    }

-    pub fn public_symbol(&self, module: &Module, symbol_name: &str) -> Option<PublicSymbolId<'db>> {
-        public_symbol(self.db, module.file(), symbol_name)
-    }
-
-    pub fn public_symbol_ty(&self, symbol: PublicSymbolId<'db>) -> Type {
-        public_symbol_ty(self.db, symbol)
+    pub fn module_global_symbol_ty(&self, module: &Module, symbol_name: &str) -> Type<'db> {
+        module_global_symbol_ty_by_name(self.db, module.file(), symbol_name)
    }
 }

@ -53,7 +48,7 @@ impl HasTy for ast::ExpressionRef<'_> {
        let scope = file_scope.to_scope_id(model.db, model.file);

        let expression_id = self.scoped_ast_id(model.db, scope);
-        infer_types(model.db, scope).expression_ty(expression_id)
+        infer_scope_types(model.db, scope).expression_ty(expression_id)
    }
 }

@ -145,11 +140,7 @@ impl HasTy for ast::StmtFunctionDef {
    fn ty<'db>(&self, model: &SemanticModel<'db>) -> Type<'db> {
        let index = semantic_index(model.db, model.file);
        let definition = index.definition(self);
-
-        let scope = definition.scope(model.db).to_scope_id(model.db, model.file);
-        let types = infer_types(model.db, scope);
-
-        types.definition_ty(definition)
+        definition_ty(model.db, definition)
    }
 }

@ -157,11 +148,7 @@ impl HasTy for StmtClassDef {
    fn ty<'db>(&self, model: &SemanticModel<'db>) -> Type<'db> {
        let index = semantic_index(model.db, model.file);
        let definition = index.definition(self);
-
-        let scope = definition.scope(model.db).to_scope_id(model.db, model.file);
-        let types = infer_types(model.db, scope);
-
-        types.definition_ty(definition)
+        definition_ty(model.db, definition)
    }
 }

@ -169,11 +156,7 @@ impl HasTy for ast::Alias {
    fn ty<'db>(&self, model: &SemanticModel<'db>) -> Type<'db> {
        let index = semantic_index(model.db, model.file);
        let definition = index.definition(self);
-
-        let scope = definition.scope(model.db).to_scope_id(model.db, model.file);
-        let types = infer_types(model.db, scope);
-
-        types.definition_ty(definition)
+        definition_ty(model.db, definition)
    }
 }