Bind top-most parent when importing nested module (#14946)

When importing a nested module, we were correctly creating a binding for
the top-most parent, but we were binding that to the nested module, not
to that parent module. Moreover, we weren't treating those submodules as
members of their containing parents. This PR addresses both issues, so
that nested imports work as expected.

As discussed in ~Slack~ whatever chat app I find myself in these days
😄, this requires keeping track of which modules have been imported
within the current file, so that when we resolve member access on a
module reference, we can see if that member has been imported as a
submodule. If so, we return the submodule reference immediately, instead
of checking whether the parent module's definition defines the symbol.

This is currently done in a flow insensitive manner. The `SemanticIndex`
now tracks all of the modules that are imported (via `import`, not via
`from...import`). The member access logic mentioned above currently only
considers module imports in the file containing the attribute
expression.

---------

Co-authored-by: Carl Meyer <carl@astral.sh>
This commit is contained in:
Douglas Creager 2024-12-16 16:15:40 -05:00 committed by GitHub
parent 6d72be2683
commit 4ddf9228f6
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 308 additions and 28 deletions

View file

@ -25,3 +25,69 @@ reveal_type(D) # revealed: Literal[C]
```py path=b.py
class C: ...
```
## Nested
```py
import a.b
reveal_type(a.b.C) # revealed: Literal[C]
```
```py path=a/__init__.py
```
```py path=a/b.py
class C: ...
```
## Deeply nested
```py
import a.b.c
reveal_type(a.b.c.C) # revealed: Literal[C]
```
```py path=a/__init__.py
```
```py path=a/b/__init__.py
```
```py path=a/b/c.py
class C: ...
```
## Nested with rename
```py
import a.b as b
reveal_type(b.C) # revealed: Literal[C]
```
```py path=a/__init__.py
```
```py path=a/b.py
class C: ...
```
## Deeply nested with rename
```py
import a.b.c as c
reveal_type(c.C) # revealed: Literal[C]
```
```py path=a/__init__.py
```
```py path=a/b/__init__.py
```
```py path=a/b/c.py
class C: ...
```

View file

@ -0,0 +1,100 @@
# Tracking imported modules
These tests depend on how we track which modules have been imported. There are currently two
characteristics of our module tracking that can lead to inaccuracies:
- Imports are tracked on a per-file basis. At runtime, importing a submodule in one file makes that
submodule globally available via any reference to the containing package. We will flag an error
if a file tries to access a submodule without there being an import of that submodule _in that
same file_.
This is a purposeful decision, and not one we plan to change. If a module wants to re-export some
other module that it imports, there are ways to do that (tested below) that are blessed by the
typing spec and that are visible to our file-scoped import tracking.
- Imports are tracked flow-insensitively: submodule accesses are allowed and resolved if that
submodule is imported _anywhere in the file_. This handles the common case where all imports are
grouped at the top of the file, and is easiest to implement. We might revisit this decision and
track submodule imports flow-sensitively, in which case we will have to update the assertions in
some of these tests.
## Import submodule later in file
This test highlights our flow-insensitive analysis, since we access the `a.b` submodule before it
has been imported.
```py
import a
# Would be an error with flow-sensitive tracking
reveal_type(a.b.C) # revealed: Literal[C]
import a.b
```
```py path=a/__init__.py
```
```py path=a/b.py
class C: ...
```
## Rename a re-export
This test highlights how import tracking is local to each file, but specifically to the file where a
containing module is first referenced. This allows the main module to see that `q.a` contains a
submodule `b`, even though `a.b` is never imported in the main module.
```py
from q import a, b
reveal_type(b) # revealed: <module 'a.b'>
reveal_type(b.C) # revealed: Literal[C]
reveal_type(a.b) # revealed: <module 'a.b'>
reveal_type(a.b.C) # revealed: Literal[C]
```
```py path=a/__init__.py
```
```py path=a/b.py
class C: ...
```
```py path=q.py
import a as a
import a.b as b
```
## Attribute overrides submodule
Technically, either a submodule or a non-module attribute could shadow the other, depending on the
ordering of when the submodule is loaded relative to the parent module's `__init__.py` file being
evaluated. We have chosen to always have the submodule take priority. (This matches pyright's
current behavior, and opposite of mypy's current behavior.)
```py
import sub.b
import attr.b
# In the Python interpreter, `attr.b` is Literal[1]
reveal_type(sub.b) # revealed: <module 'sub.b'>
reveal_type(attr.b) # revealed: <module 'attr.b'>
```
```py path=sub/__init__.py
b = 1
```
```py path=sub/b.py
```
```py path=attr/__init__.py
from . import b as _
b = 1
```
```py path=attr/b.py
```

View file

@ -61,10 +61,8 @@ class B: ...
```py path=a/test.py
import a.b
# TODO: no diagnostic
# error: [unresolved-attribute]
def f(c: type[a.b.C]):
reveal_type(c) # revealed: @Todo(unsupported type[X] special form)
reveal_type(c) # revealed: type[C]
```
```py path=a/__init__.py

View file

@ -186,6 +186,26 @@ impl ModuleName {
self.0.push('.');
self.0.push_str(other);
}
/// Returns an iterator of this module name and all of its parent modules.
///
/// # Examples
///
/// ```
/// use red_knot_python_semantic::ModuleName;
///
/// assert_eq!(
/// ModuleName::new_static("foo.bar.baz").unwrap().ancestors().collect::<Vec<_>>(),
/// vec![
/// ModuleName::new_static("foo.bar.baz").unwrap(),
/// ModuleName::new_static("foo.bar").unwrap(),
/// ModuleName::new_static("foo").unwrap(),
/// ],
/// );
/// ```
pub fn ancestors(&self) -> impl Iterator<Item = Self> {
std::iter::successors(Some(self.clone()), Self::parent)
}
}
impl Deref for ModuleName {

View file

@ -7,7 +7,7 @@ use super::path::SearchPath;
use crate::module_name::ModuleName;
/// Representation of a Python module.
#[derive(Clone, PartialEq, Eq)]
#[derive(Clone, PartialEq, Eq, Hash)]
pub struct Module {
inner: Arc<ModuleInner>,
}
@ -61,7 +61,7 @@ impl std::fmt::Debug for Module {
}
}
#[derive(PartialEq, Eq)]
#[derive(PartialEq, Eq, Hash)]
struct ModuleInner {
name: ModuleName,
kind: ModuleKind,

View file

@ -1,13 +1,14 @@
use std::iter::FusedIterator;
use std::sync::Arc;
use rustc_hash::{FxBuildHasher, FxHashMap};
use rustc_hash::{FxBuildHasher, FxHashMap, FxHashSet};
use salsa::plumbing::AsId;
use ruff_db::files::File;
use ruff_db::parsed::parsed_module;
use ruff_index::{IndexSlice, IndexVec};
use crate::module_name::ModuleName;
use crate::semantic_index::ast_ids::node_key::ExpressionNodeKey;
use crate::semantic_index::ast_ids::AstIds;
use crate::semantic_index::builder::SemanticIndexBuilder;
@ -60,6 +61,12 @@ pub(crate) fn symbol_table<'db>(db: &'db dyn Db, scope: ScopeId<'db>) -> Arc<Sym
index.symbol_table(scope.file_scope_id(db))
}
/// Returns the set of modules that are imported anywhere in `file`.
#[salsa::tracked]
pub(crate) fn imported_modules<'db>(db: &'db dyn Db, file: File) -> Arc<FxHashSet<ModuleName>> {
semantic_index(db, file).imported_modules.clone()
}
/// Returns the use-def map for a specific `scope`.
///
/// Using [`use_def_map`] over [`semantic_index`] has the advantage that
@ -116,6 +123,9 @@ pub(crate) struct SemanticIndex<'db> {
/// changing a file invalidates all dependents.
ast_ids: IndexVec<FileScopeId, AstIds>,
/// The set of modules that are imported anywhere within this file.
imported_modules: Arc<FxHashSet<ModuleName>>,
/// Flags about the global scope (code usage impacting inference)
has_future_annotations: bool,
}

View file

@ -1,7 +1,7 @@
use std::sync::Arc;
use except_handlers::TryNodeContextStackManager;
use rustc_hash::FxHashMap;
use rustc_hash::{FxHashMap, FxHashSet};
use ruff_db::files::File;
use ruff_db::parsed::ParsedModule;
@ -12,6 +12,7 @@ use ruff_python_ast::visitor::{walk_expr, walk_pattern, walk_stmt, Visitor};
use ruff_python_ast::{BoolOp, Expr};
use crate::ast_node_ref::AstNodeRef;
use crate::module_name::ModuleName;
use crate::semantic_index::ast_ids::node_key::ExpressionNodeKey;
use crate::semantic_index::ast_ids::AstIdsBuilder;
use crate::semantic_index::definition::{
@ -79,6 +80,7 @@ pub(super) struct SemanticIndexBuilder<'db> {
scopes_by_expression: FxHashMap<ExpressionNodeKey, FileScopeId>,
definitions_by_node: FxHashMap<DefinitionNodeKey, Definition<'db>>,
expressions_by_node: FxHashMap<ExpressionNodeKey, Expression<'db>>,
imported_modules: FxHashSet<ModuleName>,
}
impl<'db> SemanticIndexBuilder<'db> {
@ -105,6 +107,8 @@ impl<'db> SemanticIndexBuilder<'db> {
scopes_by_node: FxHashMap::default(),
definitions_by_node: FxHashMap::default(),
expressions_by_node: FxHashMap::default(),
imported_modules: FxHashSet::default(),
};
builder.push_scope_with_parent(NodeWithScopeRef::Module, None);
@ -558,6 +562,7 @@ impl<'db> SemanticIndexBuilder<'db> {
scopes_by_expression: self.scopes_by_expression,
scopes_by_node: self.scopes_by_node,
use_def_maps,
imported_modules: Arc::new(self.imported_modules),
has_future_annotations: self.has_future_annotations,
}
}
@ -661,6 +666,12 @@ where
}
ast::Stmt::Import(node) => {
for alias in &node.names {
// Mark the imported module, and all of its parents, as being imported in this
// file.
if let Some(module_name) = ModuleName::new(&alias.name) {
self.imported_modules.extend(module_name.ancestors());
}
let symbol_name = if let Some(asname) = &alias.asname {
asname.id.clone()
} else {

View file

@ -14,13 +14,14 @@ pub(crate) use self::infer::{
infer_deferred_types, infer_definition_types, infer_expression_types, infer_scope_types,
};
pub(crate) use self::signatures::Signature;
use crate::module_resolver::file_to_module;
use crate::module_name::ModuleName;
use crate::module_resolver::{file_to_module, resolve_module};
use crate::semantic_index::ast_ids::HasScopedExpressionId;
use crate::semantic_index::definition::Definition;
use crate::semantic_index::symbol::{self as symbol, ScopeId, ScopedSymbolId};
use crate::semantic_index::{
global_scope, semantic_index, symbol_table, use_def_map, BindingWithConstraints,
BindingWithConstraintsIterator, DeclarationsIterator,
global_scope, imported_modules, semantic_index, symbol_table, use_def_map,
BindingWithConstraints, BindingWithConstraintsIterator, DeclarationsIterator,
};
use crate::stdlib::{
builtins_symbol, core_module_symbol, typing_extensions_symbol, CoreStdlibModule,
@ -413,7 +414,7 @@ pub enum Type<'db> {
/// A specific function object
FunctionLiteral(FunctionType<'db>),
/// A specific module object
ModuleLiteral(File),
ModuleLiteral(ModuleLiteralType<'db>),
/// A specific class object
ClassLiteral(ClassLiteralType<'db>),
// The set of all class objects that are subclasses of the given class (C), spelled `type[C]`.
@ -475,15 +476,19 @@ impl<'db> Type<'db> {
matches!(self, Type::ClassLiteral(..))
}
pub const fn into_module_literal(self) -> Option<File> {
pub fn module_literal(db: &'db dyn Db, importing_file: File, submodule: Module) -> Self {
Self::ModuleLiteral(ModuleLiteralType::new(db, importing_file, submodule))
}
pub const fn into_module_literal(self) -> Option<ModuleLiteralType<'db>> {
match self {
Type::ModuleLiteral(file) => Some(file),
Type::ModuleLiteral(module) => Some(module),
_ => None,
}
}
#[track_caller]
pub fn expect_module_literal(self) -> File {
pub fn expect_module_literal(self) -> ModuleLiteralType<'db> {
self.into_module_literal()
.expect("Expected a Type::ModuleLiteral variant")
}
@ -1418,7 +1423,7 @@ impl<'db> Type<'db> {
// TODO: attribute lookup on function type
todo_type!().into()
}
Type::ModuleLiteral(file) => {
Type::ModuleLiteral(module_ref) => {
// `__dict__` is a very special member that is never overridden by module globals;
// we should always look it up directly as an attribute on `types.ModuleType`,
// never in the global scope of the module.
@ -1428,7 +1433,30 @@ impl<'db> Type<'db> {
.member(db, "__dict__");
}
let global_lookup = symbol(db, global_scope(db, *file), name);
// If the file that originally imported the module has also imported a submodule
// named [name], then the result is (usually) that submodule, even if the module
// also defines a (non-module) symbol with that name.
//
// Note that technically, either the submodule or the non-module symbol could take
// priority, depending on the ordering of when the submodule is loaded relative to
// the parent module's `__init__.py` file being evaluated. That said, we have
// chosen to always have the submodule take priority. (This matches pyright's
// current behavior, and opposite of mypy's current behavior.)
if let Some(submodule_name) = ModuleName::new(name) {
let importing_file = module_ref.importing_file(db);
let imported_submodules = imported_modules(db, importing_file);
let mut full_submodule_name = module_ref.module(db).name().clone();
full_submodule_name.extend(&submodule_name);
if imported_submodules.contains(&full_submodule_name) {
if let Some(submodule) = resolve_module(db, &full_submodule_name) {
let submodule_ty = Type::module_literal(db, importing_file, submodule);
return Symbol::Type(submodule_ty, Boundness::Bound);
}
}
}
let global_lookup =
symbol(db, global_scope(db, module_ref.module(db).file()), name);
// If it's unbound, check if it's present as an instance on `types.ModuleType`
// or `builtins.object`.
@ -2860,6 +2888,18 @@ impl KnownFunction {
}
}
#[salsa::interned]
pub struct ModuleLiteralType<'db> {
/// The file in which this module was imported.
///
/// We need this in order to know which submodules should be attached to it as attributes
/// (because the submodules were also imported in this file).
pub importing_file: File,
/// The imported module.
pub module: Module,
}
/// Representation of a runtime class object.
///
/// Does not in itself represent a type,
@ -3501,7 +3541,8 @@ pub(crate) mod tests {
.class,
),
Ty::StdlibModule(module) => {
Type::ModuleLiteral(resolve_module(db, &module.name()).unwrap().file())
let module = resolve_module(db, &module.name()).unwrap();
Type::module_literal(db, module.file(), module)
}
Ty::SliceLiteral(start, stop, step) => Type::SliceLiteral(SliceLiteralType::new(
db,

View file

@ -79,8 +79,8 @@ impl Display for DisplayRepresentation<'_> {
// `[Type::Todo]`'s display should be explicit that is not a valid display of
// any other type
Type::Todo(todo) => write!(f, "@Todo{todo}"),
Type::ModuleLiteral(file) => {
write!(f, "<module '{:?}'>", file.path(self.db))
Type::ModuleLiteral(module) => {
write!(f, "<module '{}'>", module.module(self.db).name())
}
// TODO functions and classes should display using a fully qualified name
Type::ClassLiteral(ClassLiteralType { class }) => f.write_str(class.name(self.db)),

View file

@ -2116,22 +2116,55 @@ impl<'db> TypeInferenceBuilder<'db> {
let ast::Alias {
range: _,
name,
asname: _,
asname,
} = alias;
let module_ty = if let Some(module_name) = ModuleName::new(name) {
if let Some(module) = self.module_ty_from_name(&module_name) {
module
} else {
// The name of the module being imported
let Some(full_module_name) = ModuleName::new(name) else {
tracing::debug!("Failed to resolve import due to invalid syntax");
self.add_declaration_with_binding(
alias.into(),
definition,
Type::Unknown,
Type::Unknown,
);
return;
};
// Resolve the module being imported.
let Some(full_module_ty) = self.module_ty_from_name(&full_module_name) else {
self.diagnostics.add_unresolved_module(alias, 0, Some(name));
self.add_declaration_with_binding(
alias.into(),
definition,
Type::Unknown,
Type::Unknown,
);
return;
};
let binding_ty = if asname.is_some() {
// If we are renaming the imported module via an `as` clause, then we bind the resolved
// module's type to that name, even if that module is nested.
full_module_ty
} else if full_module_name.contains('.') {
// If there's no `as` clause and the imported module is nested, we're not going to bind
// the resolved module itself into the current scope; we're going to bind the top-most
// parent package of that module.
let topmost_parent_name =
ModuleName::new(full_module_name.components().next().unwrap()).unwrap();
if let Some(topmost_parent_ty) = self.module_ty_from_name(&topmost_parent_name) {
topmost_parent_ty
} else {
Type::Unknown
}
} else {
tracing::debug!("Failed to resolve import due to invalid syntax");
Type::Unknown
// If there's no `as` clause and the imported module isn't nested, then the imported
// module _is_ what we bind into the current scope.
full_module_ty
};
self.add_declaration_with_binding(alias.into(), definition, module_ty, module_ty);
self.add_declaration_with_binding(alias.into(), definition, binding_ty, binding_ty);
}
fn infer_import_from_statement(&mut self, import: &ast::StmtImportFrom) {
@ -2316,7 +2349,8 @@ impl<'db> TypeInferenceBuilder<'db> {
}
fn module_ty_from_name(&self, module_name: &ModuleName) -> Option<Type<'db>> {
resolve_module(self.db, module_name).map(|module| Type::ModuleLiteral(module.file()))
resolve_module(self.db, module_name)
.map(|module| Type::module_literal(self.db, self.file, module))
}
fn infer_decorator(&mut self, decorator: &ast::Decorator) -> Type<'db> {