Track "delayed" annotations in the semantic model (#5070)

## Summary

This PR tackles a corner case that we'll need to support local symbol
renaming. It relates to a nuance in how we want handle annotations
(i.e., `AnnAssign` statements with no value, like `x: int` in a function
body).

When we see a statement like:

```python
x: int
```

We create a `BindingKind::Annotation` for `x`. This is a special
`BindingKind` that the resolver isn't allowed to return. For example,
given:

```python
x: int
print(x)
```

The second line will yield an `undefined-name` error.

So why does this `BindingKind` exist at all? In Pyflakes, to support the
`unused-annotation` lint:

```python
def f():
    x: int  # unused-annotation
```

If we don't track `BindingKind::Annotation`, we can't lint for unused
variables that are only "defined" via annotations.

There are a few other wrinkles to `BindingKind::Annotation`. One is
that, if a binding already exists in the scope, we actually just discard
the `BindingKind`. So in this case:

```python
x = 1
x: int
```

When we go to create the `BindingKind::Annotation` for the second
statement, we notice that (1) we're creating an annotation but (2) the
scope already has binding for the name -- so we just drop the binding on
the floor. This has the nice property that annotations aren't considered
to "shadow" another binding, which is important in a bunch of places
(e.g., if we have `import os; os: int`, we still consider `os` to be an
import, as we should). But it also means that these "delayed"
annotations are one of the few remaining references that we don't track
anywhere in the semantic model.

This PR adds explicit support for these via a new `delayed_annotations`
attribute on the semantic model. These should be extremely rare, but we
do need to track them if we want to support local symbol renaming.

### This isn't the right way to model this

This isn't the right way to model this.

Here's an alternative:

- Remove `BindingKind::Annotation`, and treat annotations as their own,
separate concept.
- Instead of storing a map from name to `BindingId` on each `Scope`,
store a map from name to... `SymbolId`.
- Introduce a `Symbol` abstraction, where a symbol can point to a
current binding, and a list of annotations, like:

```rust
pub struct Symbol {
  binding: Option<BindingId>,
  annotations: Vec<AnnotationId>
}
```

If we did this, we could appropriately model the semantics described
above. When we go to resolve a binding, we ignore annotations (always).
When we try to find unused variables, we look through the list of
symbols, and have sufficient information to discriminate between
annotations and bound variables. Etc.

The main downside of this `Symbol`-based approach is that it's going to
take a lot more work to implement, and it'll be less performant (we'll
be storing more data per symbol, and our binding lookups will have an
added layer of indirection).
This commit is contained in:
Charlie Marsh 2023-06-14 13:54:35 -04:00 committed by GitHub
parent c992cfa76e
commit a33bbe6335
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 47 additions and 10 deletions

View file

@ -71,6 +71,29 @@ pub struct SemanticModel<'a> {
/// despite the fact that they're in different scopes.
pub shadowed_bindings: HashMap<BindingId, BindingId, BuildNoHashHasher<BindingId>>,
/// Map from binding index to indexes of bindings that annotate it (in the same scope).
///
/// For example:
/// ```python
/// x = 1
/// x: int
/// ```
///
/// In this case, the binding created by `x = 1` is annotated by the binding created by
/// `x: int`. We don't consider the latter binding to _shadow_ the former, because it doesn't
/// change the value of the binding, and so we don't store in on the scope. But we _do_ want to
/// track the annotation in some form, since it's a reference to `x`.
///
/// Note that, given:
/// ```python
/// x: int
/// ```
///
/// In this case, we _do_ store the binding created by `x: int` directly on the scope, and not
/// as a delayed annotation. Annotations are thus treated as bindings only when they are the
/// first binding in a scope; any annotations that follow are treated as "delayed" annotations.
delayed_annotations: HashMap<BindingId, Vec<BindingId>, BuildNoHashHasher<BindingId>>,
/// Body iteration; used to peek at siblings.
pub body: &'a [Stmt],
pub body_index: usize,
@ -99,6 +122,7 @@ impl<'a> SemanticModel<'a> {
references: References::default(),
globals: GlobalsArena::default(),
shadowed_bindings: IntMap::default(),
delayed_annotations: IntMap::default(),
body: &[],
body_index: 0,
flags: SemanticModelFlags::new(path),
@ -729,6 +753,19 @@ impl<'a> SemanticModel<'a> {
self.bindings[binding_id].references.push(reference_id);
}
/// Add a [`BindingId`] to the list of delayed annotations for the given [`BindingId`].
pub fn add_delayed_annotation(&mut self, binding_id: BindingId, annotation_id: BindingId) {
self.delayed_annotations
.entry(binding_id)
.or_insert_with(Vec::new)
.push(annotation_id);
}
/// Return the list of delayed annotations for the given [`BindingId`].
pub fn delayed_annotations(&self, binding_id: BindingId) -> Option<&[BindingId]> {
self.delayed_annotations.get(&binding_id).map(Vec::as_slice)
}
/// Return the [`ExecutionContext`] of the current scope.
pub const fn execution_context(&self) -> ExecutionContext {
if self.in_type_checking_block()