ruff/crates/ruff_index/src/vec.rs
Carl Meyer 595b1aa4a1
[red-knot] per-definition inference, use-def maps (#12269)
Implements definition-level type inference, with basic control flow
(only if statements and if expressions so far) in Salsa.

There are a couple key ideas here:

1) We can do type inference queries at any of three region
granularities: an entire scope, a single definition, or a single
expression. These are represented by the `InferenceRegion` enum, and the
entry points are the salsa queries `infer_scope_types`,
`infer_definition_types`, and `infer_expression_types`. Generally
per-scope will be used for scopes that we are directly checking and
per-definition will be used anytime we are looking up symbol types from
another module/scope. Per-expression should be uncommon: used only for
the RHS of an unpacking or multi-target assignment (to avoid
re-inferring the RHS once per symbol defined in the assignment) and for
test nodes in type narrowing (e.g. the `test` of an `If` node). All
three queries return a `TypeInference` with a map of types for all
definitions and expressions within their region. If you do e.g.
scope-level inference, when it hits a definition, or an
independently-inferable expression, it should use the relevant query
(which may already be cached) to get all types within the smaller
region. This avoids double-inferring smaller regions, even though larger
regions encompass smaller ones.

2) Instead of building a control-flow graph and lazily traversing it to
find definitions which reach a use of a name (which is O(n^2) in the
worst case), instead semantic indexing builds a use-def map, where every
use of a name knows which definitions can reach that use. We also no
longer track all definitions of a symbol in the symbol itself; instead
the use-def map also records which defs remain visible at the end of the
scope, and considers these the publicly-visible definitions of the
symbol (see below).

Major items left as TODOs in this PR, to be done in follow-up PRs:

1) Free/global references aren't supported yet (only lookup based on
definitions in current scope), which means the override-check example
doesn't currently work. This is the first thing I'll fix as follow-up to
this PR.

2) Control flow outside of if statements and expressions.

3) Type narrowing.

There are also some smaller relevant changes here:

1) Eliminate `Option` in the return type of member lookups; instead
always return `Type::Unbound` for a name we can't find. Also use
`Type::Unbound` for modules we can't resolve (not 100% sure about this
one yet.)

2) Eliminate the use of the terms "public" and "root" to refer to
module-global scope or symbols. Instead consistently use the term
"module-global". It's longer, but it's the clearest, and the most
consistent with typical Python terminology. In particular I don't like
"public" for this use because it has other implications around author
intent (is an underscore-prefixed module-global symbol "public"?). And
"root" is just not commonly used for this in Python.

3) Eliminate the `PublicSymbol` Salsa ingredient. Many non-module-global
symbols can also be seen from other scopes (e.g. by a free var in a
nested scope, or by class attribute access), and thus need to have a
"public type" (that is, the type not as seen from a particular use in
the control flow of the same scope, but the type as seen from some other
scope.) So all symbols need to have a "public type" (here I want to keep
the use of the term "public", unless someone has a better term to
suggest -- since it's "public type of a symbol" and not "public symbol"
the confusion with e.g. initial underscores is less of an issue.) At
least initially, I would like to try not having special handling for
module-global symbols vs other symbols.

4) Switch to using "definitions that reach end of scope" rather than
"all definitions" in determining the public type of a symbol. I'm
convinced that in general this is the right way to go. We may want to
refine this further in future for some free-variable cases, but it can
be changed purely by making changes to the building of the use-def map
(the `public_definitions` index in it), without affecting any other
code. One consequence of combining this with no control-flow support
(just last-definition-wins) is that some inference tests now give more
wrong-looking results; I left TODO comments on these tests to fix them
when control flow is added.

And some potential areas for consideration in the future:

1) Should `symbol_ty` be a Salsa query? This would require making all
symbols a Salsa ingredient, and tracking even more dependencies. But it
would save some repeated reconstruction of unions, for symbols with
multiple public definitions. For now I'm not making it a query, but open
to changing this in future with actual perf evidence that it's better.
2024-07-16 11:02:30 -07:00

183 lines
3.9 KiB
Rust

use crate::slice::IndexSlice;
use crate::Idx;
use std::borrow::{Borrow, BorrowMut};
use std::fmt::{Debug, Formatter};
use std::marker::PhantomData;
use std::ops::{Deref, DerefMut, RangeBounds};
/// An owned sequence of `T` indexed by `I`
#[derive(Clone, PartialEq, Eq, Hash)]
#[repr(transparent)]
pub struct IndexVec<I, T> {
pub raw: Vec<T>,
index: PhantomData<I>,
}
impl<I: Idx, T> IndexVec<I, T> {
#[inline]
pub fn new() -> Self {
Self {
raw: Vec::new(),
index: PhantomData,
}
}
#[inline]
pub fn with_capacity(capacity: usize) -> Self {
Self {
raw: Vec::with_capacity(capacity),
index: PhantomData,
}
}
#[inline]
pub fn from_raw(raw: Vec<T>) -> Self {
Self {
raw,
index: PhantomData,
}
}
#[inline]
pub fn drain<R: RangeBounds<usize>>(&mut self, range: R) -> impl Iterator<Item = T> + '_ {
self.raw.drain(range)
}
#[inline]
pub fn truncate(&mut self, a: usize) {
self.raw.truncate(a);
}
#[inline]
pub fn as_slice(&self) -> &IndexSlice<I, T> {
IndexSlice::from_raw(&self.raw)
}
#[inline]
pub fn as_mut_slice(&mut self) -> &mut IndexSlice<I, T> {
IndexSlice::from_raw_mut(&mut self.raw)
}
#[inline]
pub fn push(&mut self, data: T) -> I {
let index = self.next_index();
self.raw.push(data);
index
}
#[inline]
pub fn next_index(&self) -> I {
I::new(self.raw.len())
}
#[inline]
pub fn shrink_to_fit(&mut self) {
self.raw.shrink_to_fit();
}
#[inline]
pub fn resize(&mut self, new_len: usize, value: T)
where
T: Clone,
{
self.raw.resize(new_len, value);
}
}
impl<I, T> Debug for IndexVec<I, T>
where
T: Debug,
{
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
std::fmt::Debug::fmt(&self.raw, f)
}
}
impl<I: Idx, T> Deref for IndexVec<I, T> {
type Target = IndexSlice<I, T>;
fn deref(&self) -> &Self::Target {
self.as_slice()
}
}
impl<I: Idx, T> DerefMut for IndexVec<I, T> {
fn deref_mut(&mut self) -> &mut Self::Target {
self.as_mut_slice()
}
}
impl<I: Idx, T> Borrow<IndexSlice<I, T>> for IndexVec<I, T> {
fn borrow(&self) -> &IndexSlice<I, T> {
self
}
}
impl<I: Idx, T> BorrowMut<IndexSlice<I, T>> for IndexVec<I, T> {
fn borrow_mut(&mut self) -> &mut IndexSlice<I, T> {
self
}
}
impl<I, T> Extend<T> for IndexVec<I, T> {
#[inline]
fn extend<Iter: IntoIterator<Item = T>>(&mut self, iter: Iter) {
self.raw.extend(iter);
}
}
impl<I: Idx, T> FromIterator<T> for IndexVec<I, T> {
#[inline]
fn from_iter<Iter: IntoIterator<Item = T>>(iter: Iter) -> Self {
Self::from_raw(Vec::from_iter(iter))
}
}
impl<I: Idx, T> IntoIterator for IndexVec<I, T> {
type IntoIter = std::vec::IntoIter<T>;
type Item = T;
#[inline]
fn into_iter(self) -> std::vec::IntoIter<T> {
self.raw.into_iter()
}
}
impl<'a, I: Idx, T> IntoIterator for &'a IndexVec<I, T> {
type IntoIter = std::slice::Iter<'a, T>;
type Item = &'a T;
#[inline]
fn into_iter(self) -> std::slice::Iter<'a, T> {
self.iter()
}
}
impl<'a, I: Idx, T> IntoIterator for &'a mut IndexVec<I, T> {
type IntoIter = std::slice::IterMut<'a, T>;
type Item = &'a mut T;
#[inline]
fn into_iter(self) -> std::slice::IterMut<'a, T> {
self.iter_mut()
}
}
impl<I: Idx, T> Default for IndexVec<I, T> {
#[inline]
fn default() -> Self {
IndexVec::new()
}
}
impl<I: Idx, T, const N: usize> From<[T; N]> for IndexVec<I, T> {
#[inline]
fn from(array: [T; N]) -> Self {
IndexVec::from_raw(array.into())
}
}
// Whether `IndexVec` is `Send` depends only on the data,
// not the phantom data.
#[allow(unsafe_code)]
unsafe impl<I: Idx, T> Send for IndexVec<I, T> where T: Send {}