[ty] AST garbage collection (#18482)

## Summary Garbage collect ASTs once we are done checking a given file. Queries with a cross-file dependency on the AST will reparse the file on demand. This reduces ty's peak memory usage by ~20-30%. The primary change of this PR is adding a `node_index` field to every AST node, that is assigned by the parser. `ParsedModule` can use this to create a flat index of AST nodes any time the file is parsed (or reparsed). This allows `AstNodeRef` to simply index into the current instance of the `ParsedModule`, instead of storing a pointer directly. The indices are somewhat hackily (using an atomic integer) assigned by the `parsed_module` query instead of by the parser directly. Assigning the indices in source-order in the (recursive) parser turns out to be difficult, and collecting the nodes during semantic indexing is impossible as `SemanticIndex` does not hold onto a specific `ParsedModuleRef`, which the pointers in the flat AST are tied to. This means that we have to do an extra AST traversal to assign and collect the nodes into a flat index, but the small performance impact (~3% on cold runs) seems worth it for the memory savings. Part of https://github.com/astral-sh/ty/issues/214.
2025-10-05 08:00:27 +00:00 · 2025-06-13 08:40:11 -04:00 · 2025-06-13 08:40:11 -04:00 · c9dff5c7d5
commit c9dff5c7d5
parent 76d9009a6e
824 changed files with 25243 additions and 804 deletions
--- a/crates/ruff_python_ast/src/node_index.rs
+++ b/crates/ruff_python_ast/src/node_index.rs
@ -0,0 +1,98 @@
+use std::sync::atomic::{AtomicU32, Ordering};
+
+/// An AST node that has an index.
+pub trait HasNodeIndex {
+    /// Returns the [`AtomicNodeIndex`] for this node.
+    fn node_index(&self) -> &AtomicNodeIndex;
+}
+
+impl<T> HasNodeIndex for &T
+where
+    T: HasNodeIndex,
+{
+    fn node_index(&self) -> &AtomicNodeIndex {
+        T::node_index(*self)
+    }
+}
+
+/// A unique index for a node within an AST.
+///
+/// This type is interiorly mutable to allow assigning node indices
+/// on-demand after parsing.
+#[derive(Default)]
+pub struct AtomicNodeIndex(AtomicU32);
+
+impl AtomicNodeIndex {
+    /// Returns a placeholder `AtomicNodeIndex`.
+    pub fn dummy() -> AtomicNodeIndex {
+        AtomicNodeIndex(AtomicU32::from(u32::MAX))
+    }
+
+    /// Load the current value of the `AtomicNodeIndex`.
+    pub fn load(&self) -> NodeIndex {
+        NodeIndex(self.0.load(Ordering::Relaxed))
+    }
+
+    /// Set the value of the `AtomicNodeIndex`.
+    pub fn set(&self, value: u32) {
+        self.0.store(value, Ordering::Relaxed);
+    }
+}
+
+/// A unique index for a node within an AST.
+#[derive(PartialEq, Eq, Debug, PartialOrd, Ord, Clone, Copy, Hash)]
+pub struct NodeIndex(u32);
+
+impl NodeIndex {
+    pub fn as_usize(self) -> usize {
+        self.0 as _
+    }
+}
+
+impl From<u32> for AtomicNodeIndex {
+    fn from(value: u32) -> Self {
+        AtomicNodeIndex(AtomicU32::from(value))
+    }
+}
+
+impl std::fmt::Debug for AtomicNodeIndex {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        if *self == AtomicNodeIndex::dummy() {
+            f.debug_tuple("AtomicNodeIndex").finish_non_exhaustive()
+        } else {
+            f.debug_tuple("AtomicNodeIndex").field(&self.0).finish()
+        }
+    }
+}
+
+impl std::hash::Hash for AtomicNodeIndex {
+    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+        self.load().hash(state);
+    }
+}
+
+impl PartialOrd for AtomicNodeIndex {
+    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl Ord for AtomicNodeIndex {
+    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
+        self.load().cmp(&other.load())
+    }
+}
+
+impl Eq for AtomicNodeIndex {}
+
+impl PartialEq for AtomicNodeIndex {
+    fn eq(&self, other: &Self) -> bool {
+        self.load() == other.load()
+    }
+}
+
+impl Clone for AtomicNodeIndex {
+    fn clone(&self) -> Self {
+        Self(AtomicU32::from(self.0.load(Ordering::Relaxed)))
+    }
+}