[ty] AST garbage collection (#18482)

## Summary

Garbage collect ASTs once we are done checking a given file. Queries
with a cross-file dependency on the AST will reparse the file on demand.
This reduces ty's peak memory usage by ~20-30%.

The primary change of this PR is adding a `node_index` field to every
AST node, that is assigned by the parser. `ParsedModule` can use this to
create a flat index of AST nodes any time the file is parsed (or
reparsed). This allows `AstNodeRef` to simply index into the current
instance of the `ParsedModule`, instead of storing a pointer directly.

The indices are somewhat hackily (using an atomic integer) assigned by
the `parsed_module` query instead of by the parser directly. Assigning
the indices in source-order in the (recursive) parser turns out to be
difficult, and collecting the nodes during semantic indexing is
impossible as `SemanticIndex` does not hold onto a specific
`ParsedModuleRef`, which the pointers in the flat AST are tied to. This
means that we have to do an extra AST traversal to assign and collect
the nodes into a flat index, but the small performance impact (~3% on
cold runs) seems worth it for the memory savings.

Part of https://github.com/astral-sh/ty/issues/214.
This commit is contained in:
Ibraheem Ahmed 2025-06-13 08:40:11 -04:00 committed by GitHub
parent 76d9009a6e
commit c9dff5c7d5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
824 changed files with 25243 additions and 804 deletions

View file

@ -0,0 +1,98 @@
use std::sync::atomic::{AtomicU32, Ordering};
/// An AST node that has an index.
pub trait HasNodeIndex {
/// Returns the [`AtomicNodeIndex`] for this node.
fn node_index(&self) -> &AtomicNodeIndex;
}
impl<T> HasNodeIndex for &T
where
T: HasNodeIndex,
{
fn node_index(&self) -> &AtomicNodeIndex {
T::node_index(*self)
}
}
/// A unique index for a node within an AST.
///
/// This type is interiorly mutable to allow assigning node indices
/// on-demand after parsing.
#[derive(Default)]
pub struct AtomicNodeIndex(AtomicU32);
impl AtomicNodeIndex {
/// Returns a placeholder `AtomicNodeIndex`.
pub fn dummy() -> AtomicNodeIndex {
AtomicNodeIndex(AtomicU32::from(u32::MAX))
}
/// Load the current value of the `AtomicNodeIndex`.
pub fn load(&self) -> NodeIndex {
NodeIndex(self.0.load(Ordering::Relaxed))
}
/// Set the value of the `AtomicNodeIndex`.
pub fn set(&self, value: u32) {
self.0.store(value, Ordering::Relaxed);
}
}
/// A unique index for a node within an AST.
#[derive(PartialEq, Eq, Debug, PartialOrd, Ord, Clone, Copy, Hash)]
pub struct NodeIndex(u32);
impl NodeIndex {
pub fn as_usize(self) -> usize {
self.0 as _
}
}
impl From<u32> for AtomicNodeIndex {
fn from(value: u32) -> Self {
AtomicNodeIndex(AtomicU32::from(value))
}
}
impl std::fmt::Debug for AtomicNodeIndex {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
if *self == AtomicNodeIndex::dummy() {
f.debug_tuple("AtomicNodeIndex").finish_non_exhaustive()
} else {
f.debug_tuple("AtomicNodeIndex").field(&self.0).finish()
}
}
}
impl std::hash::Hash for AtomicNodeIndex {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.load().hash(state);
}
}
impl PartialOrd for AtomicNodeIndex {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for AtomicNodeIndex {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.load().cmp(&other.load())
}
}
impl Eq for AtomicNodeIndex {}
impl PartialEq for AtomicNodeIndex {
fn eq(&self, other: &Self) -> bool {
self.load() == other.load()
}
}
impl Clone for AtomicNodeIndex {
fn clone(&self) -> Self {
Self(AtomicU32::from(self.0.load(Ordering::Relaxed)))
}
}