uv-resolver: document some of our intermediate data structures

There are some key invariants that I had to re-learn by reading the
code. This hopefully makes those invariants easier to discover by future
me (and others).
This commit is contained in:
Andrew Gallant 2024-06-15 09:38:30 -04:00 committed by Andrew Gallant
parent 147aec4d37
commit 07db2b167f

View file

@ -1854,17 +1854,30 @@ enum Response {
},
}
/// An enum used by [`DependencyProvider`] that holds information about package dependencies.
/// For each [Package] there is a set of versions allowed as a dependency.
/// Information about the dependencies for a particular package.
///
/// This effectively distills the dependency metadata of a package down into
/// its pubgrub specific constituent parts: each dependency package has a range
/// of possible versions.
#[derive(Clone)]
enum Dependencies {
/// Package dependencies are not available.
Unavailable(UnavailableVersion),
/// Container for all available package versions.
///
/// Note that in universal mode, it is possible and allowed for multiple
/// `PubGrubPackage` values in this list to have the same package name.
/// These conflicts are resolved via `Dependencies::fork`.
Available(Vec<(PubGrubPackage, Range<Version>)>),
}
impl Dependencies {
/// Turn this flat list of dependencies into a potential set of forked
/// groups of dependencies.
///
/// A fork *only* occurs when there are multiple dependencies with the same
/// name *and* those dependency specifications have corresponding marker
/// expressions that are completely disjoint with one another.
fn fork(self) -> ForkedDependencies {
use std::collections::hash_map::Entry;
@ -1991,11 +2004,19 @@ impl Dependencies {
}
}
/// Information about the (possibly forked) dependencies for a particular
/// package.
///
/// This is like `Dependencies` but with an extra variant that only occurs when
/// a `Dependencies` list has multiple dependency specifications with the same
/// name and non-overlapping marker expressions (i.e., a fork occurs).
#[derive(Debug)]
enum ForkedDependencies {
/// Package dependencies are not available.
Unavailable(UnavailableVersion),
/// No forking occurred.
///
/// This is the same as `Dependencies::Available`.
Unforked(Vec<(PubGrubPackage, Range<Version>)>),
/// Forked containers for all available package versions.
///
@ -2005,14 +2026,48 @@ enum ForkedDependencies {
Forked(Vec<Fork>),
}
/// A single fork in a list of dependencies.
///
/// A fork corresponds to the full list of dependencies for a package,
/// but with any conflicting dependency specifications omitted. For
/// example, if we have `a<2 ; sys_platform == 'foo'` and `a>=2 ;
/// sys_platform == 'bar'`, then because the dependency specifications
/// have the same name and because the marker expressions are disjoint,
/// a fork occurs. One fork will contain `a<2` but not `a>=2`, while
/// the other fork will contain `a>=2` but not `a<2`.
#[derive(Clone, Debug)]
struct Fork {
/// The list of dependencies for this fork, guaranteed to be conflict
/// free. (i.e., There are no two packages with the same name with
/// non-overlapping marker expressions.)
dependencies: Vec<(PubGrubPackage, Range<Version>)>,
}
/// Intermediate state that represents a *possible* grouping of forks
/// for one package name.
///
/// This accumulates state while examining a `Dependencies` list. In
/// particular, it accumulates conflicting dependency specifications and marker
/// expressions. As soon as a fork can be ruled out, this state is switched to
/// `NoForkPossible`. If, at the end of visiting all `Dependencies`, we still
/// have `PossibleForks::PossiblyForking`, then a fork exists if and only if
/// one of its groups has length bigger than `1`.
///
/// One common way for a fork to be known to be impossible is if there exists
/// conflicting dependency specifications where at least one is unconditional.
/// For example, `a<2` and `a>=2 ; sys_platform == 'foo'`. In this case, `a<2`
/// has a marker expression that is always true and thus never disjoint with
/// any other marker expression. Therefore, there can be no fork for `a`.
///
/// Note that we use indices into a `Dependencies` list to represent packages.
/// This avoids excessive cloning.
#[derive(Debug)]
enum PossibleForks<'a> {
/// A group of dependencies (all with the same package name) where it is
/// known that no forks exist.
NoForkPossible(Vec<usize>),
/// A group of groups dependencies (all with the same package name) where
/// it is possible for each group to correspond to a fork.
PossiblyForking(PossibleForkGroups<'a>),
}
@ -2064,8 +2119,11 @@ impl<'a> PossibleForks<'a> {
}
}
/// A list of groups of dependencies (all with the same package name), where
/// each group may correspond to a fork.
#[derive(Debug)]
struct PossibleForkGroups<'a> {
/// The list of forks.
forks: Vec<PossibleFork<'a>>,
}
@ -2083,6 +2141,20 @@ impl<'a> PossibleForkGroups<'a> {
}
}
/// Intermediate state representing a single possible fork.
///
/// The key invariant here is that, beyond a singleton fork, for all packages
/// in this fork, its marker expression must be overlapping with at least one
/// other package's marker expression. That is, when considering whether a
/// dependency specification with a conflicting package name provokes a fork
/// or not, one must look at the existing possible groups of forks. If any of
/// those groups have a package with an overlapping marker expression, then
/// the conflicting package name cannot possibly introduce a new fork. But if
/// there is no existing fork with an overlapping marker expression, then the
/// conflict provokes a new fork.
///
/// As with other intermediate data types, we use indices into a list of
/// `Dependencies` to represent packages to avoid excessive cloning.
#[derive(Debug)]
struct PossibleFork<'a> {
packages: Vec<(usize, &'a MarkerTree)>,