uv/crates/pep440-rs
Ibraheem Ahmed ffd18cc75d
Implement marker trees using algebraic decision diagrams (#5898)
## Summary

This PR rewrites the `MarkerTree` type to use algebraic decision
diagrams (ADD). This has many benefits:
- The diagram is canonical for a given marker function. It is impossible
to create two functionally equivalent marker trees that don't refer to
the same underlying ADD. This also means that any trivially true or
unsatisfiable markers are represented by the same constants.
- The diagram can handle complex operations (conjunction/disjunction) in
polynomial time, as well as constant-time negation.
- The diagram can be converted to a simplified DNF form for user-facing
output.

The new representation gives us a lot more confidence in our marker
operations and simplification, which is proving to be very important
(see https://github.com/astral-sh/uv/pull/5733 and
https://github.com/astral-sh/uv/pull/5163).

Unfortunately, it is not easy to split this PR into multiple commits
because it is a large rewrite of the `marker` module. I'd suggest
reading through the `marker/algebra.rs`, `marker/simplify.rs`, and
`marker/tree.rs` files for the new implementation, as well as the
updated snapshots to verify how the new simplification rules work in
practice. However, a few other things were changed:
- [We now use release-only comparisons for `python_full_version`, where
we previously only did for
`python_version`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/algebra.rs#L522).
I'm unsure how marker operations should work in the presence of
pre-release versions if we decide that this is incorrect.
- [Meaningless marker expressions are now
ignored](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/parse.rs#L502).
This means that a marker such as `'x' == 'x'` will always evaluate to
`true` (as if the expression did not exist), whereas we previously
treated this as always `false`. It's negation however, remains `false`.
- [Unsatisfiable markers are written as `python_version <
'0'`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/tree.rs#L1329).
- The `PubGrubSpecifier` type has been moved to the new `uv-pubgrub`
crate, shared by `pep508-rs` and `uv-resolver`. `pep508-rs` also depends
on the `pubgrub` crate for the `Range` type, we probably want to move
`pubgrub::Range` into a separate crate to break this, but I don't think
that should block this PR (cc @konstin).

There is still some remaining work here that I decided to leave for now
for the sake of unblocking some of the related work on the resolver.
- We still use `Option<MarkerTree>` throughout uv, which is unnecessary
now that `MarkerTree::TRUE` is canonical.
- The `MarkerTree` type is now interned globally and can potentially
implement `Copy`. However, it's unclear if we want to add more
information to marker trees that would make it `!Copy`. For example, we
may wish to attach extra and requires-python environment information to
avoid simplifying after construction.
- We don't currently combine `python_full_version` and `python_version`
markers.
- I also have not spent too much time investigating performance and
there is probably some low-hanging fruit. Many of the test cases I did
run actually saw large performance improvements due to the markers being
simplified internally, reducing the stress on the old `normalize`
routine, especially for the extremely large markers seen in
`transformers` and other projects.

Resolves https://github.com/astral-sh/uv/issues/5660,
https://github.com/astral-sh/uv/issues/5179.
2024-08-09 13:40:02 -04:00
..
python Use prettier to format the documentation (#5708) 2024-08-02 08:58:31 -05:00
src Implement marker trees using algebraic decision diagrams (#5898) 2024-08-09 13:40:02 -04:00
test Extend Ruff configuration to sort imports (#5528) 2024-07-28 21:49:28 +00:00
Cargo.lock Copy over pep440-rs crate (#30) 2023-10-06 20:11:52 -04:00
Cargo.toml Upgrade to Rust 1.80.0 (#5472) 2024-07-27 01:49:47 +00:00
CHANGELOG.md Use prettier to format the documentation (#5708) 2024-08-02 08:58:31 -05:00
License-Apache Copy over pep440-rs crate (#30) 2023-10-06 20:11:52 -04:00
License-BSD Copy over pep440-rs crate (#30) 2023-10-06 20:11:52 -04:00
Readme.md Use prettier to format the documentation (#5708) 2024-08-02 08:58:31 -05:00

PEP440 in rust

Crates.io PyPI

A library for python version numbers and specifiers, implementing PEP 440. See Reimplementing PEP 440 for some background.

Higher level bindings to the requirements syntax are available in pep508_rs.

use std::str::FromStr;
use pep440_rs::{parse_version_specifiers, Version, VersionSpecifier};

let version = Version::from_str("1.19").unwrap();
let version_specifier = VersionSpecifier::from_str("==1.*").unwrap();
assert!(version_specifier.contains(&version));
let version_specifiers = parse_version_specifiers(">=1.16, <2.0").unwrap();
assert!(version_specifiers.contains(&version));

In python (pip install pep440_rs):

from pep440_rs import Version, VersionSpecifier

assert Version("1.1a1").any_prerelease()
assert Version("1.1.dev2").any_prerelease()
assert not Version("1.1").any_prerelease()
assert VersionSpecifier(">=1.0").contains(Version("1.1a1"))
assert not VersionSpecifier(">=1.1").contains(Version("1.1a1"))
# Note that python comparisons are the version ordering, not the version specifiers operators
assert Version("1.1") >= Version("1.1a1")
assert Version("2.0") in VersionSpecifier("==2")

PEP 440 has a lot of unintuitive features, including:

  • An epoch that you can prefix the version which, e.g. 1!1.2.3. Lower epoch always means lower version (1.0 <=2!0.1)
  • post versions, which can be attached to both stable releases and pre-releases
  • dev versions, which can be attached to sbpth table releases and pre-releases. When attached to a pre-release the dev version is ordered just below the normal pre-release, however when attached to a stable version, the dev version is sorted before a pre-releases
  • pre-release handling is a mess: "Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers, unless they are already present on the system, explicitly requested by the user, or if the only available version that satisfies the version specifier is a pre-release.". This means that we can't say whether a specifier matches without also looking at the environment
  • pre-release vs. pre-release incl. dev is fuzzy
  • local versions on top of all the others, which are added with a + and have implicitly typed string and number segments
  • no semver-caret (^), but a pseudo-semver tilde (~=)
  • ordering contradicts matching: We have e.g. 1.0+local > 1.0 when sorting, but ==1.0 matches 1.0+local. While the ordering of versions itself is a total order the version matching needs to catch all sorts of special cases