uv/crates/pep440-rs
Andrew Gallant 6c98ae9d77
pep440: rewrite the parser and make version comparisons cheaper (#789)
This PR builds on #780 by making both version parsing faster, and
perhaps more importantly, making version comparisons much faster.
Overall, these changes result in a considerable improvement for the
`boto3.in` workload. Here's the status quo:

```
$ time puffin pip-compile --no-build --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/requirements/boto3.in
Resolved 31 packages in 34.56s

real    34.579
user    34.004
sys     0.413
maxmem  2867 MB
faults  0
```

And now with this PR:

```
$ time puffin pip-compile --no-build --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/requirements/boto3.in
Resolved 31 packages in 9.20s

real    9.218
user    8.919
sys     0.165
maxmem  463 MB
faults  0
```

This particular workload gets stuck in pubgrub doing resolution, and
thus benefits mightily from a faster `Version::cmp` routine. With that
said, this change does also help a fair bit with "normal" runs:

```
$ hyperfine -w10 \
    "puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in" \
    "puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in"
Benchmark 1: puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in
  Time (mean ± σ):     337.5 ms ±   3.9 ms    [User: 310.5 ms, System: 73.2 ms]
  Range (min … max):   333.6 ms … 343.4 ms    10 runs

Benchmark 2: puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in
  Time (mean ± σ):     189.8 ms ±   3.0 ms    [User: 168.1 ms, System: 78.4 ms]
  Range (min … max):   185.0 ms … 196.2 ms    15 runs

Summary
  puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in ran
    1.78 ± 0.03 times faster than puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in
```

There is perhaps some future work here (detailed in the commit
messages), but I suspect it would be more fruitful to explore ways of
making resolution itself and/or deserialization faster.

Fixes #373, Closes #396
2024-01-05 11:57:32 -05:00
..
python Unify python interpreter abstractions (#178) 2023-10-25 20:11:36 +00:00
src pep440: rewrite the parser and make version comparisons cheaper (#789) 2024-01-05 11:57:32 -05:00
test Copy over pep440-rs crate (#30) 2023-10-06 20:11:52 -04:00
Cargo.lock Copy over pep440-rs crate (#30) 2023-10-06 20:11:52 -04:00
Cargo.toml Speed up version parsing for a 1.27±0.03 speedup in transformers-extras with conservative changes (#660) 2023-12-15 14:03:35 -05:00
CHANGELOG.md Enable release builds via cargo-dist (#79) 2023-10-09 20:48:55 +00:00
License-Apache Copy over pep440-rs crate (#30) 2023-10-06 20:11:52 -04:00
License-BSD Copy over pep440-rs crate (#30) 2023-10-06 20:11:52 -04:00
README.md Enable release builds via cargo-dist (#79) 2023-10-09 20:48:55 +00:00

PEP440 in rust

Crates.io PyPI

A library for python version numbers and specifiers, implementing PEP 440. See Reimplementing PEP 440 for some background.

Higher level bindings to the requirements syntax are available in pep508_rs.

use std::str::FromStr;
use pep440_rs::{parse_version_specifiers, Version, VersionSpecifier};

let version = Version::from_str("1.19").unwrap();
let version_specifier = VersionSpecifier::from_str("==1.*").unwrap();
assert!(version_specifier.contains(&version));
let version_specifiers = parse_version_specifiers(">=1.16, <2.0").unwrap();
assert!(version_specifiers.iter().all(|specifier| specifier.contains(&version)));

In python (pip install pep440_rs):

from pep440_rs import Version, VersionSpecifier

assert Version("1.1a1").any_prerelease()
assert Version("1.1.dev2").any_prerelease()
assert not Version("1.1").any_prerelease()
assert VersionSpecifier(">=1.0").contains(Version("1.1a1"))
assert not VersionSpecifier(">=1.1").contains(Version("1.1a1"))
# Note that python comparisons are the version ordering, not the version specifiers operators
assert Version("1.1") >= Version("1.1a1")
assert Version("2.0") in VersionSpecifier("==2")

PEP 440 has a lot of unintuitive features, including:

  • An epoch that you can prefix the version which, e.g. 1!1.2.3. Lower epoch always means lower version (1.0 <=2!0.1)
  • post versions, which can be attached to both stable releases and prereleases
  • dev versions, which can be attached to sbpth table releases and prereleases. When attached to a prerelease the dev version is ordered just below the normal prerelease, however when attached to a stable version, the dev version is sorted before a prereleases
  • prerelease handling is a mess: "Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers, unless they are already present on the system, explicitly requested by the user, or if the only available version that satisfies the version specifier is a pre-release.". This means that we can't say whether a specifier matches without also looking at the environment
  • prelease vs. prerelease incl. dev is fuzzy
  • local versions on top of all the others, which are added with a + and have implicitly typed string and number segments
  • no semver-caret (^), but a pseudo-semver tilde (~=)
  • ordering contradicts matching: We have e.g. 1.0+local > 1.0 when sorting, but ==1.0 matches 1.0+local. While the ordering of versions itself is a total order the version matching needs to catch all sorts of special cases