pep440: rewrite the parser and make version comparisons cheaper (#789)

This PR builds on #780 by making both version parsing faster, and
perhaps more importantly, making version comparisons much faster.
Overall, these changes result in a considerable improvement for the
`boto3.in` workload. Here's the status quo:

```
$ time puffin pip-compile --no-build --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/requirements/boto3.in
Resolved 31 packages in 34.56s

real    34.579
user    34.004
sys     0.413
maxmem  2867 MB
faults  0
```

And now with this PR:

```
$ time puffin pip-compile --no-build --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/requirements/boto3.in
Resolved 31 packages in 9.20s

real    9.218
user    8.919
sys     0.165
maxmem  463 MB
faults  0
```

This particular workload gets stuck in pubgrub doing resolution, and
thus benefits mightily from a faster `Version::cmp` routine. With that
said, this change does also help a fair bit with "normal" runs:

```
$ hyperfine -w10 \
    "puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in" \
    "puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in"
Benchmark 1: puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in
  Time (mean ± σ):     337.5 ms ±   3.9 ms    [User: 310.5 ms, System: 73.2 ms]
  Range (min … max):   333.6 ms … 343.4 ms    10 runs

Benchmark 2: puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in
  Time (mean ± σ):     189.8 ms ±   3.0 ms    [User: 168.1 ms, System: 78.4 ms]
  Range (min … max):   185.0 ms … 196.2 ms    15 runs

Summary
  puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in ran
    1.78 ± 0.03 times faster than puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in
```

There is perhaps some future work here (detailed in the commit
messages), but I suspect it would be more fruitful to explore ways of
making resolution itself and/or deserialization faster.

Fixes #373, Closes #396
This commit is contained in:
Andrew Gallant 2024-01-05 11:57:32 -05:00 committed by GitHub
parent 74777c01ea
commit 6c98ae9d77
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
10 changed files with 2103 additions and 487 deletions

View file

@ -701,7 +701,7 @@ fn compile_python_invalid_version() -> Result<()> {
----- stdout -----
----- stderr -----
error: invalid value '3.7.x' for '--python-version <PYTHON_VERSION>': Version `3.7.x` doesn't match PEP 440 rules
error: invalid value '3.7.x' for '--python-version <PYTHON_VERSION>': after parsing 3.7, found ".x" after it, which is not part of a valid version
For more information, try '--help'.
"###);

View file

@ -49,7 +49,7 @@ fn invalid_requirement() -> Result<()> {
----- stderr -----
error: Failed to parse `flask==1.0.x`
Caused by: Version `1.0.x` doesn't match PEP 440 rules
Caused by: after parsing 1.0, found ".x" after it, which is not part of a valid version
flask==1.0.x
^^^^^^^
"###);
@ -96,7 +96,7 @@ fn invalid_requirements_txt_requirement() -> Result<()> {
----- stderr -----
error: Couldn't parse requirement in requirements.txt position 0 to 12
Caused by: Version `1.0.x` doesn't match PEP 440 rules
Caused by: after parsing 1.0, found ".x" after it, which is not part of a valid version
flask==1.0.x
^^^^^^^
"###);
@ -210,7 +210,7 @@ dependencies = ["flask==1.0.x"]
|
3 | dependencies = ["flask==1.0.x"]
| ^^^^^^^^^^^^^^^^
Version `1.0.x` doesn't match PEP 440 rules
after parsing 1.0, found ".x" after it, which is not part of a valid version
flask==1.0.x
^^^^^^^