uv/crates/pep508-rs
Charlie Marsh a0420114c3
Avoid storing absolute URLs for files (#944)
## Summary

It turns out that storing an absolute URL for every file caused a
significant performance regression. This PR attempts to address the
regression with two changes.

The first is that we now store the raw string if the URL is an absolute
URL. If the URL is relative, we store the base URL alongside the raw
relative string. As such, we avoid serializing and deserializing URLs
until we need them (later on), except for the base URL.

The second is that we now use the internal `Url` crate methods for
serializing and deserializing. If you look inside `Url`, its standard
serializer and deserialization actually convert it to a string, then
parse the string. But the crate exposes some other methods for faster
serialization and deserialization (with fewer guarantees). I think this
is totally fine since the cache is entirely internal.

If we _just_ change the `Url` serialization (and no other code -- so
continue to store URLs for every file), then the regression goes down to
about 5%:

```shell
❯ python -m scripts.bench \
        --puffin-path ./target/release/main \
        --puffin-path ./target/release/relative --puffin-path ./target/release/puffin \
        scripts/requirements/home-assistant.in --benchmark resolve-warm
Benchmark 1: ./target/release/main (resolve-warm)
  Time (mean ± σ):     496.3 ms ±   4.3 ms    [User: 452.4 ms, System: 175.5 ms]
  Range (min … max):   487.3 ms … 502.4 ms    10 runs

Benchmark 2: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     284.8 ms ±   2.1 ms    [User: 245.8 ms, System: 165.6 ms]
  Range (min … max):   280.3 ms … 288.0 ms    10 runs

Benchmark 3: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     300.4 ms ±   3.2 ms    [User: 255.5 ms, System: 178.1 ms]
  Range (min … max):   295.4 ms … 305.1 ms    10 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)'
    1.74 ± 0.02 times faster than './target/release/main (resolve-warm)'
```

So I considered _just_ making that change. But 5% is kind of
borderline...

With both of these changes, the regression is down to 1-2%:

```
Benchmark 1: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     282.6 ms ±   7.4 ms    [User: 244.6 ms, System: 181.3 ms]
  Range (min … max):   275.1 ms … 318.5 ms    30 runs

Benchmark 2: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     286.8 ms ±   2.2 ms    [User: 247.0 ms, System: 169.1 ms]
  Range (min … max):   282.3 ms … 290.7 ms    30 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)'
```

It's consistently ~2%-ish, but at this point it's unclear if that's due
to the URL change or something other change between now and then.

Closes #943.
2024-01-17 09:15:21 -05:00
..
src Avoid storing absolute URLs for files (#944) 2024-01-17 09:15:21 -05:00
Cargo.lock Copy over pep508-rs crate (#31) 2023-10-06 20:12:19 -04:00
Cargo.toml Update dependencies (#794) 2024-01-05 11:40:12 -05:00
License-Apache Copy over pep508-rs crate (#31) 2023-10-06 20:12:19 -04:00
License-BSD Copy over pep508-rs crate (#31) 2023-10-06 20:12:19 -04:00
Readme.md Copy over pep508-rs crate (#31) 2023-10-06 20:12:19 -04:00

Dependency specifiers (PEP 508) in Rust

Crates.io PyPI

A library for python dependency specifiers, better known as PEP 508.

Usage

In Rust

use std::str::FromStr;
use pep508_rs::Requirement;

let marker = r#"requests [security,tests] >= 2.8.1, == 2.8.* ; python_version > "3.8""#;
let dependency_specification = Requirement::from_str(marker).unwrap();
assert_eq!(dependency_specification.name, "requests");
assert_eq!(dependency_specification.extras, Some(vec!["security".to_string(), "tests".to_string()]));

In Python

from pep508_rs import Requirement

requests = Requirement(
    'requests [security,tests] >= 2.8.1, == 2.8.* ; python_version > "3.8"'
)
assert requests.name == "requests"
assert requests.extras == ["security", "tests"]
assert [str(i) for i in requests.version_or_url] == [">= 2.8.1", "== 2.8.*"]

Python bindings are built with maturin, but you can also use the normal pip install .

Version and VersionSpecifier from pep440_rs are reexported to avoid type mismatches.

Markers

Markers allow you to install dependencies only in specific environments (python version, operating system, architecture, etc.) or when a specific feature is activated. E.g. you can say importlib-metadata ; python_version < "3.8" or itsdangerous (>=1.1.0) ; extra == 'security'. Unfortunately, the marker grammar has some oversights (e.g. https://github.com/pypa/packaging.python.org/pull/1181) and the design of comparisons (PEP 440 comparisons with lexicographic fallback) leads to confusing outcomes. This implementation tries to carefully validate everything and emit warnings whenever bogus comparisons with unintended semantics are made.

In python, warnings are by default sent to the normal python logging infrastructure:

from pep508_rs import Requirement, MarkerEnvironment

env = MarkerEnvironment.current()
assert not Requirement("numpy; extra == 'science'").evaluate_markers(env, [])
assert Requirement("numpy; extra == 'science'").evaluate_markers(env, ["science"])
assert not Requirement(
    "numpy; extra == 'science' and extra == 'arrays'"
).evaluate_markers(env, ["science"])
assert Requirement(
    "numpy; extra == 'science' or extra == 'arrays'"
).evaluate_markers(env, ["science"])
from pep508_rs import Requirement, MarkerEnvironment

env = MarkerEnvironment.current()
Requirement("numpy; python_version >= '3.9.'").evaluate_markers(env, [])
# This will log: 
# "Expected PEP 440 version to compare with python_version, found '3.9.', "
# "evaluating to false: Version `3.9.` doesn't match PEP 440 rules"