## Summary
It turns out that storing an absolute URL for every file caused a
significant performance regression. This PR attempts to address the
regression with two changes.
The first is that we now store the raw string if the URL is an absolute
URL. If the URL is relative, we store the base URL alongside the raw
relative string. As such, we avoid serializing and deserializing URLs
until we need them (later on), except for the base URL.
The second is that we now use the internal `Url` crate methods for
serializing and deserializing. If you look inside `Url`, its standard
serializer and deserialization actually convert it to a string, then
parse the string. But the crate exposes some other methods for faster
serialization and deserialization (with fewer guarantees). I think this
is totally fine since the cache is entirely internal.
If we _just_ change the `Url` serialization (and no other code -- so
continue to store URLs for every file), then the regression goes down to
about 5%:
```shell
❯ python -m scripts.bench \
--puffin-path ./target/release/main \
--puffin-path ./target/release/relative --puffin-path ./target/release/puffin \
scripts/requirements/home-assistant.in --benchmark resolve-warm
Benchmark 1: ./target/release/main (resolve-warm)
Time (mean ± σ): 496.3 ms ± 4.3 ms [User: 452.4 ms, System: 175.5 ms]
Range (min … max): 487.3 ms … 502.4 ms 10 runs
Benchmark 2: ./target/release/relative (resolve-warm)
Time (mean ± σ): 284.8 ms ± 2.1 ms [User: 245.8 ms, System: 165.6 ms]
Range (min … max): 280.3 ms … 288.0 ms 10 runs
Benchmark 3: ./target/release/puffin (resolve-warm)
Time (mean ± σ): 300.4 ms ± 3.2 ms [User: 255.5 ms, System: 178.1 ms]
Range (min … max): 295.4 ms … 305.1 ms 10 runs
Summary
'./target/release/relative (resolve-warm)' ran
1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)'
1.74 ± 0.02 times faster than './target/release/main (resolve-warm)'
```
So I considered _just_ making that change. But 5% is kind of
borderline...
With both of these changes, the regression is down to 1-2%:
```
Benchmark 1: ./target/release/relative (resolve-warm)
Time (mean ± σ): 282.6 ms ± 7.4 ms [User: 244.6 ms, System: 181.3 ms]
Range (min … max): 275.1 ms … 318.5 ms 30 runs
Benchmark 2: ./target/release/puffin (resolve-warm)
Time (mean ± σ): 286.8 ms ± 2.2 ms [User: 247.0 ms, System: 169.1 ms]
Range (min … max): 282.3 ms … 290.7 ms 30 runs
Summary
'./target/release/relative (resolve-warm)' ran
1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)'
```
It's consistently ~2%-ish, but at this point it's unclear if that's due
to the URL change or something other change between now and then.
Closes #943.
|
||
|---|---|---|
| .. | ||
| bench | ||
| cache-key | ||
| distribution-filename | ||
| distribution-types | ||
| gourgeist | ||
| install-wheel-rs | ||
| once-map | ||
| pep440-rs | ||
| pep508-rs | ||
| platform-host | ||
| platform-tags | ||
| puffin-build | ||
| puffin-cache | ||
| puffin-cli | ||
| puffin-client | ||
| puffin-dev | ||
| puffin-dispatch | ||
| puffin-distribution | ||
| puffin-extract | ||
| puffin-fs | ||
| puffin-git | ||
| puffin-installer | ||
| puffin-interpreter | ||
| puffin-normalize | ||
| puffin-resolver | ||
| puffin-traits | ||
| puffin-warnings | ||
| puffin-workspace | ||
| pypi-types | ||
| requirements-txt | ||
| README.md | ||
Crates
bench
Functionality for benchmarking Puffin.
cache-key
Generic functionality for caching paths, URLs, and other resources across platforms.
distribution-filename
Parse built distribution (wheel) and source distribution (sdist) filenames to extract structured metadata.
distribution-types
Abstractions for representing built distributions (wheels) and source distributions (sdists), and the sources from which they can be downloaded.
gourgeist
A venv replacement to create virtual environments in Rust.
install-wheel-rs
Install built distributions (wheels) into a virtual environment.]
once-map
A waitmap-like concurrent hash map for executing tasks
exactly once.
pep440-rs
Utilities for interacting with Python version numbers and specifiers.
pep508-rs
Utilities for interacting with PEP 508 dependency specifiers.
platform-host
Functionality for detecting the current platform (operating system, architecture, etc.).
platform-tags
Functionality for parsing and inferring Python platform tags as per PEP 425.
puffin-build
A PEP 517-compatible build frontend for Puffin.
puffin-cache
Functionality for caching Python packages and associated metadata.
puffin-cli
Command-line interface for the Puffin package manager.
puffin-client
Client for interacting with PyPI-compatible HTTP APIs.
puffin-dev
Development utilities for Puffin.
puffin-dispatch
A centralized struct for resolving and building source distributions in isolated environments.
Implements the traits defined in puffin-traits.
puffin-distribution
Client for interacting with built distributions (wheels) and source distributions (sdists). Capable of fetching metadata, distribution contents, etc.
puffin-extract
Utilities for extracting files from archives.
puffin-fs
Utilities for interacting with the filesystem.
puffin-git
Functionality for interacting with Git repositories.
puffin-installer
Functionality for installing Python packages into a virtual environment.
puffin-interpreter
Functionality for detecting and leveraging the current Python interpreter.
puffin-normalize
Normalize package and extra names as per Python specifications.
puffin-package
Types and functionality for working with Python packages, e.g., parsing wheel files.
puffin-resolver
Functionality for resolving Python packages and their dependencies.
puffin-traits
Shared traits for Puffin, to avoid circular dependencies.
pypi-types
General-purpose type definitions for types used in PyPI-compatible APIs.
puffin-warnings
User-facing warnings for Puffin.
requirements-txt
Functionality for parsing requirements.txt files.