uv/crates
Charlie Marsh a0420114c3
Avoid storing absolute URLs for files (#944)
## Summary

It turns out that storing an absolute URL for every file caused a
significant performance regression. This PR attempts to address the
regression with two changes.

The first is that we now store the raw string if the URL is an absolute
URL. If the URL is relative, we store the base URL alongside the raw
relative string. As such, we avoid serializing and deserializing URLs
until we need them (later on), except for the base URL.

The second is that we now use the internal `Url` crate methods for
serializing and deserializing. If you look inside `Url`, its standard
serializer and deserialization actually convert it to a string, then
parse the string. But the crate exposes some other methods for faster
serialization and deserialization (with fewer guarantees). I think this
is totally fine since the cache is entirely internal.

If we _just_ change the `Url` serialization (and no other code -- so
continue to store URLs for every file), then the regression goes down to
about 5%:

```shell
❯ python -m scripts.bench \
        --puffin-path ./target/release/main \
        --puffin-path ./target/release/relative --puffin-path ./target/release/puffin \
        scripts/requirements/home-assistant.in --benchmark resolve-warm
Benchmark 1: ./target/release/main (resolve-warm)
  Time (mean ± σ):     496.3 ms ±   4.3 ms    [User: 452.4 ms, System: 175.5 ms]
  Range (min … max):   487.3 ms … 502.4 ms    10 runs

Benchmark 2: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     284.8 ms ±   2.1 ms    [User: 245.8 ms, System: 165.6 ms]
  Range (min … max):   280.3 ms … 288.0 ms    10 runs

Benchmark 3: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     300.4 ms ±   3.2 ms    [User: 255.5 ms, System: 178.1 ms]
  Range (min … max):   295.4 ms … 305.1 ms    10 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)'
    1.74 ± 0.02 times faster than './target/release/main (resolve-warm)'
```

So I considered _just_ making that change. But 5% is kind of
borderline...

With both of these changes, the regression is down to 1-2%:

```
Benchmark 1: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     282.6 ms ±   7.4 ms    [User: 244.6 ms, System: 181.3 ms]
  Range (min … max):   275.1 ms … 318.5 ms    30 runs

Benchmark 2: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     286.8 ms ±   2.2 ms    [User: 247.0 ms, System: 169.1 ms]
  Range (min … max):   282.3 ms … 290.7 ms    30 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)'
```

It's consistently ~2%-ish, but at this point it's unclear if that's due
to the URL change or something other change between now and then.

Closes #943.
2024-01-17 09:15:21 -05:00
..
bench Use Clippy lint table over Cargo config (#490) 2023-11-22 15:10:27 +00:00
cache-key Avoid storing absolute URLs for files (#944) 2024-01-17 09:15:21 -05:00
distribution-filename Implement --find-links as flat indexes (directories in pip-compile) (#912) 2024-01-15 02:04:10 +00:00
distribution-types Avoid storing absolute URLs for files (#944) 2024-01-17 09:15:21 -05:00
gourgeist Cleanup deps and docs (#882) 2024-01-11 10:43:40 +00:00
install-wheel-rs Use tempfile to prevent install io race crashes (#929) 2024-01-16 21:07:39 +00:00
once-map Move OnceMap into its own crate (#946) 2024-01-17 04:09:15 +00:00
pep440-rs Remove PubGrubVersion (#924) 2024-01-15 08:51:12 -05:00
pep508-rs Avoid storing absolute URLs for files (#944) 2024-01-17 09:15:21 -05:00
platform-host Error when ldd is not in path (#506) 2023-11-28 05:55:04 +00:00
platform-tags Cache Tags on Interpreter (#726) 2023-12-25 13:41:10 +00:00
puffin-build Default to PEP 517-based builds (#843) 2024-01-10 01:27:06 +00:00
puffin-cache Move Puffin subcommands to a pip namespace (#921) 2024-01-15 16:36:45 +00:00
puffin-cli Share a single Index across resolutions (#906) 2024-01-16 05:37:15 +00:00
puffin-client Avoid storing absolute URLs for files (#944) 2024-01-17 09:15:21 -05:00
puffin-dev Share a single Index across resolutions (#906) 2024-01-16 05:37:15 +00:00
puffin-dispatch Share a single Index across resolutions (#906) 2024-01-16 05:37:15 +00:00
puffin-distribution Avoid storing absolute URLs for files (#944) 2024-01-17 09:15:21 -05:00
puffin-extract Use fs_err in more places (#926) 2024-01-15 09:39:33 +00:00
puffin-fs Show resource and lockfile when waiting (#715) 2023-12-21 00:05:49 +01:00
puffin-git Split puffin-cache into Puffin-specific and generic utilities (#728) 2023-12-25 14:38:56 +00:00
puffin-installer Share in-flight map across resolutions (#932) 2024-01-15 13:11:22 -05:00
puffin-interpreter Adjust markers to match target Python version (#909) 2024-01-14 15:39:15 +00:00
puffin-normalize Avoid some additional clones for PackageName (#896) 2024-01-12 17:54:40 +00:00
puffin-resolver Move OnceMap into its own crate (#946) 2024-01-17 04:09:15 +00:00
puffin-traits Move OnceMap into its own crate (#946) 2024-01-17 04:09:15 +00:00
puffin-warnings Migrate back to owo-colors (#824) 2024-01-08 08:54:57 +00:00
puffin-workspace Use Clippy lint table over Cargo config (#490) 2023-11-22 15:10:27 +00:00
pypi-types Avoid storing absolute URLs for files (#944) 2024-01-17 09:15:21 -05:00
requirements-txt Update dependencies (#794) 2024-01-05 11:40:12 -05:00
README.md Move OnceMap into its own crate (#946) 2024-01-17 04:09:15 +00:00

Crates

bench

Functionality for benchmarking Puffin.

cache-key

Generic functionality for caching paths, URLs, and other resources across platforms.

distribution-filename

Parse built distribution (wheel) and source distribution (sdist) filenames to extract structured metadata.

distribution-types

Abstractions for representing built distributions (wheels) and source distributions (sdists), and the sources from which they can be downloaded.

gourgeist

A venv replacement to create virtual environments in Rust.

install-wheel-rs

Install built distributions (wheels) into a virtual environment.]

once-map

A waitmap-like concurrent hash map for executing tasks exactly once.

pep440-rs

Utilities for interacting with Python version numbers and specifiers.

pep508-rs

Utilities for interacting with PEP 508 dependency specifiers.

platform-host

Functionality for detecting the current platform (operating system, architecture, etc.).

platform-tags

Functionality for parsing and inferring Python platform tags as per PEP 425.

puffin-build

A PEP 517-compatible build frontend for Puffin.

puffin-cache

Functionality for caching Python packages and associated metadata.

puffin-cli

Command-line interface for the Puffin package manager.

puffin-client

Client for interacting with PyPI-compatible HTTP APIs.

puffin-dev

Development utilities for Puffin.

puffin-dispatch

A centralized struct for resolving and building source distributions in isolated environments. Implements the traits defined in puffin-traits.

puffin-distribution

Client for interacting with built distributions (wheels) and source distributions (sdists). Capable of fetching metadata, distribution contents, etc.

puffin-extract

Utilities for extracting files from archives.

puffin-fs

Utilities for interacting with the filesystem.

puffin-git

Functionality for interacting with Git repositories.

puffin-installer

Functionality for installing Python packages into a virtual environment.

puffin-interpreter

Functionality for detecting and leveraging the current Python interpreter.

puffin-normalize

Normalize package and extra names as per Python specifications.

puffin-package

Types and functionality for working with Python packages, e.g., parsing wheel files.

puffin-resolver

Functionality for resolving Python packages and their dependencies.

puffin-traits

Shared traits for Puffin, to avoid circular dependencies.

pypi-types

General-purpose type definitions for types used in PyPI-compatible APIs.

puffin-warnings

User-facing warnings for Puffin.

requirements-txt

Functionality for parsing requirements.txt files.