The normalized name abstractions were not consistently, this PR uses
them where they were previously missing:
* `WheelFilename::distribution`
* `Requirement::name`
* `Requirement::extras`
* `Metadata21::name`
* `Metadata21::provides_dist`
With `puffin-package` depending on `pep508_rs` this would be cyclical
crate dependency, so `puffin-normalize` gets split out from
`puffin-package`.
`DistInfoName` has the same task and semantics as `PackageName`, so it's
merged into the latter.
`PackageName` and `ExtraName` documentation is moved onto the type and
their constructors are called `new` instead of `normalize`. We now use
these constructors rarely enough the implicit allocation by
`to_string()` shouldn't matter anymore, while more actual cloning
becomes visible.
Hard linking might not be supported but we (afaik) can't detect this
ahead of time, so we'll try hard linking the first file, if this
succeeds we'll know later hard linking errors are not due to lack of
os/fs support, if it fails we'll switch to copying for the rest of the
install. Follow up to
https://github.com/astral-sh/puffin/pull/237#discussion_r1376705137
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.
Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
This is isn't ready, but it can resolve
`meine_stadt_transparent==0.2.14`.
The source distributions are currently being built serially one after
the other, i don't know if that is incidentally due to the resolution
order, because sdist building is blocking or because of something in the
resolver that could be improved.
It's a bit annoying that the thing that was supposed to do http requests
now suddenly also has to a whole download/unpack/resolve/install/build
routine, it messes up the type hierarchy. The much bigger problem though
is avoid recursive crate dependencies, it's the reason for the callback
and for splitting the builder into two crates (badly named atm)
Allows the user to select between clone, hardlink, and copy semantics
for installs. (The pnpm documentation has a decent description of what
these mean: https://pnpm.io/npmrc#package-import-method.)
Closes#159.
The need for this became clear when working on the source distribution
integration into the resolver.
While at it i also switch the `WheelFilename` version to the parsed
`pep440_rs` version now that we have this crate.
Gets rid of the custom `DistInfo` struct in the site-packages
abstraction in favor of a new kind of distribution
(`InstalledDistribution`). No change in behavior.
It looks like using _either_ async Rust with a `JoinSet` _or_
parallelizing a fixed threadpool with Rayon provide about a ~5% speed-up
over our current serial approach:
```console
❯ hyperfine --runs 30 --warmup 5 --prepare "./target/release/puffin venv .venv" \
"./target/release/rayon sync ./scripts/benchmarks/requirements-large.txt" \
"./target/release/async sync ./scripts/benchmarks/requirements-large.txt" \
"./target/release/main sync ./scripts/benchmarks/requirements-large.txt"
Benchmark 1: ./target/release/rayon sync ./scripts/benchmarks/requirements-large.txt
Time (mean ± σ): 295.7 ms ± 16.9 ms [User: 28.6 ms, System: 263.3 ms]
Range (min … max): 249.2 ms … 315.9 ms 30 runs
Benchmark 2: ./target/release/async sync ./scripts/benchmarks/requirements-large.txt
Time (mean ± σ): 296.2 ms ± 20.2 ms [User: 36.1 ms, System: 340.1 ms]
Range (min … max): 258.0 ms … 359.4 ms 30 runs
Benchmark 3: ./target/release/main sync ./scripts/benchmarks/requirements-large.txt
Time (mean ± σ): 306.6 ms ± 19.5 ms [User: 25.3 ms, System: 220.5 ms]
Range (min … max): 269.6 ms … 332.2 ms 30 runs
Summary
'./target/release/rayon sync ./scripts/benchmarks/requirements-large.txt' ran
1.00 ± 0.09 times faster than './target/release/async sync ./scripts/benchmarks/requirements-large.txt'
1.04 ± 0.09 times faster than './target/release/main sync ./scripts/benchmarks/requirements-large.txt'
```
It's much easier to just parallelize with Rayon and avoid async in the
underlying wheel code, so this PR takes that approach for now.
When we go to install a locked `requirements.txt`, if a wheel is already
available in the local cache, and matches the version specifiers, we can
just use it directly without fetching the package metadata. This speeds
up the no-op case by about 33%.
Closes https://github.com/astral-sh/puffin/issues/48.
I think this isn't necessary to support in this generic crate. If we
choose to adopt Monotrail-style concepts, we'll likely need to rework
them anyway.
It turns out that on macOS, you can pass `clonefile` a directory to
recursively copy an entire directory. This speeds up wheel installation
dramatically, by about 3x.
This PR massively speeds up the case in which you need to install wheels
that already exist in the global cache.
The new strategy is as follows:
- Download the wheel into the content-addressed cache.
- Unzip the wheel into the cache, but ignore content-addressing. It
turns out that writing to `cacache` for every file in the zip added a
ton of overhead, and I don't see any actual advantages to doing so.
Instead, we just unzip the contents into a directory at, e.g.,
`~/.cache/puffin/django-4.1.5`.
- (The unzip itself is now parallelized with Rayon.)
- When installing the wheel, we now support unzipping from a directory
instead of a zip archive. This required duplicating and tweaking a few
functions.
- When installing the wheel, we now use reflinks (or copy-on-write
links). These have a few fantastic properties: (1) they're extremely
cheap to create (on macOS, they are allegedly faster than hard links);
(2) they minimize disk space, since we avoid copying files entirely in
the vast majority of cases; and (3) if the user then edits a file
locally, the cache doesn't get polluted. Orogene, Bun, and soon pnpm all
use reflinks.
Puffin is now ~15x faster than `pip` for the common case of installing
cached data into a fresh environment.
Closes https://github.com/astral-sh/puffin/issues/21.
Closes https://github.com/astral-sh/puffin/issues/39.
This PR modifies the `install-wheel-rs` (and a few other crates) to get
everything playing nicely. Specifically, CI should pass, and all these
crates now use workspace dependencies between one another.
As part of this change, I split out the wheel name parsing into its own
`wheel-filename` crate, and the compatibility tag parsing into its own
`platform-tags` crate.
This PR copies over the `install-wheel-rs` crate at commit
`10730ea1a84c58af6b35fb74c89ed0578ab042b6` with no modifications.
It won't pass CI, but modifications will intentionally be confined to
later PRs.