Missing piece for the release.
## Test Plan
Built the image locally:
```shell
❯ docker run 99956098e1f8f04e209dcfc4a0afcee67df1fe8a726c164884e67f035b1a0f42
Usage: puffin [OPTIONS] <COMMAND>
Commands:
pip Resolve and install Python packages
venv Create a virtual environment
clean Clear the cache
help Print this message or the help of the given subcommand(s)
Options:
-q, --quiet Do not print any output
-v, --verbose Use verbose output
-n, --no-cache Avoid reading from or writing to the cache
--cache-dir <CACHE_DIR> Path to the cache directory [env: PUFFIN_CACHE_DIR=]
-h, --help Print help
-V, --version Print version
```
## Summary
This PR adds a release workflow powered by `cargo-dist`. It's similar to
the version that's PR'd in Ruff
(https://github.com/astral-sh/ruff/pull/9559), with the exception that
it doesn't include the Docker build or the "update dependents" step for
pre-commit.
## Summary
This PR is like #957, but for validating the virtual environment, rather
than the cache. So, if you have a local wheel, and you rebuild it, we'll
now correctly uninstall and reinstall it in the virtual environment.
## Summary
- This was inherited from
d719988323/src/metadata.rs (LL78C2-L91C26)
- ...which introduced this code here:
9cd1d43f7c
- ...with the originating issue here:
https://github.com/PyO3/maturin/issues/612
- ...and the upstream issue here:
https://github.com/staktrace/mailparse/issues/50
It seems like the goal was to support Unicode in certain header fields,
but I don't think this is necessary for us. We only use
`get_first_value` for `Requires-Python`, which has to be ASCII, doesn't
it?
In my testing, it seems like the `charset` hack can also be removed. The
tests I copied over actually work without it, which makes me a bit
skeptical.
The main benefit here is that we get to a remove a _big_ dependency
stack, including Chumsky and Stacker and psm which have limited
cross-platform support.
## Summary
This is a small correctness improvement that ensures that we avoid using
stale cache entries for local dependencies in the install plan. We
already have some logic like this in the source distribution builder,
but it didn't apply in the install plan, and so we'd end up using stale
wheels.
Specifically, now, if you create a new local wheel, and run `pip sync`,
we'll mark the cache entries as stale and make sure we unzip it and
install it. (If the wheel is _already_ installed, we won't reinstall it
though, which will be a separate change. This is just about reading from
the cache, not the environment.)
Fixes#965
We have to canonicalize the interpreter path, otherwise the home is set
to the venv dir instead of the real root. This would make
python-build-standalone fail with the encodings module not being found
because its home is wrong.
The `InstallPlan` does a lot of work in the constructor, which I tend to
feel is an anti-pattern. With cache refresh, it's also going to need to
be made `async`, so it really feels like it should be a clearer method
rather than an async, fallible constructor that does a bunch of IO. This
PR splits into a `Planner` (with a `build` method) and a `Plan`.
## Summary
It turns out that storing an absolute URL for every file caused a
significant performance regression. This PR attempts to address the
regression with two changes.
The first is that we now store the raw string if the URL is an absolute
URL. If the URL is relative, we store the base URL alongside the raw
relative string. As such, we avoid serializing and deserializing URLs
until we need them (later on), except for the base URL.
The second is that we now use the internal `Url` crate methods for
serializing and deserializing. If you look inside `Url`, its standard
serializer and deserialization actually convert it to a string, then
parse the string. But the crate exposes some other methods for faster
serialization and deserialization (with fewer guarantees). I think this
is totally fine since the cache is entirely internal.
If we _just_ change the `Url` serialization (and no other code -- so
continue to store URLs for every file), then the regression goes down to
about 5%:
```shell
❯ python -m scripts.bench \
--puffin-path ./target/release/main \
--puffin-path ./target/release/relative --puffin-path ./target/release/puffin \
scripts/requirements/home-assistant.in --benchmark resolve-warm
Benchmark 1: ./target/release/main (resolve-warm)
Time (mean ± σ): 496.3 ms ± 4.3 ms [User: 452.4 ms, System: 175.5 ms]
Range (min … max): 487.3 ms … 502.4 ms 10 runs
Benchmark 2: ./target/release/relative (resolve-warm)
Time (mean ± σ): 284.8 ms ± 2.1 ms [User: 245.8 ms, System: 165.6 ms]
Range (min … max): 280.3 ms … 288.0 ms 10 runs
Benchmark 3: ./target/release/puffin (resolve-warm)
Time (mean ± σ): 300.4 ms ± 3.2 ms [User: 255.5 ms, System: 178.1 ms]
Range (min … max): 295.4 ms … 305.1 ms 10 runs
Summary
'./target/release/relative (resolve-warm)' ran
1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)'
1.74 ± 0.02 times faster than './target/release/main (resolve-warm)'
```
So I considered _just_ making that change. But 5% is kind of
borderline...
With both of these changes, the regression is down to 1-2%:
```
Benchmark 1: ./target/release/relative (resolve-warm)
Time (mean ± σ): 282.6 ms ± 7.4 ms [User: 244.6 ms, System: 181.3 ms]
Range (min … max): 275.1 ms … 318.5 ms 30 runs
Benchmark 2: ./target/release/puffin (resolve-warm)
Time (mean ± σ): 286.8 ms ± 2.2 ms [User: 247.0 ms, System: 169.1 ms]
Range (min … max): 282.3 ms … 290.7 ms 30 runs
Summary
'./target/release/relative (resolve-warm)' ran
1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)'
```
It's consistently ~2%-ish, but at this point it's unclear if that's due
to the URL change or something other change between now and then.
Closes#943.
On ubuntu and python 3.10,
```
cargo run -q -- pip-install --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html "jax[cuda12_pip]==0.4.23"
```
non-deterministically but for me consistently fails to install with
messages such as
```
error: Failed to install: nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (nvidia-nccl-cu12==2.19.3)
Caused by: failed to remove file `/home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py`
Caused by: No such file or directory (os error 2)
```
```
error: Failed to install: nvidia_cublas_cu12-12.3.4.1-py3-none-manylinux1_x86_64.whl (nvidia-cublas-cu12==12.3.4.1)
Caused by: Replacing an existing file or directory failed
```
```
error: Failed to install: nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (nvidia-cuda-nvcc-cu12==12.3.107)
Caused by: failed to hardlink file from /home/konsti/.cache/puffin/wheels-v0/pypi/nvidia-cuda-nvcc-cu12/nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64/nvidia/__init__.py to /home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py
Caused by: File exists (os error 17)
```
We install a lot of nvidia package, that all contain
`nvidia/__init__.py`, since they all install themselves into the
`nvidia` module:
```
nvidia-cublas-cu12==12.3.4.1
nvidia-cuda-cupti-cu12==12.3.101
nvidia-cuda-nvcc-cu12==12.3.107
nvidia-cuda-nvrtc-cu12==12.3.107
nvidia-cuda-runtime-cu12==12.3.101
nvidia-cudnn-cu12==8.9.7.29
nvidia-cufft-cu12==11.0.12.1
nvidia-cusolver-cu12==11.5.4.101
nvidia-cusparse-cu12==12.2.0.103
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
```
```
$ tree -L 1 .venv/lib/python3.10/site-packages/nvidia
.venv/lib/python3.10/site-packages/nvidia
├── cublas
├── cuda_cupti
├── cuda_nvcc
├── cuda_nvrtc
├── cuda_runtime
├── cudnn
├── cufft
├── cusolver
├── cusparse
├── __init__.py
├── nccl
└── nvjitlink
```
When installing we get a race condition, each package installation is
its own thread:
* Installer Thread 1 creates `nvidia/__init__.py`
* Installer Thread 2 sees an existing `nvidia/__init__.py`
* Installer Thread 3 sees an existing `nvidia/__init__.py`
* Installer Thread 2 removes `nvidia/__init__.py`
* Installer Thread 3 tries to remove `nvidia/__init__.py`, it doesn't
exist anymore -> failure.
We switch to a new strategy: When the target files exists, we don't
remove it, but instead hardlink the source file to a tempfile first,
then renaming the tempfile to the target file. Renaming is considered an
atomic operation.
I've put the logging on debug level because they cases indicate a
conflict between two packages while being rare.
Closes#925
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
This PR uses a single `Index` that's shared between the top-level
resolver and any sub-resolutions happen in the course of that top-level
resolution (namely, to resolve build dependencies for any source
distributions).
In theory it's an optimization, since (e.g.) if we have two packages
that both need the `flit-core` build system, and we attempt to build
them both at once, we'll only fetch its metadata _once_, and share it
across the two resolutions. In practice, I haven't been able to get this
to show up in benchmarks. I suspect you'd need a _lot_ of source
distributions for it to matter... Though it may still be worth doing, it
strikes me as a cleaner design.
Closes#200.
Closes#541.
## Summary
This fixes an extremely subtle bug in `pip install --reinstall`, whereby
if you depend on `setuptools` at the top level, we end up uninstalling
it after resolving, which breaks some cached state. If we have
`--reinstall`, we need to reset that cached state between resolving and
installing.
## Test Plan
Running `pip install --reinstall` with:
```txt
setuptools
devpi @ e334eb4dc9/devpi-2.2.0.tar.gz
```
Fails on `main`, but passes.
## Summary
This PR fixes a subtle bug in `pip install` when using `--reinstall`. If
a package depends on a build system directly (e.g., `waitress` depends
on `setuptools`), and then you have other packages that also need the
build system to build a source distribution, right now, we don't share
the `OnceMap` between those cases.
This lifts the `InFlight` tracking up a level, so that it's initialized
once per command, then shared everywhere.
## Test Plan
I'm having trouble coming up with an identical test-case and hesitant to
add this slow test to the suite... But if you run `pip install
--reinstall` with:
```
waitress @ git+https://github.com/zanieb/waitress
devpi-server @ git+https://github.com/zanieb/devpi#subdirectory=server
```
It fails consistently on `main` and passes here.
## Summary
This makes the separation clearer between the legacy `pip` API and the
API we'll add in the future for the package manager itself. It also
enables seamless `puffin pip` aliasing for those that want it.
Closes#918.
## Summary
This PR restructures the flat index fetching in a few ways:
1. It now lives in its own `FlatIndexClient`, since it felt a bit
awkward (in my opinion) for it to live in `RegistryClient`.
2. We now fetch the `FlatIndex` outside of the resolver. This has a few
benefits: (1) the resolver construct is no longer `async` and no longer
returns `Result`, which feels better for a resolver; and (2) we can
share the `FlatIndex` across resolutions rather than re-fetching it for
every source distribution build.
## Summary
`FlatIndex` is now the thing that's keyed on `PackageName`, while
`FlatDistributions` is what used to be called `FlatIndex` (a map from
version to `PrioritizedDistribution`, for a single package). I find this
a bit clearer, since we can also remove the `from_files` that doesn't
return `Self`, which I had trouble following.
## Summary
I'm running into some annoyances converting `&Version` to
`&PubGrubVersion` (which is just a wrapper type around `Version`), and I
realized... We don't even need `PubGrubVersion`?
The reason we "need" it today is due to the orphan trait rule: `Version`
is defined in `pep440_rs`, but we want to `impl
pubgrub::version::Version for Version` in the resolver crate.
Instead of introducing a new type here, which leads to a lot of
awkwardness around conversion and API isolation, what if we instead just
implement `pubgrub::version::Version` in `pep440_rs` via a feature? That
way, we can just use `Version` everywhere without any confusion and
conversion for the wrapper type.
Add directory `--find-links` support for local paths to pip-compile.
It seems that pip joins all sources and then picks the best package. We
explicitly give find links packages precedence if the same exists on an
index and locally by prefilling the `VersionMap`, otherwise they are
added as another index and the existing rules of precedence apply.
Internally, the feature is called _flat index_, which is more meaningful
than _find links_: We're not looking for links, we're picking up local
directories, and (TBD) support another index format that's just a flat
list of files instead of a nested index.
`RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and
`SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist`
and `RegistrySourceDist` gained the ability to represent both a url and
a path so that `--find-links` with a url and with a path works the same,
both being locked as `<package_name>@<version>` instead of
`<package_name> @ <url>`. (This is more of a detail, this PR in general
still work if we strip that and have directory find links represented as
`<package_name> @ file:///path/to/file.ext`)
`PrioritizedDistribution` and `FlatIndex` have been moved to locations
where we can use them in the upstack PR.
I added a `scripts/wheels` directory with stripped down wheels to use
for testing.
We're lacking tests for correct tag priority precedence with flat
indexes, i only confirmed this manually since it is not covered in the
pip-compile or pip-sync output.
Closes#876
There is no guarantee that indexes provide hashes at all or the sha256
we support specifically. [PEP
503](https://peps.python.org/pep-0503/#specification):
> The URL SHOULD include a hash in the form of a URL fragment with the
following syntax: #<hashname>=<hashvalue>, where <hashname> is the
lowercase name of the hash function (such as sha256) and <hashvalue> is
the hex encoded digest.
We instead use the url as input to generate a hash when caching.
## Summary
We always normalize extra names in our requirements (e.g., `cuda12_pip`
to `cuda12-pip`), but we weren't normalizing within PEP 508 markers,
which meant we ended up comparing `cuda12-pip` (normalized) against
`cuda12_pip` (unnormalized).
Closes https://github.com/astral-sh/puffin/issues/911.
Previously, the url on file could either be a relative or an absolute
url, depending on the index, and we would finalize it lazily. Now we
finalize the url when converting `pypi_types::File` to
`distribution_types::File`. This change is required to make the hashes
on `File` optional (https://github.com/astral-sh/puffin/pull/910), which
are currently the only unique field usable for caching.
## Summary
This PR ensures that when the user passes in `--python-version`, we
adjust the _markers_ to match the target version, thus forcing us to
select compatible wheels for the `--python-version`, rather than the
installed version.
## Context
Let's call Python 3.10 the "installed" environment and Python 3.12 the
"target" environment. For each version, we have _both_ a Python version
(to match against `Requires-Python`) and a set of tags (to match against
wheels).
The rules for resolution are as follows...
- For each package, for each version, we try to find the "best
candidate" for resolution and installation.
- We first look for a wheel that's compatible with the _target_
environment. This requires testing against both the `Requires-Python`
and the markers. (We won't have to build or run this code, so the
_installed_ version is irrelevant.) **(This PR corrects _this_ bullet --
previously, we validated against the _installed_ markers, rather than
the target markers.)**
- If we can't find a compatible wheel, we accept any _incompatible_
wheel as long as there's a source distribution. The source distribution
_must_ be compatible with the target environment. (We won't have to
build or run this code, so the _installed_ version is irrelevant.)
- If there are no wheels, then the source distribution must be
compatible with _both_ the installed and target environments, since we
need to build it.
This is all true for the top-level resolution. When we perform a
sub-resolution (when resolving the build dependencies of a source
distribution), we should _only_ use the installed environment, and
ignore the target environment, since we assume that the dependencies
will be the same in both environments once built -- so our goal is
"just" to build the distribution, without concern for which build
dependencies it uses.
Closes https://github.com/astral-sh/puffin/issues/883.
Remove a test case from the `install_editable` that slows it down from
3.6s to 6.5s while providing low test coverage. It also seems to block
other tests sometimes, `cargo nextest run -E "test(editable)"
--all-features` has more consistent and lower runtimes. Surprisingly
this seems to have bigger effect than switching from pyo3 to cffi.
Used test commands:
```
rm -rf scripts/editable-installs/maturin_editable/target/ && time cargo nextest run -E "test(=install_editable)" --all-features
rm -rf scripts/editable-installs/maturin_editable/target/ && time cargo nextest run -E "test(editable)" --all-features
```
Part of #878