## Summary
We need to pass in the distribution with the "precise" URL to avoid
refetching.
## Test Plan
Ran `cargo run -p puffin-cli -- pip-compile requirements.in --verbose`
with `flask @ git+https://github.com/pallets/flask.git` and verified
that we only checked out Flask once.
## Summary and motivation
For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.
`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.
Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.
In a follow-up PR, I'll also clear built wheels when they've become
stale.
Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.
Closes#473
## Source dist cache structure
Entries by supported sources:
* `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename
(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`
Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│ └── 5c56bc1c58c34c11
│ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│ └── metadata.json
├── pypi
│ └── django-allauth-0.51.0.tar.gz
│ └── metadata.json
└── url
└── 6781bd6440ae72c2
└── werkzeug-3.0.1.tar.gz
└── metadata.json
```
The inside of a `metadata.json`:
```json
{
"data": {
"django_allauth-0.51.0-py3-none-any.whl": {
"metadata-version": "2.1",
"name": "django-allauth",
"version": "0.51.0",
...
}
}
}
```
Preparing for #235, some refactoring to `puffin_interpreter`.
* Added a dedicated error type instead of anyhow
* `InterpreterInfo` -> `Interpreter`
* `detect_virtual_env` now returns an option so it can be chained for
#235
## Summary
This PR adds support for local path dependencies. The approach mostly
just falls out of our existing approach and infrastructure for Git and
URL dependencies.
Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a
separate issue for editable installs.)
## Test Plan
Added `pip-compile` tests that pre-download a wheel or source
distribution, then install it via local path.
## Summary
A variety of small refactors to the distribution types crate to (1)
return `Result` if we find an invalid wheel, rather than treating it as
a source distribution with a `.whl` suffix, and (2) DRY up some repeated
code around URLs.
A consistent cache structure for remote wheel metadata:
* `<wheel metadata cache>/pypi/foo-1.0.0-py3-none-any.json`
* `<wheel metadata
cache>/<digest(index-url)>/foo-1.0.0-py3-none-any.json`
* `<wheel metadata cache>/url/<digest(url)>/foo-1.0.0-py3-none-any.json`
The source dist caching will use a similar structure (#468).
Deduplicate lenient parsing code between version specifiers and
Requirement. Use `warn_once!` since the warnings did show up multiple
times in my code. Fix the macro hygiene in `warn_once!`.
This PR modifies the `Fetcher` to cache remote wheels that we _already_
store to-disk. We might read these again in the future, so we might as
well store them in the cache for consistency (rather than using a
temporary directory).
## Summary
This PR unifies the behavior that lived in the resolver's `distribution`
crates with the behaviors that were spread between the various structs
in the installer crate into a single `Fetcher` struct that is intended
to manage all interactions with distributions. Specifically, the
interface of this struct is such that it can access distribution
metadata, download distributions, return those downloads, etc., all with
a common cache.
Overall, this is mostly just DRYing up code that was repeated between
the two crates, and putting it behind a reasonable shared interface.
## Summary
This crate only contains types, and I want to introduce a new crate for
all _operations_ on distributions, so this feels like a more natural
name given we also have `pypi-types`.
Previously, git requirements would fail when setting `--cache-dir`:
```console
$ cargo run --bin puffin -- pip-compile --cache-dir cache-all-kinds scripts/benchmarks/requirements/all-kinds.in
error: Failed to build distribution from URL: git+https://github.com/pydantic/pydantic-extra-types.git
Caused by: Invalid path URL: cache-all-kinds/git-v0/db/b49ffcfeb6c2e9d8
```
The cause is using a relative and not an absolute path, which `Url` needs, the solution is to turn the cache dir into an absolute path.
This never showed up in the tests since the tests use absolute temp dirs for everything.
```
error[E0432]: unresolved imports `puffin_cache::CacheArgs`, `puffin_cache::CacheDir`
--> crates/puffin-cli/src/main.rs:11:20
|
11 | use puffin_cache::{CacheArgs, CacheDir};
| ^^^^^^^^^ ^^^^^^^^ no `CacheDir` in the root
| |
| no `CacheArgs` in the root
|
note: found an item that was configured out
--> /Users/mz/eng/src/astral-sh/puffin/crates/puffin-cache/src/lib.rs:7:15
|
7 | pub use cli::{CacheArgs, CacheDir};
| ^^^^^^^^^
= note: the item is gated behind the `clap` feature
note: found an item that was configured out
--> /Users/mz/eng/src/astral-sh/puffin/crates/puffin-cache/src/lib.rs:7:26
|
7 | pub use cli::{CacheArgs, CacheDir};
| ^^^^^^^^
= note: the item is gated behind the `clap` feature
For more information about this error, try `rustc --explain E0432`.
error: could not compile `puffin-cli` (bin "puffin") due to previous error
```
## Summary
This is a refactor to address a TODO in the build context whereby we
aren't respecting the resolution options in recursive resolutions. Now,
the options are split out from the resolution _manifest_, and shared
across the build context tree.
This script can compare different requirements between pip(-compile) and
puffin across python versions, with debug and release builds.
Examples:
```shell
scripts/compare_with_pip/compare_with_pip.py
scripts/compare_with_pip/compare_with_pip.py -p 3.10
scripts/compare_with_pip/compare_with_pip.py --release -p 3.9 --target 'transformers[deepspeed-testing,dev-tensorflow]'
```
It found a bunch of fixed bugs, e.g. the lack of yanked package handling
and source dist handling, as well as #423, which is currently most of
the output.
Example output:
https://gist.github.com/konstin/9ccf8dc7c2dcca737bf705429ced4892#443 should be merged first
Extends #424 with support for URL dependency incompatibilities.
Requires changes to `miette` to prevent URLs from being word wrapped;
accepted upstream in https://github.com/zkat/miette/pull/321
We're willing to use platform-incompatible wheels during resolution, to
quicken access to metadata... But we should avoid choosing an
incompatible wheel if the package lacks a source distribution since, in
that case, we definitely won't be able to install it.
Closes https://github.com/astral-sh/puffin/issues/439.
Always¹ clear the temporary directories we create.
* Clear source dist downloads: Previously, the temporary directories
would remain in the cache dir, now they are cleared properly
* Clear wheel file downloads: Delete the `.whl` file, we only need to
cache the unpacked wheel
* Consistent handling of cache arguments: Abstract the handling for CLI
cache args away, again making sure we remove the `--no-cache` temp dir.
There are no more `into_path()` calls that persist `TempDir`s that i
could find.
¹Assuming drop is run, and deleting the directory doesn't silently
error.
Addresses
https://github.com/astral-sh/puffin/issues/309#issuecomment-1792648969
Similar to #338 this throws an error when merging versions results in an
empty set. Instead of propagating that error, we capture it and return a
new dependency type of `Unusable`. Unusable dependencies are a new
incompatibility kind which includes an arbitrary "reason" string that we
present to the user. Adding a new incompatibility kind requires changes
to the vendored pubgrub crate.
We could use this same incompatibility kind for conflicting urls as in
#284 which should allow the solver to backtrack to another valid version
instead of failing (see #425).
Unlike #383 this does not require changes to PubGrub's package mapping
model. I think in the long run we'll want PubGrub to accept multiple
versions per package to solve this specific issue, but we're interested
in it being merged upstream first. This pull request is just using the
issue as a simple case to explore adding a new incompatibility type.
We may or may not be able convince them to add this new incompatibility
type upstream. As discussed in
https://github.com/pubgrub-rs/pubgrub/issues/152, we may want a more
general incompatibility kind instead which can be used for arbitrary
problems. An upstream pull request has been opened for discussion at
https://github.com/pubgrub-rs/pubgrub/pull/153.
Related to:
- https://github.com/pubgrub-rs/pubgrub/issues/152
- #338
- #383
---------
Co-authored-by: konsti <konstin@mailbox.org>
A fork will let us stay up to date with the upstream while replaying our
work on top of it.
I expect a similar workflow to the RustPython-Parser fork we maintained,
except that I wrote an automation to create tags for each commit on the
fork (https://github.com/zanieb/pubgrub/pull/2) so we do not need to
manually tag and document each commit.
To update with the upstream:
- Rebase our fork's `main` branch on top of the latest changes in
upstream's `dev` branch
- Force push, overwriting our `main` branch history
- Change the commit hash here to the last commit on `main` in our fork
Since we automatically tag each commit on the fork, we should never lose
the commits that are dropped from `main` during rebase.
This works by filtering out files with a more recent upload time, so if
the index you use does not provide upload times, the results might be
inaccurate. pypi provides upload times for all files. This is, the field
is non-nullable in the warehouse schema, but the simple API PEP does not
know this field.
If you have only pypi dependencies, this means deterministic,
reproducible(!) resolution. We could try doing the same for git repos
but it doesn't seem worth the effort, i'd recommend pinning commits
since git histories are arbitrarily malleable and also if you care about
reproducibility and such you such not use git dependencies but a custom
index.
Timestamps are given either as RFC 3339 timestamps such as
`2006-12-02T02:07:43Z` or as UTC dates in the same format such as
`2006-12-02`. Dates are interpreted as including this day, i.e. until
midnight UTC that day. Date only is required to make this ergonomic and
midnight seems like an ergonomic choice.
In action for `pandas`:
```console
$ target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
Resolved 6 packages in 679ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
numpy==1.26.2
# via pandas
pandas==2.1.3
python-dateutil==2.8.2
# via pandas
pytz==2023.3.post1
# via pandas
six==1.16.0
# via python-dateutil
tzdata==2023.3
# via pandas
$ target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
Resolved 5 packages in 655ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
numpy==1.23.4
# via pandas
pandas==1.5.1
python-dateutil==2.8.2
# via pandas
pytz==2022.6
# via pandas
six==1.16.0
# via python-dateutil
$ target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
Resolved 5 packages in 594ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
numpy==1.21.4
# via pandas
pandas==1.3.4
python-dateutil==2.8.2
# via pandas
pytz==2021.3
# via pandas
six==1.16.0
# via python-dateutil
```
[PEP 691](https://peps.python.org/pep-0691/#project-detail) has slightly
different, more relaxed rules around file metadata. These changes are
now reflected in the `File` struct. This will make it easier to support
alternative indices.
I had expected that i need to introduce a separate type for that, so i'm
happy it's two `Option`s more and an alias.
Part of #412
Previously, we were assuming that `which <python>` return the path to
the python executable. This is not true when using pyenv shims, which
are bash scripts. Instead, we have to use `sys.executable`. Luckily,
we're already querying the python interpreter and can do it in that
pass.
We are also not allowed to cache the execution of the python interpreter
through the shim because pyenv might change the target. As a heuristic,
we check whether `sys.executable`, the real binary, is the same our
canonicalized `which` result.
---------
Co-authored-by: Zanie Blue <contact@zanie.dev>
## Summary
This PR implements logic to sort wheels by priority, where priority is
defined as preferring more "specific" wheels over less "specific"
wheels. For example, in the case of Black, my machine now selects
`black-23.11.0-cp311-cp311-macosx_11_0_arm64.whl`, whereas sorting by
lowest priority instead gives me `black-23.11.0-py3-none-any.whl`.
As part of this change, I've also modified the resolver to fallback to
using incompatible wheels when determining package metadata, if no
compatible wheels are available.
The `VersionMap` was also moved out of `resolver.rs` and into its own
file with a wrapper type, for clarity.
Closes https://github.com/astral-sh/puffin/issues/380.
Closes https://github.com/astral-sh/puffin/issues/421.