This is a pure refactor to follow-up #690, to separate the metadata that
we know upfront about distributions (like the version, for
registry-based distributions) vs. the metadata that requires building
(like the version, for URL-based distributions).
We now show the fully-resolved URL, rather than the URL as given by the
user, _everywhere_ except for the output resolution file (which should
retain relative paths, unexpanded environment variables, etc.).
Closes https://github.com/astral-sh/puffin/issues/687.
Per the title: adds support for `-e` installs to `puffin pip-install`.
There were some challenges here around threading the editable installs
to the right places. Namely, we want to build _once_, then reuse the
editable installs from the resolution. At present, we were losing the
`editable: true` flag on the `Dist` that came back through the
resolution, so it required some changes to the resolver.
Closes https://github.com/astral-sh/puffin/issues/672.
## Summary
This PR ensures that we re-use the resolution to install the build
dependencies when building a source distribution. Currently, we only
pass along the list of requirements, and then use the `Finder` to map
each requirement to a distribution. But we already determine the correct
distribution when resolving!
Closes https://github.com/astral-sh/puffin/issues/655.
## Summary
This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout
the resolution and installation pipeline. In short, alongside the parsed
`Url`, we also keep the URL as written by the user. This enables us to
display the URL exactly as written by the user, rather than the
serialized path that we use internally.
This will be especially useful once we start expanding environment
variables since, at that point, we'll be able to write the version of
the URL that includes the _unexpected_ environment variable to the
output file.
This allows us to enforce type safety within the resolver. For example,
in the index, we can remove `String` as a key type and enforce that
callers _must_ present us with a `PackageId`. (This actually caught one
bug, where we were using the SHA rather than the package ID. That bug
shouldn't have had any effect given where it was, since those are 1:1,
but it's still problematic.)
## Summary
When resolving `transformers[tensorboard]`, the `[tensorboard]` extra
doesn't exist. Previously, we returned "unknown" dependencies for this
variant, which leads the resolution to try all versions, then fail. This
PR instead warns, but returns the base dependencies for the package,
which matches `pip`. (Poetry doesn't even warn, it just proceeds as
normal.)
Arguably, it would be better to return a custom incompatibility here and
then propagate... But this PR is better than the status quo, and I don't
know if we have support for that behavior yet...? (\cc @zanieb)
Closes#386.
Closes https://github.com/astral-sh/puffin/issues/423.
## Summary
This PR enables overrides to be passed to `pip-compile` and
`pip-install` via a new `--overrides` flag.
When overrides are provided, we effectively replace any requirements
that are overridden with the overridden versions. This is applied at all
depths of the tree.
The merge semantics are such that we replace _all_ requirements of a
package with _all_ requirements from the overrides files. So, for
example, if a package declares:
```
foo >= 1.0; python_version < '3.11'
foo < 1.0; python_version >= '3.11'
```
And the user provides an override like:
```
foo >= 2.0
```
Then _both_ of the `foo` requirements in the package will be replaced
with the override.
If instead, the user provided an override like:
```
foo >= 2.0; python_version < '3.11'
foo < 3.0; python_version >= '3.11'
```
Then we'd replace _both_ of the original `foo` requirements with both of
these overrides. (In technical terms, for each package in the
requirements file, we flat-map over its overrides.)
Closes https://github.com/astral-sh/puffin/issues/511.
## Summary
Now, `puffin_warnings::warn_once` and `puffin_warnings::warn` will go to
`stderr`, as long as the user isn't running under `--quiet`. Previously,
these went through `tracing`, and so were only visible when running
under `--verbose`.
Uses https://github.com/pubgrub-rs/pubgrub/pull/156 to consolidate
version ranges in error reports using the actual available versions for
each package.
Alternative to https://github.com/zanieb/pubgrub/pull/8 which implements
this behavior as a method in the `Reporter` — here it's implemented in
our custom report formatter (#521) instead which requires no upstream
changes.
Requires https://github.com/zanieb/pubgrub/pull/11 to only retrieve the
versions for packages that will be used in the report.
This is a work in progress. Some things to do:
- ~We may want to allow lazy retrieval of the version maps from the
formatter~
- [x] We should probably create a separate error type for no solution
instead of mixing them with other resolve errors
- ~We can probably do something smarter than creating vectors to hold
the versions~
- [x] This degrades error messages when a single version is not
available, we'll need to special case that
- [x] It seems safer to coerce the error type in `resolve` instead of
`solve` if feasible
Make `prepare_metadata_for_build_wheel` accessible across the puffin
codebase by splitting the built call into a setup, a metadata and a
wheel call. This does not actually use the hook yet, but it's the
required refactoring for it.
Part of #599.
I don't know why, but this seems to resolve
https://github.com/astral-sh/puffin/issues/619. The Tokio docs also say
that using Tokio's Mutex is _not_ recommended unless you need to hold
the Mutex across an `.await`, which we don't.
Since this is a non-deterministic failure, I just ran it a bunch of
times and ensured it didn't hang (whereas it did hang occasionally prior
to this PR).
Closes https://github.com/astral-sh/puffin/issues/619
## Summary
Now, after running `pip-install`, we validate that the set of installed
packages is consistent -- that is, that we don't have any packages that
are missing dependencies, or incompatible versions of installed
dependencies.
## Summary
At present, when performing a `pip-install`, we first do a resolution,
then take the set of requirements and basically run them through our
`pip-sync`, which itself includes re-resolving the dependencies to get a
specific `Dist` for each package. (E.g., the set of requirements might
say `flask==3.0.0`, but the installer needs a specific _wheel_ or source
distribution to install.)
This PR removes this second resolution by exposing the set of pinned
packages from the resolution. The main challenge here is that we have an
optimization in the resolver such that we let the resolver read metadata
from an incompatible wheel as long as a source distribution exists for a
given package. This lets us avoid building source distributions in the
resolver under the assumption that we'll be able to install the package
later on, if needed. As such, the resolver now needs to track the
resolution and installation filenames separately.
## Summary
At present, we have two separate phases within the installation pipeline
related to populating wheels into the cache. The first phase downloads
the distribution, and then builds any source distributions into wheels;
the second phase unzips all the built wheels into the cache.
This PR merges those two phases into one, such that we seamlessly
download, build, and unzip wheels in one pass. This is more efficient,
since we can start unzipping while we build. It also ensures that if the
install _fails_ partway through, we don't end up with a bunch of
downloaded wheels that we never had a chance to unzip. The code is also
much simpler.
The main downside is that the user-facing feedback isn't as granular,
since we only have one phase and one progress bar for what was
originally three distinct phases.
Closes https://github.com/astral-sh/puffin/issues/571.
## Test Plan
I ran the benchmark script on two separate requirements files, and saw a
7% and 31% speedup respectively:
```text
+ TARGET=./scripts/benchmarks/requirements.txt
+ hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache'
Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache
Time (mean ± σ): 269.4 ms ± 33.0 ms [User: 42.4 ms, System: 117.5 ms]
Range (min … max): 221.7 ms … 446.7 ms 100 runs
Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache
Time (mean ± σ): 250.6 ms ± 28.3 ms [User: 41.5 ms, System: 127.4 ms]
Range (min … max): 207.6 ms … 336.4 ms 100 runs
Summary
'./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ran
1.07 ± 0.18 times faster than './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache'
```
```text
+ TARGET=./scripts/benchmarks/requirements-large.txt
+ hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache'
Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache
Time (mean ± σ): 5.053 s ± 0.354 s [User: 1.413 s, System: 6.710 s]
Range (min … max): 4.584 s … 6.333 s 100 runs
Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache
Time (mean ± σ): 3.845 s ± 0.225 s [User: 1.364 s, System: 6.970 s]
Range (min … max): 3.482 s … 4.715 s 100 runs
Summary
'./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' ran
```
Currently, `dbg!` is hard to read because versions are verbose, showing
all optional fields, and we have a lot of versions. Changing debug
formatting to displaying the version number (which can be losslessly
converted to the struct and back) makes this more readable.
See e.g.
https://gist.github.com/konstin/38c0f32b109dffa73b3aa0ab86b9662b
**Before**
```text
version: Version {
epoch: 0,
release: [
1,
2,
3,
],
pre: None,
post: None,
dev: None,
local: None,
},
```
**After**
```text
version: "1.2.3",
```
I saw warnings when we were e.g. unzipping wheel and setuptools in two
tasks at the same time. We now keep track of in flight unzips.
This introduces a `OnceMap` abstraction which we also use in the
resolver.
Extends #517 with a suggestion from @konstin to parse the `SimpleJson`
into an intermediate type `SimpleMetadata(BTreeMap<Version,
VersionFiles>)` before converting to a `VersionMap`. This reduces the
number of times we need to parse the response. Additionally, we cache
the parsed response now instead of `SimpleJson`.
`VersionFiles` stores two vectors with
`WheelFilename`/`SourceDistFilename` and `File` tuples. These can be
iterated over together or separately. A new enum `DistFilename` was
added to capture the `SourceDistFilename` and `WheelFilename` variants
allowing iteration over both vectors.
I'm working off of @konstin's commit here to implement arbitrary unsat
test cases for the resolver.
The entirety of the resolver's io are two functions: Get the version map
for a package (PEP 440 version -> distribution) and get the metadata for
a distribution. A new trait `ResolverProvider` abstracts these two away and
allows replacing the real network requests e.g. with stored responses
(https://github.com/pradyunsg/pip-resolver-benchmarks/blob/main/scenarios/pyrax_198.json).
---------
Co-authored-by: konsti <konstin@mailbox.org>
## Summary
This enables users to rely on yanked versions via explicit `==` markers,
which is necessary in some projects (and, in my opinion, reasonable).
Closes#551.
It turns out that it's not uncommon to use timestamps as patch versions
(e.g., `20230628214621`). I believe this is the ISO 8601 "basic format".
These can't be represented by a `u32`, so I think it makes sense to just
bump to `u64` to remove this limitation.
## Summary
This PR modifies the behavior of our `--python-version` override in two
ways:
1. First, we always use the "real" interpreter in the source
distribution builder. I think this is correct. We don't need to use the
fake markers for recursive builds, because all we care about is the
top-level resolution, and we already assume that a single source
distribution will always return the same metadata regardless of its
build environment.
2. Second, we require that source distributions are compatible with
_both_ the "real" interpreter version and the marker environment. This
ensures that we don't try to build source distributions that are
compatible with our interpreter, but incompatible with the target
version.
Closes https://github.com/astral-sh/puffin/issues/407.
This removes the last usage of cacache by replacing it with a custom,
flat json caching keyed by the digest of the executable path.

A step towards #478. I've made `CachedByTimestamp<T>` generic over `T`
but intentionally not moved it to `puffin-cache` yet.
This is mostly a mechanical refactor that moves 80% of our code to the
same cache abstraction.
It introduces cache `Cache`, which abstracts away the path of the cache
and the temp dir drop and is passed throughout the codebase. To get a
specific cache bucket, you need to requests your `CacheBucket` from
`Cache`. `CacheBucket` is the centralizes the names of all cache
buckets, moving them away from the string constants spread throughout
the crates.
Specifically for working with the `CachedClient`, there is a
`CacheEntry`. I'm not sure yet if that is a strict improvement over
`cache_dir: PathBuf, cache_file: String`, i may have to rotate that
later.
The interpreter cache moved into `interpreter-v0`.
We can use the `CacheBucket` page to document the cache structure in
each bucket:

## Summary and motivation
For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.
`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.
Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.
In a follow-up PR, I'll also clear built wheels when they've become
stale.
Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.
Closes#473
## Source dist cache structure
Entries by supported sources:
* `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename
(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`
Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│ └── 5c56bc1c58c34c11
│ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│ └── metadata.json
├── pypi
│ └── django-allauth-0.51.0.tar.gz
│ └── metadata.json
└── url
└── 6781bd6440ae72c2
└── werkzeug-3.0.1.tar.gz
└── metadata.json
```
The inside of a `metadata.json`:
```json
{
"data": {
"django_allauth-0.51.0-py3-none-any.whl": {
"metadata-version": "2.1",
"name": "django-allauth",
"version": "0.51.0",
...
}
}
}
```
Preparing for #235, some refactoring to `puffin_interpreter`.
* Added a dedicated error type instead of anyhow
* `InterpreterInfo` -> `Interpreter`
* `detect_virtual_env` now returns an option so it can be chained for
#235
## Summary
This PR adds support for local path dependencies. The approach mostly
just falls out of our existing approach and infrastructure for Git and
URL dependencies.
Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a
separate issue for editable installs.)
## Test Plan
Added `pip-compile` tests that pre-download a wheel or source
distribution, then install it via local path.
## Summary
A variety of small refactors to the distribution types crate to (1)
return `Result` if we find an invalid wheel, rather than treating it as
a source distribution with a `.whl` suffix, and (2) DRY up some repeated
code around URLs.
A consistent cache structure for remote wheel metadata:
* `<wheel metadata cache>/pypi/foo-1.0.0-py3-none-any.json`
* `<wheel metadata
cache>/<digest(index-url)>/foo-1.0.0-py3-none-any.json`
* `<wheel metadata cache>/url/<digest(url)>/foo-1.0.0-py3-none-any.json`
The source dist caching will use a similar structure (#468).
## Summary
This PR unifies the behavior that lived in the resolver's `distribution`
crates with the behaviors that were spread between the various structs
in the installer crate into a single `Fetcher` struct that is intended
to manage all interactions with distributions. Specifically, the
interface of this struct is such that it can access distribution
metadata, download distributions, return those downloads, etc., all with
a common cache.
Overall, this is mostly just DRYing up code that was repeated between
the two crates, and putting it behind a reasonable shared interface.
## Summary
This crate only contains types, and I want to introduce a new crate for
all _operations_ on distributions, so this feels like a more natural
name given we also have `pypi-types`.
## Summary
This is a refactor to address a TODO in the build context whereby we
aren't respecting the resolution options in recursive resolutions. Now,
the options are split out from the resolution _manifest_, and shared
across the build context tree.