## Summary and motivation
For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.
`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.
Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.
In a follow-up PR, I'll also clear built wheels when they've become
stale.
Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.
Closes#473
## Source dist cache structure
Entries by supported sources:
* `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename
(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`
Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│ └── 5c56bc1c58c34c11
│ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│ └── metadata.json
├── pypi
│ └── django-allauth-0.51.0.tar.gz
│ └── metadata.json
└── url
└── 6781bd6440ae72c2
└── werkzeug-3.0.1.tar.gz
└── metadata.json
```
The inside of a `metadata.json`:
```json
{
"data": {
"django_allauth-0.51.0-py3-none-any.whl": {
"metadata-version": "2.1",
"name": "django-allauth",
"version": "0.51.0",
...
}
}
}
```
## Summary
This PR adds support for local path dependencies. The approach mostly
just falls out of our existing approach and infrastructure for Git and
URL dependencies.
Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a
separate issue for editable installs.)
## Test Plan
Added `pip-compile` tests that pre-download a wheel or source
distribution, then install it via local path.
## Summary
A variety of small refactors to the distribution types crate to (1)
return `Result` if we find an invalid wheel, rather than treating it as
a source distribution with a `.whl` suffix, and (2) DRY up some repeated
code around URLs.
A consistent cache structure for remote wheel metadata:
* `<wheel metadata cache>/pypi/foo-1.0.0-py3-none-any.json`
* `<wheel metadata
cache>/<digest(index-url)>/foo-1.0.0-py3-none-any.json`
* `<wheel metadata cache>/url/<digest(url)>/foo-1.0.0-py3-none-any.json`
The source dist caching will use a similar structure (#468).
This PR modifies the `Fetcher` to cache remote wheels that we _already_
store to-disk. We might read these again in the future, so we might as
well store them in the cache for consistency (rather than using a
temporary directory).
## Summary
This PR unifies the behavior that lived in the resolver's `distribution`
crates with the behaviors that were spread between the various structs
in the installer crate into a single `Fetcher` struct that is intended
to manage all interactions with distributions. Specifically, the
interface of this struct is such that it can access distribution
metadata, download distributions, return those downloads, etc., all with
a common cache.
Overall, this is mostly just DRYing up code that was repeated between
the two crates, and putting it behind a reasonable shared interface.
## Summary
This crate only contains types, and I want to introduce a new crate for
all _operations_ on distributions, so this feels like a more natural
name given we also have `pypi-types`.
[PEP 691](https://peps.python.org/pep-0691/#project-detail) has slightly
different, more relaxed rules around file metadata. These changes are
now reflected in the `File` struct. This will make it easier to support
alternative indices.
I had expected that i need to introduce a separate type for that, so i'm
happy it's two `Option`s more and an alias.
Part of #412
Implement two behaviors for yanked versions:
* During `pip-compile`, yanked versions are filtered out entirely, we
currently treat them is if they don't exist. This is leads to confusing
error messages because a version that does exist seems to have suddenly
disappeared.
* During `pip-sync`, we warn when we fetch a remote distribution and it
has been yanked. We currently don't warn on cached or installed
distributions that have been yanked.
Filter out source dists and wheels whose `requires-python` from the
simple api is incompatible with the current python version.
This change showed an important problem: When we use a fake python
version for resolving, building source distributions breaks down because
we can only build with versions we actually have.
This change became surprisingly big. The tests now require python 3.7 to
be installed, but changing that would mean an even bigger change.
Fixes#388
## Summary
This PR refactors our `RemoteDistribution` type such that it now follows
a clear hierarchy that matches the actual variants, and encodes the
differences between source and built distributions:
```rust
pub enum Distribution {
Built(BuiltDistribution),
Source(SourceDistribution),
}
pub enum BuiltDistribution {
Registry(RegistryBuiltDistribution),
DirectUrl(DirectUrlBuiltDistribution),
}
pub enum SourceDistribution {
Registry(RegistrySourceDistribution),
DirectUrl(DirectUrlSourceDistribution),
Git(GitSourceDistribution),
}
/// A built distribution (wheel) that exists in a registry, like `PyPI`.
pub struct RegistryBuiltDistribution {
pub name: PackageName,
pub version: Version,
pub file: File,
}
/// A built distribution (wheel) that exists at an arbitrary URL.
pub struct DirectUrlBuiltDistribution {
pub name: PackageName,
pub url: Url,
}
/// A source distribution that exists in a registry, like `PyPI`.
pub struct RegistrySourceDistribution {
pub name: PackageName,
pub version: Version,
pub file: File,
}
/// A source distribution that exists at an arbitrary URL.
pub struct DirectUrlSourceDistribution {
pub name: PackageName,
pub url: Url,
}
/// A source distribution that exists in a Git repository.
pub struct GitSourceDistribution {
pub name: PackageName,
pub url: Url,
}
```
Most of the PR just stems downstream from this change. There are no
behavioral changes, so I'm largely relying on lint, tests, and the
compiler for correctness.
We now write the `direct_url.json` when installing, and _skip_
installing if we find a package installed via the direct URL that the
user is requesting.
A lot of TODOs, especially around cleaning up the `Source` abstraction
and its relationship to `DirectUrl`. I'm gonna keep working on these
today, but this works and makes the requirements clear.
Closes#332.
## Summary
This is a first-pass at adding source distribution support to the
installer.
The previous installation flow was:
1. Come up with a plan.
1. Find a distribution (specific file) for every package that we'll need
to download.
1. Download those distributions.
1. Unzip them (since we assumed they were all wheels).
1. Install them into the virtual environment.
Now, Step (3) downloads both wheels and source distributions, and we
insert a step between Steps (3) and (4) to build any source
distributions into zipped wheels.
There are a bunch of TODOs, the most important (IMO) is that we
basically have two implementations of downloading and building, between
the stuff in `puffin_installer` and `puffin_resolver` (namely in
`crates/puffin-resolver/src/distribution`). I didn't attempt to clean
that up here -- it's already a problem, and it's related to the overall
problem we need to solve around unified caching and resource management.
Closes#243.
`PackageName` and `ExtraName` can now only be constructed from valid
names. They share the same rules, so i gave them the same
implementation. Constructors are split between `new` (owned) and
`from_str` (borrowed), with the owned version avoiding allocations.
Closes#279
---------
Co-authored-by: Zanie <contact@zanie.dev>
The normalized name abstractions were not consistently, this PR uses
them where they were previously missing:
* `WheelFilename::distribution`
* `Requirement::name`
* `Requirement::extras`
* `Metadata21::name`
* `Metadata21::provides_dist`
With `puffin-package` depending on `pep508_rs` this would be cyclical
crate dependency, so `puffin-normalize` gets split out from
`puffin-package`.
`DistInfoName` has the same task and semantics as `PackageName`, so it's
merged into the latter.
`PackageName` and `ExtraName` documentation is moved onto the type and
their constructors are called `new` instead of `normalize`. We now use
these constructors rarely enough the implicit allocation by
`to_string()` shouldn't matter anymore, while more actual cloning
becomes visible.
## Summary
This PR adds support for resolving and installing dependencies via
direct URLs, like:
```
werkzeug @ 960bb4017c/Werkzeug-2.0.0-py3-none-any.whl
```
These are fairly common (e.g., with `torch`), but you most often see
them as Git dependencies.
Broadly, structs like `RemoteDistribution` and friends are now enums
that can represent either registry-based dependencies or URL-based
dependencies:
```rust
/// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`).
#[derive(Debug, Clone)]
#[allow(clippy::large_enum_variant)]
pub enum RemoteDistribution {
/// The distribution exists in a registry, like `PyPI`.
Registry(PackageName, Version, File),
/// The distribution exists at an arbitrary URL.
Url(PackageName, Url),
}
```
In the resolver, we now allow packages to take on an extra, optional
`Url` field:
```rust
#[derive(Debug, Clone, Eq, Derivative)]
#[derivative(PartialEq, Hash)]
pub enum PubGrubPackage {
Root,
Package(
PackageName,
Option<DistInfoName>,
#[derivative(PartialEq = "ignore")]
#[derivative(PartialOrd = "ignore")]
#[derivative(Hash = "ignore")]
Option<Url>,
),
}
```
However, for the purpose of version satisfaction, we ignore the URL.
This allows for the URL dependency to satisfy the transitive request in
cases like:
```
flask==3.0.0
werkzeug @ 254c3e9b5f/werkzeug-3.0.1-py3-none-any.whl
```
There are a couple limitations in the current approach:
- The caching for remote URLs is done separately in the resolver vs. the
installer. I decided not to sweat this too much... We need to figure out
caching holistically.
- We don't support any sort of time-based cache for remote URLs -- they
just exist forever. This will be a problem for URL dependencies, where
we need some way to evict and refresh them. But I've deferred it for
now.
- I think I need to redo how this is modeled in the resolver, because
right now, we don't detect a variety of invalid cases, e.g., providing
two different URLs for a dependency, asking for a URL dependency and a
_different version_ of the same dependency in the list of first-party
dependencies, etc.
- (We don't yet support VCS dependencies.)
This also allows us to get rid of `PinnedPackage` _and_ to remove some
`Result<...>` types due to needless conversions between
otherwise-identical types.