## Summary
For PEP 517 builds, the current working directory needs to be set to the
directory of the source distribution. It turns out that on Windows, if
you use a UNC path for the working directory, then relative paths are
interpreted relative to the root of the current drive
([source](https://www.fileside.app/blog/2023-03-17_windows-file-paths/#paths-relative-to-the-root-of-the-current-drive)).
So, when builds attempted to resolve relative paths, they always
errored...
This PR ensures that we remove the UNC prefix when setting the current
working directory.
Closes#1238.
## Test Plan
I tested this on my Windows machine by installing `ujson` with
`--no-binary ujson`. (I don't want to add that specific test, since it's
really slow to build.)
## Summary
This ensures that (like Cargo) we don't suffer from
https://github.com/advisories/GHSA-r5w3-xm58-jv6j, by way of checking
known hosts when fetching via `libgit2`.
The implementation is taken from Cargo itself, modified to remove all
configuration, since we don't yet support configuration for known hosts,
etc.
Closes#285.
## Summary
This PR adds a `NormalizedDisplay` trait that we can use for user-facing
paths, to strip the UNC prefix on Windows.
On other platforms, the implementation is a no-op (vs. `Display`).
I audited all usages of `.display()`, and changed any that were
user-facing, either via `println!` or `eprintln!`, or by way of being
included in error messages. I did _not_ change uses that were only in
tests or only went to tracing.
Closes https://github.com/astral-sh/puffin/issues/1084.
This crate started off as generic caching utilities, but we started
adding a lot of Puffin-specific stuff (like the cache buckets
abstraction that knows about Git vs. direct URL vs. indexes and so on).
This PR moves the generic stuff into a new `cache-key` crate.
## Summary
This PR modifies the Git wheel cache to: (1) use a shorter version of
the SHA, to save space; and (2) include the package name, for
consistency with all other buckets.
I considered removing the URL hash entirely, and _just_ using the SHA,
which would be even _more_ consistent with other buckets. But if we
remove the URL, then we won't have separate directories for
subdirectories (which are part of the URL).
Before:
<img width="1035" alt="Screen Shot 2023-12-07 at 7 23 42 PM"
src="86afce67-682f-464f-9ba1-0b60d5b7f19f">
After:
<img width="1232" alt="Screen Shot 2023-12-07 at 8 09 23 PM"
src="eda42a19-974f-47fe-8c83-54a602ddfd2d">
## Summary
We need to pass in the distribution with the "precise" URL to avoid
refetching.
## Test Plan
Ran `cargo run -p puffin-cli -- pip-compile requirements.in --verbose`
with `flask @ git+https://github.com/pallets/flask.git` and verified
that we only checked out Flask once.
## Summary and motivation
For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.
`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.
Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.
In a follow-up PR, I'll also clear built wheels when they've become
stale.
Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.
Closes#473
## Source dist cache structure
Entries by supported sources:
* `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename
(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`
Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│ └── 5c56bc1c58c34c11
│ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│ └── metadata.json
├── pypi
│ └── django-allauth-0.51.0.tar.gz
│ └── metadata.json
└── url
└── 6781bd6440ae72c2
└── werkzeug-3.0.1.tar.gz
└── metadata.json
```
The inside of a `metadata.json`:
```json
{
"data": {
"django_allauth-0.51.0-py3-none-any.whl": {
"metadata-version": "2.1",
"name": "django-allauth",
"version": "0.51.0",
...
}
}
}
```
## Summary
This PR refactors our `RemoteDistribution` type such that it now follows
a clear hierarchy that matches the actual variants, and encodes the
differences between source and built distributions:
```rust
pub enum Distribution {
Built(BuiltDistribution),
Source(SourceDistribution),
}
pub enum BuiltDistribution {
Registry(RegistryBuiltDistribution),
DirectUrl(DirectUrlBuiltDistribution),
}
pub enum SourceDistribution {
Registry(RegistrySourceDistribution),
DirectUrl(DirectUrlSourceDistribution),
Git(GitSourceDistribution),
}
/// A built distribution (wheel) that exists in a registry, like `PyPI`.
pub struct RegistryBuiltDistribution {
pub name: PackageName,
pub version: Version,
pub file: File,
}
/// A built distribution (wheel) that exists at an arbitrary URL.
pub struct DirectUrlBuiltDistribution {
pub name: PackageName,
pub url: Url,
}
/// A source distribution that exists in a registry, like `PyPI`.
pub struct RegistrySourceDistribution {
pub name: PackageName,
pub version: Version,
pub file: File,
}
/// A source distribution that exists at an arbitrary URL.
pub struct DirectUrlSourceDistribution {
pub name: PackageName,
pub url: Url,
}
/// A source distribution that exists in a Git repository.
pub struct GitSourceDistribution {
pub name: PackageName,
pub url: Url,
}
```
Most of the PR just stems downstream from this change. There are no
behavioral changes, so I'm largely relying on lint, tests, and the
compiler for correctness.
It looks like Cargo, notice the bold green lines at the top (which
appear during the resolution, to indicate Git fetches and source
distribution builds):
<img width="868" alt="Screen Shot 2023-11-06 at 11 28 47 PM"
src="9647a480-7be7-41e9-b1d3-69faefd054ae">
<img width="868" alt="Screen Shot 2023-11-06 at 11 28 51 PM"
src="6bc491aa-5b51-4b37-9ee1-257f1bc1c049">
Closes https://github.com/astral-sh/puffin/issues/287 although we can do
a lot more here.
We now write the `direct_url.json` when installing, and _skip_
installing if we find a package installed via the direct URL that the
user is requesting.
A lot of TODOs, especially around cleaning up the `Source` abstraction
and its relationship to `DirectUrl`. I'm gonna keep working on these
today, but this works and makes the requirements clear.
Closes#332.
This PR adds a mechanism by which we can ensure that we _always_ try to
refresh Git dependencies when resolving; further, we now write the fully
resolved SHA to the "lockfile". However, nothing in the code _assumes_
we do this, so the installer will remain agnostic to this behavior.
The specific approach taken here is minimally invasive. Specifically,
when we try to fetch a source distribution, we check if it's a Git
dependency; if it is, we fetch, and return the exact SHA, which we then
map back to a new URL. In the resolver, we keep track of URL
"redirects", and then we use the redirect (1) for the actual source
distribution building, and (2) when writing back out to the lockfile. As
such, none of the types outside of the resolver change at all, since
we're just mapping `RemoteDistribution` to `RemoteDistribution`, but
swapping out the internal URLs.
There are some inefficiencies here since, e.g., we do the Git fetch,
send back the "precise" URL, then a moment later, do a Git checkout of
that URL (which will be _mostly_ a no-op -- since we have a full SHA, we
don't have to fetch anything, but we _do_ check back on disk to see if
the SHA is still checked out). A more efficient approach would be to
return the path to the checked-out revision when we do this conversion
to a "precise" URL, since we'd then only interact with the Git repo
exactly once. But this runs the risk that the checked-out SHA changes
between the time we make the "precise" URL and the time we build the
source distribution.
Closes#286.
## Summary
This PR adds support for Git dependencies, like:
```
flask @ git+https://github.com/pallets/flask.git
```
Right now, they're only supported in the resolver (and not the
installer), since the installer doesn't yet support source distributions
at all.
The general approach here is based on Cargo's Git implementation.
Specifically, I adapted Cargo's
[`git`](23eb492cf9/src/cargo/sources/git/mod.rs)
module to perform the cloning, which is based on `libgit2`.
As compared to Cargo's implementation, I made the following changes:
- Removed any unnecessary code.
- Fixed any Clippy errors for our stricter ruleset.
- Removed the dependency on `curl`, in favor of `reqwest` which we use
elsewhere.
- Removed the ability to use `gix`. Cargo allows the use of `gix` as an
experimental flag, but it only supports a small subset of the
operations. When Cargo fully adopts `gix`, we should plan to do the
same.
- Removed Cargo's host key checking. We need to re-add this! I'll do it
shortly.
- Removed Cargo's progress bars. We should re-add this too, but we use
`indicatif` and Cargo had their own thing.
There are a few follow-ups to consider:
- Adding support in the installer.
- When we lock, we should write out the Git URL that includes the exact
SHA. This lets us cache in perpetuity and avoids dependencies changing
without re-locking.
- When we resolve, we should _always_ try to refresh Git dependencies.
(Right now, we skip if the wheel was already built.)
I'll work on the latter two in follow-up PRs.
Closes#202.