Source dist metadata refactor (#468)

## Summary and motivation

For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.

`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.

Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.

In a follow-up PR, I'll also clear built wheels when they've become
stale.

Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.

Closes #473

## Source dist cache structure 

Entries by supported sources:
 * `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename

(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`

Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│   └── 5c56bc1c58c34c11
│       └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│           └── metadata.json
├── pypi
│   └── django-allauth-0.51.0.tar.gz
│       └── metadata.json
└── url
    └── 6781bd6440ae72c2
        └── werkzeug-3.0.1.tar.gz
            └── metadata.json
```

The inside of a `metadata.json`:
```json
{
  "data": {
    "django_allauth-0.51.0-py3-none-any.whl": {
      "metadata-version": "2.1",
      "name": "django-allauth",
      "version": "0.51.0",
      ...
    }
  }
}
```
This commit is contained in:
konsti 2023-11-24 18:47:58 +01:00 committed by GitHub
parent 8d247fe95b
commit d54e780843
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
49 changed files with 1712 additions and 1142 deletions

View file

@ -5,8 +5,9 @@ use pubgrub::report::Reporter;
use thiserror::Error;
use url::Url;
use distribution_types::{BuiltDist, Dist, SourceDist};
use distribution_types::{BuiltDist, SourceDist};
use pep508_rs::Requirement;
use puffin_distribution::DistributionDatabaseError;
use puffin_normalize::PackageName;
use crate::pubgrub::{PubGrubPackage, PubGrubVersion};
@ -53,37 +54,11 @@ pub enum ResolveError {
#[error(transparent)]
DistributionType(#[from] distribution_types::Error),
#[error("Failed to fetch wheel metadata from: {filename}")]
RegistryBuiltDist {
filename: String,
// TODO(konstin): Gives this a proper error type
#[source]
err: anyhow::Error,
},
#[error("Failed to download {0}")]
Fetch(Box<BuiltDist>, #[source] DistributionDatabaseError),
#[error("Failed to fetch wheel metadata from: {url}")]
UrlBuiltDist {
url: Url,
// TODO(konstin): Gives this a proper error type
#[source]
err: anyhow::Error,
},
#[error("Failed to build distribution: {filename}")]
RegistrySourceDist {
filename: String,
// TODO(konstin): Gives this a proper error type
#[source]
err: anyhow::Error,
},
#[error("Failed to build distribution from URL: {url}")]
UrlSourceDist {
url: Url,
// TODO(konstin): Gives this a proper error type
#[source]
err: anyhow::Error,
},
#[error("Failed to download and build {0}")]
FetchAndBuild(Box<SourceDist>, #[source] DistributionDatabaseError),
}
impl<T> From<futures::channel::mpsc::TrySendError<T>> for ResolveError {
@ -116,38 +91,3 @@ impl From<pubgrub::error::PubGrubError<PubGrubPackage, Range<PubGrubVersion>>> f
ResolveError::PubGrub(RichPubGrubError { source: value })
}
}
impl ResolveError {
pub fn from_dist(dist: Dist, err: anyhow::Error) -> Self {
match dist {
Dist::Built(BuiltDist::Registry(wheel)) => Self::RegistryBuiltDist {
filename: wheel.file.filename.clone(),
err,
},
Dist::Built(BuiltDist::DirectUrl(wheel)) => Self::UrlBuiltDist {
url: wheel.url.clone(),
err,
},
Dist::Built(BuiltDist::Path(wheel)) => Self::UrlBuiltDist {
url: wheel.url.clone(),
err,
},
Dist::Source(SourceDist::Registry(sdist)) => Self::RegistrySourceDist {
filename: sdist.file.filename.clone(),
err,
},
Dist::Source(SourceDist::DirectUrl(sdist)) => Self::UrlSourceDist {
url: sdist.url.clone(),
err,
},
Dist::Source(SourceDist::Git(sdist)) => Self::UrlSourceDist {
url: sdist.url.clone(),
err,
},
Dist::Source(SourceDist::Path(sdist)) => Self::UrlBuiltDist {
url: sdist.url.clone(),
err,
},
}
}
}