mirror of
https://github.com/astral-sh/uv.git
synced 2025-11-20 03:49:54 +00:00
Add hash-checking support to install and sync (#2945)
## Summary This PR adds support for hash-checking mode in `pip install` and `pip sync`. It's a large change, both in terms of the size of the diff and the modifications in behavior, but it's also one that's hard to merge in pieces (at least, with any test coverage) since it needs to work end-to-end to be useful and testable. Here are some of the most important highlights: - We store hashes in the cache. Where we previously stored pointers to unzipped wheels in the `archives` directory, we now store pointers with a set of known hashes. So every pointer to an unzipped wheel also includes its known hashes. - By default, we don't compute any hashes. If the user runs with `--require-hashes`, and the cache doesn't contain those hashes, we invalidate the cache, redownload the wheel, and compute the hashes as we go. For users that don't run with `--require-hashes`, there will be no change in performance. For users that _do_, the only change will be if they don't run with `--generate-hashes` -- then they may see some repeated work between resolution and installation, if they use `pip compile` then `pip sync`. - Many of the distribution types now include a `hashes` field, like `CachedDist` and `LocalWheel`. - Our behavior is similar to pip, in that we enforce hashes when pulling any remote distributions, and when pulling from our own cache. Like pip, though, we _don't_ enforce hashes if a distribution is _already_ installed. - Hash validity is enforced in a few different places: 1. During resolution, we enforce hash validity based on the hashes reported by the registry. If we need to access a source distribution, though, we then enforce hash validity at that point too, prior to running any untrusted code. (This is enforced in the distribution database.) 2. In the install plan, we _only_ add cached distributions that have matching hashes. If a cached distribution is missing any hashes, or the hashes don't match, we don't return them from the install plan. 3. In the downloader, we _only_ return distributions with matching hashes. 4. The final combination of "things we install" are: (1) the wheels from the cache, and (2) the downloaded wheels. So this ensures that we never install any mismatching distributions. - Like pip, if `--require-hashes` is provided, we require that _all_ distributions are pinned with either `==` or a direct URL. We also require that _all_ distributions have hashes. There are a few notable TODOs: - We don't support hash-checking mode for unnamed requirements. These should be _somewhat_ rare, though? Since `pip compile` never outputs unnamed requirements. I can fix this, it's just some additional work. - We don't automatically enable `--require-hashes` with a hash exists in the requirements file. We require `--require-hashes`. Closes #474. ## Test Plan I'd like to add some tests for registries that report incorrect hashes, but otherwise: `cargo test`
This commit is contained in:
parent
715a309dd5
commit
1f3b5bb093
56 changed files with 3186 additions and 333 deletions
36
crates/uv-distribution/src/archive.rs
Normal file
36
crates/uv-distribution/src/archive.rs
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
use std::path::PathBuf;
|
||||
|
||||
use distribution_types::Hashed;
|
||||
use pypi_types::HashDigest;
|
||||
|
||||
/// An archive (unzipped wheel) that exists in the local cache.
|
||||
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
|
||||
pub struct Archive {
|
||||
/// The path to the archive entry in the wheel's archive bucket.
|
||||
pub path: PathBuf,
|
||||
/// The computed hashes of the archive.
|
||||
pub hashes: Vec<HashDigest>,
|
||||
}
|
||||
|
||||
impl Archive {
|
||||
/// Create a new [`Archive`] with the given path and hashes.
|
||||
pub(crate) fn new(path: PathBuf, hashes: Vec<HashDigest>) -> Self {
|
||||
Self { path, hashes }
|
||||
}
|
||||
|
||||
/// Return the path to the archive entry in the wheel's archive bucket.
|
||||
pub fn path(&self) -> &PathBuf {
|
||||
&self.path
|
||||
}
|
||||
|
||||
/// Return the computed hashes of the archive.
|
||||
pub fn hashes(&self) -> &[HashDigest] {
|
||||
&self.hashes
|
||||
}
|
||||
}
|
||||
|
||||
impl Hashed for Archive {
|
||||
fn hashes(&self) -> &[HashDigest] {
|
||||
&self.hashes
|
||||
}
|
||||
}
|
||||
|
|
@ -11,16 +11,19 @@ use url::Url;
|
|||
|
||||
use distribution_filename::WheelFilename;
|
||||
use distribution_types::{
|
||||
BuildableSource, BuiltDist, Dist, FileLocation, IndexLocations, LocalEditable, Name, SourceDist,
|
||||
BuildableSource, BuiltDist, Dist, FileLocation, Hashed, IndexLocations, LocalEditable, Name,
|
||||
SourceDist,
|
||||
};
|
||||
use platform_tags::Tags;
|
||||
use pypi_types::Metadata23;
|
||||
use pypi_types::{HashDigest, Metadata23};
|
||||
use uv_cache::{ArchiveTimestamp, CacheBucket, CacheEntry, CachedByTimestamp, WheelCache};
|
||||
use uv_client::{CacheControl, CachedClientError, Connectivity, RegistryClient};
|
||||
use uv_configuration::{NoBinary, NoBuild};
|
||||
use uv_extract::hash::Hasher;
|
||||
use uv_fs::write_atomic;
|
||||
use uv_types::BuildContext;
|
||||
|
||||
use crate::archive::Archive;
|
||||
use crate::locks::Locks;
|
||||
use crate::{Error, LocalWheel, Reporter, SourceDistributionBuilder};
|
||||
|
||||
|
|
@ -79,28 +82,38 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
|
||||
/// Either fetch the wheel or fetch and build the source distribution
|
||||
///
|
||||
/// If `no_remote_wheel` is set, the wheel will be built from a source distribution
|
||||
/// even if compatible pre-built wheels are available.
|
||||
/// Returns a wheel that's compliant with the given platform tags.
|
||||
///
|
||||
/// While hashes will be generated in some cases, hash-checking is only enforced for source
|
||||
/// distributions, and should be enforced by the caller for wheels.
|
||||
#[instrument(skip_all, fields(%dist))]
|
||||
pub async fn get_or_build_wheel(&self, dist: &Dist, tags: &Tags) -> Result<LocalWheel, Error> {
|
||||
pub async fn get_or_build_wheel(
|
||||
&self,
|
||||
dist: &Dist,
|
||||
tags: &Tags,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<LocalWheel, Error> {
|
||||
match dist {
|
||||
Dist::Built(built) => self.get_wheel(built).await,
|
||||
Dist::Source(source) => self.build_wheel(source, tags).await,
|
||||
Dist::Built(built) => self.get_wheel(built, hashes).await,
|
||||
Dist::Source(source) => self.build_wheel(source, tags, hashes).await,
|
||||
}
|
||||
}
|
||||
|
||||
/// Either fetch the only wheel metadata (directly from the index or with range requests) or
|
||||
/// fetch and build the source distribution.
|
||||
///
|
||||
/// Returns the [`Metadata23`], along with a "precise" URL for the source distribution, if
|
||||
/// possible. For example, given a Git dependency with a reference to a branch or tag, return a
|
||||
/// URL with a precise reference to the current commit of that branch or tag.
|
||||
/// While hashes will be generated in some cases, hash-checking is only enforced for source
|
||||
/// distributions, and should be enforced by the caller for wheels.
|
||||
#[instrument(skip_all, fields(%dist))]
|
||||
pub async fn get_or_build_wheel_metadata(&self, dist: &Dist) -> Result<Metadata23, Error> {
|
||||
pub async fn get_or_build_wheel_metadata(
|
||||
&self,
|
||||
dist: &Dist,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Metadata23, Error> {
|
||||
match dist {
|
||||
Dist::Built(built) => self.get_wheel_metadata(built).await,
|
||||
Dist::Built(built) => self.get_wheel_metadata(built, hashes).await,
|
||||
Dist::Source(source) => {
|
||||
self.build_wheel_metadata(&BuildableSource::Dist(source))
|
||||
self.build_wheel_metadata(&BuildableSource::Dist(source), hashes)
|
||||
.await
|
||||
}
|
||||
}
|
||||
|
|
@ -118,7 +131,7 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
.build_editable(editable, editable_wheel_dir)
|
||||
.await?;
|
||||
|
||||
// Unzip.
|
||||
// Unzip into the editable wheel directory.
|
||||
let path = editable_wheel_dir.join(&disk_filename);
|
||||
let target = editable_wheel_dir.join(cache_key::digest(&editable.path));
|
||||
let archive = self.unzip_wheel(&path, &target).await?;
|
||||
|
|
@ -126,13 +139,21 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
dist,
|
||||
filename,
|
||||
archive,
|
||||
hashes: vec![],
|
||||
};
|
||||
|
||||
Ok((wheel, metadata))
|
||||
}
|
||||
|
||||
/// Fetch a wheel from the cache or download it from the index.
|
||||
async fn get_wheel(&self, dist: &BuiltDist) -> Result<LocalWheel, Error> {
|
||||
///
|
||||
/// While hashes will be generated in some cases, hash-checking is _not_ enforced and should
|
||||
/// instead be enforced by the caller.
|
||||
async fn get_wheel(
|
||||
&self,
|
||||
dist: &BuiltDist,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<LocalWheel, Error> {
|
||||
let no_binary = match self.build_context.no_binary() {
|
||||
NoBinary::None => false,
|
||||
NoBinary::All => true,
|
||||
|
|
@ -157,8 +178,9 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
WheelCache::Index(&wheel.index).wheel_dir(wheel.name().as_ref()),
|
||||
wheel.filename.stem(),
|
||||
);
|
||||
|
||||
return self
|
||||
.load_wheel(path, &wheel.filename, cache_entry, dist)
|
||||
.load_wheel(path, &wheel.filename, cache_entry, dist, hashes)
|
||||
.await;
|
||||
}
|
||||
};
|
||||
|
|
@ -172,12 +194,13 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
|
||||
// Download and unzip.
|
||||
match self
|
||||
.stream_wheel(url.clone(), &wheel.filename, &wheel_entry, dist)
|
||||
.stream_wheel(url.clone(), &wheel.filename, &wheel_entry, dist, hashes)
|
||||
.await
|
||||
{
|
||||
Ok(archive) => Ok(LocalWheel {
|
||||
dist: Dist::Built(dist.clone()),
|
||||
archive,
|
||||
archive: archive.path,
|
||||
hashes: archive.hashes,
|
||||
filename: wheel.filename.clone(),
|
||||
}),
|
||||
Err(Error::Extract(err)) if err.is_http_streaming_unsupported() => {
|
||||
|
|
@ -188,11 +211,12 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
// If the request failed because streaming is unsupported, download the
|
||||
// wheel directly.
|
||||
let archive = self
|
||||
.download_wheel(url, &wheel.filename, &wheel_entry, dist)
|
||||
.download_wheel(url, &wheel.filename, &wheel_entry, dist, hashes)
|
||||
.await?;
|
||||
Ok(LocalWheel {
|
||||
dist: Dist::Built(dist.clone()),
|
||||
archive,
|
||||
archive: archive.path,
|
||||
hashes: archive.hashes,
|
||||
filename: wheel.filename.clone(),
|
||||
})
|
||||
}
|
||||
|
|
@ -210,12 +234,19 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
|
||||
// Download and unzip.
|
||||
match self
|
||||
.stream_wheel(wheel.url.raw().clone(), &wheel.filename, &wheel_entry, dist)
|
||||
.stream_wheel(
|
||||
wheel.url.raw().clone(),
|
||||
&wheel.filename,
|
||||
&wheel_entry,
|
||||
dist,
|
||||
hashes,
|
||||
)
|
||||
.await
|
||||
{
|
||||
Ok(archive) => Ok(LocalWheel {
|
||||
dist: Dist::Built(dist.clone()),
|
||||
archive,
|
||||
archive: archive.path,
|
||||
hashes: archive.hashes,
|
||||
filename: wheel.filename.clone(),
|
||||
}),
|
||||
Err(Error::Client(err)) if err.is_http_streaming_unsupported() => {
|
||||
|
|
@ -231,11 +262,13 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
&wheel.filename,
|
||||
&wheel_entry,
|
||||
dist,
|
||||
hashes,
|
||||
)
|
||||
.await?;
|
||||
Ok(LocalWheel {
|
||||
dist: Dist::Built(dist.clone()),
|
||||
archive,
|
||||
archive: archive.path,
|
||||
hashes: archive.hashes,
|
||||
filename: wheel.filename.clone(),
|
||||
})
|
||||
}
|
||||
|
|
@ -249,7 +282,8 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
WheelCache::Url(&wheel.url).wheel_dir(wheel.name().as_ref()),
|
||||
wheel.filename.stem(),
|
||||
);
|
||||
self.load_wheel(&wheel.path, &wheel.filename, cache_entry, dist)
|
||||
|
||||
self.load_wheel(&wheel.path, &wheel.filename, cache_entry, dist, hashes)
|
||||
.await
|
||||
}
|
||||
}
|
||||
|
|
@ -257,24 +291,33 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
|
||||
/// Convert a source distribution into a wheel, fetching it from the cache or building it if
|
||||
/// necessary.
|
||||
async fn build_wheel(&self, dist: &SourceDist, tags: &Tags) -> Result<LocalWheel, Error> {
|
||||
///
|
||||
/// The returned wheel is guaranteed to come from a distribution with a matching hash, and
|
||||
/// no build processes will be executed for distributions with mismatched hashes.
|
||||
async fn build_wheel(
|
||||
&self,
|
||||
dist: &SourceDist,
|
||||
tags: &Tags,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<LocalWheel, Error> {
|
||||
let lock = self.locks.acquire(&Dist::Source(dist.clone())).await;
|
||||
let _guard = lock.lock().await;
|
||||
|
||||
let built_wheel = self
|
||||
.builder
|
||||
.download_and_build(&BuildableSource::Dist(dist), tags)
|
||||
.download_and_build(&BuildableSource::Dist(dist), tags, hashes)
|
||||
.boxed()
|
||||
.await?;
|
||||
|
||||
// If the wheel was unzipped previously, respect it. Source distributions are
|
||||
// cached under a unique build ID, so unzipped directories are never stale.
|
||||
// cached under a unique revision ID, so unzipped directories are never stale.
|
||||
match built_wheel.target.canonicalize() {
|
||||
Ok(archive) => {
|
||||
return Ok(LocalWheel {
|
||||
dist: Dist::Source(dist.clone()),
|
||||
archive,
|
||||
filename: built_wheel.filename,
|
||||
hashes: built_wheel.hashes,
|
||||
});
|
||||
}
|
||||
Err(err) if err.kind() == io::ErrorKind::NotFound => {}
|
||||
|
|
@ -287,12 +330,20 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
archive: self
|
||||
.unzip_wheel(&built_wheel.path, &built_wheel.target)
|
||||
.await?,
|
||||
hashes: built_wheel.hashes,
|
||||
filename: built_wheel.filename,
|
||||
})
|
||||
}
|
||||
|
||||
/// Fetch the wheel metadata from the index, or from the cache if possible.
|
||||
pub async fn get_wheel_metadata(&self, dist: &BuiltDist) -> Result<Metadata23, Error> {
|
||||
///
|
||||
/// While hashes will be generated in some cases, hash-checking is _not_ enforced and should
|
||||
/// instead be enforced by the caller.
|
||||
pub async fn get_wheel_metadata(
|
||||
&self,
|
||||
dist: &BuiltDist,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Metadata23, Error> {
|
||||
match self.client.wheel_metadata(dist).boxed().await {
|
||||
Ok(metadata) => Ok(metadata),
|
||||
Err(err) if err.is_http_streaming_unsupported() => {
|
||||
|
|
@ -300,7 +351,7 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
|
||||
// If the request failed due to an error that could be resolved by
|
||||
// downloading the wheel directly, try that.
|
||||
let wheel = self.get_wheel(dist).await?;
|
||||
let wheel = self.get_wheel(dist, hashes).await?;
|
||||
Ok(wheel.metadata()?)
|
||||
}
|
||||
Err(err) => Err(err.into()),
|
||||
|
|
@ -308,9 +359,13 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
}
|
||||
|
||||
/// Build the wheel metadata for a source distribution, or fetch it from the cache if possible.
|
||||
///
|
||||
/// The returned metadata is guaranteed to come from a distribution with a matching hash, and
|
||||
/// no build processes will be executed for distributions with mismatched hashes.
|
||||
pub async fn build_wheel_metadata(
|
||||
&self,
|
||||
source: &BuildableSource<'_>,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Metadata23, Error> {
|
||||
let no_build = match self.build_context.no_build() {
|
||||
NoBuild::All => true,
|
||||
|
|
@ -330,7 +385,7 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
|
||||
let metadata = self
|
||||
.builder
|
||||
.download_and_build_metadata(source)
|
||||
.download_and_build_metadata(source, hashes)
|
||||
.boxed()
|
||||
.await?;
|
||||
Ok(metadata)
|
||||
|
|
@ -343,7 +398,8 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
filename: &WheelFilename,
|
||||
wheel_entry: &CacheEntry,
|
||||
dist: &BuiltDist,
|
||||
) -> Result<PathBuf, Error> {
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Archive, Error> {
|
||||
// Create an entry for the HTTP cache.
|
||||
let http_entry = wheel_entry.with_file(format!("{}.http", filename.stem()));
|
||||
|
||||
|
|
@ -354,23 +410,42 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
.map_err(|err| self.handle_response_errors(err))
|
||||
.into_async_read();
|
||||
|
||||
// Create a hasher for each hash algorithm.
|
||||
let algorithms = {
|
||||
let mut hash = hashes.iter().map(HashDigest::algorithm).collect::<Vec<_>>();
|
||||
hash.sort();
|
||||
hash.dedup();
|
||||
hash
|
||||
};
|
||||
let mut hashers = algorithms.into_iter().map(Hasher::from).collect::<Vec<_>>();
|
||||
let mut hasher = uv_extract::hash::HashReader::new(reader.compat(), &mut hashers);
|
||||
|
||||
// Download and unzip the wheel to a temporary directory.
|
||||
let temp_dir = tempfile::tempdir_in(self.build_context.cache().root())
|
||||
.map_err(Error::CacheWrite)?;
|
||||
uv_extract::stream::unzip(reader.compat(), temp_dir.path()).await?;
|
||||
uv_extract::stream::unzip(&mut hasher, temp_dir.path()).await?;
|
||||
|
||||
// If necessary, exhaust the reader to compute the hash.
|
||||
if !hashes.is_empty() {
|
||||
hasher.finish().await.map_err(Error::HashExhaustion)?;
|
||||
}
|
||||
|
||||
// Persist the temporary directory to the directory store.
|
||||
let archive = self
|
||||
let path = self
|
||||
.build_context
|
||||
.cache()
|
||||
.persist(temp_dir.into_path(), wheel_entry.path())
|
||||
.await
|
||||
.map_err(Error::CacheRead)?;
|
||||
Ok(archive)
|
||||
Ok(Archive::new(
|
||||
path,
|
||||
hashers.into_iter().map(HashDigest::from).collect(),
|
||||
))
|
||||
}
|
||||
.instrument(info_span!("wheel", wheel = %dist))
|
||||
};
|
||||
|
||||
// Fetch the archive from the cache, or download it if necessary.
|
||||
let req = self.request(url.clone())?;
|
||||
let cache_control = match self.client.connectivity() {
|
||||
Connectivity::Online => CacheControl::from(
|
||||
|
|
@ -391,6 +466,20 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
CachedClientError::Client(err) => Error::Client(err),
|
||||
})?;
|
||||
|
||||
// If the archive is missing the required hashes, force a refresh.
|
||||
let archive = if archive.has_digests(hashes) {
|
||||
archive
|
||||
} else {
|
||||
self.client
|
||||
.cached_client()
|
||||
.skip_cache(self.request(url)?, &http_entry, download)
|
||||
.await
|
||||
.map_err(|err| match err {
|
||||
CachedClientError::Callback(err) => err,
|
||||
CachedClientError::Client(err) => Error::Client(err),
|
||||
})?
|
||||
};
|
||||
|
||||
Ok(archive)
|
||||
}
|
||||
|
||||
|
|
@ -401,7 +490,8 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
filename: &WheelFilename,
|
||||
wheel_entry: &CacheEntry,
|
||||
dist: &BuiltDist,
|
||||
) -> Result<PathBuf, Error> {
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Archive, Error> {
|
||||
// Create an entry for the HTTP cache.
|
||||
let http_entry = wheel_entry.with_file(format!("{}.http", filename.stem()));
|
||||
|
||||
|
|
@ -427,16 +517,48 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
file.seek(io::SeekFrom::Start(0))
|
||||
.await
|
||||
.map_err(Error::CacheWrite)?;
|
||||
uv_extract::seek::unzip(file, temp_dir.path()).await?;
|
||||
|
||||
// If no hashes are required, parallelize the unzip operation.
|
||||
let hashes = if hashes.is_empty() {
|
||||
let file = file.into_std().await;
|
||||
tokio::task::spawn_blocking({
|
||||
let target = temp_dir.path().to_owned();
|
||||
move || -> Result<(), uv_extract::Error> {
|
||||
// Unzip the wheel into a temporary directory.
|
||||
uv_extract::unzip(file, &target)?;
|
||||
Ok(())
|
||||
}
|
||||
})
|
||||
.await??;
|
||||
|
||||
vec![]
|
||||
} else {
|
||||
// Create a hasher for each hash algorithm.
|
||||
let algorithms = {
|
||||
let mut hash = hashes.iter().map(HashDigest::algorithm).collect::<Vec<_>>();
|
||||
hash.sort();
|
||||
hash.dedup();
|
||||
hash
|
||||
};
|
||||
let mut hashers = algorithms.into_iter().map(Hasher::from).collect::<Vec<_>>();
|
||||
let mut hasher = uv_extract::hash::HashReader::new(file, &mut hashers);
|
||||
uv_extract::stream::unzip(&mut hasher, temp_dir.path()).await?;
|
||||
|
||||
// If necessary, exhaust the reader to compute the hash.
|
||||
hasher.finish().await.map_err(Error::HashExhaustion)?;
|
||||
|
||||
hashers.into_iter().map(HashDigest::from).collect()
|
||||
};
|
||||
|
||||
// Persist the temporary directory to the directory store.
|
||||
let archive = self
|
||||
let path = self
|
||||
.build_context
|
||||
.cache()
|
||||
.persist(temp_dir.into_path(), wheel_entry.path())
|
||||
.await
|
||||
.map_err(Error::CacheRead)?;
|
||||
Ok(archive)
|
||||
|
||||
Ok(Archive::new(path, hashes))
|
||||
}
|
||||
.instrument(info_span!("wheel", wheel = %dist))
|
||||
};
|
||||
|
|
@ -451,7 +573,6 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
),
|
||||
Connectivity::Offline => CacheControl::AllowStale,
|
||||
};
|
||||
|
||||
let archive = self
|
||||
.client
|
||||
.cached_client()
|
||||
|
|
@ -462,6 +583,20 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
CachedClientError::Client(err) => Error::Client(err),
|
||||
})?;
|
||||
|
||||
// If the archive is missing the required hashes, force a refresh.
|
||||
let archive = if archive.has_digests(hashes) {
|
||||
archive
|
||||
} else {
|
||||
self.client
|
||||
.cached_client()
|
||||
.skip_cache(self.request(url)?, &http_entry, download)
|
||||
.await
|
||||
.map_err(|err| match err {
|
||||
CachedClientError::Callback(err) => err,
|
||||
CachedClientError::Client(err) => Error::Client(err),
|
||||
})?
|
||||
};
|
||||
|
||||
Ok(archive)
|
||||
}
|
||||
|
||||
|
|
@ -472,6 +607,7 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
filename: &WheelFilename,
|
||||
wheel_entry: CacheEntry,
|
||||
dist: &BuiltDist,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<LocalWheel, Error> {
|
||||
// Determine the last-modified time of the wheel.
|
||||
let modified = ArchiveTimestamp::from_file(path).map_err(Error::CacheRead)?;
|
||||
|
|
@ -481,20 +617,66 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
let archive = read_timestamped_archive(&archive_entry, modified)?;
|
||||
|
||||
// If the file is already unzipped, and the cache is up-to-date, return it.
|
||||
if let Some(archive) = archive {
|
||||
if let Some(archive) = archive.filter(|archive| archive.has_digests(hashes)) {
|
||||
Ok(LocalWheel {
|
||||
dist: Dist::Built(dist.clone()),
|
||||
archive,
|
||||
archive: archive.path,
|
||||
hashes: archive.hashes,
|
||||
filename: filename.clone(),
|
||||
})
|
||||
} else {
|
||||
} else if hashes.is_empty() {
|
||||
// Otherwise, unzip the wheel.
|
||||
let archive = self.unzip_wheel(path, wheel_entry.path()).await?;
|
||||
let archive = Archive::new(self.unzip_wheel(path, wheel_entry.path()).await?, vec![]);
|
||||
write_timestamped_archive(&archive_entry, archive.clone(), modified).await?;
|
||||
|
||||
Ok(LocalWheel {
|
||||
dist: Dist::Built(dist.clone()),
|
||||
archive,
|
||||
archive: archive.path,
|
||||
hashes: archive.hashes,
|
||||
filename: filename.clone(),
|
||||
})
|
||||
} else {
|
||||
// If necessary, compute the hashes of the wheel.
|
||||
let file = fs_err::tokio::File::open(path)
|
||||
.await
|
||||
.map_err(Error::CacheRead)?;
|
||||
let temp_dir = tempfile::tempdir_in(self.build_context.cache().root())
|
||||
.map_err(Error::CacheWrite)?;
|
||||
|
||||
// Create a hasher for each hash algorithm.
|
||||
let algorithms = {
|
||||
let mut hash = hashes.iter().map(HashDigest::algorithm).collect::<Vec<_>>();
|
||||
hash.sort();
|
||||
hash.dedup();
|
||||
hash
|
||||
};
|
||||
let mut hashers = algorithms.into_iter().map(Hasher::from).collect::<Vec<_>>();
|
||||
let mut hasher = uv_extract::hash::HashReader::new(file, &mut hashers);
|
||||
|
||||
// Unzip the wheel to a temporary directory.
|
||||
uv_extract::stream::unzip(&mut hasher, temp_dir.path()).await?;
|
||||
|
||||
// Exhaust the reader to compute the hash.
|
||||
hasher.finish().await.map_err(Error::HashExhaustion)?;
|
||||
|
||||
// Persist the temporary directory to the directory store.
|
||||
let archive = self
|
||||
.build_context
|
||||
.cache()
|
||||
.persist(temp_dir.into_path(), wheel_entry.path())
|
||||
.await
|
||||
.map_err(Error::CacheWrite)?;
|
||||
|
||||
let hashes = hashers.into_iter().map(HashDigest::from).collect();
|
||||
|
||||
// Write the archive pointer to the cache.
|
||||
let archive = Archive::new(archive, hashes);
|
||||
write_timestamped_archive(&archive_entry, archive.clone(), modified).await?;
|
||||
|
||||
Ok(LocalWheel {
|
||||
dist: Dist::Built(dist.clone()),
|
||||
archive: archive.path,
|
||||
hashes: archive.hashes,
|
||||
filename: filename.clone(),
|
||||
})
|
||||
}
|
||||
|
|
@ -549,7 +731,7 @@ impl<'a, Context: BuildContext + Send + Sync> DistributionDatabase<'a, Context>
|
|||
/// Write a timestamped archive path to the cache.
|
||||
async fn write_timestamped_archive(
|
||||
cache_entry: &CacheEntry,
|
||||
data: PathBuf,
|
||||
data: Archive,
|
||||
modified: ArchiveTimestamp,
|
||||
) -> Result<(), Error> {
|
||||
write_atomic(
|
||||
|
|
@ -564,13 +746,13 @@ async fn write_timestamped_archive(
|
|||
}
|
||||
|
||||
/// Read an existing timestamped archive path, if it exists and is up-to-date.
|
||||
fn read_timestamped_archive(
|
||||
pub fn read_timestamped_archive(
|
||||
cache_entry: &CacheEntry,
|
||||
modified: ArchiveTimestamp,
|
||||
) -> Result<Option<PathBuf>, Error> {
|
||||
) -> Result<Option<Archive>, Error> {
|
||||
match fs_err::read(cache_entry.path()) {
|
||||
Ok(cached) => {
|
||||
let cached = rmp_serde::from_slice::<CachedByTimestamp<PathBuf>>(&cached)?;
|
||||
let cached = rmp_serde::from_slice::<CachedByTimestamp<Archive>>(&cached)?;
|
||||
if cached.timestamp == modified.timestamp() {
|
||||
return Ok(Some(cached.data));
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
use std::path::{Path, PathBuf};
|
||||
|
||||
use distribution_filename::WheelFilename;
|
||||
use distribution_types::{CachedDist, Dist};
|
||||
use pypi_types::Metadata23;
|
||||
use distribution_types::{CachedDist, Dist, Hashed};
|
||||
use pypi_types::{HashDigest, Metadata23};
|
||||
|
||||
use crate::Error;
|
||||
|
||||
|
|
@ -16,6 +16,8 @@ pub struct LocalWheel {
|
|||
/// The canonicalized path in the cache directory to which the wheel was downloaded.
|
||||
/// Typically, a directory within the archive bucket.
|
||||
pub(crate) archive: PathBuf,
|
||||
/// The computed hashes of the wheel.
|
||||
pub(crate) hashes: Vec<HashDigest>,
|
||||
}
|
||||
|
||||
impl LocalWheel {
|
||||
|
|
@ -40,10 +42,16 @@ impl LocalWheel {
|
|||
}
|
||||
}
|
||||
|
||||
impl Hashed for LocalWheel {
|
||||
fn hashes(&self) -> &[HashDigest] {
|
||||
&self.hashes
|
||||
}
|
||||
}
|
||||
|
||||
/// Convert a [`LocalWheel`] into a [`CachedDist`].
|
||||
impl From<LocalWheel> for CachedDist {
|
||||
fn from(wheel: LocalWheel) -> CachedDist {
|
||||
CachedDist::from_remote(wheel.dist, wheel.filename, wheel.archive)
|
||||
CachedDist::from_remote(wheel.dist, wheel.filename, wheel.hashes, wheel.archive)
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@ use zip::result::ZipError;
|
|||
|
||||
use distribution_filename::WheelFilenameError;
|
||||
use pep440_rs::Version;
|
||||
use pypi_types::HashDigest;
|
||||
use uv_client::BetterReqwestError;
|
||||
use uv_normalize::PackageName;
|
||||
|
||||
|
|
@ -81,6 +82,23 @@ pub enum Error {
|
|||
/// Should not occur; only seen when another task panicked.
|
||||
#[error("The task executor is broken, did some other task panic?")]
|
||||
Join(#[from] JoinError),
|
||||
|
||||
/// An I/O error that occurs while exhausting a reader to compute a hash.
|
||||
#[error("Failed to hash distribution")]
|
||||
HashExhaustion(#[source] std::io::Error),
|
||||
|
||||
#[error("Hash mismatch for {distribution}\n\nExpected:\n{expected}\n\nComputed:\n{actual}")]
|
||||
HashMismatch {
|
||||
distribution: String,
|
||||
expected: String,
|
||||
actual: String,
|
||||
},
|
||||
|
||||
#[error("Hash-checking is not supported for local directories: {0}")]
|
||||
HashesNotSupportedSourceTree(String),
|
||||
|
||||
#[error("Hash-checking is not supported for Git repositories: {0}")]
|
||||
HashesNotSupportedGit(String),
|
||||
}
|
||||
|
||||
impl From<reqwest::Error> for Error {
|
||||
|
|
@ -99,3 +117,30 @@ impl From<reqwest_middleware::Error> for Error {
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Error {
|
||||
/// Construct a hash mismatch error.
|
||||
pub fn hash_mismatch(
|
||||
distribution: String,
|
||||
expected: &[HashDigest],
|
||||
actual: &[HashDigest],
|
||||
) -> Error {
|
||||
let expected = expected
|
||||
.iter()
|
||||
.map(|hash| format!(" {hash}"))
|
||||
.collect::<Vec<_>>()
|
||||
.join("\n");
|
||||
|
||||
let actual = actual
|
||||
.iter()
|
||||
.map(|hash| format!(" {hash}"))
|
||||
.collect::<Vec<_>>()
|
||||
.join("\n");
|
||||
|
||||
Self::HashMismatch {
|
||||
distribution,
|
||||
expected,
|
||||
actual,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,7 +1,10 @@
|
|||
use distribution_types::{git_reference, DirectUrlSourceDist, GitSourceDist, PathSourceDist};
|
||||
use distribution_types::{
|
||||
git_reference, DirectUrlSourceDist, GitSourceDist, Hashed, PathSourceDist,
|
||||
};
|
||||
use platform_tags::Tags;
|
||||
use uv_cache::{ArchiveTimestamp, Cache, CacheBucket, CacheShard, WheelCache};
|
||||
use uv_fs::symlinks;
|
||||
use uv_types::RequiredHashes;
|
||||
|
||||
use crate::index::cached_wheel::CachedWheel;
|
||||
use crate::source::{read_http_revision, read_timestamped_revision, REVISION};
|
||||
|
|
@ -12,12 +15,17 @@ use crate::Error;
|
|||
pub struct BuiltWheelIndex<'a> {
|
||||
cache: &'a Cache,
|
||||
tags: &'a Tags,
|
||||
hashes: &'a RequiredHashes,
|
||||
}
|
||||
|
||||
impl<'a> BuiltWheelIndex<'a> {
|
||||
/// Initialize an index of built distributions.
|
||||
pub fn new(cache: &'a Cache, tags: &'a Tags) -> Self {
|
||||
Self { cache, tags }
|
||||
pub fn new(cache: &'a Cache, tags: &'a Tags, hashes: &'a RequiredHashes) -> Self {
|
||||
Self {
|
||||
cache,
|
||||
tags,
|
||||
hashes,
|
||||
}
|
||||
}
|
||||
|
||||
/// Return the most compatible [`CachedWheel`] for a given source distribution at a direct URL.
|
||||
|
|
@ -31,13 +39,19 @@ impl<'a> BuiltWheelIndex<'a> {
|
|||
WheelCache::Url(source_dist.url.raw()).root(),
|
||||
);
|
||||
|
||||
// Read the revision from the cache. There's no need to enforce freshness, since we
|
||||
// enforce freshness on the entries.
|
||||
// Read the revision from the cache.
|
||||
let revision_entry = cache_shard.entry(REVISION);
|
||||
let Some(revision) = read_http_revision(&revision_entry)? else {
|
||||
return Ok(None);
|
||||
};
|
||||
|
||||
// Enforce hash-checking by omitting any wheels that don't satisfy the required hashes.
|
||||
if let Some(hashes) = self.hashes.get(&source_dist.name) {
|
||||
if !revision.satisfies(hashes) {
|
||||
return Ok(None);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(self.find(&cache_shard.shard(revision.id())))
|
||||
}
|
||||
|
||||
|
|
@ -55,18 +69,29 @@ impl<'a> BuiltWheelIndex<'a> {
|
|||
return Err(Error::DirWithoutEntrypoint);
|
||||
};
|
||||
|
||||
// Read the revision from the cache. There's no need to enforce freshness, since we
|
||||
// enforce freshness on the entries.
|
||||
// Read the revision from the cache.
|
||||
let revision_entry = cache_shard.entry(REVISION);
|
||||
let Some(revision) = read_timestamped_revision(&revision_entry, modified)? else {
|
||||
return Ok(None);
|
||||
};
|
||||
|
||||
// Enforce hash-checking by omitting any wheels that don't satisfy the required hashes.
|
||||
if let Some(hashes) = self.hashes.get(&source_dist.name) {
|
||||
if !revision.satisfies(hashes) {
|
||||
return Ok(None);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(self.find(&cache_shard.shard(revision.id())))
|
||||
}
|
||||
|
||||
/// Return the most compatible [`CachedWheel`] for a given source distribution at a git URL.
|
||||
pub fn git(&self, source_dist: &GitSourceDist) -> Option<CachedWheel> {
|
||||
// Enforce hash-checking, which isn't supported for Git distributions.
|
||||
if self.hashes.get(&source_dist.name).is_some() {
|
||||
return None;
|
||||
}
|
||||
|
||||
let Ok(Some(git_sha)) = git_reference(&source_dist.url) else {
|
||||
return None;
|
||||
};
|
||||
|
|
@ -100,7 +125,7 @@ impl<'a> BuiltWheelIndex<'a> {
|
|||
|
||||
// Unzipped wheels are stored as symlinks into the archive directory.
|
||||
for subdir in symlinks(shard) {
|
||||
match CachedWheel::from_path(&subdir) {
|
||||
match CachedWheel::from_built_source(&subdir) {
|
||||
None => {}
|
||||
Some(dist_info) => {
|
||||
// Pick the wheel with the highest priority
|
||||
|
|
|
|||
|
|
@ -1,9 +1,13 @@
|
|||
use std::path::Path;
|
||||
|
||||
use distribution_filename::WheelFilename;
|
||||
use distribution_types::{CachedDirectUrlDist, CachedRegistryDist};
|
||||
use distribution_types::{CachedDirectUrlDist, CachedRegistryDist, Hashed};
|
||||
use pep508_rs::VerbatimUrl;
|
||||
use uv_cache::CacheEntry;
|
||||
use pypi_types::HashDigest;
|
||||
use uv_cache::{CacheEntry, CachedByTimestamp};
|
||||
use uv_client::DataWithCachePolicy;
|
||||
|
||||
use crate::archive::Archive;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct CachedWheel {
|
||||
|
|
@ -11,16 +15,23 @@ pub struct CachedWheel {
|
|||
pub filename: WheelFilename,
|
||||
/// The [`CacheEntry`] for the wheel.
|
||||
pub entry: CacheEntry,
|
||||
/// The [`HashDigest`]s for the wheel.
|
||||
pub hashes: Vec<HashDigest>,
|
||||
}
|
||||
|
||||
impl CachedWheel {
|
||||
/// Try to parse a distribution from a cached directory name (like `typing-extensions-4.8.0-py3-none-any`).
|
||||
pub fn from_path(path: &Path) -> Option<Self> {
|
||||
pub fn from_built_source(path: &Path) -> Option<Self> {
|
||||
let filename = path.file_name()?.to_str()?;
|
||||
let filename = WheelFilename::from_stem(filename).ok()?;
|
||||
let archive = path.canonicalize().ok()?;
|
||||
let entry = CacheEntry::from_path(archive);
|
||||
Some(Self { filename, entry })
|
||||
let hashes = Vec::new();
|
||||
Some(Self {
|
||||
filename,
|
||||
entry,
|
||||
hashes,
|
||||
})
|
||||
}
|
||||
|
||||
/// Convert a [`CachedWheel`] into a [`CachedRegistryDist`].
|
||||
|
|
@ -28,6 +39,7 @@ impl CachedWheel {
|
|||
CachedRegistryDist {
|
||||
filename: self.filename,
|
||||
path: self.entry.into_path_buf(),
|
||||
hashes: self.hashes,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -38,6 +50,56 @@ impl CachedWheel {
|
|||
url,
|
||||
path: self.entry.into_path_buf(),
|
||||
editable: false,
|
||||
hashes: self.hashes,
|
||||
}
|
||||
}
|
||||
|
||||
/// Read a cached wheel from a `.http` pointer (e.g., `anyio-4.0.0-py3-none-any.http`).
|
||||
pub fn from_http_pointer(path: &Path) -> Option<Self> {
|
||||
// Determine the wheel filename.
|
||||
let filename = path.file_name()?.to_str()?;
|
||||
let filename = WheelFilename::from_stem(filename).ok()?;
|
||||
|
||||
// Read the pointer.
|
||||
let file = fs_err::File::open(path).ok()?;
|
||||
let data = DataWithCachePolicy::from_reader(file).ok()?.data;
|
||||
let archive = rmp_serde::from_slice::<Archive>(&data).ok()?;
|
||||
|
||||
// Convert to a cached wheel.
|
||||
let entry = CacheEntry::from_path(archive.path);
|
||||
let hashes = archive.hashes;
|
||||
Some(Self {
|
||||
filename,
|
||||
entry,
|
||||
hashes,
|
||||
})
|
||||
}
|
||||
|
||||
/// Read a cached wheel from a `.rev` pointer (e.g., `anyio-4.0.0-py3-none-any.rev`).
|
||||
pub fn from_revision_pointer(path: &Path) -> Option<Self> {
|
||||
// Determine the wheel filename.
|
||||
let filename = path.file_name()?.to_str()?;
|
||||
let filename = WheelFilename::from_stem(filename).ok()?;
|
||||
|
||||
// Read the pointer.
|
||||
let cached = fs_err::read(path).ok()?;
|
||||
let archive = rmp_serde::from_slice::<CachedByTimestamp<Archive>>(&cached)
|
||||
.ok()?
|
||||
.data;
|
||||
|
||||
// Convert to a cached wheel.
|
||||
let entry = CacheEntry::from_path(archive.path);
|
||||
let hashes = archive.hashes;
|
||||
Some(Self {
|
||||
filename,
|
||||
entry,
|
||||
hashes,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl Hashed for CachedWheel {
|
||||
fn hashes(&self) -> &[HashDigest] {
|
||||
&self.hashes
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,16 +1,16 @@
|
|||
use std::collections::hash_map::Entry;
|
||||
use std::collections::BTreeMap;
|
||||
use std::path::Path;
|
||||
|
||||
use rustc_hash::FxHashMap;
|
||||
|
||||
use distribution_types::{CachedRegistryDist, FlatIndexLocation, IndexLocations, IndexUrl};
|
||||
use distribution_types::{CachedRegistryDist, FlatIndexLocation, Hashed, IndexLocations, IndexUrl};
|
||||
use pep440_rs::Version;
|
||||
use pep508_rs::VerbatimUrl;
|
||||
use platform_tags::Tags;
|
||||
use uv_cache::{Cache, CacheBucket, WheelCache};
|
||||
use uv_fs::{directories, symlinks};
|
||||
use uv_fs::{directories, files, symlinks};
|
||||
use uv_normalize::PackageName;
|
||||
use uv_types::RequiredHashes;
|
||||
|
||||
use crate::index::cached_wheel::CachedWheel;
|
||||
use crate::source::{read_http_revision, REVISION};
|
||||
|
|
@ -21,16 +21,23 @@ pub struct RegistryWheelIndex<'a> {
|
|||
cache: &'a Cache,
|
||||
tags: &'a Tags,
|
||||
index_locations: &'a IndexLocations,
|
||||
hashes: &'a RequiredHashes,
|
||||
index: FxHashMap<&'a PackageName, BTreeMap<Version, CachedRegistryDist>>,
|
||||
}
|
||||
|
||||
impl<'a> RegistryWheelIndex<'a> {
|
||||
/// Initialize an index of registry distributions.
|
||||
pub fn new(cache: &'a Cache, tags: &'a Tags, index_locations: &'a IndexLocations) -> Self {
|
||||
pub fn new(
|
||||
cache: &'a Cache,
|
||||
tags: &'a Tags,
|
||||
index_locations: &'a IndexLocations,
|
||||
hashes: &'a RequiredHashes,
|
||||
) -> Self {
|
||||
Self {
|
||||
cache,
|
||||
tags,
|
||||
index_locations,
|
||||
hashes,
|
||||
index: FxHashMap::default(),
|
||||
}
|
||||
}
|
||||
|
|
@ -65,6 +72,7 @@ impl<'a> RegistryWheelIndex<'a> {
|
|||
self.cache,
|
||||
self.tags,
|
||||
self.index_locations,
|
||||
self.hashes,
|
||||
)),
|
||||
};
|
||||
versions
|
||||
|
|
@ -76,8 +84,10 @@ impl<'a> RegistryWheelIndex<'a> {
|
|||
cache: &Cache,
|
||||
tags: &Tags,
|
||||
index_locations: &IndexLocations,
|
||||
hashes: &RequiredHashes,
|
||||
) -> BTreeMap<Version, CachedRegistryDist> {
|
||||
let mut versions = BTreeMap::new();
|
||||
let hashes = hashes.get(package).unwrap_or_default();
|
||||
|
||||
// Collect into owned `IndexUrl`
|
||||
let flat_index_urls: Vec<IndexUrl> = index_locations
|
||||
|
|
@ -100,7 +110,34 @@ impl<'a> RegistryWheelIndex<'a> {
|
|||
WheelCache::Index(index_url).wheel_dir(package.to_string()),
|
||||
);
|
||||
|
||||
Self::add_directory(&wheel_dir, tags, &mut versions);
|
||||
// For registry wheels, the cache structure is: `<index>/<package-name>/<wheel>.http`
|
||||
// or `<index>/<package-name>/<version>/<wheel>.rev`.
|
||||
for file in files(&wheel_dir) {
|
||||
if file
|
||||
.extension()
|
||||
.is_some_and(|ext| ext.eq_ignore_ascii_case("http"))
|
||||
{
|
||||
if let Some(wheel) = CachedWheel::from_http_pointer(&wheel_dir.join(&file)) {
|
||||
// Enforce hash-checking based on the built distribution.
|
||||
if wheel.satisfies(hashes) {
|
||||
Self::add_wheel(wheel, tags, &mut versions);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if file
|
||||
.extension()
|
||||
.is_some_and(|ext| ext.eq_ignore_ascii_case("rev"))
|
||||
{
|
||||
if let Some(wheel) = CachedWheel::from_revision_pointer(&wheel_dir.join(&file))
|
||||
{
|
||||
// Enforce hash-checking based on the built distribution.
|
||||
if wheel.satisfies(hashes) {
|
||||
Self::add_wheel(wheel, tags, &mut versions);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Index all the built wheels, created by downloading and building source distributions
|
||||
// from the registry.
|
||||
|
|
@ -115,7 +152,14 @@ impl<'a> RegistryWheelIndex<'a> {
|
|||
let cache_shard = cache_shard.shard(shard);
|
||||
let revision_entry = cache_shard.entry(REVISION);
|
||||
if let Ok(Some(revision)) = read_http_revision(&revision_entry) {
|
||||
Self::add_directory(cache_shard.join(revision.id()), tags, &mut versions);
|
||||
// Enforce hash-checking based on the source distribution.
|
||||
if revision.satisfies(hashes) {
|
||||
for wheel_dir in symlinks(cache_shard.join(revision.id())) {
|
||||
if let Some(wheel) = CachedWheel::from_built_source(&wheel_dir) {
|
||||
Self::add_wheel(wheel, tags, &mut versions);
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
}
|
||||
|
|
@ -123,33 +167,23 @@ impl<'a> RegistryWheelIndex<'a> {
|
|||
versions
|
||||
}
|
||||
|
||||
/// Add the wheels in a given directory to the index.
|
||||
///
|
||||
/// Each subdirectory in the given path is expected to be that of an unzipped wheel.
|
||||
fn add_directory(
|
||||
path: impl AsRef<Path>,
|
||||
/// Add the [`CachedWheel`] to the index.
|
||||
fn add_wheel(
|
||||
wheel: CachedWheel,
|
||||
tags: &Tags,
|
||||
versions: &mut BTreeMap<Version, CachedRegistryDist>,
|
||||
) {
|
||||
// Unzipped wheels are stored as symlinks into the archive directory.
|
||||
for wheel_dir in symlinks(path.as_ref()) {
|
||||
match CachedWheel::from_path(&wheel_dir) {
|
||||
None => {}
|
||||
Some(dist_info) => {
|
||||
let dist_info = dist_info.into_registry_dist();
|
||||
let dist_info = wheel.into_registry_dist();
|
||||
|
||||
// Pick the wheel with the highest priority
|
||||
let compatibility = dist_info.filename.compatibility(tags);
|
||||
if let Some(existing) = versions.get_mut(&dist_info.filename.version) {
|
||||
// Override if we have better compatibility
|
||||
if compatibility > existing.filename.compatibility(tags) {
|
||||
*existing = dist_info;
|
||||
}
|
||||
} else if compatibility.is_compatible() {
|
||||
versions.insert(dist_info.filename.version.clone(), dist_info);
|
||||
}
|
||||
}
|
||||
// Pick the wheel with the highest priority
|
||||
let compatibility = dist_info.filename.compatibility(tags);
|
||||
if let Some(existing) = versions.get_mut(&dist_info.filename.version) {
|
||||
// Override if we have better compatibility
|
||||
if compatibility > existing.filename.compatibility(tags) {
|
||||
*existing = dist_info;
|
||||
}
|
||||
} else if compatibility.is_compatible() {
|
||||
versions.insert(dist_info.filename.version.clone(), dist_info);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,4 +1,5 @@
|
|||
pub use distribution_database::DistributionDatabase;
|
||||
pub use archive::Archive;
|
||||
pub use distribution_database::{read_timestamped_archive, DistributionDatabase};
|
||||
pub use download::LocalWheel;
|
||||
pub use error::Error;
|
||||
pub use git::{is_same_reference, to_precise};
|
||||
|
|
@ -6,6 +7,7 @@ pub use index::{BuiltWheelIndex, RegistryWheelIndex};
|
|||
pub use reporter::Reporter;
|
||||
pub use source::SourceDistributionBuilder;
|
||||
|
||||
mod archive;
|
||||
mod distribution_database;
|
||||
mod download;
|
||||
mod error;
|
||||
|
|
|
|||
|
|
@ -2,19 +2,23 @@ use std::path::PathBuf;
|
|||
use std::str::FromStr;
|
||||
|
||||
use distribution_filename::WheelFilename;
|
||||
use distribution_types::Hashed;
|
||||
use platform_tags::Tags;
|
||||
use pypi_types::HashDigest;
|
||||
use uv_cache::CacheShard;
|
||||
use uv_fs::files;
|
||||
|
||||
/// The information about the wheel we either just built or got from the cache.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct BuiltWheelMetadata {
|
||||
pub(crate) struct BuiltWheelMetadata {
|
||||
/// The path to the built wheel.
|
||||
pub(crate) path: PathBuf,
|
||||
/// The expected path to the downloaded wheel's entry in the cache.
|
||||
pub(crate) target: PathBuf,
|
||||
/// The parsed filename.
|
||||
pub(crate) filename: WheelFilename,
|
||||
/// The computed hashes of the source distribution from which the wheel was built.
|
||||
pub(crate) hashes: Vec<HashDigest>,
|
||||
}
|
||||
|
||||
impl BuiltWheelMetadata {
|
||||
|
|
@ -39,6 +43,20 @@ impl BuiltWheelMetadata {
|
|||
target: cache_shard.join(filename.stem()),
|
||||
path,
|
||||
filename,
|
||||
hashes: vec![],
|
||||
})
|
||||
}
|
||||
|
||||
/// Set the computed hashes of the wheel.
|
||||
#[must_use]
|
||||
pub(crate) fn with_hashes(mut self, hashes: Vec<HashDigest>) -> Self {
|
||||
self.hashes = hashes;
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
impl Hashed for BuiltWheelMetadata {
|
||||
fn hashes(&self) -> &[HashDigest] {
|
||||
&self.hashes
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -16,12 +16,12 @@ use zip::ZipArchive;
|
|||
|
||||
use distribution_filename::WheelFilename;
|
||||
use distribution_types::{
|
||||
BuildableSource, DirectArchiveUrl, Dist, FileLocation, GitSourceUrl, LocalEditable,
|
||||
BuildableSource, DirectArchiveUrl, Dist, FileLocation, GitSourceUrl, Hashed, LocalEditable,
|
||||
PathSourceDist, PathSourceUrl, RemoteSource, SourceDist, SourceUrl,
|
||||
};
|
||||
use install_wheel_rs::metadata::read_archive_metadata;
|
||||
use platform_tags::Tags;
|
||||
use pypi_types::Metadata23;
|
||||
use pypi_types::{HashDigest, Metadata23};
|
||||
use uv_cache::{
|
||||
ArchiveTimestamp, CacheBucket, CacheEntry, CacheShard, CachedByTimestamp, Freshness, WheelCache,
|
||||
};
|
||||
|
|
@ -29,6 +29,7 @@ use uv_client::{
|
|||
CacheControl, CachedClientError, Connectivity, DataWithCachePolicy, RegistryClient,
|
||||
};
|
||||
use uv_configuration::{BuildKind, NoBuild};
|
||||
use uv_extract::hash::Hasher;
|
||||
use uv_fs::write_atomic;
|
||||
use uv_types::{BuildContext, SourceBuildTrait};
|
||||
|
||||
|
|
@ -49,9 +50,7 @@ pub struct SourceDistributionBuilder<'a, T: BuildContext> {
|
|||
}
|
||||
|
||||
/// The name of the file that contains the revision ID, encoded via `MsgPack`.
|
||||
///
|
||||
/// TODO(charlie): Update the filename whenever we bump the cache version.
|
||||
pub(crate) const REVISION: &str = "manifest.msgpack";
|
||||
pub(crate) const REVISION: &str = "revision.msgpack";
|
||||
|
||||
/// The name of the file that contains the cached distribution metadata, encoded via `MsgPack`.
|
||||
pub(crate) const METADATA: &str = "metadata.msgpack";
|
||||
|
|
@ -76,10 +75,11 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
}
|
||||
|
||||
/// Download and build a [`SourceDist`].
|
||||
pub async fn download_and_build(
|
||||
pub(super) async fn download_and_build(
|
||||
&self,
|
||||
source: &BuildableSource<'_>,
|
||||
tags: &Tags,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<BuiltWheelMetadata, Error> {
|
||||
let built_wheel_metadata = match &source {
|
||||
BuildableSource::Dist(SourceDist::Registry(dist)) => {
|
||||
|
|
@ -100,6 +100,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
path: Cow::Borrowed(path),
|
||||
},
|
||||
tags,
|
||||
hashes,
|
||||
)
|
||||
.boxed()
|
||||
.await;
|
||||
|
|
@ -115,9 +116,17 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
.join(dist.filename.version.to_string()),
|
||||
);
|
||||
|
||||
self.url(source, &dist.file.filename, &url, &cache_shard, None, tags)
|
||||
.boxed()
|
||||
.await?
|
||||
self.url(
|
||||
source,
|
||||
&dist.file.filename,
|
||||
&url,
|
||||
&cache_shard,
|
||||
None,
|
||||
tags,
|
||||
hashes,
|
||||
)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
BuildableSource::Dist(SourceDist::DirectUrl(dist)) => {
|
||||
let filename = dist.filename().expect("Distribution must have a filename");
|
||||
|
|
@ -136,22 +145,23 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
&cache_shard,
|
||||
subdirectory.as_deref(),
|
||||
tags,
|
||||
hashes,
|
||||
)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
BuildableSource::Dist(SourceDist::Git(dist)) => {
|
||||
self.git(source, &GitSourceUrl::from(dist), tags)
|
||||
self.git(source, &GitSourceUrl::from(dist), tags, hashes)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
BuildableSource::Dist(SourceDist::Path(dist)) => {
|
||||
if dist.path.is_dir() {
|
||||
self.source_tree(source, &PathSourceUrl::from(dist), tags)
|
||||
self.source_tree(source, &PathSourceUrl::from(dist), tags, hashes)
|
||||
.boxed()
|
||||
.await?
|
||||
} else {
|
||||
self.archive(source, &PathSourceUrl::from(dist), tags)
|
||||
self.archive(source, &PathSourceUrl::from(dist), tags, hashes)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
|
|
@ -176,18 +186,21 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
&cache_shard,
|
||||
subdirectory.as_deref(),
|
||||
tags,
|
||||
hashes,
|
||||
)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
BuildableSource::Url(SourceUrl::Git(resource)) => {
|
||||
self.git(source, resource, tags).boxed().await?
|
||||
self.git(source, resource, tags, hashes).boxed().await?
|
||||
}
|
||||
BuildableSource::Url(SourceUrl::Path(resource)) => {
|
||||
if resource.path.is_dir() {
|
||||
self.source_tree(source, resource, tags).boxed().await?
|
||||
self.source_tree(source, resource, tags, hashes)
|
||||
.boxed()
|
||||
.await?
|
||||
} else {
|
||||
self.archive(source, resource, tags).boxed().await?
|
||||
self.archive(source, resource, tags, hashes).boxed().await?
|
||||
}
|
||||
}
|
||||
};
|
||||
|
|
@ -198,9 +211,10 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
/// Download a [`SourceDist`] and determine its metadata. This typically involves building the
|
||||
/// source distribution into a wheel; however, some build backends support determining the
|
||||
/// metadata without building the source distribution.
|
||||
pub async fn download_and_build_metadata(
|
||||
pub(super) async fn download_and_build_metadata(
|
||||
&self,
|
||||
source: &BuildableSource<'_>,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Metadata23, Error> {
|
||||
let metadata = match &source {
|
||||
BuildableSource::Dist(SourceDist::Registry(dist)) => {
|
||||
|
|
@ -220,6 +234,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
url: &url,
|
||||
path: Cow::Borrowed(path),
|
||||
},
|
||||
hashes,
|
||||
)
|
||||
.boxed()
|
||||
.await;
|
||||
|
|
@ -234,9 +249,16 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
.join(dist.filename.version.to_string()),
|
||||
);
|
||||
|
||||
self.url_metadata(source, &dist.file.filename, &url, &cache_shard, None)
|
||||
.boxed()
|
||||
.await?
|
||||
self.url_metadata(
|
||||
source,
|
||||
&dist.file.filename,
|
||||
&url,
|
||||
&cache_shard,
|
||||
None,
|
||||
hashes,
|
||||
)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
BuildableSource::Dist(SourceDist::DirectUrl(dist)) => {
|
||||
let filename = dist.filename().expect("Distribution must have a filename");
|
||||
|
|
@ -254,22 +276,23 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
&url,
|
||||
&cache_shard,
|
||||
subdirectory.as_deref(),
|
||||
hashes,
|
||||
)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
BuildableSource::Dist(SourceDist::Git(dist)) => {
|
||||
self.git_metadata(source, &GitSourceUrl::from(dist))
|
||||
self.git_metadata(source, &GitSourceUrl::from(dist), hashes)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
BuildableSource::Dist(SourceDist::Path(dist)) => {
|
||||
if dist.path.is_dir() {
|
||||
self.source_tree_metadata(source, &PathSourceUrl::from(dist))
|
||||
self.source_tree_metadata(source, &PathSourceUrl::from(dist), hashes)
|
||||
.boxed()
|
||||
.await?
|
||||
} else {
|
||||
self.archive_metadata(source, &PathSourceUrl::from(dist))
|
||||
self.archive_metadata(source, &PathSourceUrl::from(dist), hashes)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
|
|
@ -293,18 +316,23 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
&url,
|
||||
&cache_shard,
|
||||
subdirectory.as_deref(),
|
||||
hashes,
|
||||
)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
BuildableSource::Url(SourceUrl::Git(resource)) => {
|
||||
self.git_metadata(source, resource).boxed().await?
|
||||
self.git_metadata(source, resource, hashes).boxed().await?
|
||||
}
|
||||
BuildableSource::Url(SourceUrl::Path(resource)) => {
|
||||
if resource.path.is_dir() {
|
||||
self.source_tree_metadata(source, resource).boxed().await?
|
||||
self.source_tree_metadata(source, resource, hashes)
|
||||
.boxed()
|
||||
.await?
|
||||
} else {
|
||||
self.archive_metadata(source, resource).boxed().await?
|
||||
self.archive_metadata(source, resource, hashes)
|
||||
.boxed()
|
||||
.await?
|
||||
}
|
||||
}
|
||||
};
|
||||
|
|
@ -322,19 +350,29 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
cache_shard: &CacheShard,
|
||||
subdirectory: Option<&'data Path>,
|
||||
tags: &Tags,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<BuiltWheelMetadata, Error> {
|
||||
// Fetch the revision for the source distribution.
|
||||
let revision = self
|
||||
.url_revision(source, filename, url, cache_shard)
|
||||
.url_revision(source, filename, url, cache_shard, hashes)
|
||||
.await?;
|
||||
|
||||
// Before running the build, check that the hashes match.
|
||||
if !revision.satisfies(hashes) {
|
||||
return Err(Error::hash_mismatch(
|
||||
source.to_string(),
|
||||
hashes,
|
||||
revision.hashes(),
|
||||
));
|
||||
}
|
||||
|
||||
// Scope all operations to the revision. Within the revision, there's no need to check for
|
||||
// freshness, since entries have to be fresher than the revision itself.
|
||||
let cache_shard = cache_shard.shard(revision.id());
|
||||
|
||||
// If the cache contains a compatible wheel, return it.
|
||||
if let Some(built_wheel) = BuiltWheelMetadata::find_in_cache(tags, &cache_shard) {
|
||||
return Ok(built_wheel);
|
||||
return Ok(built_wheel.with_hashes(revision.into_hashes()));
|
||||
}
|
||||
|
||||
let task = self
|
||||
|
|
@ -364,6 +402,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
path: cache_shard.join(&disk_filename),
|
||||
target: cache_shard.join(wheel_filename.stem()),
|
||||
filename: wheel_filename,
|
||||
hashes: revision.into_hashes(),
|
||||
})
|
||||
}
|
||||
|
||||
|
|
@ -379,12 +418,22 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
url: &'data Url,
|
||||
cache_shard: &CacheShard,
|
||||
subdirectory: Option<&'data Path>,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Metadata23, Error> {
|
||||
// Fetch the revision for the source distribution.
|
||||
let revision = self
|
||||
.url_revision(source, filename, url, cache_shard)
|
||||
.url_revision(source, filename, url, cache_shard, hashes)
|
||||
.await?;
|
||||
|
||||
// Before running the build, check that the hashes match.
|
||||
if !revision.satisfies(hashes) {
|
||||
return Err(Error::hash_mismatch(
|
||||
source.to_string(),
|
||||
hashes,
|
||||
revision.hashes(),
|
||||
));
|
||||
}
|
||||
|
||||
// Scope all operations to the revision. Within the revision, there's no need to check for
|
||||
// freshness, since entries have to be fresher than the revision itself.
|
||||
let cache_shard = cache_shard.shard(revision.id());
|
||||
|
|
@ -449,6 +498,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
filename: &str,
|
||||
url: &Url,
|
||||
cache_shard: &CacheShard,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Revision, Error> {
|
||||
let cache_entry = cache_shard.entry(REVISION);
|
||||
let cache_control = match self.client.connectivity() {
|
||||
|
|
@ -469,24 +519,40 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
|
||||
// Download the source distribution.
|
||||
debug!("Downloading source distribution: {source}");
|
||||
let source_dist_entry = cache_shard.shard(revision.id()).entry(filename);
|
||||
self.persist_url(response, source, filename, &source_dist_entry)
|
||||
let entry = cache_shard.shard(revision.id()).entry(filename);
|
||||
let hashes = self
|
||||
.download_archive(response, source, filename, entry.path(), hashes)
|
||||
.await?;
|
||||
|
||||
Ok(revision)
|
||||
Ok(revision.with_hashes(hashes))
|
||||
}
|
||||
.boxed()
|
||||
.instrument(info_span!("download", source_dist = %source))
|
||||
};
|
||||
let req = self.request(url.clone())?;
|
||||
self.client
|
||||
let revision = self
|
||||
.client
|
||||
.cached_client()
|
||||
.get_serde(req, &cache_entry, cache_control, download)
|
||||
.await
|
||||
.map_err(|err| match err {
|
||||
CachedClientError::Callback(err) => err,
|
||||
CachedClientError::Client(err) => Error::Client(err),
|
||||
})
|
||||
})?;
|
||||
|
||||
// If the archive is missing the required hashes, force a refresh.
|
||||
if revision.has_digests(hashes) {
|
||||
Ok(revision)
|
||||
} else {
|
||||
self.client
|
||||
.cached_client()
|
||||
.skip_cache(self.request(url.clone())?, &cache_entry, download)
|
||||
.await
|
||||
.map_err(|err| match err {
|
||||
CachedClientError::Callback(err) => err,
|
||||
CachedClientError::Client(err) => Error::Client(err),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
/// Build a source distribution from a local archive (e.g., `.tar.gz` or `.zip`).
|
||||
|
|
@ -495,6 +561,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
source: &BuildableSource<'_>,
|
||||
resource: &PathSourceUrl<'_>,
|
||||
tags: &Tags,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<BuiltWheelMetadata, Error> {
|
||||
let cache_shard = self.build_context.cache().shard(
|
||||
CacheBucket::BuiltWheels,
|
||||
|
|
@ -503,9 +570,18 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
|
||||
// Fetch the revision for the source distribution.
|
||||
let revision = self
|
||||
.archive_revision(source, resource, &cache_shard)
|
||||
.archive_revision(source, resource, &cache_shard, hashes)
|
||||
.await?;
|
||||
|
||||
// Before running the build, check that the hashes match.
|
||||
if !revision.satisfies(hashes) {
|
||||
return Err(Error::hash_mismatch(
|
||||
source.to_string(),
|
||||
hashes,
|
||||
revision.hashes(),
|
||||
));
|
||||
}
|
||||
|
||||
// Scope all operations to the revision. Within the revision, there's no need to check for
|
||||
// freshness, since entries have to be fresher than the revision itself.
|
||||
let cache_shard = cache_shard.shard(revision.id());
|
||||
|
|
@ -543,6 +619,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
path: cache_shard.join(&disk_filename),
|
||||
target: cache_shard.join(filename.stem()),
|
||||
filename,
|
||||
hashes: revision.into_hashes(),
|
||||
})
|
||||
}
|
||||
|
||||
|
|
@ -554,6 +631,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
&self,
|
||||
source: &BuildableSource<'_>,
|
||||
resource: &PathSourceUrl<'_>,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Metadata23, Error> {
|
||||
let cache_shard = self.build_context.cache().shard(
|
||||
CacheBucket::BuiltWheels,
|
||||
|
|
@ -562,9 +640,18 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
|
||||
// Fetch the revision for the source distribution.
|
||||
let revision = self
|
||||
.archive_revision(source, resource, &cache_shard)
|
||||
.archive_revision(source, resource, &cache_shard, hashes)
|
||||
.await?;
|
||||
|
||||
// Before running the build, check that the hashes match.
|
||||
if !revision.satisfies(hashes) {
|
||||
return Err(Error::hash_mismatch(
|
||||
source.to_string(),
|
||||
hashes,
|
||||
revision.hashes(),
|
||||
));
|
||||
}
|
||||
|
||||
// Scope all operations to the revision. Within the revision, there's no need to check for
|
||||
// freshness, since entries have to be fresher than the revision itself.
|
||||
let cache_shard = cache_shard.shard(revision.id());
|
||||
|
|
@ -627,6 +714,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
source: &BuildableSource<'_>,
|
||||
resource: &PathSourceUrl<'_>,
|
||||
cache_shard: &CacheShard,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Revision, Error> {
|
||||
// Determine the last-modified time of the source distribution.
|
||||
let modified = ArchiveTimestamp::from_file(&resource.path).map_err(Error::CacheRead)?;
|
||||
|
|
@ -637,7 +725,9 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
// If the revision already exists, return it. There's no need to check for freshness, since
|
||||
// we use an exact timestamp.
|
||||
if let Some(revision) = read_timestamped_revision(&revision_entry, modified)? {
|
||||
return Ok(revision);
|
||||
if revision.has_digests(hashes) {
|
||||
return Ok(revision);
|
||||
}
|
||||
}
|
||||
|
||||
// Otherwise, we need to create a new revision.
|
||||
|
|
@ -646,7 +736,10 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
// Unzip the archive to a temporary directory.
|
||||
debug!("Unpacking source distribution: {source}");
|
||||
let entry = cache_shard.shard(revision.id()).entry("source");
|
||||
self.persist_archive(&resource.path, source, &entry).await?;
|
||||
let hashes = self
|
||||
.persist_archive(&resource.path, entry.path(), hashes)
|
||||
.await?;
|
||||
let revision = revision.with_hashes(hashes);
|
||||
|
||||
// Persist the revision.
|
||||
write_atomic(
|
||||
|
|
@ -668,7 +761,13 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
source: &BuildableSource<'_>,
|
||||
resource: &PathSourceUrl<'_>,
|
||||
tags: &Tags,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<BuiltWheelMetadata, Error> {
|
||||
// Before running the build, check that the hashes match.
|
||||
if !hashes.is_empty() {
|
||||
return Err(Error::HashesNotSupportedSourceTree(source.to_string()));
|
||||
}
|
||||
|
||||
let cache_shard = self.build_context.cache().shard(
|
||||
CacheBucket::BuiltWheels,
|
||||
WheelCache::Path(resource.url).root(),
|
||||
|
|
@ -714,6 +813,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
path: cache_shard.join(&disk_filename),
|
||||
target: cache_shard.join(filename.stem()),
|
||||
filename,
|
||||
hashes: vec![],
|
||||
})
|
||||
}
|
||||
|
||||
|
|
@ -725,7 +825,13 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
&self,
|
||||
source: &BuildableSource<'_>,
|
||||
resource: &PathSourceUrl<'_>,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Metadata23, Error> {
|
||||
// Before running the build, check that the hashes match.
|
||||
if !hashes.is_empty() {
|
||||
return Err(Error::HashesNotSupportedSourceTree(source.to_string()));
|
||||
}
|
||||
|
||||
let cache_shard = self.build_context.cache().shard(
|
||||
CacheBucket::BuiltWheels,
|
||||
WheelCache::Path(resource.url).root(),
|
||||
|
|
@ -742,16 +848,9 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
|
||||
// If the cache contains compatible metadata, return it.
|
||||
let metadata_entry = cache_shard.entry(METADATA);
|
||||
if self
|
||||
.build_context
|
||||
.cache()
|
||||
.freshness(&metadata_entry, source.name())
|
||||
.is_ok_and(Freshness::is_fresh)
|
||||
{
|
||||
if let Some(metadata) = read_cached_metadata(&metadata_entry).await? {
|
||||
debug!("Using cached metadata for: {source}");
|
||||
return Ok(metadata);
|
||||
}
|
||||
if let Some(metadata) = read_cached_metadata(&metadata_entry).await? {
|
||||
debug!("Using cached metadata for: {source}");
|
||||
return Ok(metadata);
|
||||
}
|
||||
|
||||
// If the backend supports `prepare_metadata_for_build_wheel`, use it.
|
||||
|
|
@ -828,7 +927,13 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
source: &BuildableSource<'_>,
|
||||
resource: &GitSourceUrl<'_>,
|
||||
tags: &Tags,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<BuiltWheelMetadata, Error> {
|
||||
// Before running the build, check that the hashes match.
|
||||
if !hashes.is_empty() {
|
||||
return Err(Error::HashesNotSupportedGit(source.to_string()));
|
||||
}
|
||||
|
||||
// Resolve to a precise Git SHA.
|
||||
let url = if let Some(url) = resolve_precise(
|
||||
resource.url,
|
||||
|
|
@ -882,6 +987,7 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
path: cache_shard.join(&disk_filename),
|
||||
target: cache_shard.join(filename.stem()),
|
||||
filename,
|
||||
hashes: vec![],
|
||||
})
|
||||
}
|
||||
|
||||
|
|
@ -893,7 +999,13 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
&self,
|
||||
source: &BuildableSource<'_>,
|
||||
resource: &GitSourceUrl<'_>,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Metadata23, Error> {
|
||||
// Before running the build, check that the hashes match.
|
||||
if !hashes.is_empty() {
|
||||
return Err(Error::HashesNotSupportedGit(source.to_string()));
|
||||
}
|
||||
|
||||
// Resolve to a precise Git SHA.
|
||||
let url = if let Some(url) = resolve_precise(
|
||||
resource.url,
|
||||
|
|
@ -975,21 +1087,14 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
}
|
||||
|
||||
/// Download and unzip a source distribution into the cache from an HTTP response.
|
||||
async fn persist_url(
|
||||
async fn download_archive(
|
||||
&self,
|
||||
response: Response,
|
||||
source: &BuildableSource<'_>,
|
||||
filename: &str,
|
||||
cache_entry: &CacheEntry,
|
||||
) -> Result<(), Error> {
|
||||
let cache_path = cache_entry.path();
|
||||
if cache_path.is_dir() {
|
||||
debug!("Distribution is already cached: {source}");
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
// Download and unzip the source distribution into a temporary directory.
|
||||
let span = info_span!("persist_url", filename = filename, source_dist = %source);
|
||||
target: &Path,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Vec<HashDigest>, Error> {
|
||||
let temp_dir =
|
||||
tempfile::tempdir_in(self.build_context.cache().bucket(CacheBucket::BuiltWheels))
|
||||
.map_err(Error::CacheWrite)?;
|
||||
|
|
@ -997,9 +1102,29 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
.bytes_stream()
|
||||
.map_err(|err| std::io::Error::new(std::io::ErrorKind::Other, err))
|
||||
.into_async_read();
|
||||
uv_extract::stream::archive(reader.compat(), filename, temp_dir.path()).await?;
|
||||
|
||||
// Create a hasher for each hash algorithm.
|
||||
let algorithms = {
|
||||
let mut hash = hashes.iter().map(HashDigest::algorithm).collect::<Vec<_>>();
|
||||
hash.sort();
|
||||
hash.dedup();
|
||||
hash
|
||||
};
|
||||
let mut hashers = algorithms.into_iter().map(Hasher::from).collect::<Vec<_>>();
|
||||
let mut hasher = uv_extract::hash::HashReader::new(reader.compat(), &mut hashers);
|
||||
|
||||
// Download and unzip the source distribution into a temporary directory.
|
||||
let span = info_span!("download_source_dist", filename = filename, source_dist = %source);
|
||||
uv_extract::stream::archive(&mut hasher, filename, temp_dir.path()).await?;
|
||||
drop(span);
|
||||
|
||||
// If necessary, exhaust the reader to compute the hash.
|
||||
if !hashes.is_empty() {
|
||||
hasher.finish().await.map_err(Error::HashExhaustion)?;
|
||||
}
|
||||
|
||||
let hashes = hashers.into_iter().map(HashDigest::from).collect();
|
||||
|
||||
// Extract the top-level directory.
|
||||
let extracted = match uv_extract::strip_component(temp_dir.path()) {
|
||||
Ok(top_level) => top_level,
|
||||
|
|
@ -1008,39 +1133,51 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
};
|
||||
|
||||
// Persist it to the cache.
|
||||
fs_err::tokio::create_dir_all(cache_path.parent().expect("Cache entry to have parent"))
|
||||
fs_err::tokio::create_dir_all(target.parent().expect("Cache entry to have parent"))
|
||||
.await
|
||||
.map_err(Error::CacheWrite)?;
|
||||
fs_err::tokio::rename(extracted, &cache_path)
|
||||
fs_err::tokio::rename(extracted, target)
|
||||
.await
|
||||
.map_err(Error::CacheWrite)?;
|
||||
|
||||
Ok(())
|
||||
Ok(hashes)
|
||||
}
|
||||
|
||||
/// Extract a local archive, and store it at the given [`CacheEntry`].
|
||||
async fn persist_archive(
|
||||
&self,
|
||||
path: &Path,
|
||||
source: &BuildableSource<'_>,
|
||||
cache_entry: &CacheEntry,
|
||||
) -> Result<(), Error> {
|
||||
let cache_path = cache_entry.path();
|
||||
if cache_path.is_dir() {
|
||||
debug!("Distribution is already cached: {source}");
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
target: &Path,
|
||||
hashes: &[HashDigest],
|
||||
) -> Result<Vec<HashDigest>, Error> {
|
||||
debug!("Unpacking for build: {}", path.display());
|
||||
|
||||
// Unzip the archive into a temporary directory.
|
||||
let temp_dir =
|
||||
tempfile::tempdir_in(self.build_context.cache().bucket(CacheBucket::BuiltWheels))
|
||||
.map_err(Error::CacheWrite)?;
|
||||
let reader = fs_err::tokio::File::open(&path)
|
||||
.await
|
||||
.map_err(Error::CacheRead)?;
|
||||
uv_extract::seek::archive(reader, path, &temp_dir.path()).await?;
|
||||
|
||||
// Create a hasher for each hash algorithm.
|
||||
let algorithms = {
|
||||
let mut hash = hashes.iter().map(HashDigest::algorithm).collect::<Vec<_>>();
|
||||
hash.sort();
|
||||
hash.dedup();
|
||||
hash
|
||||
};
|
||||
let mut hashers = algorithms.into_iter().map(Hasher::from).collect::<Vec<_>>();
|
||||
let mut hasher = uv_extract::hash::HashReader::new(reader, &mut hashers);
|
||||
|
||||
// Unzip the archive into a temporary directory.
|
||||
uv_extract::stream::archive(&mut hasher, path, &temp_dir.path()).await?;
|
||||
|
||||
// If necessary, exhaust the reader to compute the hash.
|
||||
if !hashes.is_empty() {
|
||||
hasher.finish().await.map_err(Error::HashExhaustion)?;
|
||||
}
|
||||
|
||||
let hashes = hashers.into_iter().map(HashDigest::from).collect();
|
||||
|
||||
// Extract the top-level directory from the archive.
|
||||
let extracted = match uv_extract::strip_component(temp_dir.path()) {
|
||||
|
|
@ -1050,14 +1187,14 @@ impl<'a, T: BuildContext> SourceDistributionBuilder<'a, T> {
|
|||
};
|
||||
|
||||
// Persist it to the cache.
|
||||
fs_err::tokio::create_dir_all(cache_path.parent().expect("Cache entry to have parent"))
|
||||
fs_err::tokio::create_dir_all(target.parent().expect("Cache entry to have parent"))
|
||||
.await
|
||||
.map_err(Error::CacheWrite)?;
|
||||
fs_err::tokio::rename(extracted, &cache_path)
|
||||
fs_err::tokio::rename(extracted, &target)
|
||||
.await
|
||||
.map_err(Error::CacheWrite)?;
|
||||
|
||||
Ok(())
|
||||
Ok(hashes)
|
||||
}
|
||||
|
||||
/// Build a source distribution, storing the built wheel in the cache.
|
||||
|
|
|
|||
|
|
@ -1,5 +1,8 @@
|
|||
use distribution_types::Hashed;
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
use pypi_types::HashDigest;
|
||||
|
||||
/// The [`Revision`] is a thin wrapper around a unique identifier for the source distribution.
|
||||
///
|
||||
/// A revision represents a unique version of a source distribution, at a level more granular than
|
||||
|
|
@ -7,16 +10,45 @@ use serde::{Deserialize, Serialize};
|
|||
/// at a URL or a local file path may have multiple revisions, each representing a unique state of
|
||||
/// the distribution, despite the reported version number remaining the same.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub(crate) struct Revision(String);
|
||||
pub(crate) struct Revision {
|
||||
id: String,
|
||||
hashes: Vec<HashDigest>,
|
||||
}
|
||||
|
||||
impl Revision {
|
||||
/// Initialize a new [`Revision`] with a random UUID.
|
||||
pub(crate) fn new() -> Self {
|
||||
Self(nanoid::nanoid!())
|
||||
Self {
|
||||
id: nanoid::nanoid!(),
|
||||
hashes: vec![],
|
||||
}
|
||||
}
|
||||
|
||||
/// Return the unique ID of the revision.
|
||||
/// Return the unique ID of the manifest.
|
||||
pub(crate) fn id(&self) -> &str {
|
||||
&self.0
|
||||
&self.id
|
||||
}
|
||||
|
||||
/// Return the computed hashes of the archive.
|
||||
pub(crate) fn hashes(&self) -> &[HashDigest] {
|
||||
&self.hashes
|
||||
}
|
||||
|
||||
/// Return the computed hashes of the archive.
|
||||
pub(crate) fn into_hashes(self) -> Vec<HashDigest> {
|
||||
self.hashes
|
||||
}
|
||||
|
||||
/// Set the computed hashes of the archive.
|
||||
#[must_use]
|
||||
pub(crate) fn with_hashes(mut self, hashes: Vec<HashDigest>) -> Self {
|
||||
self.hashes = hashes;
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
impl Hashed for Revision {
|
||||
fn hashes(&self) -> &[HashDigest] {
|
||||
&self.hashes
|
||||
}
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue