Commit graph

68 commits

Author SHA1 Message Date
konsti
4a49ff4372
Rename ancillary direct url types to parsed url (#3211)
Followup to #3187. Renaming only
2024-04-23 14:51:23 +00:00
Charlie Marsh
14f05f27b3
Add ticks around error messages more consistently (#3004)
## Summary

I found some of these too bare (e.g., when they _just_ show a package
name with no other information). For me, this makes it easier to
differentiate error message copy from data. But open to other opinions.
Take a look at the fixture changes and LMK!
2024-04-22 23:58:36 +00:00
konsti
f29c991e21
Dedicated error type for direct url parsing (#3181)
Add a dedicated error type for direct url parsing. This change is broken
out from the new uv requirement type, which uses direct url parsing
internally.
2024-04-22 11:57:36 +00:00
Charlie Marsh
0d62e62fb7
Make hash-checking failure mode stricter and safer (#2997)
## Summary

If there are no hashes for a given package, we now return
`Validate(&[])` so that the policy is impossible to satisfy. Previously,
we returned `None`, which is always satisfied.

We don't really ever expect to hit this, because we detect this case in
the resolver and raise a different error. But if we have a bug
somewhere, it's better to fail with an error than silently let the
package through.
2024-04-11 17:53:34 +00:00
Charlie Marsh
96c3c2e774
Support unnamed requirements in --require-hashes (#2993)
## Summary

This PR enables `--require-hashes` with unnamed requirements. The key
change is that `PackageId` becomes `VersionId` (since it refers to a
package at a specific version), and the new `PackageId` consists of
_either_ a package name _or_ a URL. The hashes are keyed by `PackageId`,
so we can generate the `RequiredHashes` before we have names for all
packages, and enforce them throughout.

Closes #2979.
2024-04-11 11:26:50 -04:00
Charlie Marsh
3dd673677a
Add --find-links source distributions to the registry cache (#2986)
## Summary

Source distributions in `--find-links` are now properly picked up in the
cache.

Closes https://github.com/astral-sh/uv/issues/2978.
2024-04-11 01:25:58 +00:00
Charlie Marsh
32f129c245
Store IDs rather than paths in the cache (#2985)
## Summary

Similar to `Revision`, we now store IDs in the `Archive` entires rather
than absolute paths. This makes the cache robust to moves, etc.

Closes https://github.com/astral-sh/uv/issues/2908.
2024-04-10 21:07:51 -04:00
Charlie Marsh
5583b90c30
Create dedicated abstractions for .rev and .http pointers (#2977)
## Summary

This PR formalizes some of the concepts we use in the cache for
"pointers to things".

In the wheel cache, we have files like
`annotated_types-0.6.0-py3-none-any.http`. This represents an unzipped
wheel, cached alongside an HTTP caching policy. We now have a struct for
this to encapsulate the logic: `HttpArchivePointer`.

Similarly, we have files like `annotated_types-0.6.0-py3-none-any.rev`.
This represents an unzipped local wheel, alongside with a timestamp. We
now have a struct for this to encapsulate the logic:
`LocalArchivePointer`.

We have similar structs for source distributions too.
2024-04-10 17:30:27 -04:00
Charlie Marsh
006379c50c
Add support for URL requirements in --generate-hashes (#2952)
## Summary

This PR enables hash generation for URL requirements when the user
provides `--generate-hashes` to `pip compile`. While we include the
hashes from the registry already, today, we omit hashes for URLs.

To power hash generation, we introduce a `HashPolicy` abstraction:

```rust
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum HashPolicy<'a> {
    /// No hash policy is specified.
    None,
    /// Hashes should be generated (specifically, a SHA-256 hash), but not validated.
    Generate,
    /// Hashes should be validated against a pre-defined list of hashes. If necessary, hashes should
    /// be generated so as to ensure that the archive is valid.
    Validate(&'a [HashDigest]),
}
```

All of the methods on the distribution database now accept this policy,
instead of accepting `&'a [HashDigest]`.

Closes #2378.
2024-04-10 20:02:45 +00:00
Charlie Marsh
8513d603b4
Return computed hashes from metadata requests (#2951)
## Summary

This PR modifies the distribution database to return both the
`Metadata23` and the computed hashes when clients request metadata.

No behavior changes, but this will be necessary to power
`--generate-hashes`.
2024-04-10 19:31:41 +00:00
Charlie Marsh
1f3b5bb093
Add hash-checking support to install and sync (#2945)
## Summary

This PR adds support for hash-checking mode in `pip install` and `pip
sync`. It's a large change, both in terms of the size of the diff and
the modifications in behavior, but it's also one that's hard to merge in
pieces (at least, with any test coverage) since it needs to work
end-to-end to be useful and testable.

Here are some of the most important highlights:

- We store hashes in the cache. Where we previously stored pointers to
unzipped wheels in the `archives` directory, we now store pointers with
a set of known hashes. So every pointer to an unzipped wheel also
includes its known hashes.
- By default, we don't compute any hashes. If the user runs with
`--require-hashes`, and the cache doesn't contain those hashes, we
invalidate the cache, redownload the wheel, and compute the hashes as we
go. For users that don't run with `--require-hashes`, there will be no
change in performance. For users that _do_, the only change will be if
they don't run with `--generate-hashes` -- then they may see some
repeated work between resolution and installation, if they use `pip
compile` then `pip sync`.
- Many of the distribution types now include a `hashes` field, like
`CachedDist` and `LocalWheel`.
- Our behavior is similar to pip, in that we enforce hashes when pulling
any remote distributions, and when pulling from our own cache. Like pip,
though, we _don't_ enforce hashes if a distribution is _already_
installed.
- Hash validity is enforced in a few different places:
1. During resolution, we enforce hash validity based on the hashes
reported by the registry. If we need to access a source distribution,
though, we then enforce hash validity at that point too, prior to
running any untrusted code. (This is enforced in the distribution
database.)
2. In the install plan, we _only_ add cached distributions that have
matching hashes. If a cached distribution is missing any hashes, or the
hashes don't match, we don't return them from the install plan.
3. In the downloader, we _only_ return distributions with matching
hashes.
4. The final combination of "things we install" are: (1) the wheels from
the cache, and (2) the downloaded wheels. So this ensures that we never
install any mismatching distributions.
- Like pip, if `--require-hashes` is provided, we require that _all_
distributions are pinned with either `==` or a direct URL. We also
require that _all_ distributions have hashes.

There are a few notable TODOs:

- We don't support hash-checking mode for unnamed requirements. These
should be _somewhat_ rare, though? Since `pip compile` never outputs
unnamed requirements. I can fix this, it's just some additional work.
- We don't automatically enable `--require-hashes` with a hash exists in
the requirements file. We require `--require-hashes`.

Closes #474.

## Test Plan

I'd like to add some tests for registries that report incorrect hashes,
but otherwise: `cargo test`
2024-04-10 19:09:03 +00:00
konsti
15f0be8f04
Allow profiling tests with tracing instrumentation (#2957)
To get more insights into test performance, allow instrumenting tests
with tracing-durations-export.

Usage:

```shell
# A single test
TRACING_DURATIONS_TEST_ROOT=$(pwd)/target/test-traces cargo test --features tracing-durations-export --test pip_install_scenarios no_binary -- --exact
# All tests
TRACING_DURATIONS_TEST_ROOT=$(pwd)/target/test-traces cargo nextest run --features tracing-durations-export
```

Then we can e.g. look at
`target/test-traces/pip_install_scenarios::no_binary.svg` and see the
builds it performs:


![image](40b4e094-debc-4b22-8aa3-9471998674af)
2024-04-10 10:15:27 +00:00
Charlie Marsh
c4472ebbb9
Enforce and backtrack on invalid versions in source metadata (#2954)
## Summary

If we build a source distribution from the registry, and the version
doesn't match that of the filename, we should error, just as we do for
mismatched package names. However, we should also backtrack here, which
we didn't previously.

Closes https://github.com/astral-sh/uv/issues/2953.

## Test Plan

Verified that `cargo run pip install docutils --verbose --no-cache
--reinstall` installs `docutils==0.21` instead of the invalid
`docutils==0.21.post1`.

In the logs, I see:

```
WARN Unable to extract metadata for docutils: Package metadata version `0.21` does not match given version `0.21.post1`
```
2024-04-10 05:13:33 +00:00
Charlie Marsh
83e2297633
Store common fields on BuiltWheelIndex struct (#2939)
## Summary

This mirrors the structure of the `RegistryWheelIndex`. It will be
useful once these indexes check hashes too.
2024-04-09 13:30:02 -04:00
Zanie Blue
1512e07a2e
Split configuration options out of uv-types (#2924)
Needed to prevent circular dependencies in my toolchain work (#2931). I
think this is probably a reasonable change as we move towards persistent
configuration too?

Unfortunately `BuildIsolation` needs to be in `uv-types` to avoid
circular dependencies still. We might be able to resolve that in the
future.
2024-04-09 11:35:53 -05:00
Charlie Marsh
07e3694c3c
Separate local archive vs. local source tree paths in source database (#2922)
## Summary

When you specify a source distribution via a path, it can either be a
path to an archive (like a `.tar.gz` file), or a source tree (a
directory). Right now, we handle both paths through the same methods in
the source database. This PR splits them up into separate handlers.

This will make hash generation a little easier, since we need to
generate hashes for archives, but _can't_ generate hashes for source
trees.

It also means that we can now store the unzipped source distribution in
the cache (in the case of archives), and avoid unzipping the source
distribution needlessly on every invocation; and, overall, let's un
enforce clearer expectations between the two routes (e.g., what errors
are possible vs. not), at the cost of duplicating some code.

Closes #2760 (incidentally -- not exactly the motivation for the change,
but it did accomplish it).
2024-04-09 01:12:33 +00:00
Charlie Marsh
06e96a8f58
DRY up source distribution fetching between wheel and metadata routes (#2921)
These will get more involved with hash-checking, so easiest to extract
them now.

No functional changes.
2024-04-09 00:14:42 +00:00
Charlie Marsh
4f14e2a764
Rebrand Manifest as Revision in wheel database (#2920)
## Summary

I think this is a much clearer name for this concept: the set of
"versions" of a given wheel or source distribution. We also use
"Manifest" elsewhere to refer to the set of requirements, constraints,
etc., so this was overloaded.
2024-04-08 20:00:57 -04:00
Charlie Marsh
1ab471d167
Reduce visibility of some methods in source database (#2919) 2024-04-08 23:49:23 +00:00
Charlie Marsh
c46772eec5
Add a layer of indirection to the local path-based wheel cache (#2909)
## Summary

Right now, the path-based wheel cache just looks at the symlink to the
archives directory, checks the timestamp on it, and continues with that
symlink as long as the timestamp is up-to-date.

The HTTP-based wheel meanwhile, uses an intermediary `.http` file, which
includes the HTTP caching information. The `.http` file's payload is
just a path pointing to an entry in the archives directory.

This PR modifies the path-based codepaths to use a similar cache file,
which stores a timestamp along with a path to the archives directory.
The main advantage here is that we can add other data to this cache file
(namely, hashes in the future).

## Test Plan

Beyond existing tests, I also verified that this doesn't require a
version bump:

```
git checkout main 
cargo run pip install ~/Downloads/zeal-0.0.1-py3-none-any.whl --cache-dir baz --reinstall
git checkout charlie/manifest
cargo run pip install ~/Downloads/zeal-0.0.1-py3-none-any.whl --cache-dir baz --reinstall
cargo run pip install ~/Downloads/zeal-0.0.1-py3-none-any.whl --cache-dir baz --reinstall --refresh
```
2024-04-08 19:32:59 +00:00
Charlie Marsh
134810c547
Respect cached local --find-links in install plan (#2907)
## Summary

I think this is kind of just an oversight. If a wheel is available via
`--find-links`, and the index is "local", we never find it in the cache.

## Test Plan

`cargo test`
2024-04-08 18:58:33 +00:00
Charlie Marsh
31a67f539f
Remove unused local wheel types (#2906)
## Summary

No behavior changes. Just removing unused code.
2024-04-08 18:15:20 +00:00
Charlie Marsh
1daa35176f
Always return unzipped wheels from the distribution database (#2905)
## Summary

In all cases, we unzip these immediately after returning. By moving the
unzipping into the database, we can remove a bunch of code (coming in a
separate PR), and pave the way for hash-checking, since hash generation
will _also_ happen in the database, and splitting the caching layers
across the database and the unzipper creates complications.

Closes #2863.
2024-04-08 14:07:17 -04:00
Charlie Marsh
10dfd43af9
DRY up HTTP request builder in source database (#2902) 2024-04-08 14:45:26 +00:00
Charlie Marsh
f11a5e2208
DRY up local wheel path in distribution database (#2901) 2024-04-08 10:40:17 -04:00
Charlie Marsh
d4a258422b
DRY up request builer in wheel download (#2857) 2024-04-07 02:12:03 +00:00
Charlie Marsh
a0b8d1a994
Clean up Error enum in Metadata23 (#2835)
## Summary

Rename, and remove a bunch of unused variants.
2024-04-05 14:40:33 +00:00
Charlie Marsh
dc2c289dff
Upgrade rs-async-zip to support data descriptors (#2809)
## Summary

Upgrading `rs-async-zip` enables us to support data descriptors in
streaming. This both greatly improves performance for indexes that use
data descriptors _and_ ensures that we support them in a few other
places (e.g., zipped source distributions created in Finder).

Closes #2808.
2024-04-04 01:31:40 +00:00
Charlie Marsh
189d0d41d0
Remove redirects from the resolver (#2792)
## Summary

Rather than storing the `redirects` on the resolver, this PR just
re-uses the "convert this URL to precise" logic when we convert to a
`Resolution` after-the-fact. I think this is a lot simpler: it removes
state from the resolver, and simplifies a lot of the hooks around
distribution fetching (e.g., `get_or_build_wheel_metadata` no longer
returns `(Metadata23, Option<Url>)`).
2024-04-03 02:43:57 +00:00
Charlie Marsh
ffd4b6fcac
Remove unused download_and_extract_archive method (#2793)
## Summary

This was introduced at some point but then removed in favor of accessing
the database directly.
2024-04-03 02:34:14 +00:00
Charlie Marsh
684f790d5d
Preserve .git suffixes and casing in Git dependencies (#2789)
## Summary

I noticed in #2769 that I was now stripping `.git` suffixes from Git
URLs after resolving to a precise commit. This PR cleans up the internal
caching to use a better canonical representation: a `RepositoryUrl`
along with a `GitReference`, instead of a `GitUrl` which can contain
non-canonical data. This gives us both better fidelity (preserving the
`.git`, along with any casing that the user provided when defining the
URL) and is overall cleaner and more robust.
2024-04-03 00:24:29 +00:00
Charlie Marsh
c30a65ee0c
Allow conflicting Git URLs that refer to the same commit SHA (#2769)
## Summary

This PR leverages our lookahead direct URL resolution to significantly
improve the range of Git URLs that we can accept (e.g., if a user
provides the same requirement, once as a direct dependency, and once as
a tag). We did some of this in #2285, but the solution here is more
general and works for arbitrary transitive URLs.

Closes https://github.com/astral-sh/uv/issues/2614.
2024-04-02 23:36:35 +00:00
Charlie Marsh
dfdcce68fd
Split get_or_build_wheel_metadata into distinct methods (#2765)
## Summary

Internal refactor which makes `DistributionDatabase` for unnamed
requirements (or, at least, source distributions).
2024-04-01 23:52:21 +00:00
Charlie Marsh
e68cdb1049
Rename to SourceDistributionBuilder (#2750)
## Summary

This is more consistent with `DistributionDatabase`. The order of the
arguments is also now consistent between the two structs.
2024-04-01 02:37:43 +00:00
Charlie Marsh
8596ff3470
Remove Cache argument from DistributionDatabase (#2749)
## Summary

We can access cache from `BuildContext`. This mirrors
`SourceDistCachedBuilder`, which doesn't accept `Cache` as an argument
and always accesses it through `BuildContext`.
2024-03-31 22:22:25 -04:00
Charlie Marsh
ff6aea3f5c
Respect subdirectories when reading static metadata (#2728)
## Summary

This was just an oversight (and a lack of test coverage).

Closes https://github.com/astral-sh/uv/issues/2727.
2024-03-29 22:36:10 -04:00
Charlie Marsh
f8f7f848f5
Remove Tags from tracing (#2704)
Oops, these show in `tracing` now, and they're huge.
2024-03-28 02:18:54 +00:00
Charlie Marsh
b6ab919945
Make tags non-required for fetching wheel metadata (#2700)
## Summary

This looks like a big change but it really isn't. Rather, I just split
`get_or_build_wheel` into separate `get_wheel` and `build_wheel` methods
internally, which made `get_or_build_wheel_metadata` capable of _not_
relying on `Tags`, which in turn makes it easier for us to use the
`DistributionDatabase` in various places without having it coupled to an
interpreter or environment (something we already did for
`SourceDistributionBuilder`).
2024-03-28 00:06:25 +00:00
Charlie Marsh
365c292525
Read package metadata from pyproject.toml when statically defined (#2676)
## Summary

Now that we're resolving metadata more aggressively for local sources,
it's worth doing this. We now pull metadata from the `pyproject.toml`
directly if it's statically-defined.

Closes https://github.com/astral-sh/uv/issues/2629.
2024-03-27 14:34:18 +00:00
Charlie Marsh
ffd78d0821
Add an in-memory cache for Git references (#2682)
## Summary

Ensures that, even if we try to resolve the same Git reference twice
within an invocation, it always returns a (cached) consistent result.

Closes https://github.com/astral-sh/uv/issues/2673.

## Test Plan

```
❯ cargo run pip install git+https://github.com/pallets/flask.git --reinstall --no-cache
   Compiling uv-distribution v0.0.1 (/Users/crmarsh/workspace/uv/crates/uv-distribution)
   Compiling uv-resolver v0.0.1 (/Users/crmarsh/workspace/uv/crates/uv-resolver)
   Compiling uv-installer v0.0.1 (/Users/crmarsh/workspace/uv/crates/uv-installer)
   Compiling uv-dispatch v0.0.1 (/Users/crmarsh/workspace/uv/crates/uv-dispatch)
   Compiling uv-requirements v0.1.0 (/Users/crmarsh/workspace/uv/crates/uv-requirements)
   Compiling uv v0.1.24 (/Users/crmarsh/workspace/uv/crates/uv)
    Finished dev [unoptimized + debuginfo] target(s) in 3.95s
     Running `target/debug/uv pip install 'git+https://github.com/pallets/flask.git' --reinstall --no-cache`
 Updated https://github.com/pallets/flask.git (b90a4f1)
Resolved 7 packages in 280ms
   Built flask @ git+https://github.com/pallets/flask.git@b90a4f1f4a370e92054b9cc9db0efcb864f87ebe                                                                                                                                            Downloaded 7 packages in 212ms
Installed 7 packages in 9ms
```
2024-03-27 01:39:01 +00:00
Zanie Blue
0b08ba1e67
Rename uv-traits and split into separate modules (#2674)
This is driving me a little crazy and is becoming a larger problem in
#2596 where I need to move more types (like `Upgrade` and `Reinstall`)
into this crate. Anything that's shared across our core resolver,
install, and build crates needs to be defined in this crate to avoid
cyclic dependencies. We've outgrown it being a single file with some
shared traits.

There are no behavioral changes here.
2024-03-26 15:39:43 -05:00
Charlie Marsh
8587c440fe
Support file://localhost/ schemes (#2657)
## Summary

`uv` was failing to install requirements defined like:

```
file://localhost/Users/crmarsh/Downloads/iniconfig-2.0.0-py3-none-any.whl
```

Closes https://github.com/astral-sh/uv/issues/2652.
2024-03-25 19:23:26 +00:00
Charlie Marsh
31743f2bd8
Use PEP 517 build hooks to resolve unnamed requirements (#2604) 2024-03-22 03:20:40 +00:00
Charlie Marsh
5d7d7dce24
Enable PEP 517 builds for unnamed requirements (#2600)
## Summary

This PR enables the source distribution database to be used with unnamed
requirements (i.e., URLs without a package name). The (significant)
upside here is that we can now use PEP 517 hooks to resolve unnamed
requirement metadata _and_ reuse any computation in the cache.

The changes to `crates/uv-distribution/src/source/mod.rs` are quite
extensive, but mostly mechanical. The core idea is that we introduce a
new `BuildableSource` abstraction, which can either be a distribution,
or an unnamed URL:

```rust
/// A reference to a source that can be built into a built distribution.
///
/// This can either be a distribution (e.g., a package on a registry) or a direct URL.
///
/// Distributions can _also_ point to URLs in lieu of a registry; however, the primary distinction
/// here is that a distribution will always include a package name, while a URL will not.
#[derive(Debug, Clone, Copy)]
pub enum BuildableSource<'a> {
    Dist(&'a SourceDist),
    Url(SourceUrl<'a>),
}
```

All the methods on the source distribution database now accept
`BuildableSource`. `BuildableSource` has a `name()` method, but it
returns `Option<&PackageName>`, and everything is required to work with
and without a package name.

The main drawback of this approach (which isn't a terrible one) is that
we can no longer include the package name in the cache. (We do continue
to use the package name for registry-based distributions, since those
always have a name.). The package name was included in the cache route
for two reasons: (1) it's nice for debugging; and (2) we use it to power
`uv cache clean flask`, to identify the entries that are relevant for
Flask.

To solve this, I changed the `uv cache clean` code to look one level
deeper. So, when we want to determine whether to remove the cache entry
for a given URL, we now look into the directory to see if there are any
wheels that match the package name. This isn't as nice, but it does work
(and we have test coverage for it -- all passing).

I also considered removing the package name from the cache routes for
non-registry _wheels_, for consistency... But, it would require a cache
bump, and it didn't feel important enough to merit that.
2024-03-21 22:46:39 -04:00
Charlie Marsh
2979918320
Add support for unnamed Git and HTTP requirements (#2578)
## Summary

Enables, e.g., `uv pip install
git+https://github.com/pallets/flask.git`.

Part of: https://github.com/astral-sh/uv/issues/313.
2024-03-21 13:44:54 +00:00
konsti
70e0967dbd
Avoid repeating paths of workspace packages (#2573)
Scott schafer got me the idea: We can avoid repeating the path for
workspaces dependencies everywhere if we declare them in the virtual
package once and treat them as workspace dependencies from there on.
2024-03-20 16:16:02 -04:00
Zanie Blue
9c27f92203
Introduce a BaseClient for construction of canonical configured client (#2431)
In preparation for support of
https://github.com/astral-sh/uv/issues/2357 (see
https://github.com/astral-sh/uv/pull/2434)
2024-03-15 12:07:38 -05:00
Charlie Marsh
17732246df
Update packse to pull in additional local version tests (#2462)
Precursor to #2430.
2024-03-14 20:13:47 +00:00
Charlie Marsh
d9b160b405
Add backoff for transient Windows failures (#2419)
## Summary

This may be required elsewhere, but all the traces in that issue are
related to persisting the temporary directory to our persistent cache,
so lets start there.

See: https://github.com/astral-sh/uv/issues/1491.
2024-03-13 13:16:26 -04:00
Charlie Marsh
a267a501b6
Add Seek fallback for zip files (#2320)
## Summary

Some zip files can't be streamed; in particular, `rs-async-zip` doesn't
support data descriptors right now (though it may in the future). This
PR adds a fallback path for such zips that downloads the entire zip file
to disk, then unzips it from disk (which gives us `Seek`).

Closes https://github.com/astral-sh/uv/issues/2216.

## Test Plan

`cargo run pip install --extra-index-url https://buf.build/gen/python
hashb_foxglove_protocolbuffers_python==25.3.0.1.20240226043130+465630478360
--force-reinstall -n`
2024-03-10 11:39:28 -04:00