Commit graph

92 commits

Author SHA1 Message Date
Charlie Marsh
b7d77c04cc
Add Git resolver in lieu of static hash map (#3954)
## Summary

This PR removes the static resolver map:

```rust
static RESOLVED_GIT_REFS: Lazy<Mutex<FxHashMap<RepositoryReference, GitSha>>> =
    Lazy::new(Mutex::default);
```

With a `GitResolver` struct that we now pass around on the
`BuildContext`. There should be no behavior changes here; it's purely an
internal refactor with an eye towards making it cleaner for us to
"pre-populate" the list of resolved SHAs.
2024-05-31 22:44:42 -04:00
konsti
72b1642232
Move metadata into its own file (#3939)
Move `Metadata`, `MetadataLoweringError` and `ArchiveMetadata` into
their own file `metadata.rs` in `uv-distribution`, moving it out from
`lib.rs`. No functional changes.
2024-05-31 15:24:10 +02:00
konsti
3c074142f5
Re-add lowering unit tests (#3935)
Re-add the lowering unit tests removed in #3904. This also adds a
`stop_discovery_at` feature to avoid running actual workspace discovery.
2024-05-31 12:17:49 +00:00
konsti
081f20c53e
Add support for tool.uv into distribution building (#3904)
With the change, we remove the special casing of workspace dependencies
and resolve `tool.uv` for all git and directory distributions. This
gives us support for non-editable workspace dependencies and path
dependencies in other workspaces. It removes a lot of special casing
around workspaces. These changes are the groundwork for supporting
`tool.uv` with dynamic metadata.

The basis for this change is moving `Requirement` from
`distribution-types` to `pypi-types` and the lowering logic from
`uv-requirements` to `uv-distribution`. This changes should be split out
in separate PRs.

I've included an example workspace `albatross-root-workspace2` where
`bird-feeder` depends on `a` from another workspace `ab`. There's a
bunch of failing tests and regressed error messages that still need
fixing. It does fix the audited package count for the workspace tests.
2024-05-31 02:42:03 +00:00
Charlie Marsh
09f55482a0
Remove some unused pub use exports (#3930) 2024-05-30 22:26:52 -04:00
Ibraheem Ahmed
261aa2c70a
Port all git functionality to use git CLI (#3833)
## Summary

We currently rely on libgit2 for most git-related functionality.
However, libgit2 has long-standing performance issues, as well as lags
significantly behind git in terms of new features. For these reasons we
now use the git CLI by default for fetching repositories
(https://github.com/astral-sh/uv/pull/1781). This PR completely drops
libgit2 in favor of the git CLI for all git-related functionality, which
should allow us to use features such as partial clones and sparse
checkouts in the future for performance.

There is also a lot of technical debt in the current git code as it's
mostly taken from Cargo. Switching to the git CLI *vastly* simplifies
the `uv-git` codebase.

Eventually we might want to look into switching to
[`gitoxide`](https://github.com/Byron/gitoxide), but it's currently too
immature for our use case.
2024-05-30 15:28:48 -04:00
Charlie Marsh
1fc6a59707
Remove special-casing for editable requirements (#3869)
## Summary

There are a few behavior changes in here:

- We now enforce `--require-hashes` for editables, like pip. So if you
use `--require-hashes` with an editable requirement, we'll reject it. I
could change this if it seems off.
- We now treat source tree requirements, editable or not (e.g., both `-e
./black` and `./black`) as if `--refresh` is always enabled. This
doesn't mean that we _always_ rebuild them; but if you pass
`--reinstall`, then yes, we always rebuild them. I think this is an
improvement and is close to how editables work today.

Closes #3844.

Closes #2695.
2024-05-28 15:49:34 +00:00
Ibraheem Ahmed
7dc322665c
Concurrent progress bars (#3252)
## Summary

Implements concurrent progress bars. Resolves
https://github.com/astral-sh/uv/issues/1209.

## Test Plan

b21bdfbb-8817-4873-a65c-16c9e8c7c460
2024-05-27 01:21:07 +00:00
konsti
4db468e27f
Use VerbatimParsedUrl in pep508_rs (#3758)
When parsing requirements from any source, directly parse the url parts
(and reject unsupported urls) instead of parsing url parts at a later
stage. This removes a bunch of error branches and concludes the work
parsing url parts once and passing them around everywhere.

Many usages of the assembled `VerbatimUrl` remain, but these can be
removed incrementally.

Please review commit-by-commit.
2024-05-23 19:52:47 +00:00
Charlie Marsh
558f628ef1
Propagate URL errors in verbatim parsing (#3720)
## Summary

Closes https://github.com/astral-sh/uv/issues/3715.

## Test Plan

```
❯ echo "/../test" | cargo run pip compile -
error: Couldn't parse requirement in `-` at position 0
  Caused by: path could not be normalized: /../test
/../test
^^^^^^^^

❯ echo "-e /../test" | cargo run pip compile -
error: Invalid URL in `-`: `/../test`
  Caused by: path could not be normalized: /../test
  Caused by: cannot normalize a relative path beyond the base directory
```
2024-05-21 19:58:59 +00:00
konsti
2c4088a518
Improve DirWithoutEntrypoint error message (#3690)
Ran into this and couldn't tell what the problem was without the path
attached.
2024-05-21 11:21:20 +00:00
Charlie Marsh
657eebd50b
Remove SourceDistFilename from RegistrySourceDist (#3650)
## Summary

Uncertain about this, but we don't actually need the full
`SourceDistFilename`, only the name and version -- and we often have
that information already (as in the lockfile routines). So by flattening
the fields onto `RegistrySourceDist`, we can avoid re-parsing for
information we already have.
2024-05-20 09:25:23 -04:00
Charlie Marsh
0718705c21
Track parsed Git URL components in GitSourceUrl (#3656)
## Summary

Closes https://github.com/astral-sh/uv/issues/3571.
2024-05-20 00:43:30 +00:00
Andrew Gallant
018a7150d6
uv-distribution: include all wheels in distribution types (#3595)
Our current flow of data from "simple registry package" to "final
resolved distribution" goes through a number of types:

* `SimpleMetadata` is the API response from a registry that includes all
published versions for a package. Each version has an assortment of
metadata
associated with it.
* `VersionFiles` is the aforementioned metadata. It is split in two: a
group of files for source distributions and a group of files for wheels.
* `PrioritizedDist` collects a subset of the files from `VersionFiles`
to form a selection of the "best" sdist and the "best" wheel for the
current environment.
* `CompatibleDist` is created from a borrowed `PrioritizedDist` that,
perhaps among other things, encapsulates the decision of whether to pick
an sdist or a wheel. (This decision depends both on compatibility and
the action being performed. e.g., When doing installation, a
`CompatibleDist` will sometimes select an sdist over a wheel.)
* `ResolvedDistRef` is like a `ResolvedDist`, but borrows a `Dist`.
* `ResolvedDist` is the almost-final-form of a distribution in a
resolution and is created from a `ResolvedDistRef`.
* `AnnotatedResolvedDist` is a new data type that is the actual final
form of a distribution that a universal lock file cares about. It
bundles a `ResolvedDist` with some metadata needed to generate a lock
file.

One of the requirements of a universal lock file is that we include all
wheels (and maybe all source distributions? but at least one if it's
present) associated with a distribution. But the above flow of data (in
the step from `VersionFiles` to `PrioritizedDist`) drops all wheels
except for the best one.

To remedy this, in this PR, we rejigger `PrioritizedDist`,
`CompatibleDist` and `ResolvedDistRef` so that all wheel data is
preserved. And when a `ResolvedDistRef` is finally turned into a
`ResolvedDist`, we copy all of the wheel data. And finally, we adjust
the `Lock` constructor to read this new data and include it in the lock
file. To make this work, we also modify `RegistryBuiltDist` so that it
can contain one or more wheels instead of just one.

One shortcoming here (called out in the code as a FIXME) is that if a
source distribution is selected as the "best" thing to use (perhaps
there are no compatible wheels), then the wheels won't end up in the
lock file. I plan to fix this in a follow-up PR.

We also aren't totally consistent on source distribution naming.
Sometimes we use `sdist`. Sometimes `source`. Sometimes `source_dist`.
I think it'd be nice to just use `sdist` everywhere, but I do prefer
the type names to be `SourceDist`. And sometimes you want function
names to match the type names (i.e., `from_source_dist`), which in turn
leads to an appearance of inconsistency. I'm open to ideas.

Closes #3351
2024-05-15 15:07:28 -04:00
konsti
025368965e
Reduce GitSourceDist URL usage (#3458)
Refactor the easy-to-remove usage of the `VerbatimUrl` on
`GitSourceDist`
2024-05-14 02:03:55 +00:00
konsti
0010954ca7
Add parsed URL to PubGrubPackage (#3426)
Avoid reparsing urls by storing the parsed parts across resolution on
`PubGrubPackage`.

Part 1 of #3408
2024-05-14 00:55:21 +00:00
Charlie Marsh
9a92a3ad37
Apply advisory locks when building source distributions (#3525)
## Summary

I don't love this, but it turns out that setuptools is not robust to
parallel builds: https://github.com/pypa/setuptools/issues/3119. As a
result, if you run uv from multiple processes, and they each attempt to
build the same source distribution, you can hit failures.

This PR applies an advisory lock to the source distribution directory.
We apply it unconditionally, even if we ultimately find something in the
cache and _don't_ do a build, which helps ensure that we only build the
distribution once (and wait for that build to complete) rather than
kicking off builds from each thread.

Closes https://github.com/astral-sh/uv/issues/3512.

## Test Plan

Ran:

```sh
#!/bin/bash
make_venv(){
    target/debug/uv venv $1
    source $1/bin/activate
    target/debug/uv pip install opentracing --no-deps --verbose
}

for i in {1..8}
do
   make_venv ./$1/$i &
done
```
2024-05-13 10:42:20 -04:00
Charlie Marsh
42c3bfa351
Make Directory its own distribution kind (#3519)
## Summary

I think this is overall good change because it explicitly encodes (in
the type system) something that was previously implicit. I'm not a huge
fan of the names here, open to input.

It covers some of https://github.com/astral-sh/uv/issues/3506 but I
don't think it _closes_ it.
2024-05-13 10:03:14 -04:00
Ibraheem Ahmed
783df8f657
Consolidate concurrency limits (#3493)
## Summary

This PR consolidates the concurrency limits used throughout `uv` and
exposes two limits, `UV_CONCURRENT_DOWNLOADS` and
`UV_CONCURRENT_BUILDS`, as environment variables.

Currently, `uv` has a number of concurrent streams that it buffers using
relatively arbitrary limits for backpressure. However, many of these
limits are conflated. We run a relatively small number of tasks overall
and should start most things as soon as possible. What we really want to
limit are three separate operations:
- File I/O. This is managed by tokio's blocking pool and we should not
really have to worry about it.
- Network I/O.
- Python build processes.

Because the current limits span a broad range of tasks, it's possible
that a limit meant for network I/O is occupied by tasks performing
builds, reading from the file system, or even waiting on a `OnceMap`. We
also don't limit build processes that end up being required to perform a
download. While this may not pose a performance problem because our
limits are relatively high, it does mean that the limits do not do what
we want, making it tricky to expose them to users
(https://github.com/astral-sh/uv/issues/1205,
https://github.com/astral-sh/uv/issues/3311).

After this change, the limits on network I/O and build processes are
centralized and managed by semaphores. All other tasks are unbuffered
(note that these tasks are still bounded, so backpressure should not be
a problem).
2024-05-10 12:43:08 -04:00
Ibraheem Ahmed
94cf604574
Remove unnecessary uses of DashMap and Arc (#3413)
## Summary

All of the resolver code is run on the main thread, so a lot of the
`Send` bounds and uses of `DashMap` and `Arc` are unnecessary. We could
also switch to using single-threaded versions of `Mutex` and `Notify` in
some places, but there isn't really a crate that provides those I would
be comfortable with using.

The `Arc` in `OnceMap` can't easily be removed because of the uv-auth
code which uses the
[reqwest-middleware](https://docs.rs/reqwest-middleware/latest/reqwest_middleware/trait.Middleware.html)
crate, that seems to adds unnecessary `Send` bounds because of
`async-trait`. We could duplicate the code and create a `OnceMapLocal`
variant, but I don't feel that's worth it.
2024-05-06 22:30:43 -04:00
konsti
4f87edbe66
Add basic tool.uv.sources support (#3263)
## Introduction

PEP 621 is limited. Specifically, it lacks
* Relative path support
* Editable support
* Workspace support
* Index pinning or any sort of index specification

The semantics of urls are a custom extension, PEP 440 does not specify
how to use git references or subdirectories, instead pip has a custom
stringly format. We need to somehow support these while still stying
compatible with PEP 621.

## `tool.uv.source`

Drawing inspiration from cargo, poetry and rye, we add `tool.uv.sources`
or (for now stub only) `tool.uv.workspace`:

```toml
[project]
name = "albatross"
version = "0.1.0"
dependencies = [
  "tqdm >=4.66.2,<5",
  "torch ==2.2.2",
  "transformers[torch] >=4.39.3,<5",
  "importlib_metadata >=7.1.0,<8; python_version < '3.10'",
  "mollymawk ==0.1.0"
]

[tool.uv.sources]
tqdm = { git = "https://github.com/tqdm/tqdm", rev = "cc372d09dcd5a5eabdc6ed4cf365bdb0be004d44" }
importlib_metadata = { url = "https://github.com/python/importlib_metadata/archive/refs/tags/v7.1.0.zip" }
torch = { index = "torch-cu118" }
mollymawk = { workspace = true }

[tool.uv.workspace]
include = [
  "packages/mollymawk"
]

[tool.uv.indexes]
torch-cu118 = "https://download.pytorch.org/whl/cu118"
```

See `docs/specifying_dependencies.md` for a detailed explanation of the
format. The basic gist is that `project.dependencies` is what ends up on
pypi, while `tool.uv.sources` are your non-published additions. We do
support the full range or PEP 508, we just hide it in the docs and
prefer the exploded table for easier readability and less confusing with
actual url parts.

This format should eventually be able to subsume requirements.txt's
current use cases. While we will continue to support the legacy `uv pip`
interface, this is a piece of the uv's own top level interface. Together
with `uv run` and a lockfile format, you should only need to write
`pyproject.toml` and do `uv run`, which generates/uses/updates your
lockfile behind the scenes, no more pip-style requirements involved. It
also lays the groundwork for implementing index pinning.

## Changes

This PR implements:
* Reading and lowering `project.dependencies`,
`project.optional-dependencies` and `tool.uv.sources` into a new
requirements format, including:
  * Git dependencies
  * Url dependencies
  * Path dependencies, including relative and editable
* `pip install` integration
* Error reporting for invalid `tool.uv.sources`
* Json schema integration (works in pycharm, see below)
* Draft user-level docs (see `docs/specifying_dependencies.md`)

It does not implement:
* No `pip compile` testing, deprioritizing towards our own lockfile
* Index pinning (stub definitions only)
* Development dependencies
* Workspace support (stub definitions only)
* Overrides in pyproject.toml
* Patching/replacing dependencies

One technically breaking change is that we now require user provided
pyproject.toml to be valid wrt to PEP 621. Included files still fall
back to PEP 517. That means `pip install -r requirements.txt` requires
it to be valid while `pip install -r requirements.txt` with `-e .` as
content falls back to PEP 517 as before.

## Implementation

The `pep508` requirement is replaced by a new `UvRequirement` (name up
for bikeshedding, not particularly attached to the uv prefix). The still
existing `pep508_rs::Requirement` type is a url format copied from pip's
requirements.txt and doesn't appropriately capture all features we
want/need to support. The bulk of the diff is changing the requirement
type throughout the codebase.

We still use `VerbatimUrl` in many places, where we would expect a
parsed/decomposed url type, specifically:
* Reading core metadata except top level pyproject.toml files, we fail a
step later instead if the url isn't supported.
* Allowed `Urls`.
* `PackageId` with a custom `CanonicalUrl` comparison, instead of
canonicalizing urls eagerly.
* `PubGrubPackage`: We eventually convert the `VerbatimUrl` back to a
`Dist` (`Dist::from_url`), instead of remembering the url.
* Source dist types: We use verbatim url even though we know and require
that these are supported urls we can and have parsed.

I tried to make improve the situation be replacing `VerbatimUrl`, but
these changes would require massive invasive changes (see e.g.
https://github.com/astral-sh/uv/pull/3253). A main problem is the ref
`VersionOrUrl` and applying overrides, which assume the same
requirement/url type everywhere. In its current form, this PR increases
this tech debt.

I've tried to split off PRs and commits, but the main refactoring is
still a single monolith commit to make it compile and the tests pass.

## Demo

Adding
d1ae3b85d5/pyproject.json
as json schema (v7) to pycharm for `pyproject.toml`, you can try the IDE
support already:


![pycharm](599082c7-6be5-41c1-a3cd-516092382f8d)


[dove.webm](c293c272-c80b-459d-8c95-8c46a8d198a1)
2024-05-03 21:10:50 +00:00
Andrew Gallant
1089abda3f
require serde and rkyv everywhere; remove optional serde and rkyv features (#3345)
In *some* places in our crates, `serde` (and `rkyv`) are optional
dependencies. I believe this was done out of reasons of "good sense,"
that is, it follows a Rust ecosystem pattern where serde integration
tends to be an opt-in crate feature. (And similarly for `rkyv`.)

However, ultimately, `uv` itself requires `serde` and `rkyv` to
function. Since our crates are strictly internal, there are limited
consumers for our crates without `serde` (and `rkyv`) enabled. I think
one possibility is that optional `serde` (and `rkyv`) integration means
that someone can do this:

    cargo test -p pep440_rs

And this will run tests _without_ `serde` or `rkyv` enabled. That in
turn could lead to faster iteration time by reducing compile times. But,
I'm not sure this is worth supporting. The iterative compilation times
of
individual crates are probably fast enough in debug mode, even with
`serde` and `rkyv` enabled. Namely, `serde` and `rkyv` themselves
shouldn't need to be re-compiled in most cases. On `main`:

```
from-scratch: `cargo test -p pep440_rs --lib` 0.685
incremental: `cargo test -p pep440_rs --lib` 0.278s
from-scratch: `cargo test -p pep440_rs --features serde,rkyv --lib` 3.948s
incremental: `cargo test -p pep440_rs --features serde,rkyv --lib` 0.321s
```

So while a from-scratch build does take significantly longer, an
incremental build is about the same.

The benefit of doing this change is two-fold:

1. It brings out crates into alignment with "reality." In particular,
   some crates were _implicitly_ relying on `serde` being enabled
   without explicitly declaring it. This technically means that our
   `Cargo.toml`s were wrong in some cases, but it is hard to observe it
   because of feature unification in a Cargo workspace.
2. We no longer need to deal with the cognitive burden of writing
   `#[cfg_attr(feature = "serde", ...)]` everywhere.
2024-05-03 10:21:03 -04:00
konsti
80ce32b3e9
Fix typos (#3323)
Split out from https://github.com/astral-sh/uv/pull/3263
2024-04-30 14:36:36 +00:00
konsti
3783292c43
Remove unused dependencies (#3236)
`cargo shear --fix` and some manual fixing for tokio and flate2.

I wanted to prepare my branch and realized main also needs this.
2024-04-24 11:18:24 +00:00
konsti
4a49ff4372
Rename ancillary direct url types to parsed url (#3211)
Followup to #3187. Renaming only
2024-04-23 14:51:23 +00:00
Charlie Marsh
14f05f27b3
Add ticks around error messages more consistently (#3004)
## Summary

I found some of these too bare (e.g., when they _just_ show a package
name with no other information). For me, this makes it easier to
differentiate error message copy from data. But open to other opinions.
Take a look at the fixture changes and LMK!
2024-04-22 23:58:36 +00:00
konsti
f29c991e21
Dedicated error type for direct url parsing (#3181)
Add a dedicated error type for direct url parsing. This change is broken
out from the new uv requirement type, which uses direct url parsing
internally.
2024-04-22 11:57:36 +00:00
Charlie Marsh
0d62e62fb7
Make hash-checking failure mode stricter and safer (#2997)
## Summary

If there are no hashes for a given package, we now return
`Validate(&[])` so that the policy is impossible to satisfy. Previously,
we returned `None`, which is always satisfied.

We don't really ever expect to hit this, because we detect this case in
the resolver and raise a different error. But if we have a bug
somewhere, it's better to fail with an error than silently let the
package through.
2024-04-11 17:53:34 +00:00
Charlie Marsh
96c3c2e774
Support unnamed requirements in --require-hashes (#2993)
## Summary

This PR enables `--require-hashes` with unnamed requirements. The key
change is that `PackageId` becomes `VersionId` (since it refers to a
package at a specific version), and the new `PackageId` consists of
_either_ a package name _or_ a URL. The hashes are keyed by `PackageId`,
so we can generate the `RequiredHashes` before we have names for all
packages, and enforce them throughout.

Closes #2979.
2024-04-11 11:26:50 -04:00
Charlie Marsh
3dd673677a
Add --find-links source distributions to the registry cache (#2986)
## Summary

Source distributions in `--find-links` are now properly picked up in the
cache.

Closes https://github.com/astral-sh/uv/issues/2978.
2024-04-11 01:25:58 +00:00
Charlie Marsh
32f129c245
Store IDs rather than paths in the cache (#2985)
## Summary

Similar to `Revision`, we now store IDs in the `Archive` entires rather
than absolute paths. This makes the cache robust to moves, etc.

Closes https://github.com/astral-sh/uv/issues/2908.
2024-04-10 21:07:51 -04:00
Charlie Marsh
5583b90c30
Create dedicated abstractions for .rev and .http pointers (#2977)
## Summary

This PR formalizes some of the concepts we use in the cache for
"pointers to things".

In the wheel cache, we have files like
`annotated_types-0.6.0-py3-none-any.http`. This represents an unzipped
wheel, cached alongside an HTTP caching policy. We now have a struct for
this to encapsulate the logic: `HttpArchivePointer`.

Similarly, we have files like `annotated_types-0.6.0-py3-none-any.rev`.
This represents an unzipped local wheel, alongside with a timestamp. We
now have a struct for this to encapsulate the logic:
`LocalArchivePointer`.

We have similar structs for source distributions too.
2024-04-10 17:30:27 -04:00
Charlie Marsh
006379c50c
Add support for URL requirements in --generate-hashes (#2952)
## Summary

This PR enables hash generation for URL requirements when the user
provides `--generate-hashes` to `pip compile`. While we include the
hashes from the registry already, today, we omit hashes for URLs.

To power hash generation, we introduce a `HashPolicy` abstraction:

```rust
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum HashPolicy<'a> {
    /// No hash policy is specified.
    None,
    /// Hashes should be generated (specifically, a SHA-256 hash), but not validated.
    Generate,
    /// Hashes should be validated against a pre-defined list of hashes. If necessary, hashes should
    /// be generated so as to ensure that the archive is valid.
    Validate(&'a [HashDigest]),
}
```

All of the methods on the distribution database now accept this policy,
instead of accepting `&'a [HashDigest]`.

Closes #2378.
2024-04-10 20:02:45 +00:00
Charlie Marsh
8513d603b4
Return computed hashes from metadata requests (#2951)
## Summary

This PR modifies the distribution database to return both the
`Metadata23` and the computed hashes when clients request metadata.

No behavior changes, but this will be necessary to power
`--generate-hashes`.
2024-04-10 19:31:41 +00:00
Charlie Marsh
1f3b5bb093
Add hash-checking support to install and sync (#2945)
## Summary

This PR adds support for hash-checking mode in `pip install` and `pip
sync`. It's a large change, both in terms of the size of the diff and
the modifications in behavior, but it's also one that's hard to merge in
pieces (at least, with any test coverage) since it needs to work
end-to-end to be useful and testable.

Here are some of the most important highlights:

- We store hashes in the cache. Where we previously stored pointers to
unzipped wheels in the `archives` directory, we now store pointers with
a set of known hashes. So every pointer to an unzipped wheel also
includes its known hashes.
- By default, we don't compute any hashes. If the user runs with
`--require-hashes`, and the cache doesn't contain those hashes, we
invalidate the cache, redownload the wheel, and compute the hashes as we
go. For users that don't run with `--require-hashes`, there will be no
change in performance. For users that _do_, the only change will be if
they don't run with `--generate-hashes` -- then they may see some
repeated work between resolution and installation, if they use `pip
compile` then `pip sync`.
- Many of the distribution types now include a `hashes` field, like
`CachedDist` and `LocalWheel`.
- Our behavior is similar to pip, in that we enforce hashes when pulling
any remote distributions, and when pulling from our own cache. Like pip,
though, we _don't_ enforce hashes if a distribution is _already_
installed.
- Hash validity is enforced in a few different places:
1. During resolution, we enforce hash validity based on the hashes
reported by the registry. If we need to access a source distribution,
though, we then enforce hash validity at that point too, prior to
running any untrusted code. (This is enforced in the distribution
database.)
2. In the install plan, we _only_ add cached distributions that have
matching hashes. If a cached distribution is missing any hashes, or the
hashes don't match, we don't return them from the install plan.
3. In the downloader, we _only_ return distributions with matching
hashes.
4. The final combination of "things we install" are: (1) the wheels from
the cache, and (2) the downloaded wheels. So this ensures that we never
install any mismatching distributions.
- Like pip, if `--require-hashes` is provided, we require that _all_
distributions are pinned with either `==` or a direct URL. We also
require that _all_ distributions have hashes.

There are a few notable TODOs:

- We don't support hash-checking mode for unnamed requirements. These
should be _somewhat_ rare, though? Since `pip compile` never outputs
unnamed requirements. I can fix this, it's just some additional work.
- We don't automatically enable `--require-hashes` with a hash exists in
the requirements file. We require `--require-hashes`.

Closes #474.

## Test Plan

I'd like to add some tests for registries that report incorrect hashes,
but otherwise: `cargo test`
2024-04-10 19:09:03 +00:00
konsti
15f0be8f04
Allow profiling tests with tracing instrumentation (#2957)
To get more insights into test performance, allow instrumenting tests
with tracing-durations-export.

Usage:

```shell
# A single test
TRACING_DURATIONS_TEST_ROOT=$(pwd)/target/test-traces cargo test --features tracing-durations-export --test pip_install_scenarios no_binary -- --exact
# All tests
TRACING_DURATIONS_TEST_ROOT=$(pwd)/target/test-traces cargo nextest run --features tracing-durations-export
```

Then we can e.g. look at
`target/test-traces/pip_install_scenarios::no_binary.svg` and see the
builds it performs:


![image](40b4e094-debc-4b22-8aa3-9471998674af)
2024-04-10 10:15:27 +00:00
Charlie Marsh
c4472ebbb9
Enforce and backtrack on invalid versions in source metadata (#2954)
## Summary

If we build a source distribution from the registry, and the version
doesn't match that of the filename, we should error, just as we do for
mismatched package names. However, we should also backtrack here, which
we didn't previously.

Closes https://github.com/astral-sh/uv/issues/2953.

## Test Plan

Verified that `cargo run pip install docutils --verbose --no-cache
--reinstall` installs `docutils==0.21` instead of the invalid
`docutils==0.21.post1`.

In the logs, I see:

```
WARN Unable to extract metadata for docutils: Package metadata version `0.21` does not match given version `0.21.post1`
```
2024-04-10 05:13:33 +00:00
Charlie Marsh
83e2297633
Store common fields on BuiltWheelIndex struct (#2939)
## Summary

This mirrors the structure of the `RegistryWheelIndex`. It will be
useful once these indexes check hashes too.
2024-04-09 13:30:02 -04:00
Zanie Blue
1512e07a2e
Split configuration options out of uv-types (#2924)
Needed to prevent circular dependencies in my toolchain work (#2931). I
think this is probably a reasonable change as we move towards persistent
configuration too?

Unfortunately `BuildIsolation` needs to be in `uv-types` to avoid
circular dependencies still. We might be able to resolve that in the
future.
2024-04-09 11:35:53 -05:00
Charlie Marsh
07e3694c3c
Separate local archive vs. local source tree paths in source database (#2922)
## Summary

When you specify a source distribution via a path, it can either be a
path to an archive (like a `.tar.gz` file), or a source tree (a
directory). Right now, we handle both paths through the same methods in
the source database. This PR splits them up into separate handlers.

This will make hash generation a little easier, since we need to
generate hashes for archives, but _can't_ generate hashes for source
trees.

It also means that we can now store the unzipped source distribution in
the cache (in the case of archives), and avoid unzipping the source
distribution needlessly on every invocation; and, overall, let's un
enforce clearer expectations between the two routes (e.g., what errors
are possible vs. not), at the cost of duplicating some code.

Closes #2760 (incidentally -- not exactly the motivation for the change,
but it did accomplish it).
2024-04-09 01:12:33 +00:00
Charlie Marsh
06e96a8f58
DRY up source distribution fetching between wheel and metadata routes (#2921)
These will get more involved with hash-checking, so easiest to extract
them now.

No functional changes.
2024-04-09 00:14:42 +00:00
Charlie Marsh
4f14e2a764
Rebrand Manifest as Revision in wheel database (#2920)
## Summary

I think this is a much clearer name for this concept: the set of
"versions" of a given wheel or source distribution. We also use
"Manifest" elsewhere to refer to the set of requirements, constraints,
etc., so this was overloaded.
2024-04-08 20:00:57 -04:00
Charlie Marsh
1ab471d167
Reduce visibility of some methods in source database (#2919) 2024-04-08 23:49:23 +00:00
Charlie Marsh
c46772eec5
Add a layer of indirection to the local path-based wheel cache (#2909)
## Summary

Right now, the path-based wheel cache just looks at the symlink to the
archives directory, checks the timestamp on it, and continues with that
symlink as long as the timestamp is up-to-date.

The HTTP-based wheel meanwhile, uses an intermediary `.http` file, which
includes the HTTP caching information. The `.http` file's payload is
just a path pointing to an entry in the archives directory.

This PR modifies the path-based codepaths to use a similar cache file,
which stores a timestamp along with a path to the archives directory.
The main advantage here is that we can add other data to this cache file
(namely, hashes in the future).

## Test Plan

Beyond existing tests, I also verified that this doesn't require a
version bump:

```
git checkout main 
cargo run pip install ~/Downloads/zeal-0.0.1-py3-none-any.whl --cache-dir baz --reinstall
git checkout charlie/manifest
cargo run pip install ~/Downloads/zeal-0.0.1-py3-none-any.whl --cache-dir baz --reinstall
cargo run pip install ~/Downloads/zeal-0.0.1-py3-none-any.whl --cache-dir baz --reinstall --refresh
```
2024-04-08 19:32:59 +00:00
Charlie Marsh
134810c547
Respect cached local --find-links in install plan (#2907)
## Summary

I think this is kind of just an oversight. If a wheel is available via
`--find-links`, and the index is "local", we never find it in the cache.

## Test Plan

`cargo test`
2024-04-08 18:58:33 +00:00
Charlie Marsh
31a67f539f
Remove unused local wheel types (#2906)
## Summary

No behavior changes. Just removing unused code.
2024-04-08 18:15:20 +00:00
Charlie Marsh
1daa35176f
Always return unzipped wheels from the distribution database (#2905)
## Summary

In all cases, we unzip these immediately after returning. By moving the
unzipping into the database, we can remove a bunch of code (coming in a
separate PR), and pave the way for hash-checking, since hash generation
will _also_ happen in the database, and splitting the caching layers
across the database and the unzipper creates complications.

Closes #2863.
2024-04-08 14:07:17 -04:00
Charlie Marsh
10dfd43af9
DRY up HTTP request builder in source database (#2902) 2024-04-08 14:45:26 +00:00
Charlie Marsh
f11a5e2208
DRY up local wheel path in distribution database (#2901) 2024-04-08 10:40:17 -04:00
Charlie Marsh
d4a258422b
DRY up request builer in wheel download (#2857) 2024-04-07 02:12:03 +00:00