See https://github.com/astral-sh/uv/issues/2617
Note this also includes:
- #2918
- #2931 (pending)
A first step towards Python toolchain management in Rust.
First, we add a new crate to manage Python download metadata:
- Adds a new `uv-toolchain` crate
- Adds Rust structs for Python version download metadata
- Duplicates the script which downloads Python version metadata
- Adds a script to generate Rust code from the JSON metadata
- Adds a utility to download and extract the Python version
I explored some alternatives like a build script using things like
`serde` and `uneval` to automatically construct the code from our
structs but deemed it to heavy. Unlike Rye, I don't generate the Rust
directly from the web requests and have an intermediate JSON layer to
speed up iteration on the Rust types.
Next, we add add a `uv-dev` command `fetch-python` to download Python
versions per the bootstrapping script.
- Downloads a requested version or reads from `.python-versions`
- Extracts to `UV_BOOTSTRAP_DIR`
- Links executables for path extension
This command is not really intended to be user facing, but it's a good
PoC for the `uv-toolchain` API. Hash checking (via the sha256) isn't
implemented yet, we can do that in a follow-up.
Finally, we remove the `scripts/bootstrap` directory, update CI to use
the new command, and update the CONTRIBUTING docs.
<img width="1023" alt="Screenshot 2024-04-08 at 17 12 15"
src="57bd3cf1-7477-4bb8-a8e9-802a00d772cb">
## Summary
If the user runs with `--generate-hashes`, and the lockfile doesn't
contain _any_ hashes for a package (despite being pinned), we should add
new hashes. This mirrors running `uv pip compile --generate-hashes` for
the first time with an existing lockfile.
Closes#2962.
To get more insights into test performance, allow instrumenting tests
with tracing-durations-export.
Usage:
```shell
# A single test
TRACING_DURATIONS_TEST_ROOT=$(pwd)/target/test-traces cargo test --features tracing-durations-export --test pip_install_scenarios no_binary -- --exact
# All tests
TRACING_DURATIONS_TEST_ROOT=$(pwd)/target/test-traces cargo nextest run --features tracing-durations-export
```
Then we can e.g. look at
`target/test-traces/pip_install_scenarios::no_binary.svg` and see the
builds it performs:

## Summary
If we build a source distribution from the registry, and the version
doesn't match that of the filename, we should error, just as we do for
mismatched package names. However, we should also backtrack here, which
we didn't previously.
Closes https://github.com/astral-sh/uv/issues/2953.
## Test Plan
Verified that `cargo run pip install docutils --verbose --no-cache
--reinstall` installs `docutils==0.21` instead of the invalid
`docutils==0.21.post1`.
In the logs, I see:
```
WARN Unable to extract metadata for docutils: Package metadata version `0.21` does not match given version `0.21.post1`
```
## Summary
This updates to the version of axoupdater used in cargo-dist 0.13.0's
own selfupdate command, with all relevant fixes for platforms. It also
tentatively introduces a mildly dangerous self-runtest that runs `uv
self update` and checks that the binary is installed and executable.
I *believe* some adjustments need to be made to your CI to have this new
test run, because it requires the `self-update` feature to be enabled,
and I didn't want to just start messing with how you do feature coverage
in your CI. **As a result I haven't yet had a chance to actually fully
run this in CI**, though I've locally tested it on windows (with the
guard disabled).
## Test Plan
Most of the machinery here is provided by axoupdater itself (cargo-dist
also includes a variant of these tests in its codebase). This initial
implementation has a couple major limitations:
* This is For Reals modifying the system that runs the test (so it's off
unless it detects it's running in CI, and if you want variations on this
test they'll need to be [run in
serial](5e7826f7b0/cargo-dist/tests/cli-tests.rs (L235))).
Since many of the testing issues were surrounding precise details of
Actual Deployed Executions, this seemed worth the tradeoff.
* The actual installer *script* it's ultimately invoking is the one you
last published, and *not* the one that cargo-dist will make when you
next publish.
We're already working on implementing some logic for "get cargo-dist to
generate a fresh installer script too", which is in fact the basis of a
huge amount of cargo-dist's own testsuite. Now that we're dogfooding
this stuff, it should be quite hard for this stuff to break without
cargo-dist's own codebase noticing it first.
<!-- How was it tested? -->
Reproduced https://github.com/astral-sh/uv/issues/2941 and confirmed
fix.
We probably ought to have some ecosystem test coverage — this seems like
a good starting point we can extend to other projects in the future.
## Summary
The prefetcher tallies the number of times we tried a given package, and
then once we hit a threshold, grabs the version map, assuming it's
already been fetched. For direct URL distributions, though, we don't
have a version map! And there's no need to prefetch.
Closes https://github.com/astral-sh/uv/issues/2941.
## Summary
Right now, we have a `Hashes` representation that looks like:
```rust
/// A dictionary mapping a hash name to a hex encoded digest of the file.
///
/// PEP 691 says multiple hashes can be included and the interpretation is left to the client.
#[derive(Debug, Clone, Eq, PartialEq, Default, Deserialize)]
pub struct Hashes {
pub md5: Option<Box<str>>,
pub sha256: Option<Box<str>>,
pub sha384: Option<Box<str>>,
pub sha512: Option<Box<str>>,
}
```
It stems from the PyPI API, which returns a dictionary of hashes.
We tend to pass these around as a vector of `Vec<Hashes>`. But it's a
bit strange because each entry in that vector could contain multiple
hashes. And it makes it difficult to ask questions like "Is
`sha256:ab21378ca980a8` in the set of hashes"?
This PR instead treats `Hashes` as the PyPI-internal type, and uses a
new `Vec<HashDigest>` everywhere in our own APIs.
Needed to prevent circular dependencies in my toolchain work (#2931). I
think this is probably a reasonable change as we move towards persistent
configuration too?
Unfortunately `BuildIsolation` needs to be in `uv-types` to avoid
circular dependencies still. We might be able to resolve that in the
future.
Elides Python patch versions from the test suite unless the test
specifically requests a patch version.
This reduces some toil when not using our bootstrapped Python versions.
Partially addresses https://github.com/astral-sh/uv/issues/2165 though
we'll need changes to the scenario tests to really support their case.
## Summary
When you specify a source distribution via a path, it can either be a
path to an archive (like a `.tar.gz` file), or a source tree (a
directory). Right now, we handle both paths through the same methods in
the source database. This PR splits them up into separate handlers.
This will make hash generation a little easier, since we need to
generate hashes for archives, but _can't_ generate hashes for source
trees.
It also means that we can now store the unzipped source distribution in
the cache (in the case of archives), and avoid unzipping the source
distribution needlessly on every invocation; and, overall, let's un
enforce clearer expectations between the two routes (e.g., what errors
are possible vs. not), at the cost of duplicating some code.
Closes#2760 (incidentally -- not exactly the motivation for the change,
but it did accomplish it).
## Summary
I think this is a much clearer name for this concept: the set of
"versions" of a given wheel or source distribution. We also use
"Manifest" elsewhere to refer to the set of requirements, constraints,
etc., so this was overloaded.
## Summary
We have a heuristic in `File` that attempts to detect whether a URL is
absolute or relative. However, `contains("://")` is prone to false
positive. In the linked issues, the URLs look like:
```
/packages/5a/d8/4d75d1e4287ad9d051aab793c68f902c9c55c4397636b5ee540ebd15aedf/pytz-2005k.tar.bz2?hash=597b596dc1c2c130cd0a57a043459c3bd6477c640c07ac34ca3ce8eed7e6f30c&remote=4d75d1e428/pytz-2005k.tar.bz2 (sha256)=597b596dc1c2c130cd0a57a043459c3bd6477c640c07ac34ca3ce8eed7e6f30c
```
Which is relative, but includes `://`.
Instead, we should determine whether the URL has a _scheme_ which
matches the `Url` crate internally.
Closes https://github.com/astral-sh/uv/issues/2899.
## Summary
Right now, the path-based wheel cache just looks at the symlink to the
archives directory, checks the timestamp on it, and continues with that
symlink as long as the timestamp is up-to-date.
The HTTP-based wheel meanwhile, uses an intermediary `.http` file, which
includes the HTTP caching information. The `.http` file's payload is
just a path pointing to an entry in the archives directory.
This PR modifies the path-based codepaths to use a similar cache file,
which stores a timestamp along with a path to the archives directory.
The main advantage here is that we can add other data to this cache file
(namely, hashes in the future).
## Test Plan
Beyond existing tests, I also verified that this doesn't require a
version bump:
```
git checkout main
cargo run pip install ~/Downloads/zeal-0.0.1-py3-none-any.whl --cache-dir baz --reinstall
git checkout charlie/manifest
cargo run pip install ~/Downloads/zeal-0.0.1-py3-none-any.whl --cache-dir baz --reinstall
cargo run pip install ~/Downloads/zeal-0.0.1-py3-none-any.whl --cache-dir baz --reinstall --refresh
```
## Summary
I think this is kind of just an oversight. If a wheel is available via
`--find-links`, and the index is "local", we never find it in the cache.
## Test Plan
`cargo test`
## Summary
In all cases, we unzip these immediately after returning. By moving the
unzipping into the database, we can remove a bunch of code (coming in a
separate PR), and pave the way for hash-checking, since hash generation
will _also_ happen in the database, and splitting the caching layers
across the database and the unzipper creates complications.
Closes#2863.
With pubgrub being fast for complex ranges, we can now compute the next
n candidates without taking a performance hit. This speeds up cold cache
`urllib3<1.25.4` `boto3` from maybe 40s - 50s to ~2s. See docstrings for
details on the heuristics.
**Before**

**After**

---
We need two parts of the prefetching, first looking for compatible
version and then falling back to flat next versions. After we selected a
boto3 version, there is only one compatible botocore version remaining,
so when won't find other compatible candidates for prefetching. We see
this as a pattern where we only prefetch boto3 (stack bars), but not
botocore (sequential requests between the stacked bars).

The risk is that we're completely wrong with the guess and cause a lot
of useless network requests. I think this is acceptable since this
mechanism only triggers when we're already on the bad path and we should
simply have fetched all versions after some seconds (assuming a fast
index like pypi).
---
It would be even better if the pubgrub state was copy-on-write so we
could simulate more progress than we actually have; currently we're
guessing what the next version is which could be completely wrong, but i
think this is still a valuable heuristic.
Fixes#170.