## Summary
This should reduce the number of filesystem operations fairly
dramatically:
- Only query actual symlinks.
- Don't recurse into package bodies (huge).
- Only traverse once (rather than twice).
Closes https://github.com/astral-sh/uv/issues/13667.
Prior to this PR, there were numerous places where uv would leak
credentials in logs. We had a way to mask credentials by calling methods
or a recently-added `redact_url` function, but this was not secure by
default. There were a number of other types (like `GitUrl`) that would
leak credentials on display.
This PR adds a `DisplaySafeUrl` newtype to prevent leaking credentials
when logging by default. It takes a maximalist approach, replacing the
use of `Url` almost everywhere. This includes when first parsing config
files, when storing URLs in types like `GitUrl`, and also when storing
URLs in types that in practice will never contain credentials (like
`DirectorySourceUrl`). The idea is to make it easy for developers to do
the right thing and for the compiler to support this (and to minimize
ever having to manually convert back and forth). Displaying credentials
now requires an active step. Note that despite this maximalist approach,
the use of the newtype should be zero cost.
One conspicuous place this PR does not use `DisplaySafeUrl` is in the
`uv-auth` crate. That would require new clones since there are calls to
`request.url()` that return a `&Url`. One option would have been to make
`DisplaySafeUrl` wrap a `Cow`, but this would lead to lifetime
annotations all over the codebase. I've created a separate PR based on
this one (#13576) that updates `uv-auth` to use `DisplaySafeUrl` with
one new clone. We can discuss the tradeoffs there.
Most of this PR just replaces `Url` with `DisplaySafeUrl`. The core is
`uv_redacted/lib.rs`, where the newtype is implemented. To make it
easier to review the rest, here are some points of note:
* `DisplaySafeUrl` has a `Display` implementation that masks
credentials. Currently, it will still display the username when there is
both a username and password. If we think is the wrong choice, it can
now be changed in one place.
* `DisplaySafeUrl` has a `remove_credentials()` method and also a
`.to_string_with_credentials()` method. This allows us to use it in a
variety of scenarios.
* `IndexUrl::redacted()` was renamed to
`IndexUrl::removed_credentials()` to make it clearer that we are not
masking.
* We convert from a `DisplaySafeUrl` to a `Url` when calling `reqwest`
methods like `.get()` and `.head()`.
* We convert from a `DisplaySafeUrl` to a `Url` when creating a
`uv_auth::Index`. That is because, as mentioned above, I will be
updating the `uv_auth` crate to use this newtype in a separate PR.
* A number of tests (e.g., in `pip_install.rs`) that formerly used
filters to mask tokens in the test output no longer need those filters
since tokens in URLs are now masked automatically.
* The one place we are still knowingly writing credentials to
`pyproject.toml` is when a URL with credentials is passed to `uv add`
with `--raw`. Since displaying credentials is no longer automatic, I
have added a `to_string_with_credentials()` method to the `Pep508Url`
trait. This is used when `--raw` is passed. Adding it to that trait is a
bit weird, but it's the simplest way to achieve the goal. I'm open to
suggestions on how to improve this, but note that because of the way
we're using generic bounds, it's not as simple as just creating a
separate trait for that method.
## Summary
If a set of wheel tags includes a dot, this code is treating the part
_after_ the dot as an extension, and thereby failing to detect that the
entry is a symlink to an archive (and thereby removing the archive).
This is all an optimization, so this code just makes it a little
targeted: we skip specific known extensions, rather than anything with
any extension.
Closes https://github.com/astral-sh/uv/issues/13270.
In #13302, there was an IO error without context. This error seems to be
caused by a symlink error. Switching as symlinking to `fs_err` ensures
these errors will carry context in the future.
Part of #11834
Currently, all Python installation are a streaming download-and-extract.
With this PR, we add the `UV_PYTHON_CACHE_DIR` variable. When set, the
installation is split into downloading the interpreter into
`UV_PYTHON_CACHE_DIR` and extracting it there from a second step. If the
archive is already present in `UV_PYTHON_CACHE_DIR`, we skip the
download.
The feature can be used to speed up tests and CI. Locally for me, `cargo
test -p uv -- python_install` goes from 43s to 7s (1,7s in release mode)
when setting `UV_PYTHON_CACHE_DIR`. It can also be used for offline
installation of Python interpreter, by copying the archives to a
directory in the offline machine, while the path rewriting is still
performed on the target machine on installation.
## Summary
I don't know if I actually want to commit this, but I did it on the
plane last time and just polished it off (got it to compile) while
waiting to board.
## Summary
This ended up being more involved than expected. The gist is that we
setup all the packages we want to reinstall upfront (they're passed in
on the command-line); but at that point, we don't have names for all the
packages that the user has specified. (Consider, e.g., `uv pip install
.` -- we don't have a name for `.`, so we can't add it to the list of
`Reinstall` packages.)
Now, `Reinstall` also accepts paths, so we can augment `Reinstall` based
on the user-provided paths.
Closes#12038.
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
Closes#2410
<!-- What's the purpose of the change? What does it do, and why? -->
This changes the name of files in `wheels` bucket to use a hash instead
of the wheel name as to not exceed maximum file length limit on various
systems.
This only addresses the primary concern of #2410. It still does _not_
address:
- Path limit of 260 on windows:
https://github.com/astral-sh/uv/issues/2410#issuecomment-2062020882
To solve this we need to opt-in to longer path limits on windows
([ref](https://github.com/astral-sh/uv/issues/2410#issuecomment-2150532658)),
but I think that is a separate issue and should be a separate MR.
- Exceeding filename limit while building a wheel from source
distribution
As per my understanding, this is out of uv's control. Name of the output
wheel will be decided by build-backend used by the project. For wheels
built from source distribution, pip also uses the wheel names in cache.
So I have not touched `sdists` cache.
I have added a `filename: WheelFileName` field in `Archive`, so we can
use it while indexing instead of relying on the filename on disk.
Another way to do this was to read `.dist-info/WHEEL` and
`.dist-info/METADATA` and build `WheelFileName` but that seems less
robust and will be slower.
## Test Plan
<!-- How was it tested? -->
Tested by installing `yt-dlp`, `httpie` and `sqlalchemy` and verifying
that cache files in `wheels` bucket use hash.
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
* Upgrade the rust toolchain to 1.85.0. This does not increase the MSRV.
* Update windows trampoline to 1.86 nightly beta (previously in 1.85
nightly beta).
## Test Plan
Existing tests
Revert #11601 for now
We run Python interpreter discovery with `-I` (#2500) which means these
environments variables are ignored when determining `sys.path`. Unless
we decide to remove the `-I` flag from the `sys.path` query, we
shouldn't release these changes to interpreter discovery caching.
We want to use `sys.path` for package discovery (#2500, #9849). For
that, we need to know the correct value of `sys.path`. `sys.path` is a
runtime-changeable value, which gets influenced from a lot of different
sources: Environment variables, CLI arguments, `.pth` files with
scripting, `sys.path.append()` at runtime, a distributor patching
Python, etc. We cannot capture them all accurately, especially since
it's possible to change `sys.path` mid-execution. Instead, we do a best
effort attempt at matching the user's expectation.
The assumption is that package installation generally happens in venv
site-packages, system/user site-packages (including pypy shipping
packages with std), and `PYTHONPATH`. Specifically, we reuse
`PYTHONPATH` as dedicated way for users to tell uv to include specific
directories in package discovery.
A common way to influence `sys.path` that is not using venvs is setting
`PYTHONPATH`. To support this we're capturing `PYTHONPATH` as part of
the cache invalidation, i.e. we refresh the interpreter metadata if it
changed. For completeness, we're also capturing other environment
variables documented as influencing `sys.path` or other fields in the
interpreter info.
This PR does not include reading registry values for `sys.path`
additions on Windows as documented in
https://docs.python.org/3.11/using/windows.html#finding-modules. It
notably also does not include parsing of python CLI arguments, we only
consider their environment variable versions for package installation
and listing. We could try parsing CLI flags in `uv run python`, but we'd
still miss them when Python is launched indirectly through a script, and
it's more consistent to only consider uv's own arguments and environment
variables, similar to uv's behavior in other places.
Instead of using junctions, we can just write files that contain (as the
file contents) the target path. This requires a little more finesse in
that, as readers, we need to know where to expect these. But it also
means we get to avoid junctions, which have led to a variety of
confusing behaviors. Further, `replace_symlink` should now be on atomic
on Windows.
Closes#11263.
We've never bumped the version of this bucket, and we may never do so...
But it's still incorrect for us to omit it from these serialized structs
in the cache. Specifically, these structs include a pointer into the
archive bucket (namely, the ID). But we don't include the bucket
version! So, in theory, we could end up pointing to archives that don't
match the current bucket version expected in the code.
## Summary
Today, scripts use `CachedEnvironment`, which results in a different
virtual environment path every time the interpreter changes _or_ the
project requirements change. This makes it impossible to provide users
with a stable path to the script that they can use for (e.g.) directing
their editor.
This PR modifies `uv run` to use a stable path for local scripts (we
continue to use `CachedEnvironment` for remote scripts and scripts from
`stdin`). The logic now looks a lot more like it does for projects: we
`get_or_init` an environment, etc.
For now, the path to the script is like:
`environments-v1/4485801245a4732f`, where `4485801245a4732f` is a SHA of
the absolute path to the script. But I'm not picky on that :)
## Summary
If we fail to deserialize cached metadata in the cache, we should just
ignore it, rather than failing.
Ideally, this never happens. If it does, it means we missed a cache
version bump. But if it does happen, it should still be non-fatal.
Closes https://github.com/astral-sh/uv/issues/11043.
Closes https://github.com/astral-sh/uv/issues/11101.
## Test Plan
Prior to this PR, the following would fail:
- `uvx uv@0.5.25 venv --python 3.12 --cache-dir foo`
- `uvx uv@0.5.25 pip install ./scripts/packages/hatchling_dynamic
--no-deps --python 3.12 --cache-dir foo`
- `uvx uv@0.5.18 venv --python 3.12 --cache-dir foo`
- `uvx uv@0.5.18 pip install ./scripts/packages/hatchling_dynamic
--no-deps --python 3.12 --cache-dir foo`
We can't go back and fix 0.5.18, but this will prevent such regressions
in the future.
## Summary
On Windows, we have a lot of issues with atomic replacement and such.
There are a bunch of different failure modes, but they generally
involve: trying to persist a fail to a path at which the file already
exists, trying to replace or remove a file while someone else is reading
it, etc.
This PR adds locks to all of the relevant database paths. We already use
these advisory locks when building source distributions; now we use them
when unzipping wheels, storing metadata, etc.
Closes#11002.
## Test Plan
I ran the following script:
```shell
# Define the cache directory path
$cacheDir = "C:\Users\crmar\workspace\uv\cache"
# Clear the cache directory if it exists
if (Test-Path $cacheDir) {
Remove-Item -Recurse -Force $cacheDir
}
# Create the cache directory again
New-Item -ItemType Directory -Force -Path $cacheDir
# Define the command to run with --cache-dir flag
$command = {
param ($venvPath)
# Create a virtual environment in the specified path with --python
uv venv $venvPath
# Run the pip install command with --cache-dir flag
C:\Users\crmar\workspace\uv\target\profiling\uv.exe pip install flask==1.0.4 --no-binary flask --cache-dir C:\Users\crmar\workspace\uv\cache -v --python $venvPath
}
# Define the paths for the different virtual environments
$venv1 = "C:\Users\crmar\workspace\uv\venv1"
$venv2 = "C:\Users\crmar\workspace\uv\venv2"
$venv3 = "C:\Users\crmar\workspace\uv\venv3"
$venv4 = "C:\Users\crmar\workspace\uv\venv4"
$venv5 = "C:\Users\crmar\workspace\uv\venv5"
# Start the command in parallel five times using Start-Job, each with a different venv
$job1 = Start-Job -ScriptBlock $command -ArgumentList $venv1
$job2 = Start-Job -ScriptBlock $command -ArgumentList $venv2
$job3 = Start-Job -ScriptBlock $command -ArgumentList $venv3
$job4 = Start-Job -ScriptBlock $command -ArgumentList $venv4
$job5 = Start-Job -ScriptBlock $command -ArgumentList $venv5
# Wait for all jobs to complete
$jobs = @($job1, $job2, $job3, $job4, $job5)
$jobs | ForEach-Object { Wait-Job $_ }
# Retrieve the results (optional)
$jobs | ForEach-Object { Receive-Job -Job $_ }
# Clean up the jobs
$jobs | ForEach-Object { Remove-Job -Job $_ }
```
And ensured it succeeded in five straight invocations (whereas on
`main`, it consistently fails with a variety of different traces).
## Summary
This PR extends the thinking in #10525 to platform tags, and then uses
the structured tag enums everywhere, rather than passing around strings.
I think this is a big improvement! It means we're no longer doing ad hoc
tag parsing all over the place.
## Summary
A lot of good new lints, and most importantly, error stabilizations. I
tried to find a few usages of the new stabilizations, but I'm sure there
are more.
IIUC, this _does_ require bumping our MSRV.
Refs:
- #8626
## Summary
Current documentation incorrectly suggests that the macOS cache
directory location is `$HOME/Library/Caches/uv`, but that changed in:
- #5806
Updates docs to say this instead:
> <p>Defaults to <code>$HOME/.cache/uv</code> on macOS,
<code>$XDG_CACHE_HOME/uv</code> or <code>$HOME/.cache/uv</code> on
Linux, and <code>%LOCALAPPDATA%\uv\cache</code> on Windows. The <code>uv
cache dir</code> command will show the location of the cache
directory.</p>
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
Going forward, we're going to provide better versioning guarantees
around using the same cache across multiple uv versions, so this PR
updates the docs to reflect that. It also bumps the `sdists-` version to
fix the inconvenience demonstrated in
https://github.com/astral-sh/uv/issues/8367.
Closes https://github.com/astral-sh/uv/issues/8367.
When building a source distribution to a wheels, we perform the build
inside a temporary directory inside the output directory. By default,
the output directory is `dist/` in the repository root. This temp dir
placement allows us to move the final wheel to the output directory
instead of copying it (a temp dir might be on another device, which
means we need to copy instead of moving).
Some build backends such as hatchling traverse upwards from the current
directory (the source dist build location) looking for gitignore files
to consider. By adding a gitignore in `dist/` with `*`, we caused
hatchling to ignore all files in our temporary build directory below it,
causing empty wheels. To prevent this, we add a `.git` file as a phony
git root. We are already using this trick successfully in the cache.
Hatchling sees this `.git` file, considers it a boundary and does not
traverse up to `dist/.gitignore`.
Fixes#8200
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
## Summary
This PR declares and documents all environment variables that are used
in one way or another in `uv`, either internally, or externally, or
transitively under a common struct.
I think over time as uv has grown there's been many environment
variables introduced. Its harder to know which ones exists, which ones
are missing, what they're used for, or where are they used across the
code. The docs only documents a handful of them, for others you'd have
to dive into the code and inspect across crates to know which crates
they're used on or where they're relevant.
This PR is a starting attempt to unify them, make it easier to discover
which ones we have, and maybe unlock future posibilities in automating
generating documentation for them.
I think we can split out into multiple structs later to better organize,
but given the high influx of PR's and possibly new environment variables
introduced/re-used, it would be hard to try to organize them all now
into their proper namespaced struct while this is all happening given
merge conflicts and/or keeping up to date.
I don't think this has any impact on performance as they all should
still be inlined, although it may affect local build times on changes to
the environment vars as more crates would likely need a rebuild. Lastly,
some of them are declared but not used in the code, for example those in
`build.rs`. I left them declared because I still think it's useful to at
least have a reference.
Did I miss any? Are their initial docs cohesive?
Note, `uv-static` is a terrible name for a new crate, thoughts? Others
considered `uv-vars`, `uv-consts`.
## Test Plan
Existing tests
As per
https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html
Before that, there were 91 separate integration tests binary.
(As discussed on Discord — I've done the `uv` crate, there's still a few
more commits coming before this is mergeable, and I want to see how it
performs in CI and locally).
## Summary
Random, but I noticed that we can remove a ton of serialize and
deserialize derives by using `rkyv` for the flat-index caches. (We
already use `rkyv` for these same structs in the registry cache.)
## Summary
`uv cache prune --ci` will remove the source distribution directory. If
we then need to build a _different_ wheel (e.g., you're building a
package that has Python minor version-specific wheels), we fail, because
we expect the source to be there.
Now, if the source is missing, we re-download it. It would be slightly
easier to just _ignore_ that revision, but that would mean we'd also
lose the already-built wheels -- so if you ran against many Python
versions, we'd continuously lose the cached data.
Closes https://github.com/astral-sh/uv/issues/7543.
## Test Plan
We can add tests, but they _need_ to build non-pure Python wheels, which
tends to be expensive...
For reference:
```console
$ cargo run venv --python 3.12
$ cargo run pip install mercurial==6.8.1 --verbose
$ cargo run cache prune --ci
$ cargo run venv --python 3.11
$ cargo run pip install mercurial==6.8.1 --verbose
```
I also did this with a local `.tar.gz` that I downloaded from PyPI.
## Summary
Both of these can contain rkyv data in their HTTP cache envelopes. As
such, the entries aren't readable by earlier versions of uv, and `uv
cache prune` can break. I should make `uv cache prune` robust to this,
but this feels safest.
Recently, rkyv 0.8 was released. Its API is a fair bit simpler now for
higher level uses (like for us in `uv`) and results in us being able to
delete a fair bit of code. This also removes our last dependency on `syn
1.0`, and thus drops that dependency.
Performance (via testing on the `transformers` example) seems to remain
about the same, which is what was expected:
```
$ hyperfine -w5 -r100 'uv lock' 'uv-ag-rkyv-update lock'
Benchmark 1: uv lock
Time (mean ± σ): 55.6 ms ± 6.4 ms [User: 30.4 ms, System: 35.1 ms]
Range (min … max): 43.0 ms … 73.1 ms 100 runs
Benchmark 2: uv-ag-rkyv-update lock
Time (mean ± σ): 56.5 ms ± 7.2 ms [User: 30.5 ms, System: 36.3 ms]
Range (min … max): 39.1 ms … 71.5 ms 100 runs
Summary
uv lock ran
1.02 ± 0.18 times faster than uv-ag-rkyv-update lock
```
Closes#7415
## Summary
It's very unlikely that retaining these is beneficial, since you tend to
partition the cache by platform anyway.
Closes https://github.com/astral-sh/uv/issues/7394.
## Summary
This PR adds a more flexible cache invalidation abstraction for uv, and
uses that new abstraction to improve support for dynamic metadata.
Specifically, instead of relying solely on a timestamp, we now pass
around a `CacheInfo` struct which (as of now) contains
`Option<Timestamp>` and `Option<Commit>`. The `CacheInfo` is saved in
`dist-info` as `uv_cache.json`, so we can test already-installed
distributions for cache validity (along with testing _cached_
distributions for cache validity).
Beyond the defaults (`pyproject.toml`, `setup.py`, and `setup.cfg`
changes), users can also specify additional cache keys, and it's easy
for us to extend support in the future. Right now, cache keys can either
be instructions to include the current commit (for `setuptools_scm` and
similar) or file paths (for `hatch-requirements-txt` and similar):
```toml
[tool.uv]
cache-keys = [{ file = "requirements.txt" }, { git = true }]
```
This change should be fully backwards compatible.
Closes https://github.com/astral-sh/uv/issues/6964.
Closes https://github.com/astral-sh/uv/issues/6255.
Closes https://github.com/astral-sh/uv/issues/6860.
## Summary
This has bothered me for a while and should be fairly impactful for
users. It requires a weird implementation, since the
distribution-building crate depends on the cache, and so the prune
operation can't live in the cache, since it needs to access internals of
the distribution-building crate.
Closes https://github.com/astral-sh/uv/issues/7096.