Commit graph

11 commits

Author SHA1 Message Date
Charlie Marsh
de40f798b9
Cache tool environments in uv tool run (#4784)
## Summary

The basic strategy:

- When the user does `uv tool run`, we resolve the `from` and `with`
requirements (always).
- After resolving, we generate a hash of the requirements. For now, I'm
just converting to a lockfile and hashing _that_, but that's an
implementation detail.
- Once we have a hash, we _also_ hash the interpreter.
- We then store environments in
`${CACHE_DIR}/${INTERPRETER_HASH}/${RESOLUTION_HASH}`.

Some consequences:

- We cache based on the interpreter, so if you request a different
Python, we'll create a new environment (even if they're compatible).
This has the nice side-effect of ensuring that we don't use environments
for interpreters that were later deleted.
- We cache the `from` and `with` together. In practice, we may want to
cache them separately, then layer them? But this is also an
implementation detail that we could change later.
- Because we use the lockfile as the cache key, we will invalidate the
cache when the format changes. That seems ok, but we could improve it in
the future by generating a stable hash from a lockfile that's
independent of the schema.

Closes https://github.com/astral-sh/uv/issues/4752.
2024-07-03 19:25:39 -04:00
Charlie Marsh
09f55482a0
Remove some unused pub use exports (#3930) 2024-05-30 22:26:52 -04:00
konsti
cedb8259f7
Avoid panic for file url (#3306)
When using find links with a file url, we shouldn't panic because we
can't remove username/password for a host-less url.

See #3262
2024-04-29 16:39:16 +02:00
Jos van de Wolfshaar
3103180ce5
Avoid cache invalidation on credentials renewal (#3010)
# Avoid cache invalidation on credentials renewal

Addresses

- https://github.com/astral-sh/uv/issues/3009#issue-2239221126

## Summary

Some private package registries (e.g. AWS CodeArtifact) use short-lived
credentials. Since they are short-lived, the exact URL that is assigned
to `UV_INDEX_URL` changes frequently and with that the cache key /
hashes of these URLs. This causes the cache to be missed on token
renewal.

This PR attempts to fix this by hashing URLs for cache keys without
their user credentials.

## Test Plan

I added a test that validates that:
1. Changing user credentials returns the same hash
2. Setting no user credentials yields the same as some user credentials

## Question
I'm not sure if we should also change the `hash` implementation of
`CanonicalUrl` / `RepositoryUrl`. They also run `.hash` within.

PS. this is the first time I'm writing `rust` so if I'm wasting your
precious time, let me know and I'll try to up my skills before I ask
again. Anyway, I figured it's good to get this issue on your radar :)
2024-04-13 23:38:24 +00:00
Charlie Marsh
684f790d5d
Preserve .git suffixes and casing in Git dependencies (#2789)
## Summary

I noticed in #2769 that I was now stripping `.git` suffixes from Git
URLs after resolving to a precise commit. This PR cleans up the internal
caching to use a better canonical representation: a `RepositoryUrl`
along with a `GitReference`, instead of a `GitUrl` which can contain
non-canonical data. This gives us both better fidelity (preserving the
`.git`, along with any casing that the user provided when defining the
URL) and is overall cleaner and more robust.
2024-04-03 00:24:29 +00:00
Charlie Marsh
c30a65ee0c
Allow conflicting Git URLs that refer to the same commit SHA (#2769)
## Summary

This PR leverages our lookahead direct URL resolution to significantly
improve the range of Git URLs that we can accept (e.g., if a user
provides the same requirement, once as a direct dependency, and once as
a tag). We did some of this in #2285, but the solution here is more
general and works for arbitrary transitive URLs.

Closes https://github.com/astral-sh/uv/issues/2614.
2024-04-02 23:36:35 +00:00
Charlie Marsh
2fb8df3769
Avoid panicking on cannot-be-a-base URLs (#2461)
`path_segments_mut` returns an `Err` for cannot-be-a-base URLs. These
won't be valid when we try to fetch them anyway, but we need to avoid a
panic.

Closes https://github.com/astral-sh/uv/issues/2460.
2024-03-14 17:47:16 +00:00
danieleades
8d721830db
Clippy pedantic (#1963)
Address a few pedantic lints

lints are separated into separate commits so they can be reviewed
individually.

I've not added enforcement for any of these lints, but that could be
added if desirable.
2024-02-25 14:04:05 -05:00
Zanie Blue
d07b587f3f
Retain passwords in Git URLs (#1717)
Fixes handling of GitHub PATs in HTTPS URLs, which were otherwise
dropped. We now supporting the following authentication schemes:

```
git+https://<user>:<token>/...
git+https://<token>/...
```

On Windows, the username is required. We can consider adding a
special-case for this in the future, but this just matches libgit2's
behavior.

I tested with fine-grained tokens, OAuth tokens, and "classic" tokens.
There's test coverage for fine-grained tokens in CI where we use a real
private repository and PAT. Yes, the PAT is committed to make this test
usable by anyone. It has read-only permissions to the single repository,
expires Feb 1 2025, and is in an isolated organization and GitHub
account.

Does not yet address SSH authentication.

Related:
- https://github.com/astral-sh/uv/issues/1514
- https://github.com/astral-sh/uv/issues/1452
2024-02-21 00:12:56 +00:00
Charlie Marsh
a0420114c3
Avoid storing absolute URLs for files (#944)
## Summary

It turns out that storing an absolute URL for every file caused a
significant performance regression. This PR attempts to address the
regression with two changes.

The first is that we now store the raw string if the URL is an absolute
URL. If the URL is relative, we store the base URL alongside the raw
relative string. As such, we avoid serializing and deserializing URLs
until we need them (later on), except for the base URL.

The second is that we now use the internal `Url` crate methods for
serializing and deserializing. If you look inside `Url`, its standard
serializer and deserialization actually convert it to a string, then
parse the string. But the crate exposes some other methods for faster
serialization and deserialization (with fewer guarantees). I think this
is totally fine since the cache is entirely internal.

If we _just_ change the `Url` serialization (and no other code -- so
continue to store URLs for every file), then the regression goes down to
about 5%:

```shell
❯ python -m scripts.bench \
        --puffin-path ./target/release/main \
        --puffin-path ./target/release/relative --puffin-path ./target/release/puffin \
        scripts/requirements/home-assistant.in --benchmark resolve-warm
Benchmark 1: ./target/release/main (resolve-warm)
  Time (mean ± σ):     496.3 ms ±   4.3 ms    [User: 452.4 ms, System: 175.5 ms]
  Range (min … max):   487.3 ms … 502.4 ms    10 runs

Benchmark 2: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     284.8 ms ±   2.1 ms    [User: 245.8 ms, System: 165.6 ms]
  Range (min … max):   280.3 ms … 288.0 ms    10 runs

Benchmark 3: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     300.4 ms ±   3.2 ms    [User: 255.5 ms, System: 178.1 ms]
  Range (min … max):   295.4 ms … 305.1 ms    10 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)'
    1.74 ± 0.02 times faster than './target/release/main (resolve-warm)'
```

So I considered _just_ making that change. But 5% is kind of
borderline...

With both of these changes, the regression is down to 1-2%:

```
Benchmark 1: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     282.6 ms ±   7.4 ms    [User: 244.6 ms, System: 181.3 ms]
  Range (min … max):   275.1 ms … 318.5 ms    30 runs

Benchmark 2: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     286.8 ms ±   2.2 ms    [User: 247.0 ms, System: 169.1 ms]
  Range (min … max):   282.3 ms … 290.7 ms    30 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)'
```

It's consistently ~2%-ish, but at this point it's unclear if that's due
to the URL change or something other change between now and then.

Closes #943.
2024-01-17 09:15:21 -05:00
Charlie Marsh
6ff21374dc
Split puffin-cache into Puffin-specific and generic utilities (#728)
This crate started off as generic caching utilities, but we started
adding a lot of Puffin-specific stuff (like the cache buckets
abstraction that knows about Git vs. direct URL vs. indexes and so on).
This PR moves the generic stuff into a new `cache-key` crate.
2023-12-25 14:38:56 +00:00