Avoid cache invalidation on credentials renewal (#3010)

# Avoid cache invalidation on credentials renewal

Addresses

- https://github.com/astral-sh/uv/issues/3009#issue-2239221126

## Summary

Some private package registries (e.g. AWS CodeArtifact) use short-lived
credentials. Since they are short-lived, the exact URL that is assigned
to `UV_INDEX_URL` changes frequently and with that the cache key /
hashes of these URLs. This causes the cache to be missed on token
renewal.

This PR attempts to fix this by hashing URLs for cache keys without
their user credentials.

## Test Plan

I added a test that validates that:
1. Changing user credentials returns the same hash
2. Setting no user credentials yields the same as some user credentials

## Question
I'm not sure if we should also change the `hash` implementation of
`CanonicalUrl` / `RepositoryUrl`. They also run `.hash` within.

PS. this is the first time I'm writing `rust` so if I'm wasting your
precious time, let me know and I'll try to up my skills before I ask
again. Anyway, I figured it's good to get this issue on your radar :)
This commit is contained in:
Jos van de Wolfshaar 2024-04-14 01:38:24 +02:00 committed by GitHub
parent 8f483c048f
commit 3103180ce5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 60 additions and 1 deletions

View file

@ -26,6 +26,15 @@ impl CanonicalUrl {
return Self(url);
}
// If the URL has no host, then it's not a valid URL anyway.
if !url.has_host() {
return Self(url);
}
// Strip credentials.
url.set_password(None).unwrap();
url.set_username("").unwrap();
// Strip a trailing slash.
if url.path().ends_with('/') {
url.path_segments_mut().unwrap().pop_if_empty();
@ -175,6 +184,56 @@ impl std::fmt::Display for RepositoryUrl {
mod tests {
use super::*;
#[test]
fn user_credential_does_not_affect_cache_key() -> Result<(), url::ParseError> {
let mut hasher = CacheKeyHasher::new();
CanonicalUrl::parse("https://example.com/pypa/sample-namespace-packages.git@2.0.0")?
.cache_key(&mut hasher);
let hash_without_creds = hasher.finish();
let mut hasher = CacheKeyHasher::new();
CanonicalUrl::parse(
"https://user:foo@example.com/pypa/sample-namespace-packages.git@2.0.0",
)?
.cache_key(&mut hasher);
let hash_with_creds = hasher.finish();
assert_eq!(
hash_without_creds, hash_with_creds,
"URLs with no user credentials should hash the same as URLs with different user credentials",
);
let mut hasher = CacheKeyHasher::new();
CanonicalUrl::parse(
"https://user:bar@example.com/pypa/sample-namespace-packages.git@2.0.0",
)?
.cache_key(&mut hasher);
let hash_with_creds = hasher.finish();
assert_eq!(
hash_without_creds, hash_with_creds,
"URLs with different user credentials should hash the same",
);
let mut hasher = CacheKeyHasher::new();
CanonicalUrl::parse("https://:bar@example.com/pypa/sample-namespace-packages.git@2.0.0")?
.cache_key(&mut hasher);
let hash_with_creds = hasher.finish();
assert_eq!(
hash_without_creds, hash_with_creds,
"URLs with no username, though with a password, should hash the same as URLs with different user credentials",
);
let mut hasher = CacheKeyHasher::new();
CanonicalUrl::parse("https://user:@example.com/pypa/sample-namespace-packages.git@2.0.0")?
.cache_key(&mut hasher);
let hash_with_creds = hasher.finish();
assert_eq!(
hash_without_creds, hash_with_creds,
"URLs with no password, though with a username, should hash the same as URLs with different user credentials",
);
Ok(())
}
#[test]
fn canonical_url() -> Result<(), url::ParseError> {
// Two URLs should be considered equal regardless of the `.git` suffix.