uv/crates/uv-cache-info
Ivan Smirnov 567468ce72
Some checks are pending
CI / Determine changes (push) Waiting to run
CI / lint (push) Waiting to run
CI / cargo clippy | ubuntu (push) Blocked by required conditions
CI / cargo clippy | windows (push) Blocked by required conditions
CI / cargo dev generate-all (push) Blocked by required conditions
CI / cargo shear (push) Waiting to run
CI / cargo test | ubuntu (push) Blocked by required conditions
CI / cargo test | macos (push) Blocked by required conditions
CI / cargo test | windows (push) Blocked by required conditions
CI / check windows trampoline | aarch64 (push) Blocked by required conditions
CI / smoke test | linux aarch64 (push) Blocked by required conditions
CI / check windows trampoline | i686 (push) Blocked by required conditions
CI / check windows trampoline | x86_64 (push) Blocked by required conditions
CI / test windows trampoline | i686 (push) Blocked by required conditions
CI / test windows trampoline | x86_64 (push) Blocked by required conditions
CI / typos (push) Waiting to run
CI / check system | alpine (push) Blocked by required conditions
CI / mkdocs (push) Waiting to run
CI / build binary | linux libc (push) Blocked by required conditions
CI / build binary | linux aarch64 (push) Blocked by required conditions
CI / build binary | linux musl (push) Blocked by required conditions
CI / build binary | macos aarch64 (push) Blocked by required conditions
CI / build binary | macos x86_64 (push) Blocked by required conditions
CI / build binary | windows x86_64 (push) Blocked by required conditions
CI / build binary | windows aarch64 (push) Blocked by required conditions
CI / cargo build (msrv) (push) Blocked by required conditions
CI / integration test | pypy on windows (push) Blocked by required conditions
CI / integration test | graalpy on ubuntu (push) Blocked by required conditions
CI / integration test | graalpy on windows (push) Blocked by required conditions
CI / integration test | pyodide on ubuntu (push) Blocked by required conditions
CI / integration test | github actions (push) Blocked by required conditions
CI / integration test | free-threaded python on github actions (push) Blocked by required conditions
CI / integration test | determine publish changes (push) Blocked by required conditions
CI / integration test | registries (push) Blocked by required conditions
CI / integration test | uv publish (push) Blocked by required conditions
CI / integration test | uv_build (push) Blocked by required conditions
CI / check cache | ubuntu (push) Blocked by required conditions
CI / check cache | macos aarch64 (push) Blocked by required conditions
CI / check system | python on debian (push) Blocked by required conditions
CI / check system | python on fedora (push) Blocked by required conditions
CI / build binary | freebsd (push) Blocked by required conditions
CI / ecosystem test | pydantic/pydantic-core (push) Blocked by required conditions
CI / ecosystem test | prefecthq/prefect (push) Blocked by required conditions
CI / ecosystem test | pallets/flask (push) Blocked by required conditions
CI / smoke test | linux (push) Blocked by required conditions
CI / smoke test | macos (push) Blocked by required conditions
CI / smoke test | windows x86_64 (push) Blocked by required conditions
CI / smoke test | windows aarch64 (push) Blocked by required conditions
CI / integration test | conda on ubuntu (push) Blocked by required conditions
CI / integration test | deadsnakes python3.9 on ubuntu (push) Blocked by required conditions
CI / integration test | free-threaded on windows (push) Blocked by required conditions
CI / integration test | aarch64 windows implicit (push) Blocked by required conditions
CI / integration test | aarch64 windows explicit (push) Blocked by required conditions
CI / integration test | pypy on ubuntu (push) Blocked by required conditions
CI / check system | python3.13 (push) Blocked by required conditions
CI / check system | python on ubuntu (push) Blocked by required conditions
CI / check system | python on rocky linux 8 (push) Blocked by required conditions
CI / check system | python on rocky linux 9 (push) Blocked by required conditions
CI / check system | graalpy on ubuntu (push) Blocked by required conditions
CI / check system | pypy on ubuntu (push) Blocked by required conditions
CI / check system | pyston (push) Blocked by required conditions
CI / check system | python on macos aarch64 (push) Blocked by required conditions
CI / check system | homebrew python on macos aarch64 (push) Blocked by required conditions
CI / check system | python on macos x86-64 (push) Blocked by required conditions
CI / check system | python3.10 on windows x86-64 (push) Blocked by required conditions
CI / check system | python3.10 on windows x86 (push) Blocked by required conditions
CI / check system | python3.13 on windows x86-64 (push) Blocked by required conditions
CI / check system | x86-64 python3.13 on windows aarch64 (push) Blocked by required conditions
CI / check system | aarch64 python3.13 on windows aarch64 (push) Blocked by required conditions
CI / check system | windows registry (push) Blocked by required conditions
CI / check system | python3.12 via chocolatey (push) Blocked by required conditions
CI / check system | python3.9 via pyenv (push) Blocked by required conditions
CI / check system | conda3.11 on macos aarch64 (push) Blocked by required conditions
CI / check system | conda3.8 on macos aarch64 (push) Blocked by required conditions
CI / check system | conda3.11 on linux x86-64 (push) Blocked by required conditions
CI / check system | conda3.8 on linux x86-64 (push) Blocked by required conditions
CI / check system | conda3.11 on windows x86-64 (push) Blocked by required conditions
CI / check system | conda3.8 on windows x86-64 (push) Blocked by required conditions
CI / check system | amazonlinux (push) Blocked by required conditions
CI / check system | embedded python3.10 on windows x86-64 (push) Blocked by required conditions
CI / benchmarks | walltime aarch64 linux (push) Blocked by required conditions
CI / benchmarks | instrumented (push) Blocked by required conditions
More efficient cache-key globbing + support parent paths in globs (#13469)
## Summary

(Related PR: #13438 - would be nice to have it merged as well since it
touches on the same globwalker code)

There's a few issues with `cache-key` globs, which this PR attempts to
address:

- As of the current state, parent or absolute paths are not allowed,
which is not obvious and is not documented. E.g., cache-key paths of the
form `{file = "../dep/**"}` will be essentially ignored.
- Absolute glob patterns also don't work (funnily enough, there's logic
in `globwalk` itself that attempts to address it in
[`globwalk::glob_builder()`](8973fa2bc5/src/lib.rs (L415)),
which serves as inspiration to some parts of this PR).
- The reason for parent paths being ignored is the way globwalker is
currently being triggered in `uv-cache-info`: the base directory is
being walked over completely and each entry is then being matched to one
of the provided match patterns.
- This may also end up being very inefficient if you have a huge root
folder with thousands of files: if your match patterns are `a/b/*.rs`
and `a/c/*.py` then instead of walking over the root directory, you can
just walk over `a/b` and `a/c` and match the relevant patterns there.
- Why supporting parent paths may be important to the point of being a
blocker: in large codebases with python projects depending on other
local non-python projects (e.g. rust crates), cache-keys can be very
useful to track dependency on the source code of the latter (e.g.
`cache-keys = [{ file = "../../crates/some-dep/**" }]`.
- TLDR: parent/absolute cache-key globs don't work, glob walk can be
slow.

## Solution

- In this PR, user-provided glob patterns are first clustered
(LCP-style) into pattern groups with longest common path prefix; each of
these groups can then be walked over separately.
- Pattern groups do not overlap, so we would never walk over the same
directory twice (unless there's symlinks pointing to same folders).
- Paths are not canonicalized nor virtually normalized (which is
impossible on Unix without FS access), so the method is symlink-safe
(i.e. we don't treat `a/b/..` as `a`) and should work fine with #13438.
- Because of LCP logic, the minimal amount of directory space will be
traversed to cover all patterns.
- Absolute glob patterns will now work.
- Parent-relative glob patterns will now work.
- Glob walking will be more efficient in some cases.

## Possible improvements

- Efficiency can be further greatly improved if we limit max depth for
globwalk. Currently, a simple ".toml" will deep-traverse the whole
folder. Essentially, max depth can be always set to either N or
infinity. If a pattern at a pivot node contains `**`, we collect all
children nodes from the subtree into the same group and don't limit max
depth; otherwise, we set max depth to the length of the glob pattern.
This wouldn't change correctness though and can we done separately if
needed.
- If this is considered important enough, docs can be updated to
indicate that parent and absolute globs are supported (and symlinks are
resolved, if the relevant PR is algo merged in).

## Test Plan

- Glob splitting and clustering tests are included in the PR.
- Relative and absolute glob cache-keys were tested in an actual
codebase.
2025-07-11 10:01:54 -05:00
..
src More efficient cache-key globbing + support parent paths in globs (#13469) 2025-07-11 10:01:54 -05:00
Cargo.toml Fix git-tag cache-key reader in case of slashes (#10467) (#10500) 2025-01-11 21:30:46 -05:00