mirrors/uv - Forgejo: Beyond coding. We Forge.

mirrors/uv

mirror of https://github.com/astral-sh/uv.git synced 2025-10-28 02:40:11 +00:00

Author	SHA1	Message	Date
Zanie Blue	2586f655bb	Rename to `uv` (#1302 ) First, replace all usages in files in-place. I used my editor for this. If someone wants to add a one-liner that'd be fun. Then, update directory and file names: ``` # Run twice for nested directories find . -type d -print0 \| xargs -0 rename s/puffin/uv/g find . -type d -print0 \| xargs -0 rename s/puffin/uv/g # Update files find . -type f -print0 \| xargs -0 rename s/puffin/uv/g ``` Then add all the files again ``` # Add all the files again git add crates git add python/uv # This one needs a force-add git add -f crates/uv-trampoline ```	2024-02-15 11:19:46 -06:00
Charlie Marsh	16bb80132f	Add an `--offline` mode (#1270 ) ## Summary This PR adds an `--offline` flag to Puffin that disables network requests (implemented as a Reqwest middleware on our registry client). When `--offline` is provided, we also allow the HTTP cache to return stale data. Closes #942.	2024-02-13 03:35:23 +00:00
Zanie Blue	a37b08808e	Implement pip compatible `--no-binary` and `--only-binary` options (#1268 ) Updates our `--no-binary` option and adds a `--only-binary` option for compatibility with `pip` which uses `:all:`, `:none:` and `<name>` for specifying packages. This required adding support for `--only-binary <name>` into our resolver, previously it was only a boolean toggle. Retains`--no-build` which is equivalent to `--only-binary :all:`. This is common enough for safety that I would prefer it is available without pip's awkward `:all:` syntax. --------- Co-authored-by: konsti <konstin@mailbox.org>	2024-02-11 19:31:41 -06:00
konsti	ab45485eb5	Reduce stack sizes further and ignore remaining tests (#1261 ) This PR reduces the stack sizes a windows a little further using the stack traces from stack overflows combined with looking at the type sizes. Ultimately, it ignore the three remaining tests failing in debug on windows due to stack overflows to unblock `cargo test` for windows on CI. 444 tests run: 444 passed (39 slow), 1 skipped	2024-02-06 23:08:18 +01:00
Andrew Gallant	d4b4c21133	initial implementation of zero-copy deserialization for SimpleMetadata (#1249 ) (Please review this PR commit by commit.) This PR closes an initial loop on zero-copy deserialization. That is, provides a way to get a `Archived<SimpleMetadata>` (spelled `OwnedArchive<SimpleMetadata>` in the code) from a `CachedClient`. The main benefit of zero-copy deserialization is that we can read bytes from a file, cast those bytes to a structured representation without cost, and then start using that type as any other Rust type. The "catch" is that the structured representation is not the actual type you started with, but the "archived" version of it. In order to make all this work, we ended up needing to shave a rather large yak: we had to re-implement HTTP cache semantics. Previously, we were using the `http-cache-semantics` crate. While it does support Serde, it doesn't support `rkyv`. Moreover, even simple support for `rkyv` wouldn't be enough. What we actually want is for the HTTP cache semantics to be implemented on the archived type so that we can decide whether our cached response is stale or not without needing to do a full deserialization into the unarchived type. This is why, in this PR, you'll see `impl ArchivedCachePolicy { ... }` instead of `impl CachePolicy { ... }`. (The `derive(rkyv::Archive)` macro automatically introduces the `ArchivedCachePolicy` type into the current namespace.) Unfortunately, this PR does not fully realize the dream that is zero-copy deserialization. Namely, while a `CachedClient` can now provide an `OwnedArchive<SimpleMetadata>`, the rest of our code doesn't really make use of it. Indeed, as soon as we go to build a `VersionMap`, we eagerly convert our archived metadata into an owned `SimpleMetadata` via deserialization (that isn't zero-copy). After this change, a lot of the work now shifts to `rkyv` deserialization and `VersionMap` construction. More precisely, the main thing we drop here is `CachePolicy` deserialization (which is now truly zero-copy) and the parsing of the MessagePack format for `SimpleMetadata`. But we are still paying for deserialization. We're just paying for it in a different place. This PR does seem to bring a speed-up, but it is somewhat underwhelming. My measurements have been pretty noisy, but I get a 1.1x speedup fairly often: ``` $ hyperfine -w5 "puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A kang Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null Time (mean ± σ): 164.4 ms ± 18.8 ms [User: 427.1 ms, System: 348.6 ms] Range (min … max): 131.1 ms … 190.5 ms 18 runs Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null Time (mean ± σ): 148.3 ms ± 10.2 ms [User: 357.1 ms, System: 319.4 ms] Range (min … max): 136.8 ms … 184.4 ms 19 runs Summary puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran 1.11 ± 0.15 times faster than puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ``` One downside is that this does increase cache size (`rkyv`'s serialization format is not as compact as MessagePack). On disk size increases by about 1.8x for our `simple-v0` cache. ``` $ sort-filesize cache-main 4.0K cache-main/CACHEDIR.TAG 4.0K cache-main/.gitignore 8.0K cache-main/interpreter-v0 8.7M cache-main/wheels-v0 18M cache-main/archive-v0 59M cache-main/simple-v0 109M cache-main/built-wheels-v0 193M cache-main 193M total $ sort-filesize cache-test 4.0K cache-test/CACHEDIR.TAG 4.0K cache-test/.gitignore 8.0K cache-test/interpreter-v0 8.7M cache-test/wheels-v0 18M cache-test/archive-v0 107M cache-test/simple-v0 109M cache-test/built-wheels-v0 242M cache-test 242M total ``` Also, while I initially intended to do a simplistic implementation of HTTP cache semantics, I found that everything was somewhat inter-connected. I could have wrote code that _specifically_ only worked with the present behavior of PyPI, but then it would need to be special cased and everything else would need to continue to use `http-cache-sematics`. By implementing what we need based on what Puffin actually is (which is still less than what `http-cache-semantics` does), we can avoid special casing and use zero-copy deserialization for our cache policy in _all_ cases.	2024-02-05 16:47:53 -05:00
Charlie Marsh	8cbe1d220c	Remove double-download for source distributions (#1218 ) ## Summary Oops -- this was using a different cache key than the route above (this is the wheel _metadata_ route vs. the wheel build route), so we were saving and building source distributions twice in `pip install`.	2024-02-01 04:41:29 +00:00
Charlie Marsh	d88ce76979	Stream unpacking of source distribution downloads (#1157 ) This PR migrates our source distribution downloads to unzip as we stream, similar to our approach for wheels. In my testing, this showed a consistent speedup (e.g., 6% here for a few representative source distributions): ```text ❯ python -m scripts.bench --puffin-path ./target/release/main --puffin-path ./target/release/puffin --benchmark install-cold requirements.in Benchmark 1: ./target/release/main (install-cold) Time (mean ± σ): 1.503 s ± 0.039 s [User: 1.479 s, System: 0.537 s] Range (min … max): 1.466 s … 1.605 s 10 runs Benchmark 2: ./target/release/puffin (install-cold) Time (mean ± σ): 1.421 s ± 0.024 s [User: 1.505 s, System: 0.593 s] Range (min … max): 1.381 s … 1.454 s 10 runs Summary './target/release/puffin (install-cold)' ran 1.06 ± 0.03 times faster than './target/release/main (install-cold)' ```	2024-01-28 20:09:24 -05:00
Andrew Gallant	5219d37250	add initial rkyv support (#1135 ) This PR adds initial support for [rkyv] to puffin. In particular, the main aim here is to make puffin-client's `SimpleMetadata` type possible to deserialize from a `&[u8]` without doing any copies. This PR stops short of actuallying doing that zero-copy deserialization. Instead, this PR is about adding the necessary trait impls to a variety of types, along with a smattering of small refactorings to make rkyv possible to use. For those unfamiliar, rkyv works via the interplay of three traits: `Archive`, `Serialize` and `Deserialize`. The usual flow of things is this: * Make a type `T` implement `Archive`, `Serialize` and `Deserialize`. rkyv helpfully provides `derive` macros to make this pretty painless in most cases. * The process of implementing `Archive` for `T` usually creates an entirely new distinct type within the same namespace. One can refer to this type without naming it explicitly via `Archived<T>` (where `Archived` is a clever type alias defined by rkyv). * Serialization happens from `T` to (conceptually) a `Vec<u8>`. The serialization format is specifically designed to reflect the in-memory layout of `Archived<T>`. Notably, not `T`. But `Archived<T>`. * One can then get an `Archived<T>` with no copying (albeit, we will likely need to incur some cost for validation) from the previously created `&[u8]`. This is quite literally [implemented as a pointer cast][rkyv-ptr-cast]. * The problem with an `Archived<T>` is that it isn't your `T`. It's something else. And while there is limited interoperability between a `T` and an `Archived<T>`, the main issue is that the surrounding code generally demands a `T` and not an `Archived<T>`. This is at the heart of the tension for introducing zero-copy deserialization, and this is mostly an intrinsic problem to the technique and not an rkyv-specific issue. For this reason, given an `Archived<T>`, one can get a `T` back via an explicit deserialization step. This step is like any other kind of deserialization, although generally faster since no real "parsing" is required. But it will allocate and create all necessary objects. This PR largely proceeds by deriving the three aforementioned traits for `SimpleMetadata`. And, of course, all of its type dependencies. But we stop there for now. The main issue with carrying this work forward so that rkyv is actually used to deserialize a `SimpleMetadata` is figuring out how to deal with `DataWithCachePolicy` inside of the cached client. Ideally, this type would itself have rkyv support, but adding it is difficult. The main difficulty lay in the fact that its `CachePolicy` type is opaque, not easily constructable and is internally the tip of the iceberg of a rat's nest of types found in more crates such as `http`. While one "dumb"-but-annoying approach would be to fork both of those crates and add rkyv trait impls to all necessary types, it is my belief that this is the wrong approach. What we'd like to do is not just use rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to get an `Archived<DataWithCachePolicy>` and make actual decisions used the archived type directly. Doing that will require some work to make `Archived<DataWithCachePolicy>` directly useful. My suspicion is that, after doing the above, we may want to mush forward with a similar approach for `SimpleMetadata`. That is, we want `Archived<SimpleMetadata>` to be as useful as possible. But right now, the structure of the code demands an eager conversion (and thus deserialization) into a `SimpleMetadata` and then into a `VersionMap`. Getting rid of that eagerness is, I think, the next step after dealing with `DataWithCachePolicy` to unlock bigger wins here. There are many commits in this PR, but most are tiny. I still encourage review to happen commit-by-commit. [rkyv]: https://rkyv.org/ [rkyv-ptr-cast]: https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68	2024-01-28 12:14:59 -05:00
Charlie Marsh	a25a1f2958	Avoid re-creating directories in async unzip (#1155 ) This PR extends the optimizations from #1154 to other unzip paths.	2024-01-28 14:30:38 +00:00
Charlie Marsh	67b41427cc	Store source distribution directly in the cache (#1116 ) I want to move towards using the archive bucket exclusively for wheels. We never overwrite source distributions, so there's no need to symlink them.	2024-01-25 20:52:31 -05:00
Charlie Marsh	f36c167982	Use a consolidated error for distribution failures (#1104 ) ## Summary Use a single error type in `puffin_distribution`, rather than two confusingly similar types between `DistributionDatabase` and the source distribution module. Also removes the `#[from]` for IO errors and replaces with explicit wrapping, which is verbose but removes a bunch of incorrect error messages.	2024-01-25 14:49:11 -05:00
Andrew Gallant	067acfe79e	puffin-client: rejigger error type (#1102 ) This PR changes the error type to be boxed internally so that it uses less size on the stack. This makes functions returning `Result<T, Error>`, in particular, return something much smaller. The specific thing that motivated this was Clippy lints firing when I tried to refactor code in this crate. I chose to achieve boxing by splitting the enum out into a separate type, and then wiring up the necessary `From` impl to make error conversions easy, and then making `Error` itself opaque. We could expose the `Box`, but there isn't a ton of benefit in doing so because one cannot pattern match through a `Box`. This required using more explicit error conversions in several places. And as a result, I was able to remove all `#[from]` attributes on non-transparent error variants.	2024-01-25 13:13:21 -05:00
Charlie Marsh	904db967af	Use junctions instead of symlinks on Windows (#1087 ) ## Summary When we unzip wheels in the cache, we write the directories out to an `archive-v0` bucket, and then symlink into that bucket from the `wheels-v0` and `built-wheels-v0` buckets. On Windows, symlinks are not well supported. Specifically, they need to be explicitly enabled by the user. So, instead of symlinks, we now use junctions, which are well-supported on Windows, and allow you to (effectively) symlink a directory to another directory. This PR implements said junction support, which gets the core installer working on Windows. In the past, we also used symlinks to implement another primitive: we wanted to be able to replace a directory "atomically" (I put "atomically" in quotes because I don't know if it's actually a guaranteed atomic operation), in case someone was trying to use the directory while we were replacing it (as opposed to deleting the directory, then moving it into place). On Windows, it doesn't appear to be possible to atomically replace a junction. So instead, I'm using a new design, whereby the cache always returns canonicalized paths. We know these canonicalized paths are unique and won't be replaced, so they're safe for writers to rely on. In general, when we write new data to the cache, we now return the canonicalized path. When we read from the cache, and try to identify (e.g.) the set of wheels available to us, we canonicalize the links immediately and consider them non-existent if that operation fails. Closes #1085. --------- Co-authored-by: konstin <konstin@mailbox.org>	2024-01-25 10:06:38 +01:00
Charlie Marsh	cedd2e0b3f	Use a buffered reader for wheel metadata (#1082 ) ## Summary It turns out this is significantly faster when reading (e.g.) _just_ the `METADATA` file from a zipped wheel. I audited other `File::open` usages, and everything else seems to be using a buffered reader already (directly, or in whatever third-party crate it's passed to) _or_ is read immediately in full. See the criterion benchmark: ``` file_reader/numpy-1.26.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl time: [6.9618 ms 6.9664 ms 6.9713 ms] Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild file_reader/flask-3.0.1-py3-none-any.whl time: [237.50 µs 238.25 µs 239.13 µs] Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe buffered_reader/numpy-1.26.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl time: [648.92 µs 653.85 µs 660.09 µs] Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe buffered_reader/flask-3.0.1-py3-none-any.whl time: [39.578 µs 39.712 µs 39.869 µs] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe ```	2024-01-24 15:22:55 -05:00
Charlie Marsh	738e8341e2	Use a consistent `Timestamp` struct (#1081 ) ## Summary This PR uses `ctime` consistently on Unix as a more conservative approach to change detection. It also ensures that our timestamp abstraction is entirely internal, so we can change the representation and logic easily across the codebase in the future.	2024-01-24 14:21:31 -05:00
Charlie Marsh	63f3434b21	Use nanoid instead of uuid (#1074 ) ## Summary Gives us equivalent randomness with ~half as many characters.	2024-01-24 05:05:14 +00:00
Charlie Marsh	1b3a3f4e80	Add `--refresh` behavior to the cache (#1057 ) ## Summary This PR is an alternative approach to #949 which should be much safer. As in #949, we add a `Refresh` policy to the cache. However, instead of deleting entries from the cache the first time we read them, we now check if the entry is sufficiently new (created after the start of the command) if the refresh policy applies. If the entry is stale, then we avoid reading it and continue onward, relying on the cache to appropriately overwrite based on "new" data. (This relies on the preceding PRs, which ensure the cache is append-only, and ensure that we can atomically overwrite.) Unfortunately, there are just a lot of paths through the cache, and didn't data is handled with different policies, so I really had to go through and consider the "right" behavior for each case. For example, the HTTP requests can use `max-age=0, must-revalidate`. But for the routes that are based on filesystem modification, we need to do something slightly different. Closes #945.	2024-01-23 18:30:26 -05:00
Charlie Marsh	5621c414cf	Use symlinks for directories entries in cache (#1037 ) ## Summary One problem we have in the cache today is that we can't overwrite entries atomically, because we store unzipped _directories_ in the cache (which makes installation _much_ faster than storing zipped directories). So, if you ignore the existing contents of the cache when writing, you might run into an error, because you might attempt to write a directory where a directory already exists. This is especially annoying for cache refresh, because in order to refresh the cache, we have to purge it (i.e., delete a bunch of stuff), which is also highly unsafe if Puffin is running across multiple threads or multiple processes. The solution I'm proposing here is that whenever we persist a _directory_ to the cache, we persist it to a special "archive" bucket. Then, within the other buckets, directory entries are actually symlinks into that "archive" bucket. With symlinks, we can atomically replace, which means we can easily overwrite cache entries without having to delete from the cache. The main downside is that we'll now accumulate dangling entries in the "archive" bucket, and so we'll need to implement some form of garbage collection to ensure that we remove entries with no symlinks. Another downside is that cache reads and writes will be a bit slower, since we need to deal with creating and resolving these symlinks. As an example... after this change, the cache entry for this unzipped wheel is actually a symlink: ![Screenshot 2024-01-22 at 11 56 18 AM](`99ff6940`-5096-4246-8d16-2a7bdcdd8d4b) Then, within the archive directory, we actually have two unique entries (since I intentionally ran the command twice to ensure overwrites were safe): ![Screenshot 2024-01-22 at 11 56 22 AM](`717d04e2`-25d9-4225-b190-bad1441868c6)	2024-01-23 19:52:37 +00:00
Charlie Marsh	556080225d	Use ctime for interpreter timestamps (#1067 ) Per https://apenwarr.ca/log/20181113, `ctime` should be a lot more conservative, and should detect things like the issue we see with the python-build-standalone builds, where the `mtime` is identical across builds. On Windows, I'm just using `last_write_time`. But we should probably add `volume_serial_number` and other attributes via [`winapi_util`](https://docs.rs/winapi-util/latest/winapi_util/index.html).	2024-01-23 19:52:20 +00:00
Charlie Marsh	6561617c56	Store source distribution builds under a unique manifest ID (#1051 ) ## Summary This is a refactor of the source distribution cache that again aims to make the cache purely additive. Instead of deleting all built wheels when the cache gets invalidated (e.g., because the source distribution changed on PyPI or something), we now treat each invalidation as its own cache directory. The manifest inside of the source distribution directory now becomes a pointer to the "latest" version of the source distribution cache. Here's a visual example: ![Screenshot 2024-01-22 at 5 35 41 PM](`ca103c83`-e116-4956-b91c-8434fe62cffe) With this change, we avoid deleting built distributions that might be relied on elsewhere and maintain our invariant that the cache is purely additive. The cost is that we now preserve stale wheels, but we should add a garbage collection mechanism to deal with that.	2024-01-23 19:49:11 +00:00
Charlie Marsh	e32027e384	Avoid persisting manifest data in standalone file (#1044 ) ## Summary This PR gets rid of the manifest that we store for source distributions. Historically, that manifest included the source distribution metadata, plus a list of built wheels. The problem with the manifest is that it duplicates state, since we now have to look at both the manifest and the filesystem to understand the cache state. Instead, I think we should treat the cache as the source of truth, and get rid of the duplicated state in the manifest. Now, we store the manifest (which is merely used to check for cache freshness -- in future PRs, I will repurpose it though, so I left it around), then the distribution metadata as its own file, then any distributions in the same directory. When we want to see if there are any valid distributions, we `readdir` on the directory. This is also much more consistent with how the install plan works.	2024-01-23 19:46:48 +00:00
Charlie Marsh	81401a17e5	Use `archive_mtime` in another call site (#1056 ) _Not_ using this was an oversight.	2024-01-23 04:51:18 +00:00
Charlie Marsh	c8941d4799	Rename metadata.msgpack to manifest.msgpack (#1043 ) We store the `Manifest` at this path, so this name feels more appropriate.	2024-01-22 15:00:41 -05:00
Charlie Marsh	d9cc9dbf88	Improve error message when editable requirement doesn't exist (#1024 ) Making these a lot clearer in the common case by reducing the depth of the error.	2024-01-20 12:59:18 -05:00
konsti	47fc90d1b3	Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures (#1004 ) This is https://github.com/astral-sh/puffin/pull/947 again but this time merging into main instead of downstack, sorry for the noise. --- Windows has a default stack size of 1MB, which makes puffin often fail with stack overflows. The PR reduces stack size by three changes: * Boxing `File` in `Dist`, reducing the size from 496 to 240. * Boxing the largest futures. * Boxing `CachePolicy` ## Method Debugging happened on linux using https://github.com/astral-sh/puffin/pull/941 to limit the stack size to 1MB. Used ran the command below. ``` RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt ``` The main drawback is top-type-sizes not saying what the `__awaitee` is, so it requires manually looking up with a future with matching size. When the `brotli` features on `reqwest` is active, a lot of brotli types show up. Toggling this feature however seems to have no effect. I assume they are false positives since the `brotli` crate has elaborate control about allocation. The sizes are therefore shown with the feature off. ## Results The largest future goes from 12208B to 6416B, the largest type (`PrioritizedDistribution`, see also #948) from 17448B to 9264B. Full diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f For the second commit, i iteratively boxed the largest file until the tests passed, then with an 800KB stack limit looked through the backtrace of a failing test and added some more boxing. Quick benchmarking showed no difference: ```console $ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent Time (mean ± σ): 49.2 ms ± 3.0 ms [User: 39.8 ms, System: 24.0 ms] Range (min … max): 46.6 ms … 63.0 ms 55 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent Time (mean ± σ): 47.4 ms ± 3.2 ms [User: 41.3 ms, System: 20.6 ms] Range (min … max): 44.6 ms … 60.5 ms 62 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary target/profiling/puffin-dev resolve meine_stadt_transparent ran 1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent ```	2024-01-19 09:38:36 +00:00
Charlie Marsh	9b24fcd306	Remove verbatim URL from path file location (#998 ) ## Summary I got confused by why `VerbatimUrl` was on `Path`. Since it's directly computed from it, I think we should just compute it as-needed. I think it's also possibly-buggy because the URL is the URL of the _directory_, not the artifact itself, which differs from other distributions.	2024-01-18 22:40:48 -05:00
Charlie Marsh	f17bad0a75	Mark path-based cache entries as stale during install plan (#957 ) ## Summary This is a small correctness improvement that ensures that we avoid using stale cache entries for local dependencies in the install plan. We already have some logic like this in the source distribution builder, but it didn't apply in the install plan, and so we'd end up using stale wheels. Specifically, now, if you create a new local wheel, and run `pip sync`, we'll mark the cache entries as stale and make sure we unzip it and install it. (If the wheel is _already_ installed, we won't reinstall it though, which will be a separate change. This is just about reading from the cache, not the environment.)	2024-01-18 19:13:29 +00:00
Charlie Marsh	a0420114c3	Avoid storing absolute URLs for files (#944 ) ## Summary It turns out that storing an absolute URL for every file caused a significant performance regression. This PR attempts to address the regression with two changes. The first is that we now store the raw string if the URL is an absolute URL. If the URL is relative, we store the base URL alongside the raw relative string. As such, we avoid serializing and deserializing URLs until we need them (later on), except for the base URL. The second is that we now use the internal `Url` crate methods for serializing and deserializing. If you look inside `Url`, its standard serializer and deserialization actually convert it to a string, then parse the string. But the crate exposes some other methods for faster serialization and deserialization (with fewer guarantees). I think this is totally fine since the cache is entirely internal. If we _just_ change the `Url` serialization (and no other code -- so continue to store URLs for every file), then the regression goes down to about 5%: ```shell ❯ python -m scripts.bench \ --puffin-path ./target/release/main \ --puffin-path ./target/release/relative --puffin-path ./target/release/puffin \ scripts/requirements/home-assistant.in --benchmark resolve-warm Benchmark 1: ./target/release/main (resolve-warm) Time (mean ± σ): 496.3 ms ± 4.3 ms [User: 452.4 ms, System: 175.5 ms] Range (min … max): 487.3 ms … 502.4 ms 10 runs Benchmark 2: ./target/release/relative (resolve-warm) Time (mean ± σ): 284.8 ms ± 2.1 ms [User: 245.8 ms, System: 165.6 ms] Range (min … max): 280.3 ms … 288.0 ms 10 runs Benchmark 3: ./target/release/puffin (resolve-warm) Time (mean ± σ): 300.4 ms ± 3.2 ms [User: 255.5 ms, System: 178.1 ms] Range (min … max): 295.4 ms … 305.1 ms 10 runs Summary './target/release/relative (resolve-warm)' ran 1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)' 1.74 ± 0.02 times faster than './target/release/main (resolve-warm)' ``` So I considered _just_ making that change. But 5% is kind of borderline... With both of these changes, the regression is down to 1-2%: ``` Benchmark 1: ./target/release/relative (resolve-warm) Time (mean ± σ): 282.6 ms ± 7.4 ms [User: 244.6 ms, System: 181.3 ms] Range (min … max): 275.1 ms … 318.5 ms 30 runs Benchmark 2: ./target/release/puffin (resolve-warm) Time (mean ± σ): 286.8 ms ± 2.2 ms [User: 247.0 ms, System: 169.1 ms] Range (min … max): 282.3 ms … 290.7 ms 30 runs Summary './target/release/relative (resolve-warm)' ran 1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)' ``` It's consistently ~2%-ish, but at this point it's unclear if that's due to the URL change or something other change between now and then. Closes #943.	2024-01-17 09:15:21 -05:00
konsti	95f3cca28d	Use fs_err in more places (#926 ) Before: ``` error: Failed to download distributions Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89 Caused by: Directory not empty (os error 39) ``` After: ``` error: Failed to download distributions Caused by: Failed to fetch wheel: jaxlib==0.4.23+cuda12.cudnn89 Caused by: failed to rename file from /home/konsti/.cache/puffin/.tmpcG7tVP/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64.whl to /home/konsti/.cache/puffin/wheels-v0/index/9ff50b883297fa9d/jaxlib/jaxlib-0.4.23+cuda12.cudnn89-cp310-cp310-manylinux2014_x86_64 Caused by: Directory not empty (os error 39) ```	2024-01-15 09:39:33 +00:00
konsti	e9b6b6fa36	Implement `--find-links` as flat indexes (directories in pip-compile) (#912 ) Add directory `--find-links` support for local paths to pip-compile. It seems that pip joins all sources and then picks the best package. We explicitly give find links packages precedence if the same exists on an index and locally by prefilling the `VersionMap`, otherwise they are added as another index and the existing rules of precedence apply. Internally, the feature is called _flat index_, which is more meaningful than _find links_: We're not looking for links, we're picking up local directories, and (TBD) support another index format that's just a flat list of files instead of a nested index. `RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and `SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist` and `RegistrySourceDist` gained the ability to represent both a url and a path so that `--find-links` with a url and with a path works the same, both being locked as `<package_name>@<version>` instead of `<package_name> @ <url>`. (This is more of a detail, this PR in general still work if we strip that and have directory find links represented as `<package_name> @ file:///path/to/file.ext`) `PrioritizedDistribution` and `FlatIndex` have been moved to locations where we can use them in the upstack PR. I added a `scripts/wheels` directory with stripped down wheels to use for testing. We're lacking tests for correct tag priority precedence with flat indexes, i only confirmed this manually since it is not covered in the pip-compile or pip-sync output. Closes #876	2024-01-15 02:04:10 +00:00
konsti	5ffbfadf66	Make hashes optional (#910 ) There is no guarantee that indexes provide hashes at all or the sha256 we support specifically. [PEP 503](https://peps.python.org/pep-0503/#specification): > The URL SHOULD include a hash in the form of a URL fragment with the following syntax: #<hashname>=<hashvalue>, where <hashname> is the lowercase name of the hash function (such as sha256) and <hashvalue> is the hex encoded digest. We instead use the url as input to generate a hash when caching.	2024-01-14 16:32:55 -05:00
konsti	a99e5e00f2	Use absolute urls in `distribution_type::File` (#917 ) Previously, the url on file could either be a relative or an absolute url, depending on the index, and we would finalize it lazily. Now we finalize the url when converting `pypi_types::File` to `distribution_types::File`. This change is required to make the hashes on `File` optional (https://github.com/astral-sh/puffin/pull/910), which are currently the only unique field usable for caching.	2024-01-14 17:15:24 +00:00
Charlie Marsh	e26dc8e33d	Add support for `prepare_metadata_for_build_wheel` (#842 ) ## Summary This PR adds support for `prepare_metadata_for_build_wheel`, which allows us to determine source distribution metadata without building the source distribution. This represents an optimization for the resolver, as we can skip the expensive build phase for build backends that support it. For reference, `prepare_metadata_for_build_wheel` seems to be supported by: - `hatchling` (as of [1.0.9](https://hatch.pypa.io/latest/history/hatchling/#hatchling-v1.9.0)). - `flit` - `setuptools` In fact, it seems to work for every backend _except_ those using legacy `setup.py`. Closes #599.	2024-01-10 00:07:37 +00:00
Charlie Marsh	19c6d655b5	Avoid duplicated source distribution handling in url (#841 ) ## Summary Right now, both the callback _and_ the "We have no compatible wheel" paths have a lot of repeated code. This PR changes the callback to _just_ remove all the wheels and handle the download, and the rest of the method following the callback is responsible for finding and building any wheels.	2024-01-08 16:19:54 -05:00
Charlie Marsh	cc9140643e	Rename `metadata` to `built_wheel` in `source/mod.rs` (#840 )	2024-01-08 19:20:20 +00:00
Charlie Marsh	df254087d9	Break `source_dist.rs` into a module (#839 ) ## Summary Finding this file hard to edit and work in since it's gotten quite large.	2024-01-08 19:14:45 +00:00

36 commits