mirrors/uv - Forgejo: Beyond coding. We Forge.

mirrors/uv

mirror of https://github.com/astral-sh/uv.git synced 2025-08-06 20:08:01 +00:00

Author	SHA1	Message	Date
Charlie Marsh	a0420114c3	Avoid storing absolute URLs for files (#944 ) ## Summary It turns out that storing an absolute URL for every file caused a significant performance regression. This PR attempts to address the regression with two changes. The first is that we now store the raw string if the URL is an absolute URL. If the URL is relative, we store the base URL alongside the raw relative string. As such, we avoid serializing and deserializing URLs until we need them (later on), except for the base URL. The second is that we now use the internal `Url` crate methods for serializing and deserializing. If you look inside `Url`, its standard serializer and deserialization actually convert it to a string, then parse the string. But the crate exposes some other methods for faster serialization and deserialization (with fewer guarantees). I think this is totally fine since the cache is entirely internal. If we _just_ change the `Url` serialization (and no other code -- so continue to store URLs for every file), then the regression goes down to about 5%: ```shell ❯ python -m scripts.bench \ --puffin-path ./target/release/main \ --puffin-path ./target/release/relative --puffin-path ./target/release/puffin \ scripts/requirements/home-assistant.in --benchmark resolve-warm Benchmark 1: ./target/release/main (resolve-warm) Time (mean ± σ): 496.3 ms ± 4.3 ms [User: 452.4 ms, System: 175.5 ms] Range (min … max): 487.3 ms … 502.4 ms 10 runs Benchmark 2: ./target/release/relative (resolve-warm) Time (mean ± σ): 284.8 ms ± 2.1 ms [User: 245.8 ms, System: 165.6 ms] Range (min … max): 280.3 ms … 288.0 ms 10 runs Benchmark 3: ./target/release/puffin (resolve-warm) Time (mean ± σ): 300.4 ms ± 3.2 ms [User: 255.5 ms, System: 178.1 ms] Range (min … max): 295.4 ms … 305.1 ms 10 runs Summary './target/release/relative (resolve-warm)' ran 1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)' 1.74 ± 0.02 times faster than './target/release/main (resolve-warm)' ``` So I considered _just_ making that change. But 5% is kind of borderline... With both of these changes, the regression is down to 1-2%: ``` Benchmark 1: ./target/release/relative (resolve-warm) Time (mean ± σ): 282.6 ms ± 7.4 ms [User: 244.6 ms, System: 181.3 ms] Range (min … max): 275.1 ms … 318.5 ms 30 runs Benchmark 2: ./target/release/puffin (resolve-warm) Time (mean ± σ): 286.8 ms ± 2.2 ms [User: 247.0 ms, System: 169.1 ms] Range (min … max): 282.3 ms … 290.7 ms 30 runs Summary './target/release/relative (resolve-warm)' ran 1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)' ``` It's consistently ~2%-ish, but at this point it's unclear if that's due to the URL change or something other change between now and then. Closes #943.	2024-01-17 09:15:21 -05:00
konsti	5ffbfadf66	Make hashes optional (#910 ) There is no guarantee that indexes provide hashes at all or the sha256 we support specifically. [PEP 503](https://peps.python.org/pep-0503/#specification): > The URL SHOULD include a hash in the form of a URL fragment with the following syntax: #<hashname>=<hashvalue>, where <hashname> is the lowercase name of the hash function (such as sha256) and <hashvalue> is the hex encoded digest. We instead use the url as input to generate a hash when caching.	2024-01-14 16:32:55 -05:00
Charlie Marsh	06039e1293	Add hashes to `pip-compile` output (#894 ) ## Summary Adds hashes to `pip-compile` output, though we don't actually check those hashes in `pip-sync` yet. Closes https://github.com/astral-sh/puffin/issues/131.	2024-01-12 12:44:19 -05:00
konsti	8c2b7d55af	Cleanup deps and docs (#882 ) Fix warnings from `cargo +nightly udeps` and `cargo doc`. Removes all mentions of regex from pep440_rs.	2024-01-11 10:43:40 +00:00
konsti	5b0b072e3c	Allow files >4GB on 32-bit platforms (#847 ) Changes `File::size` from a `usize` to a `u64`. The motivations are that with tensorflow wheels being 475 MB (https://pypi.org/project/tensorflow/2.15.0.post1/#files), we're already only one order of magnitude away and to avoid target dependent failures.	2024-01-09 17:31:49 +01:00
konsti	b1edecdf1f	Filter out files with invalid requires python specifiers (#775 ) Instead of trying to fixup _all_ the invalid version specifiers on pypi and elsewhere, this filters out distributions with invalid `requires-python` version specifiers that even `LenientVersionSpecifiers` couldn't parse, as opposed to failing entirely, which we currently do. I would be nicer to model through an invalid distribution pubgrub type, together with e.g. source dists with an unknown extension, so that the version itself still shows up in the error trace. At the same time, we reduce the log level for fixups from warning to trace, as they are not actionable for the user.	2024-01-09 02:46:27 +00:00
Andrew Gallant	6c98ae9d77	pep440: rewrite the parser and make version comparisons cheaper (#789 ) This PR builds on #780 by making both version parsing faster, and perhaps more importantly, making version comparisons much faster. Overall, these changes result in a considerable improvement for the `boto3.in` workload. Here's the status quo: ``` $ time puffin pip-compile --no-build --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/requirements/boto3.in Resolved 31 packages in 34.56s real 34.579 user 34.004 sys 0.413 maxmem 2867 MB faults 0 ``` And now with this PR: ``` $ time puffin pip-compile --no-build --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/requirements/boto3.in Resolved 31 packages in 9.20s real 9.218 user 8.919 sys 0.165 maxmem 463 MB faults 0 ``` This particular workload gets stuck in pubgrub doing resolution, and thus benefits mightily from a faster `Version::cmp` routine. With that said, this change does also help a fair bit with "normal" runs: ``` $ hyperfine -w10 \ "puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in" \ "puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in" Benchmark 1: puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in Time (mean ± σ): 337.5 ms ± 3.9 ms [User: 310.5 ms, System: 73.2 ms] Range (min … max): 333.6 ms … 343.4 ms 10 runs Benchmark 2: puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in Time (mean ± σ): 189.8 ms ± 3.0 ms [User: 168.1 ms, System: 78.4 ms] Range (min … max): 185.0 ms … 196.2 ms 15 runs Summary puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in ran 1.78 ± 0.03 times faster than puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in ``` There is perhaps some future work here (detailed in the commit messages), but I suspect it would be more fruitful to explore ways of making resolution itself and/or deserialization faster. Fixes #373, Closes #396	2024-01-05 11:57:32 -05:00
konsti	5820a9d937	Update dependencies (#794 ) Pull in a bunch of updates so they get some testing before we announce the project. textwrap 0.16 is blocked on miette updating, http 1.0 on reqwest.	2024-01-05 11:40:12 -05:00
Andrew Gallant	d7c9b151fb	pep440: some minor refactoring, mostly around error types (#780 ) This PR does a bit of refactoring to the pep440 crate, and in particular around the erorr types. This PR is meant to be a precursor to another PR that does some surgery (both in parsing and in `Version` representation) that benefits somewhat from this refactoring. As usual, please review commit-by-commit.	2024-01-04 12:28:36 -05:00
konsti	7d6e6fae25	Requirement fixup for trailing comma after trailing quote (#776 ) Fixup for `7349527cea/boto3-1.2.0-py2.py3-none-any.whl`: ``` botocore>=1.3.0,<1.4.0', ``` Note that neither the quote nor the comma are right.	2024-01-04 08:45:41 -05:00
konsti	7bf2790a25	Remove all quotes from (lenient) version specifiers (#735 ) Found in https://pypi.org/simple/algoliasearch/?format=application/vnd.pypi.simple.v1+json and https://pypi.org/simple/okta/?format=application/vnd.pypi.simple.v1+json	2023-12-28 14:40:42 +00:00
Charlie Marsh	007f52bb4e	Add support for relative URLs in simple metadata responses (#721 ) ## Summary This PR adds support for relative URLs in the simple JSON responses. We already support relative URLs for HTML responses, but the handling has been consolidated between the two. Similar to index URLs, we now store the base alongside the metadata, and use the base when resolving the URL. Closes #455. ## Test Plan `cargo test` (to test HTML indexes). Separately, I also ran `cargo run -p puffin-cli -- pip-compile requirements.in -n --index-url=http://localhost:3141/packages/pypi/+simple` on the `zb/relative` branch with `packse` running, and forced both HTML and JSON by limiting the `accept` header.	2023-12-27 08:53:21 -05:00
Charlie Marsh	ae83a74309	Review feedback for HTML indexes (#733 ) See: https://github.com/astral-sh/puffin/pull/719	2023-12-26 21:57:20 +00:00
Charlie Marsh	188ab75769	Split `File` into internal and external type (#729 ) ## Summary This PR makes the `pypi_types::File` a response-only type (i.e., a type that's only used when deserializing over the wire), and adds a separate internal `File` type. Right now, the representations are similar, but already, we can avoid the "lenient" deserialization on our internal `File` type, and avoid the special-casing of the property names that's required in the JSON. Over time, we can evolve this representation entirely separately from the representation we receive from PyPI and other indexes.	2023-12-25 15:42:28 -05:00
Charlie Marsh	ad34bb02a9	Modify some inconsistent exports (#724 )	2023-12-24 22:30:03 +00:00
Charlie Marsh	5bce699ee1	Add support for HTML indexes (#719 ) ## Summary This PR adds support for HTML index responses (as with `--index-url=https://download.pytorch.org/whl`). Closes https://github.com/astral-sh/puffin/issues/412.	2023-12-24 16:04:00 +00:00
Zanie Blue	665a59dae6	Fix deserialization of index response when `requires_python` field is missing (#708 ) Closes https://github.com/astral-sh/puffin/issues/707	2023-12-20 11:53:48 +01:00
Charlie Marsh	31afb39a10	Show URLs and version together for installed, URL-based dependencies (#690 ) The snapshot test changes will give you a sense for the impact of the change and the output formatting. Closes https://github.com/astral-sh/puffin/issues/686.	2023-12-18 22:21:37 +00:00
Charlie Marsh	207bb83a1c	Rename puffin-warnings macros to avoid tracing collision (#694 ) Also more consistent with Ruff.	2023-12-18 21:33:21 +00:00
konsti	7926749296	Fixup for `>=2.7,!=3.0.,!=3.1.,<3.4.*` (#683 ) Found in https://pypi.org/simple/wincertstore/?format=application/vnd.pypi.simple.v1+json	2023-12-18 12:56:48 +00:00
Charlie Marsh	47290f784e	Add fixup for invalid double quotes (#663 ) Closes https://github.com/astral-sh/puffin/issues/658.	2023-12-15 18:11:22 +00:00
Charlie Marsh	4fd69c74b6	Use URL rather than String in direct URL types (#643 )	2023-12-14 01:01:27 +00:00
Charlie Marsh	a24eb57e93	Make warnings user-facing (#628 ) ## Summary Now, `puffin_warnings::warn_once` and `puffin_warnings::warn` will go to `stderr`, as long as the user isn't running under `--quiet`. Previously, these went through `tracing`, and so were only visible when running under `--verbose`.	2023-12-12 21:24:38 -05:00
Charlie Marsh	edcb71b1be	Remove some unused fields from `SimpleJson` (#612 )	2023-12-11 23:01:37 -05:00
konsti	9806901a16	Consolidate wheel caches (#524 ) After this change, two wheel caches remain: `built-wheels-v0` and `wheels-v0`, docs screenshots below. Each contains both the wheel metadata, cache policy and zip or unzipped wheels under the same name. The zipped/unzipped strategy is as follows: In `pip-compile`, when we build a wheel, we store it zipped. When `pip-sync` or a source dist build in `pip-compile` need to install the wheel, we unzip it, remove the file and replace it with the unzipped wheel. This removes `WheelCache` and `UrlIndex` in favor of `Cache` plus `WheelCache`. The non-built wheel cache now considers index urls and the url for url wheels. I'm unsure if we need the `Unzipper` type, this could just be a function. I move `no_index` into `IndexUrls` and started using `IndexUrl` up to the clap level. I left a number of TODOs in the code, namely performing the actual invalidation of unzipped wheels and making the `InstallPlan` understand cache invalidation (i.e. uninstall wheels when their remote changed). ![image](`c4d45979`-485b-4954-848d-fd3347ee2510)	2023-12-01 20:16:33 +00:00
Charlie Marsh	9d35128840	Use Clippy lint table over Cargo config (#490 ) Closes https://github.com/astral-sh/puffin/issues/482.	2023-11-22 15:10:27 +00:00
Charlie Marsh	443a0a9df2	Use a sparse Metadata 2.1 representation (#488 ) This is an optimization to avoid parsing the entire Metadata 2.1 when we only need a small subset of the fields. Closes #175.	2023-11-22 13:25:35 +00:00
konsti	e1dafe7203	Allow applying multiple fixups for version specifiers (#486 ) Allow applying multiple fixups for version specifiers, remove the duplication from the code and add another test case.	2023-11-22 10:26:12 +00:00
konsti	ff1100a1ab	Fixup for `>= '2.7'` (#485 ) Fixup to allow parsing https://pypi.org/simple/shellingham/?format=application/vnd.pypi.simple.v1+json	2023-11-22 10:00:12 +00:00
Charlie Marsh	17228ba04e	Add support for path dependencies (#471 ) ## Summary This PR adds support for local path dependencies. The approach mostly just falls out of our existing approach and infrastructure for Git and URL dependencies. Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a separate issue for editable installs.) ## Test Plan Added `pip-compile` tests that pre-download a wheel or source distribution, then install it via local path.	2023-11-21 11:49:42 +00:00
konsti	f0841cdb6e	Wheel metadata refactor (#462 ) A consistent cache structure for remote wheel metadata: * `<wheel metadata cache>/pypi/foo-1.0.0-py3-none-any.json` * `<wheel metadata cache>/<digest(index-url)>/foo-1.0.0-py3-none-any.json` * `<wheel metadata cache>/url/<digest(url)>/foo-1.0.0-py3-none-any.json` The source dist caching will use a similar structure (#468).	2023-11-20 17:26:36 +01:00
konsti	d3e9e1783f	Refactor lenient parsing (#467 ) Deduplicate lenient parsing code between version specifiers and Requirement. Use `warn_once!` since the warnings did show up multiple times in my code. Fix the macro hygiene in `warn_once!`.	2023-11-20 15:35:38 +00:00
konsti	46bb18f06e	Track file index (#452 ) Track the index (or at least its url) where we got a file from across the source code. Fixes #448	2023-11-20 08:48:16 +00:00
konsti	e41ec12239	Option to resolve at a fixed timestamp with `pip-compile --exclude-newer YYYY-MM-DD` (#434 ) This works by filtering out files with a more recent upload time, so if the index you use does not provide upload times, the results might be inaccurate. pypi provides upload times for all files. This is, the field is non-nullable in the warehouse schema, but the simple API PEP does not know this field. If you have only pypi dependencies, this means deterministic, reproducible(!) resolution. We could try doing the same for git repos but it doesn't seem worth the effort, i'd recommend pinning commits since git histories are arbitrarily malleable and also if you care about reproducibility and such you such not use git dependencies but a custom index. Timestamps are given either as RFC 3339 timestamps such as `2006-12-02T02:07:43Z` or as UTC dates in the same format such as `2006-12-02`. Dates are interpreted as including this day, i.e. until midnight UTC that day. Date only is required to make this ergonomic and midnight seems like an ergonomic choice. In action for `pandas`: ```console $ target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in Resolved 6 packages in 679ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in numpy==1.26.2 # via pandas pandas==2.1.3 python-dateutil==2.8.2 # via pandas pytz==2023.3.post1 # via pandas six==1.16.0 # via python-dateutil tzdata==2023.3 # via pandas $ target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in Resolved 5 packages in 655ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in numpy==1.23.4 # via pandas pandas==1.5.1 python-dateutil==2.8.2 # via pandas pytz==2022.6 # via pandas six==1.16.0 # via python-dateutil $ target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in Resolved 5 packages in 594ms # This file was autogenerated by Puffin v0.0.1 via the following command: # target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in numpy==1.21.4 # via pandas pandas==1.3.4 python-dateutil==2.8.2 # via pandas pytz==2021.3 # via pandas six==1.16.0 # via python-dateutil ```	2023-11-16 19:46:17 +00:00
konsti	751f7fa9c6	Improve PEP 691 compatibility (#428 ) [PEP 691](https://peps.python.org/pep-0691/#project-detail) has slightly different, more relaxed rules around file metadata. These changes are now reflected in the `File` struct. This will make it easier to support alternative indices. I had expected that i need to introduce a separate type for that, so i'm happy it's two `Option`s more and an alias. Part of #412	2023-11-16 19:03:44 +01:00
konsti	bacf1dc911	Filter out yanked files (#413 ) Implement two behaviors for yanked versions: * During `pip-compile`, yanked versions are filtered out entirely, we currently treat them is if they don't exist. This is leads to confusing error messages because a version that does exist seems to have suddenly disappeared. * During `pip-sync`, we warn when we fetch a remote distribution and it has been yanked. We currently don't warn on cached or installed distributions that have been yanked.	2023-11-13 20:58:50 +00:00
Charlie Marsh	28ec4e79f0	Co-locate lenient requirement parsing (#418 ) No behavior changes.	2023-11-13 15:46:21 -05:00
Charlie Marsh	437d4fb87e	Add trailing-comma fix to lenient requirements (#417 ) Closes https://github.com/astral-sh/puffin/issues/408.	2023-11-13 20:20:57 +00:00
Charlie Marsh	582c94cec3	Add missing-dot fix to lenient requirements (#416 ) Part of https://github.com/astral-sh/puffin/issues/408.	2023-11-13 20:17:01 +00:00
konsti	76a41066ac	Filter out incompatible dists (#398 ) Filter out source dists and wheels whose `requires-python` from the simple api is incompatible with the current python version. This change showed an important problem: When we use a fake python version for resolving, building source distributions breaks down because we can only build with versions we actually have. This change became surprisingly big. The tests now require python 3.7 to be installed, but changing that would mean an even bigger change. Fixes #388	2023-11-13 17:14:07 +01:00
Charlie Marsh	2c32bc5a86	Respect direct URLs in puffin installer (#345 ) We now write the `direct_url.json` when installing, and _skip_ installing if we find a package installed via the direct URL that the user is requesting. A lot of TODOs, especially around cleaning up the `Source` abstraction and its relationship to `DirectUrl`. I'm gonna keep working on these today, but this works and makes the requirements clear. Closes #332.	2023-11-07 09:11:27 -05:00
konsti	c883b123ac	Allow greater than star (`torch (>=1.9.*)`) in lenient requirement (#351 ) This appeared in the pypi top 8k testing.	2023-11-07 11:37:23 +00:00
Charlie Marsh	b013ea9c93	Move `DirectUrl` into `pypi-types` (#343 ) This needs to be reused elsewhere, and there's nothing specific to wheel installation about it.	2023-11-06 18:26:33 +00:00
Charlie Marsh	24e30e6557	Split `puffin-package` into requirements.txt parser and `pypi-types` (#341 ) There are only two things left in this crate and they don't really have anything to do with one another.	2023-11-06 18:19:49 +00:00

44 commits