Commit graph

58 commits

Author SHA1 Message Date
Charlie Marsh
d9cc9dbf88
Improve error message when editable requirement doesn't exist (#1024)
Making these a lot clearer in the common case by reducing the depth of
the error.
2024-01-20 12:59:18 -05:00
Charlie Marsh
5adb08a304
Allow relative paths and environment variables in all editable representations (#1000)
## Summary

I don't know if this is actually a good change, but it tries to make the
editable install experience more consistent. Specifically, we now
support...

```
# Use a relative path with a `file://` prefix.
# Prior to this PR, we supported `file:../foo`, but not `file://../foo`, which felt inconsistent.
-e file://../foo

# Use environment variables with paths, not just URLs.
# Prior to this PR, we supported `file://${PROJECT_ROOT}/../foo`, but not the below.
-e ${PROJECT_ROOT}/../foo
```

Importantly, `-e file://../foo` is actually not supported by pip... `-e
file:../foo` _is_ supported though. We support both, as of this PR. Open
to feedback.
2024-01-19 09:00:37 -05:00
konsti
cd2fb6fd60
Box PrioritizedDistribution (#948)
On top of https://github.com/astral-sh/puffin/pull/947, we can also box
`PrioritizedDistribution`.

In a simple benchmark, this seems to slightly improve performance when
comparing only this commit to main, even though the benchmark is too
noisy to establish significance:

```
$ hyperfine --warmup 30 --runs 300 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent"
  Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
    Time (mean ± σ):      83.6 ms ±   2.0 ms    [User: 77.7 ms, System: 20.0 ms]
    Range (min … max):    81.4 ms …  98.2 ms    300 runs

    Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

  Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
    Time (mean ± σ):      80.8 ms ±   2.2 ms    [User: 75.4 ms, System: 19.5 ms]
    Range (min … max):    78.6 ms …  98.6 ms    300 runs

    Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

  Summary
    target/profiling/puffin-dev resolve meine_stadt_transparent ran
      1.03 ± 0.04 times faster than target/profiling/main-dev resolve meine_stadt_transparent
```

The effect on type sizes however is considerable ([downstack
PR](https://gist.github.com/konstin/38e6c774db541db46d61f1d4ea6b498f)
vs. [this
PR](https://gist.github.com/konstin/003a77fe7d7d246b0d535e3fc843cb36)):

```patch
--- branch.txt  2024-01-17 14:26:01.826085176 +0100
+++ boxed-prioritized-dist.txt  2024-01-17 14:25:57.101900963 +0100
@@ -1,19 +1,3 @@
-9264 alloc::collections::btree::node::InternalNode<pep440_rs::version::Version, distribution_types::PrioritizedDistribution> align=8
-   9168 data
-     96 edges
-
-9264 alloc::collections::btree::node::InternalNode<pep440_rs::Version, distribution_types::PrioritizedDistribution> align=8
-   9168 data
-     96 edges
-
-9168 alloc::collections::btree::node::LeafNode<pep440_rs::version::Version, distribution_types::PrioritizedDistribution> align=8
-   9064 vals
-     88 keys
-
-9168 alloc::collections::btree::node::LeafNode<pep440_rs::Version, distribution_types::PrioritizedDistribution> align=8
-   9064 vals
-     88 keys
-
 8992 tokio::sync::mpsc::block::Block<hyper::client::dispatch::Envelope<http::request::Request<reqwest::async_impl::body::ImplStream>, http::response::Response<hyper::body::body::Body>>> align=8
    8960 values
      32 header
@@ -74,10 +58,23 @@
          40 __tracing_attr_span
      64 variant Unresumed, Returned, Panicked

+5648 {async fn body@crates/puffin-client/src/registry_client.rs:224:5: 224:30} align=8
+   5647 variant Suspend0
+       5576 __awaitee align=8
+         40 __tracing_attr_span
```
2024-01-19 10:44:41 +01:00
konsti
47fc90d1b3
Reduce stack usage by boxing File in Dist, CachePolicy and large futures (#1004)
This is https://github.com/astral-sh/puffin/pull/947 again but this time
merging into main instead of downstack, sorry for the noise.

---

Windows has a default stack size of 1MB, which makes puffin often fail
with stack overflows. The PR reduces stack size by three changes:

* Boxing `File` in `Dist`, reducing the size from 496 to 240.
* Boxing the largest futures.
* Boxing `CachePolicy`

## Method

Debugging happened on linux using
https://github.com/astral-sh/puffin/pull/941 to limit the stack size to
1MB. Used ran the command below.

```
RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt
```

The main drawback is top-type-sizes not saying what the `__awaitee` is,
so it requires manually looking up with a future with matching size.

When the `brotli` features on `reqwest` is active, a lot of brotli types
show up. Toggling this feature however seems to have no effect. I assume
they are false positives since the `brotli` crate has elaborate control
about allocation. The sizes are therefore shown with the feature off.

## Results

The largest future goes from 12208B to 6416B, the largest type
(`PrioritizedDistribution`, see also #948) from 17448B to 9264B. Full
diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f

For the second commit, i iteratively boxed the largest file until the
tests passed, then with an 800KB stack limit looked through the
backtrace of a failing test and added some more boxing.

Quick benchmarking showed no difference:

```console
$ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" 
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      49.2 ms ±   3.0 ms    [User: 39.8 ms, System: 24.0 ms]
  Range (min … max):    46.6 ms …  63.0 ms    55 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
  Time (mean ± σ):      47.4 ms ±   3.2 ms    [User: 41.3 ms, System: 20.6 ms]
  Range (min … max):    44.6 ms …  60.5 ms    62 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  target/profiling/puffin-dev resolve meine_stadt_transparent ran
    1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent
```
2024-01-19 09:38:36 +00:00
Charlie Marsh
9b24fcd306
Remove verbatim URL from path file location (#998)
## Summary

I got confused by why `VerbatimUrl` was on `Path`. Since it's directly
computed from it, I think we should just compute it as-needed. I think
it's also possibly-buggy because the URL is the URL of the _directory_,
not the artifact itself, which differs from other distributions.
2024-01-18 22:40:48 -05:00
Charlie Marsh
f9154e8297
Add release workflow (#961)
## Summary

This PR adds a release workflow powered by `cargo-dist`. It's similar to
the version that's PR'd in Ruff
(https://github.com/astral-sh/ruff/pull/9559), with the exception that
it doesn't include the Docker build or the "update dependents" step for
pre-commit.
2024-01-18 15:44:11 -05:00
Charlie Marsh
f17bad0a75
Mark path-based cache entries as stale during install plan (#957)
## Summary

This is a small correctness improvement that ensures that we avoid using
stale cache entries for local dependencies in the install plan. We
already have some logic like this in the source distribution builder,
but it didn't apply in the install plan, and so we'd end up using stale
wheels.

Specifically, now, if you create a new local wheel, and run `pip sync`,
we'll mark the cache entries as stale and make sure we unzip it and
install it. (If the wheel is _already_ installed, we won't reinstall it
though, which will be a separate change. This is just about reading from
the cache, not the environment.)
2024-01-18 19:13:29 +00:00
Charlie Marsh
a0420114c3
Avoid storing absolute URLs for files (#944)
## Summary

It turns out that storing an absolute URL for every file caused a
significant performance regression. This PR attempts to address the
regression with two changes.

The first is that we now store the raw string if the URL is an absolute
URL. If the URL is relative, we store the base URL alongside the raw
relative string. As such, we avoid serializing and deserializing URLs
until we need them (later on), except for the base URL.

The second is that we now use the internal `Url` crate methods for
serializing and deserializing. If you look inside `Url`, its standard
serializer and deserialization actually convert it to a string, then
parse the string. But the crate exposes some other methods for faster
serialization and deserialization (with fewer guarantees). I think this
is totally fine since the cache is entirely internal.

If we _just_ change the `Url` serialization (and no other code -- so
continue to store URLs for every file), then the regression goes down to
about 5%:

```shell
❯ python -m scripts.bench \
        --puffin-path ./target/release/main \
        --puffin-path ./target/release/relative --puffin-path ./target/release/puffin \
        scripts/requirements/home-assistant.in --benchmark resolve-warm
Benchmark 1: ./target/release/main (resolve-warm)
  Time (mean ± σ):     496.3 ms ±   4.3 ms    [User: 452.4 ms, System: 175.5 ms]
  Range (min … max):   487.3 ms … 502.4 ms    10 runs

Benchmark 2: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     284.8 ms ±   2.1 ms    [User: 245.8 ms, System: 165.6 ms]
  Range (min … max):   280.3 ms … 288.0 ms    10 runs

Benchmark 3: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     300.4 ms ±   3.2 ms    [User: 255.5 ms, System: 178.1 ms]
  Range (min … max):   295.4 ms … 305.1 ms    10 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.05 ± 0.01 times faster than './target/release/puffin (resolve-warm)'
    1.74 ± 0.02 times faster than './target/release/main (resolve-warm)'
```

So I considered _just_ making that change. But 5% is kind of
borderline...

With both of these changes, the regression is down to 1-2%:

```
Benchmark 1: ./target/release/relative (resolve-warm)
  Time (mean ± σ):     282.6 ms ±   7.4 ms    [User: 244.6 ms, System: 181.3 ms]
  Range (min … max):   275.1 ms … 318.5 ms    30 runs

Benchmark 2: ./target/release/puffin (resolve-warm)
  Time (mean ± σ):     286.8 ms ±   2.2 ms    [User: 247.0 ms, System: 169.1 ms]
  Range (min … max):   282.3 ms … 290.7 ms    30 runs

Summary
  './target/release/relative (resolve-warm)' ran
    1.01 ± 0.03 times faster than './target/release/puffin (resolve-warm)'
```

It's consistently ~2%-ish, but at this point it's unclear if that's due
to the URL change or something other change between now and then.

Closes #943.
2024-01-17 09:15:21 -05:00
Charlie Marsh
249ca10765
Move Puffin subcommands to a pip namespace (#921)
## Summary

This makes the separation clearer between the legacy `pip` API and the
API we'll add in the future for the package manager itself. It also
enables seamless `puffin pip` aliasing for those that want it.

Closes #918.
2024-01-15 16:36:45 +00:00
Charlie Marsh
e54fdea93f
Continue to respect --find-links with --no-index (#931)
Like `pip`, we should allow `--find-links` with `--no-index`.
2024-01-15 16:19:27 +00:00
Charlie Marsh
42888a9609
Share flat index across resolutions (#930)
## Summary

This PR restructures the flat index fetching in a few ways:

1. It now lives in its own `FlatIndexClient`, since it felt a bit
awkward (in my opinion) for it to live in `RegistryClient`.
2. We now fetch the `FlatIndex` outside of the resolver. This has a few
benefits: (1) the resolver construct is no longer `async` and no longer
returns `Result`, which feels better for a resolver; and (2) we can
share the `FlatIndex` across resolutions rather than re-fetching it for
every source distribution build.
2024-01-15 11:02:02 -05:00
konsti
82ff136a74
Add find links supports to pip-sync (#914)
Closes #877
2024-01-15 03:04:55 +00:00
konsti
f63776b894
Support HTML indexes in --find-links (#913)
The simple html format parser luckily seems to work for find links too,
at least it can parse
https://storage.googleapis.com/jax-releases/jax_cuda_releases.html.
2024-01-15 02:54:34 +00:00
konsti
e9b6b6fa36
Implement --find-links as flat indexes (directories in pip-compile) (#912)
Add directory `--find-links` support for local paths to pip-compile.

It seems that pip joins all sources and then picks the best package. We
explicitly give find links packages precedence if the same exists on an
index and locally by prefilling the `VersionMap`, otherwise they are
added as another index and the existing rules of precedence apply.

Internally, the feature is called _flat index_, which is more meaningful
than _find links_: We're not looking for links, we're picking up local
directories, and (TBD) support another index format that's just a flat
list of files instead of a nested index.

`RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and
`SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist`
and `RegistrySourceDist` gained the ability to represent both a url and
a path so that `--find-links` with a url and with a path works the same,
both being locked as `<package_name>@<version>` instead of
`<package_name> @ <url>`. (This is more of a detail, this PR in general
still work if we strip that and have directory find links represented as
`<package_name> @ file:///path/to/file.ext`)

`PrioritizedDistribution` and `FlatIndex` have been moved to locations
where we can use them in the upstack PR.

I added a `scripts/wheels` directory with stripped down wheels to use
for testing.

We're lacking tests for correct tag priority precedence with flat
indexes, i only confirmed this manually since it is not covered in the
pip-compile or pip-sync output.

Closes #876
2024-01-15 02:04:10 +00:00
konsti
5ffbfadf66
Make hashes optional (#910)
There is no guarantee that indexes provide hashes at all or the sha256
we support specifically. [PEP
503](https://peps.python.org/pep-0503/#specification):

> The URL SHOULD include a hash in the form of a URL fragment with the
following syntax: #<hashname>=<hashvalue>, where <hashname> is the
lowercase name of the hash function (such as sha256) and <hashvalue> is
the hex encoded digest.

We instead use the url as input to generate a hash when caching.
2024-01-14 16:32:55 -05:00
konsti
a53bdeba4c
Remove base from RegistryBuiltDist and RegistrySourceDist (#919)
Follow-up to https://github.com/astral-sh/puffin/pull/917 i found
rebasing the find-links PRs, this field became unused through the
absolute URLs.
2024-01-14 17:46:16 +00:00
konsti
a99e5e00f2
Use absolute urls in distribution_type::File (#917)
Previously, the url on file could either be a relative or an absolute
url, depending on the index, and we would finalize it lazily. Now we
finalize the url when converting `pypi_types::File` to
`distribution_types::File`. This change is required to make the hashes
on `File` optional (https://github.com/astral-sh/puffin/pull/910), which
are currently the only unique field usable for caching.
2024-01-14 17:15:24 +00:00
Charlie Marsh
d3f65c317d
Avoid some additional clones for PackageName (#896) 2024-01-12 17:54:40 +00:00
konsti
8c2b7d55af
Cleanup deps and docs (#882)
Fix warnings from `cargo +nightly udeps` and `cargo doc`.

Removes all mentions of regex from pep440_rs.
2024-01-11 10:43:40 +00:00
konsti
ee6d809b60
Remove unused Result (#849)
Remove some dead code, seems to be a refactoring oversight
2024-01-09 16:35:10 +00:00
konsti
5b0b072e3c
Allow files >4GB on 32-bit platforms (#847)
Changes `File::size` from a `usize` to a `u64`.

The motivations are that with tensorflow wheels being 475 MB
(https://pypi.org/project/tensorflow/2.15.0.post1/#files), we're already
only one order of magnitude away and to avoid target dependent failures.
2024-01-09 17:31:49 +01:00
konsti
b1edecdf1f
Filter out files with invalid requires python specifiers (#775)
Instead of trying to fixup _all_ the invalid version specifiers on pypi
and elsewhere, this filters out distributions with invalid
`requires-python` version specifiers that even
`LenientVersionSpecifiers` couldn't parse, as opposed to failing
entirely, which we currently do.

I would be nicer to model through an invalid distribution pubgrub type,
together with e.g. source dists with an unknown extension, so that the
version itself still shows up in the error trace.

At the same time, we reduce the log level for fixups from warning to
trace, as they are not actionable for the user.
2024-01-09 02:46:27 +00:00
konsti
0c5ca1cdd8
Delete unused file (#772)
This is a duplicate that's not used anymore, probably a refactoring
artifact.
2024-01-04 11:32:12 +00:00
konsti
2d4cb1ebf2
Rust 1.75 (#736)
The `async fn` and return-position `impl Trait` in traits improve
`BuildContext` ergonomics. The traits use `impl Future` over `async fn`
to make the send bound explicit
(https://blog.rust-lang.org/2023/12/21/async-fn-rpit-in-traits.html).

The remaining changes are due to clippy.
2023-12-28 16:08:35 -04:00
konsti
0ebff943e4
Finish install-many with pypi 10k most dependents (#732)
This PR combines three small changes to finish up the install-many
testing.

* Download pypi_10k_most_dependents.txt in script I'd like to have the
setup process of the large scale checks automated.
* Some install-many dev script improvements 
* Fix mkl_fft-1.3.6-58-cp310-cp310-manylinux2014_x86_64.whl:
mkl_fft-1.3.6-58-cp310-cp310-manylinux2014_x86_64.whl has multiple
Wheel-Version entries, we have to ignore that like pip

Apart from the mkl-fft fix the only other errors i've seen showing up
are
https://github.com/astral-sh/puffin/issues/520#issuecomment-1869625642.
2023-12-27 09:42:51 -05:00
Charlie Marsh
007f52bb4e
Add support for relative URLs in simple metadata responses (#721)
## Summary

This PR adds support for relative URLs in the simple JSON responses. We
already support relative URLs for HTML responses, but the handling has
been consolidated between the two. Similar to index URLs, we now store
the base alongside the metadata, and use the base when resolving the
URL.

Closes #455.

## Test Plan

`cargo test` (to test HTML indexes). Separately, I also ran `cargo run
-p puffin-cli -- pip-compile requirements.in -n
--index-url=http://localhost:3141/packages/pypi/+simple` on the
`zb/relative` branch with `packse` running, and forced both HTML and
JSON by limiting the `accept` header.
2023-12-27 08:53:21 -05:00
Charlie Marsh
188ab75769
Split File into internal and external type (#729)
## Summary

This PR makes the `pypi_types::File` a response-only type (i.e., a type
that's only used when deserializing over the wire), and adds a separate
internal `File` type. Right now, the representations are similar, but
already, we can avoid the "lenient" deserialization on our internal
`File` type, and avoid the special-casing of the property names that's
required in the JSON. Over time, we can evolve this representation
entirely separately from the representation we receive from PyPI and
other indexes.
2023-12-25 15:42:28 -05:00
Charlie Marsh
6ff21374dc
Split puffin-cache into Puffin-specific and generic utilities (#728)
This crate started off as generic caching utilities, but we started
adding a lot of Puffin-specific stuff (like the cache buckets
abstraction that knows about Git vs. direct URL vs. indexes and so on).
This PR moves the generic stuff into a new `cache-key` crate.
2023-12-25 14:38:56 +00:00
Charlie Marsh
ad34bb02a9
Modify some inconsistent exports (#724) 2023-12-24 22:30:03 +00:00
Charlie Marsh
3660d8a08e
Introduce separate traits for ahead-of-time and installed metadata (#692)
This is a pure refactor to follow-up #690, to separate the metadata that
we know upfront about distributions (like the version, for
registry-based distributions) vs. the metadata that requires building
(like the version, for URL-based distributions).
2023-12-18 22:37:45 +00:00
Charlie Marsh
31afb39a10
Show URLs and version together for installed, URL-based dependencies (#690)
The snapshot test changes will give you a sense for the impact of the
change and the output formatting.

Closes https://github.com/astral-sh/puffin/issues/686.
2023-12-18 22:21:37 +00:00
Charlie Marsh
365c860e27
Show fully-resolved URLs in non-resolution contexts (#689)
We now show the fully-resolved URL, rather than the URL as given by the
user, _everywhere_ except for the output resolution file (which should
retain relative paths, unexpanded environment variables, etc.).

Closes https://github.com/astral-sh/puffin/issues/687.
2023-12-18 22:10:24 +00:00
Charlie Marsh
c400ab7d07
Add support for file:// URLs in editable requirements (#680) 2023-12-18 14:55:37 +00:00
konsti
f4f67ebde0
Rebase: Uninstall existing non-editable versions when installing editable requirements bug (#682)
Separate branch for rebasing #677 onto main because i don't trust the
rebase enough to force push.

Closes #677.

---

If you install `black` from PyPI, then `-e ../black`, we need to
uninstall the existing `black`. This sounds simple, but that in turn
requires that we _know_ `-e ../black` maps to the package `black`, so
that we can mark it for uninstallation in the install plan. This, in
turn, means that we need to build editable dependencies prior to the
install plan.

This is just a bunch of reorganization to fix that specific bug
(installing multiple versions of `black` if you run through the above
workflow): we now run through the list of editables upfront, mark those
that are already installed, build those that aren't, and then ensure
that `InstallPlan` correctly removes those that need to be removed, etc.

Closes #676.

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2023-12-18 09:28:14 +00:00
Charlie Marsh
0bb2c92246
Add editable install support to pip-install (#675)
Per the title: adds support for `-e` installs to `puffin pip-install`.
There were some challenges here around threading the editable installs
to the right places. Namely, we want to build _once_, then reuse the
editable installs from the resolution. At present, we were losing the
`editable: true` flag on the `Dist` that came back through the
resolution, so it required some changes to the resolver.

Closes https://github.com/astral-sh/puffin/issues/672.
2023-12-18 09:52:32 +01:00
Charlie Marsh
00e1c33af4
Add an editable index to the site-packages registry (#671)
This PR modifies `SitePackages` to store all distributions in a flat
vector, and maintain two indexes (hash maps) from "per-element data for
an element in the vector" to "index of that element". This enables us to
maintain a map on both package name and editable URL.
2023-12-17 03:44:36 +00:00
konsti
f059c6e6a6
Support editable in pip-sync and pip-compile (#587)
Support `-e path/do/dir` in pip-sync and and pip-compile.
2023-12-16 22:37:34 +00:00
Charlie Marsh
9470c20e7a
Avoid double resolution during source builds (#656)
## Summary

This PR ensures that we re-use the resolution to install the build
dependencies when building a source distribution. Currently, we only
pass along the list of requirements, and then use the `Finder` to map
each requirement to a distribution. But we already determine the correct
distribution when resolving!

Closes https://github.com/astral-sh/puffin/issues/655.
2023-12-15 17:27:16 +00:00
Charlie Marsh
ed8dfbfcf7
Preserve verbatim URLs (#639)
## Summary

This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout
the resolution and installation pipeline. In short, alongside the parsed
`Url`, we also keep the URL as written by the user. This enables us to
display the URL exactly as written by the user, rather than the
serialized path that we use internally.

This will be especially useful once we start expanding environment
variables since, at that point, we'll be able to write the version of
the URL that includes the _unexpected_ environment variable to the
output file.
2023-12-14 15:03:39 +00:00
Charlie Marsh
eef9612719
Allow reporters to take dyn Metadata (#645) 2023-12-14 12:36:28 +01:00
Charlie Marsh
4fd69c74b6
Use URL rather than String in direct URL types (#643) 2023-12-14 01:01:27 +00:00
Charlie Marsh
8071a23863
Add dedicated ID types to avoid opaque strings (#642)
This allows us to enforce type safety within the resolver. For example,
in the index, we can remove `String` as a key type and enforce that
callers _must_ present us with a `PackageId`. (This actually caught one
bug, where we were using the SHA rather than the package ID. That bug
shouldn't have had any effect given where it was, since those are 1:1,
but it's still problematic.)
2023-12-14 00:53:33 +00:00
Charlie Marsh
4fb2e0955e
Add a fast-path to skip resolution when installation is complete (#613)
For a very large resolution (a few hundred packages), I see 13ms vs.
400ms for a no-op. It's worth optimizing this case, in my opinion.
2023-12-12 17:43:12 +00:00
Charlie Marsh
5ae3a8b1cb
Restructure Git cache to include package name (#588)
## Summary

This PR modifies the Git wheel cache to: (1) use a shorter version of
the SHA, to save space; and (2) include the package name, for
consistency with all other buckets.

I considered removing the URL hash entirely, and _just_ using the SHA,
which would be even _more_ consistent with other buckets. But if we
remove the URL, then we won't have separate directories for
subdirectories (which are part of the URL).

Before:

<img width="1035" alt="Screen Shot 2023-12-07 at 7 23 42 PM"
src="86afce67-682f-464f-9ba1-0b60d5b7f19f">

After:

<img width="1232" alt="Screen Shot 2023-12-07 at 8 09 23 PM"
src="eda42a19-974f-47fe-8c83-54a602ddfd2d">
2023-12-07 20:17:41 -05:00
Charlie Marsh
aa065f5c97
Modify install plan to support all distribution types (#581)
This PR adds caching support for built wheels in the installer.
Specifically, the `RegistryWheelIndex` now indexes both downloaded and
built wheels (from registries), and we have a new `BuiltWheelIndex` that
takes a subdirectory and returns the "best-matching" compatible wheel.

Closes #570.
2023-12-07 04:43:34 +00:00
Charlie Marsh
5370484307
Remove .whl extension for cached, unzipped wheels (#574)
## Summary

This PR uses the wheel stem (e.g., `foo-1.2.3-py3-none-any`) instead of
the wheel name (e.g., `foo-1.2.3-py3-none-any.whl`) when storing
unzipped wheels in the cache, which removes a class of confusing issues
around overwrites and directory-vs.-file collisions.

For now, we retain _both_ the zipped and unzipped wheels in the cache,
though we can easily change this by storing the zipped wheels in a
temporary directory.

Closes https://github.com/astral-sh/puffin/issues/573.

## Test Plan

Some examples from my local cache:

<img width="835" alt="Screen Shot 2023-12-05 at 4 09 55 PM"
src="784146aa-b080-416e-9767-40c843fe5d6a">
<img width="847" alt="Screen Shot 2023-12-05 at 4 12 14 PM"
src="4bc7f30f-bef3-47f1-b4e8-da9cabf87f28">
<img width="637" alt="Screen Shot 2023-12-05 at 4 09 50 PM"
src="25ca4944-4a06-4a08-ac85-c6f7d8b5c8ea">
2023-12-05 22:41:22 +00:00
Charlie Marsh
5fddcc362e
Improve error messages for 'file not found' case (#550)
Right now, if you specify a wheel that doesn't exist, you get: `no such
file or directory` with no additional context. Oops!
2023-12-04 22:01:51 +00:00
konsti
1142a14f4d
Check compatibility for cached unzipped wheels (#501)
**Motivation** Previously, we would install any wheel with the correct
package name and version from the cache, even if it doesn't match the
current python interpreter.

**Summary** The unzipped wheel cache for registries now uses the entire
wheel filename over the name-version (`editables-0.5-py3-none-any.whl`
over `editables-0.5`).

Built wheels are not stored in the `wheels-v0` unzipped wheels cache
anymore. For each source distribution, there can be multiple built
wheels (with different compatibility tags), so i argue that we need a
different cache structure for them (follow up PR).

For `all-kinds.in` with

```bash
rm -rf cache-all-kinds
virtualenv --clear -p 3.12 .venv
cargo run --bin puffin -- pip-sync --cache-dir cache-all-kinds target/all-kinds.txt
```

we get:

**Before**
```
cache-all-kinds/wheels-v0/
├── registry
│   ├── annotated_types-0.6.0
│   ├── asgiref-3.7.2
│   ├── blinker-1.7.0
│   ├── certifi-2023.11.17
│   ├── cffi-1.16.0
│   ├── [...]
│   ├── tzdata-2023.3
│   ├── urllib3-2.1.0
│   └── wheel-0.42.0
└── url
    ├── 4b8be67c801a7ecb
    │   ├── flask
    │   └── flask-3.0.0.dist-info
    ├── 6781bd6440ae72c2
    │   ├── werkzeug
    │   └── werkzeug-3.0.1.dist-info
    └── a67db8ed076e3814
        ├── pydantic_extra_types
        └── pydantic_extra_types-2.1.0.dist-info

48 directories, 0 files
```

**After**

```
cache-all-kinds/wheels-v0/
├── registry
│   ├── annotated_types-0.6.0-py3-none-any.whl
│   ├── asgiref-3.7.2-py3-none-any.whl
│   ├── blinker-1.7.0-py3-none-any.whl
│   ├── certifi-2023.11.17-py3-none-any.whl
│   ├── cffi-1.16.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
│   ├── [...]
│   ├── tzdata-2023.3-py2.py3-none-any.whl
│   ├── urllib3-2.1.0-py3-none-any.whl
│   └── wheel-0.42.0-py3-none-any.whl
└── url
    └── 4b8be67c801a7ecb
        └── flask-3.0.0-py3-none-any.whl

39 directories, 0 files
```

**Outlook** Part of #477 "Fix wheel caching". Further tasks:
* Replace the `CacheShard` with `WheelMetadataCache` which handles urls
properly.
* Delete unzipped wheels when their remote wheel changed
* Store built wheels next to the `metadata.json` in the source dist
directory; delete built wheels when their source dist changed (different
cache bucket, but it's the same problem of fixing wheel caching) I'll
make stacked PRs for those
2023-11-27 16:03:58 -08:00
Charlie Marsh
3eb0a43995
Perform a single Git fetch when building source distributions (#499)
## Summary

We need to pass in the distribution with the "precise" URL to avoid
refetching.

## Test Plan

Ran `cargo run -p puffin-cli -- pip-compile requirements.in --verbose`
with `flask @ git+https://github.com/pallets/flask.git` and verified
that we only checked out Flask once.
2023-11-25 23:29:41 +00:00
konsti
d54e780843
Source dist metadata refactor (#468)
## Summary and motivation

For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.

`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.

Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.

In a follow-up PR, I'll also clear built wheels when they've become
stale.

Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.

Closes #473

## Source dist cache structure 

Entries by supported sources:
 * `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename

(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`

Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│   └── 5c56bc1c58c34c11
│       └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│           └── metadata.json
├── pypi
│   └── django-allauth-0.51.0.tar.gz
│       └── metadata.json
└── url
    └── 6781bd6440ae72c2
        └── werkzeug-3.0.1.tar.gz
            └── metadata.json
```

The inside of a `metadata.json`:
```json
{
  "data": {
    "django_allauth-0.51.0-py3-none-any.whl": {
      "metadata-version": "2.1",
      "name": "django-allauth",
      "version": "0.51.0",
      ...
    }
  }
}
```
2023-11-24 17:47:58 +00:00