Commit graph

1139 commits

Author SHA1 Message Date
konsti
ea0bfc565d
Refactor pip scenario tests (#1212)
Mostly a mechanical refactor to use the `puffin_snapshot!` and
`TestContext` infrastructure in the pip compile and pip install
scenarios, in preparation for adding programmatic windows testing
filters.
2024-02-01 10:31:40 +01:00
Charlie Marsh
0757862a7a
Accommodate minute-level filters in Insta (#1219)
I don't know why `compile_editable` took over a minute in this case, but
seems like it did? Hard to test this fix.


2108933895
2024-02-01 09:43:09 +01:00
Charlie Marsh
631ab51d6e
Use publicly available Apple Silicon runners (#1216) 2024-01-31 23:41:51 -05:00
Charlie Marsh
8cbe1d220c
Remove double-download for source distributions (#1218)
## Summary

Oops -- this was using a different cache key than the route above (this
is the wheel _metadata_ route vs. the wheel build route), so we were
saving and building source distributions twice in `pip install`.
2024-02-01 04:41:29 +00:00
Charlie Marsh
51e8609ee8
Use Python 3.12 in benchmarks (#1215)
I originally used Python 3.10, since 3.10 and 3.11 are by far the most
common (at least for [Ruff](https://pypistats.org/packages/ruff)). But
3.12 should give Python tools the most favorable benchmarks.
2024-01-31 15:51:13 -05:00
Charlie Marsh
ee69fb51ea
Add PDM to benchmark script (#1214)
## Summary

Overall, similar to Poetry, with some simplifications (e.g., we don't
need to translate to Poetry's dependency syntax), and the need to adjust
how we manage the cache and virtual environment.
2024-01-31 20:31:45 +00:00
Charlie Marsh
c4bfb6efee
Add a BENCHMARKS.md with rendered benchmarks (#1211)
As a precursor to the release, I want to include a structured document
with detailed benchmarks.

Closes https://github.com/astral-sh/puffin/issues/1210.
2024-01-31 20:11:52 +00:00
Andrew Gallant
b9d89e7624
puffin-client: generalize SimpleMetadaRaw into OwnedArchive<A> (#1208)
It turns out that the pattern I coded up for SimpleMetadataRaw is
generally useful when working with rkyv. This commit makes it generic by
supporting any type that implements rkyv's traits, and makes a few
simplifying assumptions by picking a concrete serializer, validator and
deserializer. In effect, this lets use own any archived value.

We also rejigger the API a little bit and double-down on
`OwnedArchive<A>` just being a owned wrapper for `Archived<A>`. Namely,
we implement `Deref` and turn its inherent methods into methods that
require fully qualified syntax. (As is standard for things that
implement `Deref` to avoid ambiguity with the deref target's methods.)

(This PR also makes a couple small simplifications to our custom rkyv
serializer since we no longer need to use it directly. We do still need
to name the type in trait bounds, so it has to be public.)
2024-01-31 11:56:34 -05:00
konsti
234e8d0bb7
Abstract away test duplication in pip-compile (#1187)
In preparation for the new windows handling, i want to introduce a
`TestContext` and `puffin_snapshot!` abstraction. This PR applies those
changes for pip-compile. My plan is to use those for all venv-based
integration tests and build the custom windows filters on top of
`puffin_snapshot!`.
2024-01-31 16:11:10 +00:00
Charlie Marsh
01258c1bb3
Report number of bytes deleted when clearing cache (#1203)
## Summary

This is based on Cargo's `clean` implementation, with modifications
based on some of my own preferences, and to better adhere to patterns we
use in our codebase:

![Screenshot 2024-01-31 at 1 31
10 AM](38704798-b17f-4972-ab67-00484ce63d62)
2024-01-31 10:48:28 -05:00
Charlie Marsh
ec816a3322
Update Python discovery documentation (#1194)
Closes https://github.com/astral-sh/puffin/issues/1109.
2024-01-31 15:42:32 +00:00
Charlie Marsh
35113c1d06
Enable macOS checks on CI (#1193)
## Summary

Enables tests for macOS in CI, using the M1 runners (which are free in
public, but count against our quota in private
repos). For now, I'm just running them on `main` to save quota.

I did the math, and the M1 runners are the best value:

![Screenshot 2024-01-30 at 9 33
36 PM](bd5a14b6-740c-487f-bcad-81c0fce5b62e)

Closes #1053.
2024-01-31 15:27:04 +00:00
Charlie Marsh
462fec1968
Remove readarray from install.sh (#1198)
## Summary

This isn't available on macOS (see, e.g.,
https://stackoverflow.com/questions/23842261/alternative-to-readarray-because-it-does-not-work-on-mac-os-x),
but this version works both on macOS and Linux.

Closes https://github.com/astral-sh/puffin/issues/1196. (Verified
locally and on CI.)
2024-01-31 10:22:13 -05:00
Charlie Marsh
8f9258fae3
Invert default feature for testing (#1200)
## Summary

We have some flags in Puffin that enable us to opt-in to certain tests.
To date, they've been opt-in, so we've run our tests with
`--all-features`. This PR makes them opt-out, and we now run tests with
default features.

The main motivation here is that I want to get tests working for macOS
on CI, but for unknown reasons, macOS can't compile the PyO3 features at
the same time as everything else due to strange linker issues. By
avoiding `--all-features` for tests, we thus avoid unnecessarily
including features that we don't actually use in Puffin.

I verified that the exact same number of tests (439) are run before and
after this change. For users, the primary difference is that you now
need to specify `--no-default-features --features pypi --features
python` to avoid (e.g.) including the Git tests.
2024-01-31 09:44:26 -05:00
Charlie Marsh
b2f1bbaa63
Add a Ctrl+C handler to the confirm workflow (#1202)
Fixes an issue whereby exiting the confirmation prompt can lead to your
cursor disappearing: https://github.com/console-rs/dialoguer/issues/294.

See:
b839a2c5b7/rye/src/main.rs (L36-L48).
2024-01-31 02:08:27 +00:00
Charlie Marsh
262f29b558
Add missing --exclude-newer to executable tests (#1201)
A new version of `platformdirs` came out, which broke these.
2024-01-30 20:26:11 -05:00
Charlie Marsh
b88b9e1f3d
Remove dedicated flate2 features from Puffin (#1199)
We should be able to enable and disable these without crate-internal
features.
2024-01-30 19:41:08 -05:00
Andrew Gallant
b47f70917f
puffin-client: simplify use of http-cache-semantics (#1197)
The `http-cache-semantics` crate is polymorphic on the types of requests
and responses it accepts. We had previously been explicitly converting
between `http` and `reqwest` types, but this isn't necessary. We can
provide impls of the traits in `http-cache-semantics` for `reqwest`'s
types (via a wrapper). This saves us from the awkward request/response
type conversions.

While this does clone the request, this is:

1. Not new. We were previously cloning the request to do the conversion.
2. An artifact (I believe) of http-cache-semantics API. (It kind of
   seems like an API bug to me?)

There is also a little bit of messiness around inter-operating between
http::uri::Uri and url::Url. But overall shouldn't be a big deal.
2024-01-30 18:20:44 -05:00
Charlie Marsh
7ae9d3c631
Remove Windows limitation from README (#1195) 2024-01-30 21:39:15 +00:00
Charlie Marsh
3f5e7306a5
Remove WaitMap dependency (#1183)
## Summary

This is an attempt to https://github.com/astral-sh/puffin/pull/1163 by
removing the `WaitMap` and gaining more granular control over the values
that we hold over `await` boundaries.
2024-01-30 15:25:22 -05:00
Charlie Marsh
c129717b41
Add support for --no-deps to pip install (#1191)
## Summary

Closes https://github.com/astral-sh/puffin/issues/1188.
2024-01-30 19:54:57 +00:00
Charlie Marsh
8305acc584
Add a builder for resolution options (#1192) 2024-01-30 19:50:16 +00:00
Charlie Marsh
aa3b79ec63
Prompt user for missing -r and -e flags in pip install (#1180)
## Summary

If the user runs a command like `pip install requirements.txt`, we now
prompt them to ask if they meant to include the `-r` flag:

![Screenshot 2024-01-29 at 8 38
29 PM](82b9f7a2-2526-4144-b200-a5015e5b8a4b)

![Screenshot 2024-01-29 at 8 38
33 PM](bd8ebb51-2537-4540-a0e0-718e66a1c69c)

The specific logic is: if the requirement ends in `.txt` or `.in`, and
the file exists locally, prompt the user for `-r`. If the requirement
contains a directory separator, and the directory exists locally, prompt
the user for `-e`.

Closes #1166.
2024-01-30 18:58:45 +00:00
Charlie Marsh
7a937e0f60
Error when parsing requirements.txt-like packages in requirements.txt file (#1179)
## Summary

Like https://github.com/astral-sh/puffin/pull/1180, this PR adds logic
for `requirements.txt` parsing whereby if a requirement _looks like_ a
local requirements file or an editable directory, we prompt the user to
correct the error (typically, by adding `-r`).
2024-01-30 18:55:11 +00:00
konsti
4ad0dc8b9e
Add windows aarch64 trampolines (#1190)
Lacking windows compatible aarch64 hardware, i cross compiled the
trampoline from x86_64 linux to aarch64-pc-windows-msvc; I added the
instructions to the puffin-trampoline readme. With some testing on an
aarch64 windows machine, this should be sufficient to build working
win_arm64 tagged wheels.

i686-pc-windows-msvc is failing with an error:

```
error: linking with `lld-link` failed: exit status: 1
  = note: lld-link: error: undefined symbol: __aulldiv
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced 4 more times

          lld-link: error: undefined symbol: __aullrem
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
          >>> referenced 4 more times
```
2024-01-30 17:51:27 +00:00
konsti
614bb0cf52
Update async_http_range_reader to 0.5.0 (#1189)
Removes a git dep and removes itertools 0.11
2024-01-30 16:32:53 +00:00
Charlie Marsh
c479c26cab
Add compatibility arguments for pip sync (#1185)
## Summary

As with `pip compile`, we can provide useful error messages and warnings
when people pass `pip sync` arguments.

Closes https://github.com/astral-sh/puffin/issues/1184.
2024-01-30 08:48:55 -05:00
konsti
ab27913f68
Instrument the main function and add jupyter.in (#1186)
Instrument the main function as anchor span for checking overhead and
update tracing-durations-export to 0.2.0 for differentiating
blocking/non-blocking tasks.

Add a `jupyter.in` requirement since `pip install jupyter` is a common
operation. I tried `jupyterlab` too but there is no difference in
performance (1.00 ± 0.07).
2024-01-30 11:03:24 +00:00
konsti
a6c4cbfe55
Cleanup puffin interpreter errors (#1169)
Use `virtualenv` consistently, remove unused error variants and hint the
user towards installing missing python versions.

I didn't touch the Readme but i replaced `virtualenv environment` with
`virtualenv` in the strings i found.

Fixes https://github.com/astral-sh/puffin/issues/1167
2024-01-30 10:52:46 +01:00
Charlie Marsh
bd934207e4
Accept relative file paths in CLI requirements (#1182)
## Summary

See: https://github.com/astral-sh/puffin/issues/1181.

## Test Plan

```
❯ cargo run -- pip install packse@../../zanieb/packse
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/puffin pip install 'packse@../../zanieb/packse'`
error: Distribution not found at: file:///Users/crmarsh/zanieb/packse
```
2024-01-30 03:31:24 +00:00
Charlie Marsh
61a3060383
Run cargo update (#1178) 2024-01-29 21:01:37 -05:00
Charlie Marsh
fa3c9afdc1
Deduplicate pep440_rs in dependency tree (#1177)
## Summary

Closes https://github.com/astral-sh/puffin/issues/1176.

## Test Plan

`cargo tree -p puffin -i pep440_rs` runs without error. Previously, it
errored due to multiple versions.
2024-01-29 16:11:42 -05:00
konsti
d4ed5ea858
Fix the compile_python_37 test with python 3.7 installed (#1172)
Make the test `compile_python_37` pass whether python 3.7 is installed
or not by muting the warning for a missing 3.7. The resolution error is
independent of whether 3.7 is installed or not.
2024-01-29 18:59:28 +01:00
Charlie Marsh
67a09649f2
Support parsing --find-links, --index-url, and --extra-index-url in requirements.txt (#1146)
## Summary

This PR adds support for `--find-links`, `--index-url`, and
`--extra-index-url` arguments when specified in a `requirements.txt`.

It's a mostly-straightforward change. The only uncertain piece is what
to do when multiple files include these flags, and/or when we include
them on the CLI and in other files.

In general:

- If _anything_ specifies `--no-index`, we respect it.
- We combine all `--extra-index-url` and `--find-links` across all
sources, since those are just vectors.
- If we see multiple `--index-url` in requirements files, we error.
- We respect the `--index-url` from the command line over any provided
in a requirements file.

(`pip-compile` seems to just pick one semi-arbitrarily when multiple are
provided.)

Closes https://github.com/astral-sh/puffin/issues/1143.
2024-01-29 15:06:40 +00:00
Charlie Marsh
4b9daf9604
Use tokio_tar instead of async_tar (#1170)
## Summary

`tokio_tar` is a fork of `async_tar` that uses Tokio instead of
`async-std`. Using it removes a significant dependency from our tree.

(There is an open PR
(https://github.com/dignifiedquire/async-tar/pull/41) in `async-tar` to
add Tokio support, but it's over a year old.)

See:
https://github.com/astral-sh/puffin/pull/1157#discussion_r1469190249.
2024-01-29 10:00:30 -05:00
Andrew Gallant
a42b385e9b
puffin-client: add SimpleMetadataRaw (#1150)
This adds what is effectively an owned wrapper around
`Archived<SimpleMetadata>`. Normally, an `Archived<SimpleMetadata>`
has to be used behind a pointer (since it has a lifetime
attached to its underlying byte buffer), but we create a
wrapper around it that owns the underlying buffer and provides
free access to the archived type.

This in effect creates an anchor point for the archived type
and lets us pass it around easily. (There has to be an anchor
point for it somewhere.)

An alternative to this approach would be to store it as a file
backed memory map. But in practice, we're dealing with small
files, and just reading them on to the heap is likely to be
faster. (Memory maps also have wildly different perf characteristics
across platforms.)

Note that this commit just defines the type. It isn't actually
used anywhere yet.
2024-01-29 09:37:06 -05:00
Charlie Marsh
d94cf0e763
Remove specific MUSL mention from README (#1171)
See:
https://github.com/astral-sh/puffin/pull/1158#discussion_r1469603073.
2024-01-29 13:50:23 +00:00
konsti
be48200642
Small instrumentation improvements (#1164)
Less verbose span fields for `Dist`s by using the display impl and no
more min length in the tracing durations plot config for comparability
(we lose spans due to a speedup otherwise). Both wait points in the
solver loop are now instrumented so we can inspect what we're waiting
for to progress in the solver.
2024-01-29 10:55:19 +00:00
konsti
8bfc3c1b37
Trim get_cached_with_callback and send_cached down some more. (#1128)
I noticed that `get_cached_with_callback` and `send_cached` are large
both in terms of llvm lines and in terms of types (and large types can
cause buffer overflows on windows). `get_cached_with_callback`
specifically is large because it's monomorphized for each callback. I've
split both functions into smaller units and boxed the callback.

llvm lines, before:

```
  Lines                 Copies               Function name
  -----                 ------               -------------
  909511                21625                (TOTAL)
   36026 (4.0%,  4.0%)     33 (0.2%,  0.2%)  <&mut rmp_serde::decode::Deserializer<R,C> as serde:🇩🇪:Deserializer>::deserialize_any
   14688 (1.6%,  5.6%)      8 (0.0%,  0.2%)  puffin_client::cached_client::CachedClient::get_cached_with_callback::{{closure}}::{{closure}}
   13748 (1.5%,  7.1%)      5 (0.0%,  0.2%)  puffin_client::cached_client::CachedClient::send_cached::{{closure}}
   12460 (1.4%,  8.5%)     35 (0.2%,  0.4%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
   10731 (1.2%,  9.6%)    122 (0.6%,  0.9%)  <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
    8952 (1.0%, 10.6%)      9 (0.0%,  1.0%)  core::slice::sort::partition_in_blocks
    8216 (0.9%, 11.5%)    323 (1.5%,  2.5%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    7745 (0.9%, 12.4%)    205 (0.9%,  3.4%)  core::result::Result<T,E>::map_err
    6862 (0.8%, 13.1%)     54 (0.2%,  3.7%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    6720 (0.7%, 13.9%)    133 (0.6%,  4.3%)  std::panicking::try
    6600 (0.7%, 14.6%)     45 (0.2%,  4.5%)  <alloc::sync::Weak<T,A> as core::ops::drop::Drop>::drop
    5899 (0.6%, 15.2%)     33 (0.2%,  4.6%)  rmp_serde::decode::Deserializer<R,C>::read_str_data
    5610 (0.6%, 15.9%)     33 (0.2%,  4.8%)  alloc::raw_vec::RawVec<T,A>::allocate_in
    5187 (0.6%, 16.4%)    133 (0.6%,  5.4%)  std::panicking::try::do_catch
    4740 (0.5%, 17.0%)    268 (1.2%,  6.7%)  core::ops::function::FnOnce::call_once
    4670 (0.5%, 17.5%)     40 (0.2%,  6.8%)  puffin_client::cached_client::CachedClient::get_cached_with_callback::{{closure}}::{{closure}}::{{closure}}
    4527 (0.5%, 18.0%)     54 (0.2%,  7.1%)  core::iter::traits::iterator::Iterator::try_fold
```

after:

```
  Lines                 Copies               Function name
  -----                 ------               -------------
  910275                21712                (TOTAL)
   36026 (4.0%,  4.0%)     33 (0.2%,  0.2%)  <&mut rmp_serde::decode::Deserializer<R,C> as serde:🇩🇪:Deserializer>::deserialize_any
   12460 (1.4%,  5.3%)     35 (0.2%,  0.3%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
   10935 (1.2%,  6.5%)    124 (0.6%,  0.9%)  <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
    8952 (1.0%,  7.5%)      9 (0.0%,  0.9%)  core::slice::sort::partition_in_blocks
    8714 (1.0%,  8.5%)      5 (0.0%,  0.9%)  puffin_client::cached_client::CachedClient::send_cached_handle_stale::{{closure}}
    8216 (0.9%,  9.4%)    323 (1.5%,  2.4%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    8192 (0.9%, 10.3%)      8 (0.0%,  2.5%)  puffin_client::cached_client::CachedClient::get_cached_with_callback::{{closure}}::{{closure}}
    7745 (0.9%, 11.1%)    205 (0.9%,  3.4%)  core::result::Result<T,E>::map_err
    6862 (0.8%, 11.9%)     54 (0.2%,  3.7%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    6778 (0.7%, 12.6%)      5 (0.0%,  3.7%)  puffin_client::cached_client::CachedClient::send_cached::{{closure}}
    6720 (0.7%, 13.4%)    133 (0.6%,  4.3%)  std::panicking::try
    6600 (0.7%, 14.1%)     45 (0.2%,  4.5%)  <alloc::sync::Weak<T,A> as core::ops::drop::Drop>::drop
    5899 (0.6%, 14.7%)     33 (0.2%,  4.7%)  rmp_serde::decode::Deserializer<R,C>::read_str_data
    5610 (0.6%, 15.3%)     33 (0.2%,  4.8%)  alloc::raw_vec::RawVec<T,A>::allocate_in
    5187 (0.6%, 15.9%)    133 (0.6%,  5.4%)  std::panicking::try::do_catch
    4740 (0.5%, 16.4%)    268 (1.2%,  6.7%)  core::ops::function::FnOnce::call_once
    4527 (0.5%, 16.9%)     54 (0.2%,  6.9%)  core::iter::traits::iterator::Iterator::try_fold
```

Stack sizes diff:
https://gist.github.com/konstin/a3f38276aacf1170038a756c8c49793c
2024-01-29 08:31:27 +00:00
Zanie Blue
ebd8cd425d
Use large Windows runner (#1134) 2024-01-29 08:34:40 +01:00
Charlie Marsh
fe03e74669
Add platform support to the README (#1158)
Closes https://github.com/astral-sh/puffin/issues/1149.
2024-01-28 22:52:25 -05:00
Charlie Marsh
fa3f0d7a55
Remove cache purge methods to clean (#1159)
This is more consistent with the public interface.
2024-01-28 21:15:11 -05:00
Charlie Marsh
d88ce76979
Stream unpacking of source distribution downloads (#1157)
This PR migrates our source distribution downloads to unzip as we
stream, similar to our approach for wheels.

In my testing, this showed a consistent speedup (e.g., 6% here for a few
representative source distributions):

```text
❯ python -m scripts.bench --puffin-path ./target/release/main --puffin-path ./target/release/puffin --benchmark install-cold requirements.in
Benchmark 1: ./target/release/main (install-cold)
  Time (mean ± σ):      1.503 s ±  0.039 s    [User: 1.479 s, System: 0.537 s]
  Range (min … max):    1.466 s …  1.605 s    10 runs

Benchmark 2: ./target/release/puffin (install-cold)
  Time (mean ± σ):      1.421 s ±  0.024 s    [User: 1.505 s, System: 0.593 s]
  Range (min … max):    1.381 s …  1.454 s    10 runs

Summary
  './target/release/puffin (install-cold)' ran
    1.06 ± 0.03 times faster than './target/release/main (install-cold)'
```
2024-01-28 20:09:24 -05:00
Andrew Gallant
5219d37250
add initial rkyv support (#1135)
This PR adds initial support for [rkyv] to puffin. In particular,
the main aim here is to make puffin-client's `SimpleMetadata` type
possible to deserialize from a `&[u8]` without doing any copies. This
PR **stops short of actuallying doing that zero-copy deserialization**.
Instead, this PR is about adding the necessary trait impls to a variety
of types, along with a smattering of small refactorings to make rkyv
possible to use.

For those unfamiliar, rkyv works via the interplay of three traits:
`Archive`, `Serialize` and `Deserialize`. The usual flow of things is
this:

* Make a type `T` implement `Archive`, `Serialize` and `Deserialize`.
rkyv
helpfully provides `derive` macros to make this pretty painless in most
  cases.
* The process of implementing `Archive` for `T` *usually* creates an
entirely
new distinct type within the same namespace. One can refer to this type
without naming it explicitly via `Archived<T>` (where `Archived` is a
clever
  type alias defined by rkyv).
* Serialization happens from `T` to (conceptually) a `Vec<u8>`. The
serialization format is specifically designed to reflect the in-memory
layout
  of `Archived<T>`. Notably, *not* `T`. But `Archived<T>`.
* One can then get an `Archived<T>` with no copying (albeit, we will
likely
need to incur some cost for validation) from the previously created
`&[u8]`.
This is quite literally [implemented as a pointer cast][rkyv-ptr-cast].
* The problem with an `Archived<T>` is that it isn't your `T`. It's
something
  else. And while there is limited interoperability between a `T` and an
`Archived<T>`, the main issue is that the surrounding code generally
demands
a `T` and not an `Archived<T>`. **This is at the heart of the tension
for
  introducing zero-copy deserialization, and this is mostly an intrinsic
problem to the technique and not an rkyv-specific issue.** For this
reason,
  given an `Archived<T>`, one can get a `T` back via an explicit
deserialization step. This step is like any other kind of
deserialization,
although generally faster since no real "parsing" is required. But it
will
  allocate and create all necessary objects.

This PR largely proceeds by deriving the three aforementioned traits
for `SimpleMetadata`. And, of course, all of its type dependencies. But
we stop there for now.

The main issue with carrying this work forward so that rkyv is actually
used to deserialize a `SimpleMetadata` is figuring out how to deal
with `DataWithCachePolicy` inside of the cached client. Ideally, this
type would itself have rkyv support, but adding it is difficult. The
main difficulty lay in the fact that its `CachePolicy` type is opaque,
not easily constructable and is internally the tip of the iceberg of
a rat's nest of types found in more crates such as `http`. While one
"dumb"-but-annoying approach would be to fork both of those crates
and add rkyv trait impls to all necessary types, it is my belief that
this is the wrong approach. What we'd *like* to do is not just use
rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to
get an `Archived<DataWithCachePolicy>` and make actual decisions used
the archived type directly. Doing that will require some work to make
`Archived<DataWithCachePolicy>` directly useful.

My suspicion is that, after doing the above, we may want to mush
forward with a similar approach for `SimpleMetadata`. That is, we want
`Archived<SimpleMetadata>` to be as useful as possible. But right
now, the structure of the code demands an eager conversion (and thus
deserialization) into a `SimpleMetadata` and then into a `VersionMap`.
Getting rid of that eagerness is, I think, the next step after dealing
with `DataWithCachePolicy` to unlock bigger wins here.

There are many commits in this PR, but most are tiny. I still encourage
review to happen commit-by-commit.

[rkyv]: https://rkyv.org/
[rkyv-ptr-cast]:
https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68
2024-01-28 12:14:59 -05:00
Zanie Blue
c0e7668dfa
Add bootstrapped installation in Python for Windows (#1130)
A 1:1 port of the Bash script to Python for use on Windows.

Pulls some parts of #1068 but much more minimal. Avoids an additional
dependency on `requests`. Because we require `zstandard` to unzip the
distributions we unfortunately cannot be dependency free and cannot have
`bootstrap.sh` download the Python version needed to run this script
without it doing a non-trivial amount of work.

Retains the Bash script for now so you can bootstrap without Python
available. I may drop it in the future?
2024-01-28 10:24:49 -06:00
Charlie Marsh
a25a1f2958
Avoid re-creating directories in async unzip (#1155)
This PR extends the optimizations from #1154 to other unzip paths.
2024-01-28 14:30:38 +00:00
Charlie Marsh
3d10f344f3
Only include visited packages in error message derivation (#1144)
## Summary

This is my guess as to the source of the resolver flake, based on
information and extensive debugging from @zanieb. In short, if we rely
on `self.index.packages` as a source of truth during error reporting, we
open ourselves up to a source of non-determinism, because we fetch
package metadata asynchronously in the background while we solve -- so
packages _could_ be included in or excluded from the index depending on
the order in which those requests are returned.

So, instead, we now track the set of packages that _were_ visited by the
solver. Visiting a package _requires_ that we wait for its metadata to
be available. By limiting analysis to those packages that were visited
during solving, we are faithfully representing the state of the solver
at the time of failure.

Closes #863
2024-01-28 09:27:22 -05:00
Charlie Marsh
6f2c235d21
Avoid re-creating directories during unzip (#1154)
## Summary

We have this optimization in `wheel.rs`, in the installer, but it makes
a huge difference for zips with many small files:

```
Benchmarking file_reader/Django-5.0.1-py3-none-any.whl: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 74.2s, or reduce sample count to 10.
file_reader/Django-5.0.1-py3-none-any.whl
                        time:   [751.63 ms 757.78 ms 764.27 ms]
                        change: [-1.0290% +0.0841% +1.2289%] (p = 0.88 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Benchmarking buffered_reader/Django-5.0.1-py3-none-any.whl: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 53.4s, or reduce sample count to 10.
buffered_reader/Django-5.0.1-py3-none-any.whl
                        time:   [529.86 ms 536.44 ms 543.35 ms]
                        change: [+0.0293% +1.5543% +3.1426%] (p = 0.05 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
```

That's almost 30% faster...
2024-01-28 00:07:54 -05:00
Charlie Marsh
888a9e6f53
Remove an unnecessary Path clone (#1153) 2024-01-28 03:16:51 +00:00
Charlie Marsh
d243250dec
Avoid unnecessary permissions changes for copy paths (#1152)
In Rust, `fs::copy` automatically preserves permissions (see:
https://doc.rust-lang.org/std/fs/fn.copy.html).

Elsewhere, when copying from the zip archive out to the cache, we can
set permissions during file creation, rather than as a separate call.

Both of these should be slightly more efficient.
2024-01-27 22:11:55 -05:00