Commit graph

4903 commits

Author SHA1 Message Date
Andrew Gallant
8102980192 puffin-resolver: make VersionMap construction lazy
That is, a `PrioritizedDistribution` for a specific version of a
package is not actually materialized in memory until a corresponding
`VersionMap::get` call is made for that version. Similarly, iteration
lazily materializes distributions as it moves through the map. It
specifically does not materialize everything first.

The main reason why this is effective is that an
`OwnedArchive<SimpleMetadata>` represents a zero-copy (other than
reading the source file) version of `SimpleMetadata` that is really just
a `Vec<u8>` internally. The problem with `VersionMap` construction
previously is that it had to eagerly materialize a `SimpleMetadata` in
memory before anything else, which defeats a large part of the purpose
of zero-copy deserialization. By making more of `VersionMap`
construction itself lazy, we permit doing some parts of resolution
without necessarily fully deserializing a `SimpleMetadata` into memory.
Indeed, with this commit, in the warm cached case, a `SimpleMetadata` is
itself never materialized fully in memory.

This does not completely and totally fully realize the benefits of
zero-copy deserialization. For example, we are likely still building
lots of distributions in memory that we don't actually need in some
cases. Perhaps in cases where no resolution exists, or when one needs to
iterate over large portions of the total versions published for a
package.
2024-02-15 08:10:32 -05:00
Andrew Gallant
e2f3ad0e28 puffin-resolver: add some trace calls
This commit adds some logging to candidate selection during
resolution. The idea with these logs is to get a signal on
how much "exploring" the resolver does in specific examples.

For example, this logs helped me realize that at least in
some cases, candidate selection was looking through a long list
of versions even when its range consisted of exactly one
version. We'll use this fact in a later commit.
2024-02-15 08:10:32 -05:00
Andrew Gallant
1cff7c3774 platform-tags: make Tags use an Arc internally
This makes cloning and thus sharing across multiple threads much
cheaper. Since Tags is conceptually immutable once it is constructed,
this doesn't pose an issue and shouldn't introduce any additional
costs.
2024-02-15 08:10:32 -05:00
Andrew Gallant
bdb491baf6 deps: bump pubgrub
This brings in a [PR] that makes `Range::as_singleton` return a
borrow.

[PR]: https://github.com/zanieb/pubgrub/pull/23
2024-02-15 08:10:32 -05:00
Charlie Marsh
e4fffc15f5
Remove Cargo-specific error messages (#1306)
We're leveraging Cargo's git implementation, but we left in some
Cargo-specific error messages for features we don't yet support.
2024-02-15 06:04:22 +00:00
Zanie Blue
9808c6b500
Reset all of the snapshots for consistent indentation (#1300)
This is really annoying, but the snapshots keep changing indentation
when updated.

I could not get insta to update them. So I added a print statement to
`main` and updated the snapshots, then removed the statement and updated
the snapshots again to force them all to refresh.
2024-02-14 12:50:28 -06:00
Charlie Marsh
40b74fb0fb
Replace MarkupSafe for no-binary tests (#1296) 2024-02-14 04:44:07 +00:00
Zanie Blue
36783743ba
Include slow tests in CI summary (#1295)
Show me the slow tests! ref
https://github.com/astral-sh/puffin/issues/878
2024-02-13 13:52:56 -06:00
Charlie Marsh
ea13d94c57
Fix dependency overrides link in README (#1297) 2024-02-13 17:09:18 +00:00
Zanie Blue
7fec2a311a
Refactor storage of distribution metadata needed in resolver (#1291)
Follows #1290 and https://github.com/astral-sh/puffin/pull/912 with some
minor clean-up.
2024-02-13 04:19:21 +00:00
Zanie Blue
3bff8d5f79
Add scenario coverage for wheels with incompatible ABI and Python tags (#1285)
We use

- An arbitrary ABI hash: `MMMMMM` (six base64 characters)
- An unlikely Jython27 Python tag

For cases that are valid but are never going to be available during
tests.

See https://github.com/zanieb/packse/pull/109
2024-02-12 22:14:38 -06:00
Zanie Blue
b5dd8b7de2
Track yanked versions as incompatibilities (#1290)
Moves yanked version filtering from `VersionMap::from_metadata` to the
resolver and tracks it as a PubGrub unavailable incompatibility so
yanked versions are reflected in error messages.

e.g. before
```
╰─▶ Because only albatross<=0.1.0 is available and you require albatross>0.1.0, 
       we can conclude that the requirements are unsatisfiable.
```

after

```
╰─▶ Because only the following versions of albatross are available:
            albatross<=0.1.0
            albatross==1.0.0
      and albatross==1.0.0 is unusable because it was yanked, we can conclude that albatross>0.1.0 cannot be used.
      And because you require albatross>0.1.0, we can conclude that the requirements are unsatisfiable.
```
2024-02-12 22:01:17 -06:00
Charlie Marsh
d8619f668a
Surface errors for offline --find-links URLs (#1271)
## Summary

Ensures that if the user passes `--no-index` with `--find-links`, and
we're unable to access the HTML page, we show an appropriate hint.
2024-02-13 03:41:00 +00:00
Charlie Marsh
16bb80132f
Add an --offline mode (#1270)
## Summary

This PR adds an `--offline` flag to Puffin that disables network
requests (implemented as a Reqwest middleware on our registry client).
When `--offline` is provided, we also allow the HTTP cache to return
stale data.

Closes #942.
2024-02-13 03:35:23 +00:00
Zanie Blue
942e353f65
Change ordering of highlights in readme (#1289)
Also, shorten some more items
2024-02-12 18:45:26 -06:00
Zanie Blue
2a3e817d53
Shorten the novel features highlight in README (#1265) 2024-02-12 20:04:50 +00:00
Charlie Marsh
929715f4e2
Swap out Discord icon (#1287) 2024-02-12 19:29:22 +00:00
Zanie Blue
0cd6b7be8c
Fix incompatible wheel test scenarios (#1284)
I had specified the tags incorrectly
https://github.com/zanieb/packse/pull/105
2024-02-12 18:51:49 +00:00
Zanie Blue
6d24d998e0
Add scenarios for yanked packages (#1283) 2024-02-12 12:44:59 -06:00
Zanie Blue
336d12556c
Add scenario tests for --only-binary and --no-binary (#1279) 2024-02-12 11:21:14 -06:00
Charlie Marsh
c75eef28b5
Upgrade to miette v6.0.0 (#1272) 2024-02-11 23:23:27 -05:00
Charlie Marsh
b386590b3c
Add some compatibility arguments to puffin venv (#1282)
See: https://github.com/astral-sh/puffin/issues/1276.
2024-02-12 03:19:55 +00:00
Charlie Marsh
93b7a1140f
Allow virtualenv creation at existing, empty directories (#1281)
## Summary

If the directory exists but is empty, we should allow `puffin venv`
without erroring.

Also adds test cases for a variety of error cases.
2024-02-12 03:13:13 +00:00
Charlie Marsh
b7e3933fe7
Place editable requirements before non-editable requirements (#1278)
## Summary

`pip-compile` puts the editable requirements first.

Closes https://github.com/astral-sh/puffin/issues/1275.
2024-02-12 02:26:40 +00:00
Charlie Marsh
a16ec45d1f
Set an exclude cutoff for virtualenv tests (#1280)
## Summary

This test is failing since a new version of one of the seed packages was
uploaded.
2024-02-12 02:21:05 +00:00
Zanie Blue
a37b08808e
Implement pip compatible --no-binary and --only-binary options (#1268)
Updates our `--no-binary` option and adds a `--only-binary` option for
compatibility with `pip` which uses `:all:`, `:none:` and `<name>` for
specifying packages.

This required adding support for `--only-binary <name>` into our
resolver, previously it was only a boolean toggle.

Retains`--no-build` which is equivalent to `--only-binary :all:`. This
is common enough for safety that I would prefer it is available without
pip's awkward `:all:` syntax.

---------

Co-authored-by: konsti <konstin@mailbox.org>
2024-02-11 19:31:41 -06:00
Charlie Marsh
d98b3c3070
Strip UNC prefix when setting working directory (#1277)
## Summary

For PEP 517 builds, the current working directory needs to be set to the
directory of the source distribution. It turns out that on Windows, if
you use a UNC path for the working directory, then relative paths are
interpreted relative to the root of the current drive
([source](https://www.fileside.app/blog/2023-03-17_windows-file-paths/#paths-relative-to-the-root-of-the-current-drive)).
So, when builds attempted to resolve relative paths, they always
errored...

This PR ensures that we remove the UNC prefix when setting the current
working directory.

Closes #1238.

## Test Plan

I tested this on my Windows machine by installing `ujson` with
`--no-binary ujson`. (I don't want to add that specific test, since it's
really slow to build.)
2024-02-12 00:51:36 +00:00
Charlie Marsh
ba4c6e1a55
Remove unused deps (#1273) 2024-02-11 18:53:58 +00:00
Charlie Marsh
32aacc35a9
Bump version to v0.0.4 (#1269) 2024-02-09 16:42:17 -05:00
Zanie Blue
ffb19b9a52
Add a error messages highlight to the README (#1264) 2024-02-08 23:12:22 +00:00
konsti
561e33e353
Validate instead of discovering python patch version (#1266)
Contrary to our prior assumption, we can't reliably select a specific
patch version. With the deadsnakes PPA for example, `python3.12` is
installed into `PATH`, but `python3.12.1` isn't. Based on the assumption
(or rather, observation) that users have a single python patch version
per python minor version installed, generally the latest, we only check
if the installed patch version matches the selected patch version, if
any, instead of search for one.

In the process, i deduplicated the python discovery logic.
2024-02-08 22:38:00 +01:00
konsti
1dc9904f8c
Run the test suite on windows in CI (#1262)
Run `cargo test` on windows in CI, pulling the switch on tier 1 windows
support.

These changes make the bootstrap script virtually required for running
the tests. This gives us consistency between and CI, but it also locks
our tests to python-build-standalone and an articificial `PATH`.

I've deleted the shell bootstrap script in favor of only the python one,
which also runs on windows. I've left the (sym)link creation of the
bootstrap in place, even though it is not used by the tests anymore.

I've reactivated the three tests that would previously stack overflow by
doubling their stack sizes. The stack overflows only happen in debug
mode, so this is neither a user facing problem nor an actual problem
with our code and this workaround seems better than optimizing our code
for case that the (release) compiler can optimize much better for.

The handling of patch versions will be fixed in a follow-up PR.

Closes #1160 
Closes #1161

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-02-08 22:09:55 +01:00
Andrew Gallant
96276d9e3e
puffin-resolver: simplify version map construction (#1267)
In the process of making VersionMap construction lazy, I realized this
refactoring would be useful to me. It also simplifies a fair bit of case
analysis and does fewer BTreeMap lookups during construction. With that
said, this doesn't seem to matter for perf:

```
$ hyperfine -w10 --runs 50 \
    "puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" \
    "puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null"
Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     146.8 ms ±   4.1 ms    [User: 350.1 ms, System: 314.2 ms]
  Range (min … max):   140.7 ms … 158.0 ms    50 runs

Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     146.8 ms ±   4.5 ms    [User: 359.8 ms, System: 308.3 ms]
  Range (min … max):   138.2 ms … 160.1 ms    50 runs

Summary
  puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
    1.00 ± 0.04 times faster than puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
```

But the simplification is still nice, and will decrease the delta
between what we have now and a lazy version map.
2024-02-08 15:33:33 -05:00
Zanie Blue
fc2ab611d5
Use "locations" instead of "listings" for find links errors (#1263) 2024-02-07 10:28:22 -06:00
konsti
ab45485eb5
Reduce stack sizes further and ignore remaining tests (#1261)
This PR reduces the stack sizes a windows a little further using the
stack traces from stack overflows combined with looking at the type
sizes. Ultimately, it ignore the three remaining tests failing in debug
on windows due to stack overflows to unblock `cargo test` for windows on
CI.

444 tests run: 444 passed (39 slow), 1 skipped
2024-02-06 23:08:18 +01:00
konsti
e0cdf1a16f
Use anstream consistently and remove clippy lints (#1260)
We need to use the anstream print macros instead of the std print
macros, otherwise we risk wrong color behavior
(https://github.com/astral-sh/puffin/pull/1258#discussion_r1480428236).
Luckily, the `print_stderr` and `print_stdout` lints catch usages of the
std prints.

This PR switches over to anstream consistently and removes the now
redundant clippy lints. The lints should catch missing anstream usage in
the future.
2024-02-06 22:16:26 +01:00
konsti
f4ca175df4
Search and replace windows specific tests in deps (#1255)
Remove windows-only dependencies from the snapshot output using regex.
We now do the filtering entirely on our without relying on insta
settings.

435 tests run: 430 passed (30 slow), 5 failed, 1 skipped
2024-02-06 19:31:42 +00:00
konsti
ac49dec4a2
Multiple entries in PUFFIN_PYTHON_PATH for windows tests (#1254)
There are no binary installers for the latests patch versions of cpython
for windows, and building them is hard. As an alternative, we download
python-build-standanlone cpythons and put them into `<project
root>/bin`. On unix, we can symlink `pythonx.y.z` into this directory
and point `PUFFIN_PYTHON_PATH` to it. On windows, all pythons are called
`python.exe` and they don't like being linked. Instead, we add the path
to each directory containing a `python.exe` to `PUFFIN_PYTHON_PATH`,
similar to the regular `PATH`. The python discovery on windows was
extended to respect `PUFFIN_PYTHON_PATH` where needed.

These changes mean that we don't need to (sym)link pythons anymore and
could drop that part to the script.

435 tests run: 389 passed (21 slow), 46 failed, 1 skipped
2024-02-06 20:28:30 +01:00
Charlie Marsh
91118a962a
Offer tip when users omits pip prefix (#1257)
## Summary

Open to other opinions here. We could just continue (and warn), prompt
the user with a confirmation, etc.

(The weird thing about those two options is we might need to validate
the command-line arguments _before_ we do that -- so you could get
errors for bad arguments, and then get a warning that your subcommand is
wrong. I can probably avoid that with more work if it feels like a
better out come though.)

Closes https://github.com/astral-sh/puffin/issues/1256.
2024-02-06 19:25:07 +00:00
Charlie Marsh
62416286e2
Remove add and remove commands (#1259)
## Summary

These add and remove dependencies from a `pyproject.toml` -- but they're
currently hidden, and don't match the rest of the workflow. We can
re-add them when the time is right.
2024-02-06 14:18:27 -05:00
Zanie Blue
d4bbaf1755
Add hint for --no-index without --find-links (#1258)
Since unavailable packages with `--no-index` can be confusing when the
user does not also provide `--find-links` we add a hint for this case.
Required some plumbing to get the required information to the
`NoSolution` error.

---------

Co-authored-by: konstin <konstin@mailbox.org>
2024-02-06 11:04:14 -06:00
konsti
b2a810fe37
Add windows specific filters for tests (#1231)
Add more windows specific filters in various places.

435 tests run: 333 passed (21 slow), 102 failed, 1 skipped
2024-02-06 15:58:16 +01:00
Andrew Gallant
d4b4c21133
initial implementation of zero-copy deserialization for SimpleMetadata (#1249)
(Please review this PR commit by commit.)

This PR closes an initial loop on zero-copy deserialization. That
is, provides a way to get a `Archived<SimpleMetadata>` (spelled
`OwnedArchive<SimpleMetadata>` in the code) from a `CachedClient`. The
main benefit of zero-copy deserialization is that we can read bytes
from a file, cast those bytes to a structured representation without
cost, and then start using that type as any other Rust type. The
"catch" is that the structured representation is not the actual type
you started with, but the "archived" version of it.

In order to make all this work, we ended up needing to shave a rather
large yak: we had to re-implement HTTP cache semantics. Previously,
we were using the `http-cache-semantics` crate. While it does support
Serde, it doesn't support `rkyv`. Moreover, even simple support for
`rkyv` wouldn't be enough. What we actually want is for the HTTP cache
semantics to be implemented on the *archived* type so that we can
decide whether our cached response is stale or not without needing to
do a full deserialization into the unarchived type. This is why, in
this PR, you'll see `impl ArchivedCachePolicy { ... }` instead of
`impl CachePolicy { ... }`. (The `derive(rkyv::Archive)` macro
automatically introduces the `ArchivedCachePolicy` type into the
current namespace.)

Unfortunately, this PR does not fully realize the dream that is
zero-copy deserialization. Namely, while a `CachedClient` can now
provide an `OwnedArchive<SimpleMetadata>`, the rest of our code
doesn't really make use of it. Indeed, as soon as we go to build a
`VersionMap`, we eagerly convert our archived metadata into an owned
`SimpleMetadata` via deserialization (that *isn't* zero-copy). After
this change, a lot of the work now shifts to `rkyv` deserialization
and `VersionMap` construction. More precisely, the main thing we drop
here is `CachePolicy` deserialization (which is now truly zero-copy)
and the parsing of the MessagePack format for `SimpleMetadata`. But we
are still paying for deserialization. We're just paying for it in a
different place.

This PR does seem to bring a speed-up, but it is somewhat underwhelming.
My measurements have been pretty noisy, but I get a 1.1x speedup fairly
often:

```
$ hyperfine -w5 "puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" "puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null" ; A kang
Benchmark 1: puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     164.4 ms ±  18.8 ms    [User: 427.1 ms, System: 348.6 ms]
  Range (min … max):   131.1 ms … 190.5 ms    18 runs

Benchmark 2: puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
  Time (mean ± σ):     148.3 ms ±  10.2 ms    [User: 357.1 ms, System: 319.4 ms]
  Range (min … max):   136.8 ms … 184.4 ms    19 runs

Summary
  puffin-test pip compile --cache-dir ~/astral/tmp/cache-test ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null ran
    1.11 ± 0.15 times faster than puffin-main pip compile --cache-dir ~/astral/tmp/cache-main ~/astral/tmp/reqs/home-assistant-reduced.in -o /dev/null
```

One downside is that this does increase cache size (`rkyv`'s
serialization format is not as compact as MessagePack). On disk size
increases by about 1.8x for our `simple-v0` cache.

```
$ sort-filesize cache-main
4.0K    cache-main/CACHEDIR.TAG
4.0K    cache-main/.gitignore
8.0K    cache-main/interpreter-v0
8.7M    cache-main/wheels-v0
18M     cache-main/archive-v0
59M     cache-main/simple-v0
109M    cache-main/built-wheels-v0
193M    cache-main
193M    total

$ sort-filesize cache-test
4.0K    cache-test/CACHEDIR.TAG
4.0K    cache-test/.gitignore
8.0K    cache-test/interpreter-v0
8.7M    cache-test/wheels-v0
18M     cache-test/archive-v0
107M    cache-test/simple-v0
109M    cache-test/built-wheels-v0
242M    cache-test
242M    total
```

Also, while I initially intended to do a simplistic implementation of
HTTP cache semantics, I found that everything was somewhat
inter-connected. I could have wrote code that _specifically_ only worked
with the present behavior of PyPI, but then it would need to be special
cased and everything else would need to continue to use
`http-cache-sematics`. By implementing what we need based on what Puffin
actually is (which is still less than what `http-cache-semantics` does),
we can avoid special casing and use zero-copy deserialization for our
cache policy in _all_ cases.
2024-02-05 16:47:53 -05:00
Charlie Marsh
398659a9b0
Show yank warnings for pip install (#1253)
Closes https://github.com/astral-sh/puffin/issues/1252.
2024-02-05 17:15:44 +00:00
Zanie Blue
d090acf13d
Improve error messaging when a dependency is not found (#1241)
Previously, whenever we encountered a missing package we would throw an
error without information about why the package was requested. This
meant that if a transitive dependency required a missing package, the
user would have no idea why it was even selected. Here, we track
`NotFound` and `NoIndex` errors as `NoVersions` incompatibilities with
an attached reason. Improves our test coverage for `--no-index` without
`--find-links`.

The
[snapshots](https://github.com/astral-sh/puffin/pull/1241/files#diff-3eea1658f165476252f1f061d0aa9f915aabdceafac21611cdf45019447f60ec)
show a nice improvement.

I think this will also enable backtracking to another version if some
version of transitive dependency has a missing dependency. I'll write a
scenario for that next.

Requires https://github.com/zanieb/pubgrub/pull/22
2024-02-05 08:43:05 -06:00
Charlie Marsh
be9125b0f0
Remove unnecessary is_dir in clone_recursive (#1247) 2024-02-04 23:54:22 +00:00
Charlie Marsh
9d42cfd09b
Clarify documentation for --no-index (#1243)
Closes #1242.
2024-02-04 18:46:01 -05:00
konsti
8116f6131a
Document profiling and tracing-durations-export (#1234)
Profiling is already extensively documented in ruff, so we can just link
to that guide. I added `tracing-durations-export` to the guide because i
found it a useful tool for optimizing.
2024-02-04 21:57:16 +00:00
Charlie Marsh
0e35041ccd
Add Python version support (#1239)
Closes https://github.com/astral-sh/puffin/issues/1221.
2024-02-04 21:56:00 +00:00
Andrew Gallant
586eeb6eca
tests: update snapshot for new pip release (#1245)
See: https://pypi.org/project/pip/#history
2024-02-03 15:44:13 -05:00