Commit graph

189 commits

Author SHA1 Message Date
konsti
5820a9d937
Update dependencies (#794)
Pull in a bunch of updates so they get some testing before we announce
the project. textwrap 0.16 is blocked on miette updating, http 1.0 on
reqwest.
2024-01-05 11:40:12 -05:00
Charlie Marsh
cfffcbb269
Cancel waiting tasks on resolution error (#753)
## Summary

I don't understand why this works (because I don't understand why it's
erroring) but it does. See:
https://github.com/astral-sh/puffin/pull/746#issuecomment-1875722454.

## Test Plan

```
cargo run --bin puffin pip-install requires-transitive-incompatible-with-transitive-8329cfc0 --extra-index-url https://test.pypi.org/simple -n
```
2024-01-03 20:18:27 +00:00
Charlie Marsh
a8e52d2899
Split resolver.rs into a module (#752)
This is just getting hard to navigate. No code changes, just moving
stuff around.
2024-01-03 14:02:30 -05:00
Charlie Marsh
48c7359622
Always simplify dependency sets (#748)
`simplify_set` can itself simplify to the full range, so it seems like
we should be checking if the set is `Range::full` _after_ simplifying
rather than before.
2024-01-03 13:21:03 -05:00
Charlie Marsh
607a5bee6d
Use register_owned in prefetch path (#750) 2024-01-03 17:31:23 +00:00
Charlie Marsh
fd556ccd44
Model Python version as a PubGrub package (#745)
## Summary

This PR modifies the resolver to treat the Python version as a package,
which allows for better error messages (since we no longer treat
incompatible packages as if they "don't exist at all").

There are a few tricky pieces here...

First, we need to track both the interpreter's Python version and the
_target_ Python version, because we support resolving for other versions
via `--python 3.7`.

Second, we allow using incompatible wheels during resolution, as long as
there's a compatible source distribution. So we still need to test for
`requires-python` compatibility when selecting distributions.

This could use more testing, but it feels like an area where `packse`
would be more productive than writing PyPI tests.

Closes https://github.com/astral-sh/puffin/issues/406.
2024-01-03 15:20:45 +00:00
Charlie Marsh
5a98add54e
Always pre-fetch distribution metadata (#744)
This PR fixes our prefetching logic to ensure that we always attempt to
prefetch the "best-guess" distribution for all dependencies. This logic
already existed, but because we only attempted to prefetch when package
metadata was available, it almost never triggered. Now, we wait for the
package metadata to become available, _then_ kick off the "best-guess"
prefetch (for every package).

In my testing, this dramatically improves performance (like 2x). I'm
wondering if this regressed at some point?

Closes #743.

Co-authored-by: konsti <konstin@mailbox.org>
2024-01-03 11:37:45 +01:00
Charlie Marsh
ba23115465
Remove some package clones (#749) 2024-01-02 23:21:46 -05:00
Charlie Marsh
94076d6000
Use dependency package when simplifying dependency set (#747)
This manifested itself here:
https://github.com/astral-sh/puffin/pull/745/files#r1439912440.
2024-01-02 20:26:56 -06:00
konsti
26f597a787
Add spans to all significant tasks (#740)
I've tried to investigate puffin's performance wrt to builds and
parallelism in general, but found the previous instrumentation to
granular. I've tried to add spans to every function that either needs
noticeable io or cpu resources without creating duplication. This also
fixes some wrong tracing usage on async functions
(https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code)
and some spans that weren't actually entered.
2024-01-02 16:17:03 +00:00
konsti
35f6ea204b
Remove Box::pin usages (#738)
Rust 1.75 update follow-up, simplifies the code.
2023-12-29 15:49:12 +00:00
Charlie Marsh
2cfa4a3574
Add a dedicated error message to hint users towards enabling pre-releases (#697)
This PR adds a dedicated error message for resolutions that fail, but
might've succeeded if pre-releases were allowed. Specifically, if we see
a failed resolution, and failed to find a version for a package that
included a pre-release marker, we add a hint nudging the user to
explicitly enable all pre-releases.

We'd prefer a solution like
https://github.com/astral-sh/puffin/pull/666, but believe that it will
break some assumptions in PubGrub, so this is the lighter-weight
solution.

Closes https://github.com/astral-sh/puffin/issues/659.
2023-12-28 21:44:35 -05:00
konsti
2d4cb1ebf2
Rust 1.75 (#736)
The `async fn` and return-position `impl Trait` in traits improve
`BuildContext` ergonomics. The traits use `impl Future` over `async fn`
to make the send bound explicit
(https://blog.rust-lang.org/2023/12/21/async-fn-rpit-in-traits.html).

The remaining changes are due to clippy.
2023-12-28 16:08:35 -04:00
Charlie Marsh
007f52bb4e
Add support for relative URLs in simple metadata responses (#721)
## Summary

This PR adds support for relative URLs in the simple JSON responses. We
already support relative URLs for HTML responses, but the handling has
been consolidated between the two. Similar to index URLs, we now store
the base alongside the metadata, and use the base when resolving the
URL.

Closes #455.

## Test Plan

`cargo test` (to test HTML indexes). Separately, I also ran `cargo run
-p puffin-cli -- pip-compile requirements.in -n
--index-url=http://localhost:3141/packages/pypi/+simple` on the
`zb/relative` branch with `packse` running, and forced both HTML and
JSON by limiting the `accept` header.
2023-12-27 08:53:21 -05:00
Charlie Marsh
188ab75769
Split File into internal and external type (#729)
## Summary

This PR makes the `pypi_types::File` a response-only type (i.e., a type
that's only used when deserializing over the wire), and adds a separate
internal `File` type. Right now, the representations are similar, but
already, we can avoid the "lenient" deserialization on our internal
`File` type, and avoid the special-casing of the property names that's
required in the JSON. Over time, we can evolve this representation
entirely separately from the representation we receive from PyPI and
other indexes.
2023-12-25 15:42:28 -05:00
Charlie Marsh
6ff21374dc
Split puffin-cache into Puffin-specific and generic utilities (#728)
This crate started off as generic caching utilities, but we started
adding a lot of Puffin-specific stuff (like the cache buckets
abstraction that knows about Git vs. direct URL vs. indexes and so on).
This PR moves the generic stuff into a new `cache-key` crate.
2023-12-25 14:38:56 +00:00
konsti
e60f0ec732
Update pubgrub (#713)
Easier than i expected: We simply never construct the pubgrub error
variants since we have our own main loop. The `unreachable!()`s can be
removed when never is stabilized
2023-12-20 23:56:59 +01:00
konsti
9f8b7e7e12
Refactor DistFinder to allow handling errors (#709)
For the install tests, i need the ability to ignore failures in the
`DistFinder`. To avoid just copy&pasting a version that collects errors
separately, i followed
https://gendignoux.com/blog/2021/04/01/rust-async-streams-futures-part1.html
and switched the custom channel over to an async stream yielding
`Result` items.

I like the async streams mirror the normal iterator api.
2023-12-20 04:07:55 +00:00
Andrew Gallant
aa9f47bbde
improve tests for version parser (#696)
The high level goal here is to improve the tests for the version parser.
Namely, we now check not just that version strings parse successfully,
but that they parse to the expected result.

We also do a few other cleanups. Most notably, `Version` is now an
opaque type so that we can more easily change its representation going
forward.

Reviewing commit-by-commit is suggested. :-)
2023-12-19 12:25:32 -05:00
Charlie Marsh
6f90edda78
Reduce visibility of PubGrubReportFormatter (#699) 2023-12-19 08:53:38 -06:00
Charlie Marsh
878bb5c035
Remove remaining snapshot files from resolver test (#698) 2023-12-19 05:41:50 +00:00
Charlie Marsh
3660d8a08e
Introduce separate traits for ahead-of-time and installed metadata (#692)
This is a pure refactor to follow-up #690, to separate the metadata that
we know upfront about distributions (like the version, for
registry-based distributions) vs. the metadata that requires building
(like the version, for URL-based distributions).
2023-12-18 22:37:45 +00:00
Charlie Marsh
365c860e27
Show fully-resolved URLs in non-resolution contexts (#689)
We now show the fully-resolved URL, rather than the URL as given by the
user, _everywhere_ except for the output resolution file (which should
retain relative paths, unexpanded environment variables, etc.).

Closes https://github.com/astral-sh/puffin/issues/687.
2023-12-18 22:10:24 +00:00
Charlie Marsh
207bb83a1c
Rename puffin-warnings macros to avoid tracing collision (#694)
Also more consistent with Ruff.
2023-12-18 21:33:21 +00:00
Charlie Marsh
0bb2c92246
Add editable install support to pip-install (#675)
Per the title: adds support for `-e` installs to `puffin pip-install`.
There were some challenges here around threading the editable installs
to the right places. Namely, we want to build _once_, then reuse the
editable installs from the resolution. At present, we were losing the
`editable: true` flag on the `Dist` that came back through the
resolution, so it required some changes to the resolver.

Closes https://github.com/astral-sh/puffin/issues/672.
2023-12-18 09:52:32 +01:00
konsti
f059c6e6a6
Support editable in pip-sync and pip-compile (#587)
Support `-e path/do/dir` in pip-sync and and pip-compile.
2023-12-16 22:37:34 +00:00
Charlie Marsh
9470c20e7a
Avoid double resolution during source builds (#656)
## Summary

This PR ensures that we re-use the resolution to install the build
dependencies when building a source distribution. Currently, we only
pass along the list of requirements, and then use the `Finder` to map
each requirement to a distribution. But we already determine the correct
distribution when resolving!

Closes https://github.com/astral-sh/puffin/issues/655.
2023-12-15 17:27:16 +00:00
Charlie Marsh
ed8dfbfcf7
Preserve verbatim URLs (#639)
## Summary

This PR adds a `VerbatimUrl` struct to preserve verbatim URLs throughout
the resolution and installation pipeline. In short, alongside the parsed
`Url`, we also keep the URL as written by the user. This enables us to
display the URL exactly as written by the user, rather than the
serialized path that we use internally.

This will be especially useful once we start expanding environment
variables since, at that point, we'll be able to write the version of
the URL that includes the _unexpected_ environment variable to the
output file.
2023-12-14 15:03:39 +00:00
Charlie Marsh
eef9612719
Allow reporters to take dyn Metadata (#645) 2023-12-14 12:36:28 +01:00
Charlie Marsh
8071a23863
Add dedicated ID types to avoid opaque strings (#642)
This allows us to enforce type safety within the resolver. For example,
in the index, we can remove `String` as a key type and enforce that
callers _must_ present us with a `PackageId`. (This actually caught one
bug, where we were using the SHA rather than the package ID. That bug
shouldn't have had any effect given where it was, since those are 1:1,
but it's still problematic.)
2023-12-14 00:53:33 +00:00
Charlie Marsh
3549d9638e
Inline all snapshot files (#641)
Right now, we're inconsistent between checking in and inlining these.
The outputs are small in Puffin, so let's just inline them in all cases.
2023-12-14 00:35:38 +00:00
Charlie Marsh
2da6563a64
Use Manifest::simple in tests (#638) 2023-12-13 17:41:29 +00:00
Charlie Marsh
eb1a630db2
Avoid hard-error for non-existent extras (#627)
## Summary

When resolving `transformers[tensorboard]`, the `[tensorboard]` extra
doesn't exist. Previously, we returned "unknown" dependencies for this
variant, which leads the resolution to try all versions, then fail. This
PR instead warns, but returns the base dependencies for the package,
which matches `pip`. (Poetry doesn't even warn, it just proceeds as
normal.)

Arguably, it would be better to return a custom incompatibility here and
then propagate... But this PR is better than the status quo, and I don't
know if we have support for that behavior yet...? (\cc @zanieb)

Closes #386.

Closes https://github.com/astral-sh/puffin/issues/423.
2023-12-13 17:36:27 +00:00
Charlie Marsh
69581c03c3
Enable package overrides in pip-compile (#631)
## Summary

This PR enables overrides to be passed to `pip-compile` and
`pip-install` via a new `--overrides` flag.

When overrides are provided, we effectively replace any requirements
that are overridden with the overridden versions. This is applied at all
depths of the tree.

The merge semantics are such that we replace _all_ requirements of a
package with _all_ requirements from the overrides files. So, for
example, if a package declares:

```
foo >= 1.0; python_version < '3.11'
foo < 1.0; python_version >= '3.11'
```

And the user provides an override like:
```
foo >= 2.0
```

Then _both_ of the `foo` requirements in the package will be replaced
with the override.

If instead, the user provided an override like:
```
foo >= 2.0; python_version < '3.11'
foo < 3.0; python_version >= '3.11'
```

Then we'd replace _both_ of the original `foo` requirements with both of
these overrides. (In technical terms, for each package in the
requirements file, we flat-map over its overrides.)

Closes https://github.com/astral-sh/puffin/issues/511.
2023-12-13 15:03:38 +00:00
Charlie Marsh
cbfd39093e
Clean up some function signatures (#633) 2023-12-13 06:21:47 +00:00
Charlie Marsh
a24eb57e93
Make warnings user-facing (#628)
## Summary

Now, `puffin_warnings::warn_once` and `puffin_warnings::warn` will go to
`stderr`, as long as the user isn't running under `--quiet`. Previously,
these went through `tracing`, and so were only visible when running
under `--verbose`.
2023-12-12 21:24:38 -05:00
Zanie Blue
490fb55ac5
Use available versions to simplify unsat error reports (#547)
Uses https://github.com/pubgrub-rs/pubgrub/pull/156 to consolidate
version ranges in error reports using the actual available versions for
each package.

Alternative to https://github.com/zanieb/pubgrub/pull/8 which implements
this behavior as a method in the `Reporter` — here it's implemented in
our custom report formatter (#521) instead which requires no upstream
changes.

Requires https://github.com/zanieb/pubgrub/pull/11 to only retrieve the
versions for packages that will be used in the report.

This is a work in progress. Some things to do:
- ~We may want to allow lazy retrieval of the version maps from the
formatter~
- [x] We should probably create a separate error type for no solution
instead of mixing them with other resolve errors
- ~We can probably do something smarter than creating vectors to hold
the versions~
- [x] This degrades error messages when a single version is not
available, we'll need to special case that
- [x] It seems safer to coerce the error type in `resolve` instead of
`solve` if feasible
2023-12-12 23:25:16 +00:00
konsti
a24a681db9
Towards using prepare_metadata_for_build_wheel in the resolver (#616)
Make `prepare_metadata_for_build_wheel` accessible across the puffin
codebase by splitting the built call into a setup, a metadata and a
wheel call. This does not actually use the hook yet, but it's the
required refactoring for it.

Part of #599.
2023-12-12 20:45:37 +00:00
Charlie Marsh
85c37b2b9c
Add extra to debug logging (#625) 2023-12-12 20:09:09 +00:00
Charlie Marsh
f459e1ee50
Use a non-async Mutex in OnceMap (#624)
I don't know why, but this seems to resolve
https://github.com/astral-sh/puffin/issues/619. The Tokio docs also say
that using Tokio's Mutex is _not_ recommended unless you need to hold
the Mutex across an `.await`, which we don't.

Since this is a non-deterministic failure, I just ran it a bunch of
times and ensured it didn't hang (whereas it did hang occasionally prior
to this PR).

Closes https://github.com/astral-sh/puffin/issues/619
2023-12-12 14:59:45 -05:00
Charlie Marsh
3aaab32a9d
Omit extra in resolver progress (#623)
Closes #621.
2023-12-12 12:41:18 -05:00
Charlie Marsh
6c7f5cb846
Validate installed packages in virtual environment (#611)
## Summary

Now, after running `pip-install`, we validate that the set of installed
packages is consistent -- that is, that we don't have any packages that
are missing dependencies, or incompatible versions of installed
dependencies.
2023-12-12 17:33:38 +00:00
Charlie Marsh
c764155988
Avoid double-resolving during pip-install (#610)
## Summary

At present, when performing a `pip-install`, we first do a resolution,
then take the set of requirements and basically run them through our
`pip-sync`, which itself includes re-resolving the dependencies to get a
specific `Dist` for each package. (E.g., the set of requirements might
say `flask==3.0.0`, but the installer needs a specific _wheel_ or source
distribution to install.)

This PR removes this second resolution by exposing the set of pinned
packages from the resolution. The main challenge here is that we have an
optimization in the resolver such that we let the resolver read metadata
from an incompatible wheel as long as a source distribution exists for a
given package. This lets us avoid building source distributions in the
resolver under the assumption that we'll be able to install the package
later on, if needed. As such, the resolver now needs to track the
resolution and installation filenames separately.
2023-12-12 17:29:09 +00:00
Charlie Marsh
1181288078
Download, build, and install in a single pipeline phase (#605)
## Summary

At present, we have two separate phases within the installation pipeline
related to populating wheels into the cache. The first phase downloads
the distribution, and then builds any source distributions into wheels;
the second phase unzips all the built wheels into the cache.

This PR merges those two phases into one, such that we seamlessly
download, build, and unzip wheels in one pass. This is more efficient,
since we can start unzipping while we build. It also ensures that if the
install _fails_ partway through, we don't end up with a bunch of
downloaded wheels that we never had a chance to unzip. The code is also
much simpler.

The main downside is that the user-facing feedback isn't as granular,
since we only have one phase and one progress bar for what was
originally three distinct phases.

Closes https://github.com/astral-sh/puffin/issues/571.

## Test Plan

I ran the benchmark script on two separate requirements files, and saw a
7% and 31% speedup respectively:

```text
+ TARGET=./scripts/benchmarks/requirements.txt
+ hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache'
Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache
  Time (mean ± σ):     269.4 ms ±  33.0 ms    [User: 42.4 ms, System: 117.5 ms]
  Range (min … max):   221.7 ms … 446.7 ms    100 runs

Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache
  Time (mean ± σ):     250.6 ms ±  28.3 ms    [User: 41.5 ms, System: 127.4 ms]
  Range (min … max):   207.6 ms … 336.4 ms    100 runs

Summary
  './target/release/puffin pip-sync ./scripts/benchmarks/requirements.txt --no-cache' ran
    1.07 ± 0.18 times faster than './target/release/main pip-sync ./scripts/benchmarks/requirements.txt --no-cache'
```

```text
+ TARGET=./scripts/benchmarks/requirements-large.txt
+ hyperfine --runs 100 --warmup 10 --prepare 'virtualenv --clear .venv' './target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' --prepare 'virtualenv --clear .venv' './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache'
Benchmark 1: ./target/release/main pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache
  Time (mean ± σ):      5.053 s ±  0.354 s    [User: 1.413 s, System: 6.710 s]
  Range (min … max):    4.584 s …  6.333 s    100 runs

Benchmark 2: ./target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache
  Time (mean ± σ):      3.845 s ±  0.225 s    [User: 1.364 s, System: 6.970 s]
  Range (min … max):    3.482 s …  4.715 s    100 runs

Summary
  './target/release/puffin pip-sync ./scripts/benchmarks/requirements-large.txt --no-cache' ran
```
2023-12-11 15:42:29 +00:00
konsti
b84fbb86b2
Impl Version debug as display (#606)
Currently, `dbg!` is hard to read because versions are verbose, showing
all optional fields, and we have a lot of versions. Changing debug
formatting to displaying the version number (which can be losslessly
converted to the struct and back) makes this more readable.

See e.g.
https://gist.github.com/konstin/38c0f32b109dffa73b3aa0ab86b9662b

**Before**

```text
version: Version {
    epoch: 0,
    release: [
        1,
        2,
        3,
    ],
    pre: None,
    post: None,
    dev: None,
    local: None,
},
```

**After**

```text
version: "1.2.3",
```
2023-12-11 16:38:14 +01:00
Charlie Marsh
a24534b0ce
Use rustc-hash instead of fxhash crate (#594)
`fxhash` is the old, less maintained version of this crate
(`rustc-hash`). We use the latter in Ruff.
2023-12-08 20:27:49 +00:00
konsti
6005d7a552
Keep track of in flight unzips using OnceMap (#544)
I saw warnings when we were e.g. unzipping wheel and setuptools in two
tasks at the same time. We now keep track of in flight unzips.

This introduces a `OnceMap` abstraction which we also use in the
resolver.
2023-12-08 20:18:11 +00:00
Zanie Blue
ef7be9103c
Parse SimpleJson into categorized data in the client (#522)
Extends #517 with a suggestion from @konstin to parse the `SimpleJson`
into an intermediate type `SimpleMetadata(BTreeMap<Version,
VersionFiles>)` before converting to a `VersionMap`. This reduces the
number of times we need to parse the response. Additionally, we cache
the parsed response now instead of `SimpleJson`.

`VersionFiles` stores two vectors with
`WheelFilename`/`SourceDistFilename` and `File` tuples. These can be
iterated over together or separately. A new enum `DistFilename` was
added to capture the `SourceDistFilename` and `WheelFilename` variants
allowing iteration over both vectors.
2023-12-07 11:04:47 -06:00
Zanie Blue
2bb04771ce
Allow switching out the resolver's IO (#517)
I'm working off of @konstin's commit here to implement arbitrary unsat
test cases for the resolver.

The entirety of the resolver's io are two functions: Get the version map
for a package (PEP 440 version -> distribution) and get the metadata for
a distribution. A new trait `ResolverProvider` abstracts these two away and
allows replacing the real network requests e.g. with stored responses
(https://github.com/pradyunsg/pip-resolver-benchmarks/blob/main/scenarios/pyrax_198.json).

---------

Co-authored-by: konsti <konstin@mailbox.org>
2023-12-06 11:53:16 -06:00
Charlie Marsh
2d1e19e474
Allow yanked versions when specified via == (#561)
## Summary

This enables users to rely on yanked versions via explicit `==` markers,
which is necessary in some projects (and, in my opinion, reasonable).

Closes #551.
2023-12-05 09:44:06 +01:00