First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.
Then, update directory and file names:
```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```
Then add all the files again
```
# Add all the files again
git add crates
git add python/uv
# This one needs a force-add
git add -f crates/uv-trampoline
```
Since unavailable packages with `--no-index` can be confusing when the
user does not also provide `--find-links` we add a hint for this case.
Required some plumbing to get the required information to the
`NoSolution` error.
---------
Co-authored-by: konstin <konstin@mailbox.org>
Previously, whenever we encountered a missing package we would throw an
error without information about why the package was requested. This
meant that if a transitive dependency required a missing package, the
user would have no idea why it was even selected. Here, we track
`NotFound` and `NoIndex` errors as `NoVersions` incompatibilities with
an attached reason. Improves our test coverage for `--no-index` without
`--find-links`.
The
[snapshots](https://github.com/astral-sh/puffin/pull/1241/files#diff-3eea1658f165476252f1f061d0aa9f915aabdceafac21611cdf45019447f60ec)
show a nice improvement.
I think this will also enable backtracking to another version if some
version of transitive dependency has a missing dependency. I'll write a
scenario for that next.
Requires https://github.com/zanieb/pubgrub/pull/22
## Summary
Previously, we were blocking operations that could run in parallel. We
would send request through our main requests channel, but not yield so
that the receiver could only start processing requests much later than
necessary. We solve this by switching to the async
`tokio::sync::mpsc::channel`, where send is an async functions that
yields.
Due to the increased parallelism cache deserialization and the
conversion from simple api request to version map became bottlenecks, so
i moved them to `spawn_blocking`. Together these result in a 30-60%
speedup for larger warm cache resolution. Small cases such as black
already resolve in 5.7 ms on my machine so there's no speedup to be
gained, refresh and no cache were to noisy to get signal from.
Note for the future: Revisit the bounded channel if we want to produce
requests from `process_request`, too, (this would be good for
prefetching) to avoid deadlocks.
## Details
We can look at the behavior change through the spans:
```
RUST_LOG=puffin=info TRACING_DURATIONS_FILE=target/traces/jupyter-warm-branch.ndjson cargo run --features tracing-durations-export --bin puffin-dev --profile profiling -- resolve jupyter 2> /dev/null
```
Below, you can see how on main, we have discrete phases: All (cached)
simple api requests in parallel, then all (cached) metadata requests in
parallel, repeat until done. The solver is mostly waiting until it has
it's version map from the simple API query to be able to choose a
version. The main thread is blocked by process requests.
In the PR branch, the simple api requests succeeds much earlier,
allowing the solver to advance and also to schedule more prefetching.
Due to that `parse_cache` and `from_metadata` became bottlenecks, so i
moved them off the main thread (green color, and their spans can now
overlap because they can run on multiple threads in parallel). The main
thread isn't blocked on `process_request` anymore, instead it has
frequent idle times. The spans are all much shorter, which indicates
that on main they could have finished much earlier, but a task didn't
yield so they weren't scheduled to finish (though i haven't dug deep
enough to understand the exact scheduling between the process request
stream and the solver here).
**main**

**PR**

## Benchmarks
```
$ hyperfine --warmup 3 "target/profiling/main-dev resolve jupyter" "target/profiling/branch-dev resolve jupyter"
Benchmark 1: target/profiling/main-dev resolve jupyter
Time (mean ± σ): 29.1 ms ± 0.7 ms [User: 22.9 ms, System: 11.1 ms]
Range (min … max): 27.7 ms … 32.2 ms 103 runs
Benchmark 2: target/profiling/branch-dev resolve jupyter
Time (mean ± σ): 18.8 ms ± 1.1 ms [User: 37.0 ms, System: 22.7 ms]
Range (min … max): 16.5 ms … 21.9 ms 154 runs
Summary
target/profiling/branch-dev resolve jupyter ran
1.55 ± 0.10 times faster than target/profiling/main-dev resolve jupyter
$ hyperfine --warmup 3 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/branch-dev resolve meine_stadt_transparent"
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
Time (mean ± σ): 37.8 ms ± 0.9 ms [User: 30.7 ms, System: 14.1 ms]
Range (min … max): 36.6 ms … 41.5 ms 79 runs
Benchmark 2: target/profiling/branch-dev resolve meine_stadt_transparent
Time (mean ± σ): 24.7 ms ± 1.5 ms [User: 47.0 ms, System: 39.3 ms]
Range (min … max): 21.5 ms … 28.7 ms 113 runs
Summary
target/profiling/branch-dev resolve meine_stadt_transparent ran
1.53 ± 0.10 times faster than target/profiling/main-dev resolve meine_stadt_transparent
$ hyperfine --warmup 3 "target/profiling/main pip compile scripts/requirements/home-assistant.in" "target/profiling/branch pip compile scripts/requirements/home-assistant.in"
Benchmark 1: target/profiling/main pip compile scripts/requirements/home-assistant.in
Time (mean ± σ): 229.0 ms ± 2.8 ms [User: 197.3 ms, System: 63.7 ms]
Range (min … max): 225.8 ms … 234.0 ms 13 runs
Benchmark 2: target/profiling/branch pip compile scripts/requirements/home-assistant.in
Time (mean ± σ): 91.4 ms ± 5.3 ms [User: 289.2 ms, System: 176.9 ms]
Range (min … max): 81.0 ms … 104.7 ms 32 runs
Summary
target/profiling/branch pip compile scripts/requirements/home-assistant.in ran
2.50 ± 0.15 times faster than target/profiling/main pip compile scripts/requirements/home-assistant.in
```
## Summary
This is an attempt to https://github.com/astral-sh/puffin/pull/1163 by
removing the `WaitMap` and gaining more granular control over the values
that we hold over `await` boundaries.
## Summary
This is my guess as to the source of the resolver flake, based on
information and extensive debugging from @zanieb. In short, if we rely
on `self.index.packages` as a source of truth during error reporting, we
open ourselves up to a source of non-determinism, because we fetch
package metadata asynchronously in the background while we solve -- so
packages _could_ be included in or excluded from the index depending on
the order in which those requests are returned.
So, instead, we now track the set of packages that _were_ visited by the
solver. Visiting a package _requires_ that we wait for its metadata to
be available. By limiting analysis to those packages that were visited
during solving, we are faithfully representing the state of the solver
at the time of failure.
Closes#863
## Summary
Use a single error type in `puffin_distribution`, rather than two
confusingly similar types between `DistributionDatabase` and the source
distribution module.
Also removes the `#[from]` for IO errors and replaces with explicit
wrapping, which is verbose but removes a bunch of incorrect error
messages.
Requires https://github.com/zanieb/pubgrub/pull/20
In short, `UnusableDependencies` can be generalized into `Unavailable`
which encompasses incompatibilities where a package range which is
unusable for some inherent reason as well as when its dependencies are
unusable. We can eventually use this to track more incompatibilities in
the solver. I made the reason string required because I can't see a case
where we should leave it out.
Additionally, this improves the display of conflicts in the root
requirements.
This PR replaces a few uses of hash maps/sets with btree maps/sets and
index maps/sets. This has the benefit of guaranteeing a deterministic
order of iteration.
I made these changes as part of looking into a flaky test.
Unfortunately, I'm not optimistic that anything here will actually fix
the flaky test, since I don't believe anything was actually dependent
on the order of iteration.
## Summary
I'm running into some annoyances converting `&Version` to
`&PubGrubVersion` (which is just a wrapper type around `Version`), and I
realized... We don't even need `PubGrubVersion`?
The reason we "need" it today is due to the orphan trait rule: `Version`
is defined in `pep440_rs`, but we want to `impl
pubgrub::version::Version for Version` in the resolver crate.
Instead of introducing a new type here, which leads to a lot of
awkwardness around conversion and API isolation, what if we instead just
implement `pubgrub::version::Version` in `pep440_rs` via a feature? That
way, we can just use `Version` everywhere without any confusion and
conversion for the wrapper type.
Refactoring split out from find links support: Find links files can be
represented as `Dist`, but not really as `File`, they don't have url nor
hashes.
`DistRequiresPython` is somewhat odd as an in between type.
## Summary
This PR modifies the resolver to treat the Python version as a package,
which allows for better error messages (since we no longer treat
incompatible packages as if they "don't exist at all").
There are a few tricky pieces here...
First, we need to track both the interpreter's Python version and the
_target_ Python version, because we support resolving for other versions
via `--python 3.7`.
Second, we allow using incompatible wheels during resolution, as long as
there's a compatible source distribution. So we still need to test for
`requires-python` compatibility when selecting distributions.
This could use more testing, but it feels like an area where `packse`
would be more productive than writing PyPI tests.
Closes https://github.com/astral-sh/puffin/issues/406.
This PR adds a dedicated error message for resolutions that fail, but
might've succeeded if pre-releases were allowed. Specifically, if we see
a failed resolution, and failed to find a version for a package that
included a pre-release marker, we add a hint nudging the user to
explicitly enable all pre-releases.
We'd prefer a solution like
https://github.com/astral-sh/puffin/pull/666, but believe that it will
break some assumptions in PubGrub, so this is the lighter-weight
solution.
Closes https://github.com/astral-sh/puffin/issues/659.
## Summary
This PR adds support for relative URLs in the simple JSON responses. We
already support relative URLs for HTML responses, but the handling has
been consolidated between the two. Similar to index URLs, we now store
the base alongside the metadata, and use the base when resolving the
URL.
Closes#455.
## Test Plan
`cargo test` (to test HTML indexes). Separately, I also ran `cargo run
-p puffin-cli -- pip-compile requirements.in -n
--index-url=http://localhost:3141/packages/pypi/+simple` on the
`zb/relative` branch with `packse` running, and forced both HTML and
JSON by limiting the `accept` header.
## Summary
This PR makes the `pypi_types::File` a response-only type (i.e., a type
that's only used when deserializing over the wire), and adds a separate
internal `File` type. Right now, the representations are similar, but
already, we can avoid the "lenient" deserialization on our internal
`File` type, and avoid the special-casing of the property names that's
required in the JSON. Over time, we can evolve this representation
entirely separately from the representation we receive from PyPI and
other indexes.
Easier than i expected: We simply never construct the pubgrub error
variants since we have our own main loop. The `unreachable!()`s can be
removed when never is stabilized
Uses https://github.com/pubgrub-rs/pubgrub/pull/156 to consolidate
version ranges in error reports using the actual available versions for
each package.
Alternative to https://github.com/zanieb/pubgrub/pull/8 which implements
this behavior as a method in the `Reporter` — here it's implemented in
our custom report formatter (#521) instead which requires no upstream
changes.
Requires https://github.com/zanieb/pubgrub/pull/11 to only retrieve the
versions for packages that will be used in the report.
This is a work in progress. Some things to do:
- ~We may want to allow lazy retrieval of the version maps from the
formatter~
- [x] We should probably create a separate error type for no solution
instead of mixing them with other resolve errors
- ~We can probably do something smarter than creating vectors to hold
the versions~
- [x] This degrades error messages when a single version is not
available, we'll need to special case that
- [x] It seems safer to coerce the error type in `resolve` instead of
`solve` if feasible
## Summary and motivation
For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.
`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.
Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.
In a follow-up PR, I'll also clear built wheels when they've become
stale.
Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.
Closes#473
## Source dist cache structure
Entries by supported sources:
* `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename
(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`
Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│ └── 5c56bc1c58c34c11
│ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│ └── metadata.json
├── pypi
│ └── django-allauth-0.51.0.tar.gz
│ └── metadata.json
└── url
└── 6781bd6440ae72c2
└── werkzeug-3.0.1.tar.gz
└── metadata.json
```
The inside of a `metadata.json`:
```json
{
"data": {
"django_allauth-0.51.0-py3-none-any.whl": {
"metadata-version": "2.1",
"name": "django-allauth",
"version": "0.51.0",
...
}
}
}
```
## Summary
This PR adds support for local path dependencies. The approach mostly
just falls out of our existing approach and infrastructure for Git and
URL dependencies.
Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a
separate issue for editable installs.)
## Test Plan
Added `pip-compile` tests that pre-download a wheel or source
distribution, then install it via local path.
## Summary
A variety of small refactors to the distribution types crate to (1)
return `Result` if we find an invalid wheel, rather than treating it as
a source distribution with a `.whl` suffix, and (2) DRY up some repeated
code around URLs.
## Summary
This PR unifies the behavior that lived in the resolver's `distribution`
crates with the behaviors that were spread between the various structs
in the installer crate into a single `Fetcher` struct that is intended
to manage all interactions with distributions. Specifically, the
interface of this struct is such that it can access distribution
metadata, download distributions, return those downloads, etc., all with
a common cache.
Overall, this is mostly just DRYing up code that was repeated between
the two crates, and putting it behind a reasonable shared interface.
## Summary
This crate only contains types, and I want to introduce a new crate for
all _operations_ on distributions, so this feels like a more natural
name given we also have `pypi-types`.
Extends #424 with support for URL dependency incompatibilities.
Requires changes to `miette` to prevent URLs from being word wrapped;
accepted upstream in https://github.com/zkat/miette/pull/321
Addresses
https://github.com/astral-sh/puffin/issues/309#issuecomment-1792648969
Similar to #338 this throws an error when merging versions results in an
empty set. Instead of propagating that error, we capture it and return a
new dependency type of `Unusable`. Unusable dependencies are a new
incompatibility kind which includes an arbitrary "reason" string that we
present to the user. Adding a new incompatibility kind requires changes
to the vendored pubgrub crate.
We could use this same incompatibility kind for conflicting urls as in
#284 which should allow the solver to backtrack to another valid version
instead of failing (see #425).
Unlike #383 this does not require changes to PubGrub's package mapping
model. I think in the long run we'll want PubGrub to accept multiple
versions per package to solve this specific issue, but we're interested
in it being merged upstream first. This pull request is just using the
issue as a simple case to explore adding a new incompatibility type.
We may or may not be able convince them to add this new incompatibility
type upstream. As discussed in
https://github.com/pubgrub-rs/pubgrub/issues/152, we may want a more
general incompatibility kind instead which can be used for arbitrary
problems. An upstream pull request has been opened for discussion at
https://github.com/pubgrub-rs/pubgrub/pull/153.
Related to:
- https://github.com/pubgrub-rs/pubgrub/issues/152
- #338
- #383
---------
Co-authored-by: konsti <konstin@mailbox.org>
## Summary
This PR refactors our `RemoteDistribution` type such that it now follows
a clear hierarchy that matches the actual variants, and encodes the
differences between source and built distributions:
```rust
pub enum Distribution {
Built(BuiltDistribution),
Source(SourceDistribution),
}
pub enum BuiltDistribution {
Registry(RegistryBuiltDistribution),
DirectUrl(DirectUrlBuiltDistribution),
}
pub enum SourceDistribution {
Registry(RegistrySourceDistribution),
DirectUrl(DirectUrlSourceDistribution),
Git(GitSourceDistribution),
}
/// A built distribution (wheel) that exists in a registry, like `PyPI`.
pub struct RegistryBuiltDistribution {
pub name: PackageName,
pub version: Version,
pub file: File,
}
/// A built distribution (wheel) that exists at an arbitrary URL.
pub struct DirectUrlBuiltDistribution {
pub name: PackageName,
pub url: Url,
}
/// A source distribution that exists in a registry, like `PyPI`.
pub struct RegistrySourceDistribution {
pub name: PackageName,
pub version: Version,
pub file: File,
}
/// A source distribution that exists at an arbitrary URL.
pub struct DirectUrlSourceDistribution {
pub name: PackageName,
pub url: Url,
}
/// A source distribution that exists in a Git repository.
pub struct GitSourceDistribution {
pub name: PackageName,
pub url: Url,
}
```
Most of the PR just stems downstream from this change. There are no
behavioral changes, so I'm largely relying on lint, tests, and the
compiler for correctness.
Closes https://github.com/astral-sh/puffin/issues/356.
The example from the issue now renders as:
```
❯ cargo run --bin puffin-dev -q -- resolve-cli tensorflow-cpu-aws
puffin-dev failed
Caused by: No solution found when resolving build dependencies for source distribution:
Caused by: Because there is no available version for tensorflow-cpu-aws and root depends on tensorflow-cpu-aws, version solving failed.
```
In the resolver, our current model for solving URL dependencies requires
that we visit the URL dependency _before_ the registry-based dependency.
This PR encodes a strict requirement that all URL dependencies be
declared upfront, either as requirements or constraints.
I wrote more about how it works and why it's necessary in documentation
[here](https://github.com/astral-sh/puffin/pull/319/files#diff-2b1c4f36af0c62a2b7bebeae9473ae083588f2a6b18a3ec52393a24266adecbbR20).
I think we could relax this constraint over time, but it requires a more
sophisticated model -- and for now, I just want something that's (1)
correct, (2) easy for us to reason about, and (3) easy for users to
reason about.
As additional motivation... allowing arbitrary URL dependencies anywhere
in the tree creates some really confusing situations in which I'm not
even sure what the right answers are. For example, assume you declare a
direct dependency on `Werkzeug==2.0.0`. You then depend on a version of
Flask that depends on a version of `Werkzeug` from some arbitrary URL.
You build the source distribution at that arbitrary URL, and it turns
out it _does_ build to a declared version of 2.0.0. What should happen?
(And if it resolves to a version that _isn't_ 2.0.0, what should happen
_then_?) I suspect different tools handle this differently, but it must
lead to a lot of "silent" failures. In my testing of Poetry, it seems
like Poetry just ignores the URL dependency, which seems wrong, but is
also a behavior we could implement in the future.
Closes https://github.com/astral-sh/puffin/issues/303.
Closes https://github.com/astral-sh/puffin/issues/284.
## Summary
This PR adds support for resolving and installing dependencies via
direct URLs, like:
```
werkzeug @ 960bb4017c/Werkzeug-2.0.0-py3-none-any.whl
```
These are fairly common (e.g., with `torch`), but you most often see
them as Git dependencies.
Broadly, structs like `RemoteDistribution` and friends are now enums
that can represent either registry-based dependencies or URL-based
dependencies:
```rust
/// A built distribution (wheel) that exists as a remote file (e.g., on `PyPI`).
#[derive(Debug, Clone)]
#[allow(clippy::large_enum_variant)]
pub enum RemoteDistribution {
/// The distribution exists in a registry, like `PyPI`.
Registry(PackageName, Version, File),
/// The distribution exists at an arbitrary URL.
Url(PackageName, Url),
}
```
In the resolver, we now allow packages to take on an extra, optional
`Url` field:
```rust
#[derive(Debug, Clone, Eq, Derivative)]
#[derivative(PartialEq, Hash)]
pub enum PubGrubPackage {
Root,
Package(
PackageName,
Option<DistInfoName>,
#[derivative(PartialEq = "ignore")]
#[derivative(PartialOrd = "ignore")]
#[derivative(Hash = "ignore")]
Option<Url>,
),
}
```
However, for the purpose of version satisfaction, we ignore the URL.
This allows for the URL dependency to satisfy the transitive request in
cases like:
```
flask==3.0.0
werkzeug @ 254c3e9b5f/werkzeug-3.0.1-py3-none-any.whl
```
There are a couple limitations in the current approach:
- The caching for remote URLs is done separately in the resolver vs. the
installer. I decided not to sweat this too much... We need to figure out
caching holistically.
- We don't support any sort of time-based cache for remote URLs -- they
just exist forever. This will be a problem for URL dependencies, where
we need some way to evict and refresh them. But I've deferred it for
now.
- I think I need to redo how this is modeled in the resolver, because
right now, we don't detect a variety of invalid cases, e.g., providing
two different URLs for a dependency, asking for a URL dependency and a
_different version_ of the same dependency in the list of first-party
dependencies, etc.
- (We don't yet support VCS dependencies.)
This is isn't ready, but it can resolve
`meine_stadt_transparent==0.2.14`.
The source distributions are currently being built serially one after
the other, i don't know if that is incidentally due to the resolution
order, because sdist building is blocking or because of something in the
resolver that could be improved.
It's a bit annoying that the thing that was supposed to do http requests
now suddenly also has to a whole download/unpack/resolve/install/build
routine, it messes up the type hierarchy. The much bigger problem though
is avoid recursive crate dependencies, it's the reason for the callback
and for splitting the builder into two crates (badly named atm)
As elsewhere, we just use the `pip` and `pip-compile` APIs. So we
support `--index-url` to override PyPI, then `--extra-index-url` to add
_additional_ indexes, and `--no-index` to avoid hitting the index at
all.
Closes#156.
Updates to `29c48fb9f3daa11bd02794edd55060d0b01ee705` from the
`pubgrub-rs` dev branch. This lets us reduce the number of changes we've
made to PubGrub itself (now, only changing visibility to export a few
things from the `solver.rs` module).
This PR reverts #109 which is actually a performance _regression_ since
we need to iterate over a bunch of wheels that we could otherwise
entirely ignore.
Rather than constantly iterating over all files and testing their
compatibility with the current platform, just store wheels we can
actually consider in the solver cache.
## Summary
This PR enables the proof-of-concept resolver to backtrack by way of
using the `pubgrub-rs` crate.
Rather than using PubGrub as a _framework_ (implementing the
`DependencyProvider` trait, letting PubGrub call us), I've instead
copied over PubGrub's primary solver hook (which is only ~100 lines or
so) and modified it for our purposes (e.g., made it async).
There's a lot to improve here, but it's a start that will let us
understand PubGrub's appropriateness for this problem space. A few
observations:
- In simple cases, the resolver is slower than our current (naive)
resolver. I think it's just that the pipelining isn't as efficient as in
the naive case, where we can just stream package and version fetches
concurrently without any bottlenecks.
- A lot of the code here relates to bridging PubGrub with our own
abstractions -- so we need a `PubGrubPackage`, a `PubGrubVersion`, etc.