## Summary
This PR enables `puffin clean` to accept package names as command line
arguments, and selectively purge entries from the cache tied to the
given package.
Relate to #572.
## Test Plan
Modified all the caching tests to run an additional step to (1) purge
the cache, and (2) re-install the package.
Also ensures that we filter out any incompatible requirements when
building the install plan. In general, we assume that requirements were
generated by `pip-compile`, in which case all requirements should be
compatible and there should be no duplicates; but we should handle this
case gracefully.
Closes https://github.com/astral-sh/puffin/issues/582.
This PR adds caching support for built wheels in the installer.
Specifically, the `RegistryWheelIndex` now indexes both downloaded and
built wheels (from registries), and we have a new `BuiltWheelIndex` that
takes a subdirectory and returns the "best-matching" compatible wheel.
Closes#570.
Adds a few more tests for re-installs with various kinds of source
distributions, and changes the tests to use packages that we can safely
import (via `check_command`) for extra validation.
Once we properly respect cached built wheels, we should expect these
snapshots to change, since we'll no longer download and re-build
unnecessarily.
## Summary
This enables users to rely on yanked versions via explicit `==` markers,
which is necessary in some projects (and, in my opinion, reasonable).
Closes#551.
## Summary
Allows, e.g., `--python-version 3.7` or `--python-version 3.7.9`. This
was also feedback I received in the original PR.
Closes https://github.com/astral-sh/puffin/issues/533.
## Summary
This PR modifies `puffin-build` to be closer in behavior to
[pip](a15dd75d98/src/pip/_internal/pyproject.py (L53))
and
[build](de5b44b0c2/src/build/__init__.py (L94)).
Specifically, if a project contains a `[build-system]` field, but no
`build-backend`, we now perform a PEP 517 build (instead of using
`setup.py` directly) _and_ respect the `requires` of the
`[build-system]`. Without this change, we were failing to build source
distributions for packages like `ujson`.
Closes#527.
---------
Co-authored-by: konstin <konstin@mailbox.org>
Remove built wheels alongside their metadata when their index source
dist or url source dist changed. For git source dists, we currently
don't clear the previous build but use a new directory (not sure what's
right here - are there any generic cache GC approaches out there? I've
seen that e.g. spotify keeps its cache at 10GB max, but i also haven't
seen any reusable, well tested approaches for this). Path distributions
are unchanged (#478).
I like the structure of metadata alongside the wheel for cache
invalidation, i'll try to do that for `wheels-v0`/`wheel-metadata-v0`
too. (The unzipped wheels afaik currently lack cache invalidation when
the remote changed.) This should give is roughly the same structure for
wheel and built wheels and a very similar pattern of invalidation.
Previously, when installing a package we would delete the target
directory before copying (or linking) the contents of the package.
However, this means that we do not properly support namespace packages
which can share a target directory. Instead the last package to be
installed would be override existing packages. Since we install packages
in parallel, this could result in a race condition where the target
directory already exists which is not allowed when using `clonefile`.
See example error in #515.
c7e63d2dce
provides a regression test for this — it fails on `main`.
Here, we implement a recursive merge when the target directory already
exists. Both packages will be installed into the same directory. We no
longer delete the target directory, which seems okay since we uninstall
packages before installing now.
When files conflict, we will likely throw an error still. The correct
behavior to implement in this case is unclear, as if we just take "first
write wins" or "last write wins" we could end up with some files from
one package and some from another resulting in two broken packages. A
possible solution here is to lock the target directories while copying.
## Summary and motivation
For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.
`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.
Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.
In a follow-up PR, I'll also clear built wheels when they've become
stale.
Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.
Closes#473
## Source dist cache structure
Entries by supported sources:
* `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename
(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`
Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│ └── 5c56bc1c58c34c11
│ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│ └── metadata.json
├── pypi
│ └── django-allauth-0.51.0.tar.gz
│ └── metadata.json
└── url
└── 6781bd6440ae72c2
└── werkzeug-3.0.1.tar.gz
└── metadata.json
```
The inside of a `metadata.json`:
```json
{
"data": {
"django_allauth-0.51.0-py3-none-any.whl": {
"metadata-version": "2.1",
"name": "django-allauth",
"version": "0.51.0",
...
}
}
}
```
## Summary
This PR adds support for local path dependencies. The approach mostly
just falls out of our existing approach and infrastructure for Git and
URL dependencies.
Closes https://github.com/astral-sh/puffin/issues/436. (We'll open a
separate issue for editable installs.)
## Test Plan
Added `pip-compile` tests that pre-download a wheel or source
distribution, then install it via local path.
Extends #424 with support for URL dependency incompatibilities.
Requires changes to `miette` to prevent URLs from being word wrapped;
accepted upstream in https://github.com/zkat/miette/pull/321
Addresses
https://github.com/astral-sh/puffin/issues/309#issuecomment-1792648969
Similar to #338 this throws an error when merging versions results in an
empty set. Instead of propagating that error, we capture it and return a
new dependency type of `Unusable`. Unusable dependencies are a new
incompatibility kind which includes an arbitrary "reason" string that we
present to the user. Adding a new incompatibility kind requires changes
to the vendored pubgrub crate.
We could use this same incompatibility kind for conflicting urls as in
#284 which should allow the solver to backtrack to another valid version
instead of failing (see #425).
Unlike #383 this does not require changes to PubGrub's package mapping
model. I think in the long run we'll want PubGrub to accept multiple
versions per package to solve this specific issue, but we're interested
in it being merged upstream first. This pull request is just using the
issue as a simple case to explore adding a new incompatibility type.
We may or may not be able convince them to add this new incompatibility
type upstream. As discussed in
https://github.com/pubgrub-rs/pubgrub/issues/152, we may want a more
general incompatibility kind instead which can be used for arbitrary
problems. An upstream pull request has been opened for discussion at
https://github.com/pubgrub-rs/pubgrub/pull/153.
Related to:
- https://github.com/pubgrub-rs/pubgrub/issues/152
- #338
- #383
---------
Co-authored-by: konsti <konstin@mailbox.org>
This works by filtering out files with a more recent upload time, so if
the index you use does not provide upload times, the results might be
inaccurate. pypi provides upload times for all files. This is, the field
is non-nullable in the warehouse schema, but the simple API PEP does not
know this field.
If you have only pypi dependencies, this means deterministic,
reproducible(!) resolution. We could try doing the same for git repos
but it doesn't seem worth the effort, i'd recommend pinning commits
since git histories are arbitrarily malleable and also if you care about
reproducibility and such you such not use git dependencies but a custom
index.
Timestamps are given either as RFC 3339 timestamps such as
`2006-12-02T02:07:43Z` or as UTC dates in the same format such as
`2006-12-02`. Dates are interpreted as including this day, i.e. until
midnight UTC that day. Date only is required to make this ergonomic and
midnight seems like an ergonomic choice.
In action for `pandas`:
```console
$ target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
Resolved 6 packages in 679ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
numpy==1.26.2
# via pandas
pandas==2.1.3
python-dateutil==2.8.2
# via pandas
pytz==2023.3.post1
# via pandas
six==1.16.0
# via python-dateutil
tzdata==2023.3
# via pandas
$ target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
Resolved 5 packages in 655ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
numpy==1.23.4
# via pandas
pandas==1.5.1
python-dateutil==2.8.2
# via pandas
pytz==2022.6
# via pandas
six==1.16.0
# via python-dateutil
$ target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
Resolved 5 packages in 594ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
numpy==1.21.4
# via pandas
pandas==1.3.4
python-dateutil==2.8.2
# via pandas
pytz==2021.3
# via pandas
six==1.16.0
# via python-dateutil
```
Previously, we were assuming that `which <python>` return the path to
the python executable. This is not true when using pyenv shims, which
are bash scripts. Instead, we have to use `sys.executable`. Luckily,
we're already querying the python interpreter and can do it in that
pass.
We are also not allowed to cache the execution of the python interpreter
through the shim because pyenv might change the target. As a heuristic,
we check whether `sys.executable`, the real binary, is the same our
canonicalized `which` result.
---------
Co-authored-by: Zanie Blue <contact@zanie.dev>
## Summary
This PR implements logic to sort wheels by priority, where priority is
defined as preferring more "specific" wheels over less "specific"
wheels. For example, in the case of Black, my machine now selects
`black-23.11.0-cp311-cp311-macosx_11_0_arm64.whl`, whereas sorting by
lowest priority instead gives me `black-23.11.0-py3-none-any.whl`.
As part of this change, I've also modified the resolver to fallback to
using incompatible wheels when determining package metadata, if no
compatible wheels are available.
The `VersionMap` was also moved out of `resolver.rs` and into its own
file with a wrapper type, for clarity.
Closes https://github.com/astral-sh/puffin/issues/380.
Closes https://github.com/astral-sh/puffin/issues/421.
The pip compile test now explicitly set their python version and `puffin
venv` resolves e.g. `python3.12` correctly now. The venv creation is
moved to a shared method
Implement two behaviors for yanked versions:
* During `pip-compile`, yanked versions are filtered out entirely, we
currently treat them is if they don't exist. This is leads to confusing
error messages because a version that does exist seems to have suddenly
disappeared.
* During `pip-sync`, we warn when we fetch a remote distribution and it
has been yanked. We currently don't warn on cached or installed
distributions that have been yanked.
Filter out source dists and wheels whose `requires-python` from the
simple api is incompatible with the current python version.
This change showed an important problem: When we use a fake python
version for resolving, building source distributions breaks down because
we can only build with versions we actually have.
This change became surprisingly big. The tests now require python 3.7 to
be installed, but changing that would mean an even bigger change.
Fixes#388
## Summary
It looks like, when you install `pip`, it includes a bunch of
`__pycache__` directories in the RECORD file (although these directories
don't exist until you run `pip`). Our uninstaller assumed that the
RECORD file only contained _files_.
Closes https://github.com/astral-sh/puffin/issues/389.
## Summary
Low-priority but fun thing to end the day. You can now pass
`--target-version py37`, and we'll generate a resolution for Python 3.7.
See: https://github.com/astral-sh/puffin/issues/183.
It looks like Cargo, notice the bold green lines at the top (which
appear during the resolution, to indicate Git fetches and source
distribution builds):
<img width="868" alt="Screen Shot 2023-11-06 at 11 28 47 PM"
src="9647a480-7be7-41e9-b1d3-69faefd054ae">
<img width="868" alt="Screen Shot 2023-11-06 at 11 28 51 PM"
src="6bc491aa-5b51-4b37-9ee1-257f1bc1c049">
Closes https://github.com/astral-sh/puffin/issues/287 although we can do
a lot more here.
We now write the `direct_url.json` when installing, and _skip_
installing if we find a package installed via the direct URL that the
user is requesting.
A lot of TODOs, especially around cleaning up the `Source` abstraction
and its relationship to `DirectUrl`. I'm gonna keep working on these
today, but this works and makes the requirements clear.
Closes#332.
## Summary
This is a first-pass at adding source distribution support to the
installer.
The previous installation flow was:
1. Come up with a plan.
1. Find a distribution (specific file) for every package that we'll need
to download.
1. Download those distributions.
1. Unzip them (since we assumed they were all wheels).
1. Install them into the virtual environment.
Now, Step (3) downloads both wheels and source distributions, and we
insert a step between Steps (3) and (4) to build any source
distributions into zipped wheels.
There are a bunch of TODOs, the most important (IMO) is that we
basically have two implementations of downloading and building, between
the stuff in `puffin_installer` and `puffin_resolver` (namely in
`crates/puffin-resolver/src/distribution`). I didn't attempt to clean
that up here -- it's already a problem, and it's related to the overall
problem we need to solve around unified caching and resource management.
Closes#243.
`PackageName` and `ExtraName` can now only be constructed from valid
names. They share the same rules, so i gave them the same
implementation. Constructors are split between `new` (owned) and
`from_str` (borrowed), with the owned version avoiding allocations.
Closes#279
---------
Co-authored-by: Zanie <contact@zanie.dev>
In the resolver, our current model for solving URL dependencies requires
that we visit the URL dependency _before_ the registry-based dependency.
This PR encodes a strict requirement that all URL dependencies be
declared upfront, either as requirements or constraints.
I wrote more about how it works and why it's necessary in documentation
[here](https://github.com/astral-sh/puffin/pull/319/files#diff-2b1c4f36af0c62a2b7bebeae9473ae083588f2a6b18a3ec52393a24266adecbbR20).
I think we could relax this constraint over time, but it requires a more
sophisticated model -- and for now, I just want something that's (1)
correct, (2) easy for us to reason about, and (3) easy for users to
reason about.
As additional motivation... allowing arbitrary URL dependencies anywhere
in the tree creates some really confusing situations in which I'm not
even sure what the right answers are. For example, assume you declare a
direct dependency on `Werkzeug==2.0.0`. You then depend on a version of
Flask that depends on a version of `Werkzeug` from some arbitrary URL.
You build the source distribution at that arbitrary URL, and it turns
out it _does_ build to a declared version of 2.0.0. What should happen?
(And if it resolves to a version that _isn't_ 2.0.0, what should happen
_then_?) I suspect different tools handle this differently, but it must
lead to a lot of "silent" failures. In my testing of Poetry, it seems
like Poetry just ignores the URL dependency, which seems wrong, but is
also a behavior we could implement in the future.
Closes https://github.com/astral-sh/puffin/issues/303.
Closes https://github.com/astral-sh/puffin/issues/284.
Ensures that if we need to access the same Git repo twice in a
resolution, we only have one handler to that repo at a time. (Otherwise,
`git2` panics.)
This PR adds a mechanism by which we can ensure that we _always_ try to
refresh Git dependencies when resolving; further, we now write the fully
resolved SHA to the "lockfile". However, nothing in the code _assumes_
we do this, so the installer will remain agnostic to this behavior.
The specific approach taken here is minimally invasive. Specifically,
when we try to fetch a source distribution, we check if it's a Git
dependency; if it is, we fetch, and return the exact SHA, which we then
map back to a new URL. In the resolver, we keep track of URL
"redirects", and then we use the redirect (1) for the actual source
distribution building, and (2) when writing back out to the lockfile. As
such, none of the types outside of the resolver change at all, since
we're just mapping `RemoteDistribution` to `RemoteDistribution`, but
swapping out the internal URLs.
There are some inefficiencies here since, e.g., we do the Git fetch,
send back the "precise" URL, then a moment later, do a Git checkout of
that URL (which will be _mostly_ a no-op -- since we have a full SHA, we
don't have to fetch anything, but we _do_ check back on disk to see if
the SHA is still checked out). A more efficient approach would be to
return the path to the checked-out revision when we do this conversion
to a "precise" URL, since we'd then only interact with the Git repo
exactly once. But this runs the risk that the checked-out SHA changes
between the time we make the "precise" URL and the time we build the
source distribution.
Closes#286.
Extends #295Closes#214
Copies some of the implementations from `pubgrub::report` so we can
implement Puffin `PubGrubPackage` specific display when explaining
failed resolutions.
Here, we just drop the dummy version number if it's a
`PubGrubPackage::Root` package. In the future, we can further customize
reporting.