Updates our `--no-binary` option and adds a `--only-binary` option for
compatibility with `pip` which uses `:all:`, `:none:` and `<name>` for
specifying packages.
This required adding support for `--only-binary <name>` into our
resolver, previously it was only a boolean toggle.
Retains`--no-build` which is equivalent to `--only-binary :all:`. This
is common enough for safety that I would prefer it is available without
pip's awkward `:all:` syntax.
---------
Co-authored-by: konsti <konstin@mailbox.org>
Since unavailable packages with `--no-index` can be confusing when the
user does not also provide `--find-links` we add a hint for this case.
Required some plumbing to get the required information to the
`NoSolution` error.
---------
Co-authored-by: konstin <konstin@mailbox.org>
## Summary
First batch of changes for windows support. Notable changes:
* Fixes all compile errors and added windows specific paths.
* Working venv creation on windows, both from a base interpreter and
from a venv. This requires querying `stdlib` from the sysconfig paths to
find the launcher.
* Basic url/path conversion handling for windows.
* `if cfg!(...)` instead of `#[cfg()]`. This should make it easier to
keep everything compiling across platforms.
## Outlook
Test summary: 402 tests run: 299 passed (15 slow), 103 failed, 1 skipped
There are various reason for the remaining test failure:
* Windows-specific colorama and tzdata dependencies that change the
snapshot slightly. This is by far the biggest batch.
* Some url-path handling issues. I fixed some in the PR, some remain.
* Lack of the latest python patch versions for older pythons on my
machine, since there are no builds for windows and we need to register
them in the registry for them to be picked up for `py --list-paths` (CC
@zanieb RE #1070).
* Lack of entrypoint launchers.
* ... likely more
## Summary
This PR adds support for PyPy wheels by changing the compatible tags
based on the implementation name and version of the current interpreter.
For now, we only support CPython and PyPy, and explicitly error out when
given other interpreters. (Is this right? Should we just fallback to
CPython tags...? Or skip the ABI-specific tags for unknown
interpreters?)
The logic is based on
4d85340613/src/packaging/tags.py (L247).
Note, however, that `packaging` uses the `EXT_SUFFIX` variable from
`sysconfig`... Instead, I looked at the way that PyPy formats the tags,
and recreated them based on the Python and implementation version. For
example, PyPy wheels look like
`cchardet-2.1.7-pp37-pypy37_pp73-win_amd64.whl` -- so that's `pp37` for
PyPy with Python version 3.7, and then `pypy37_pp73` for PyPy with
Python version 3.7 and PyPy version 7.3.
Closes https://github.com/astral-sh/puffin/issues/1013.
## Test Plan
I tested this manually, but I couldn't find macOS universal PyPy
wheels... So instead I added `cchardet` to a `requirements.in`, ran
`cargo run pip sync requirements.in --index-url
https://pypy.kmtea.eu/simple --verbose`, and added logging to verify
that the platform tags matched (even if the architecture didn't).
Adds support for disabling installation from pre-built wheels i.e. the
package must be built from source locally.
We will still always use pre-built wheels for metadata during
resolution.
Available via `--no-binary` and `--no-binary-package <name>` flags in
`pip install` and `pip sync`. There is no flag for `pip compile` since
no installation happens there.
```
--no-binary
Don't install pre-built wheels.
When enabled, all installed packages will be installed from a source distribution.
The resolver will still use pre-built wheels for metadata.
--no-binary-package <NO_BINARY_PACKAGE>
Don't install pre-built wheels for a specific package.
When enabled, the specified packages will be installed from a source distribution.
The resolver will still use pre-built wheels for metadata.
```
When packages are already installed, the `--no-binary` flag will have no
affect without the `--reinstall` flag. In the future, I'd like to change
this by tracking if a local distribution is from a pre-built wheel or a
locally-built wheel. However, this is significantly more complex and
different than `pip`'s behavior so deferring for now.
For reference, `pip`'s flag works as follows:
```
--no-binary <format_control>
Do not use binary packages. Can be supplied multiple times, and each time adds to the
existing value. Accepts either ":all:" to disable all binary packages, ":none:" to empty the
set (notice the colons), or one or more package names with commas between them (no colons).
Note that some packages are tricky to compile and may fail to install when this option is
used on them.
```
Note we are not matching the exact `pip` interface here because it seems
complicated to use. I think we may want to consider adjusting our
interface for this behavior since we're not entirely compatible anyway
e.g. I think `--force-build` and `--force-build-package` are clearer
names. We could also consider matching the `pip` interface or only
allowing `--no-binary <package>` for compatibility. We can of course do
whatever we want in our _own_ install interfaces later.
Additionally, we may want to further consider the semantics of
`--no-binary`. For example, if I run `pip install pydantic --no-binary`
I expect _just_ Pydantic to be installed without binaries but by default
we will build all of Pydantic's dependencies too.
This work was prompted by #895, as it is much easier to measure
performance gains from building source distributions if we have a flag
to ensure we actually build source distributions. Additionally, this is
a flag I have used frequently in production to debug packages that ship
Cythonized wheels.
Improves some of the "no versions of <package> are available" messages
by showing the complement or inversion of the package.
Does not address cases like
```
Because there are no versions of crow that satisfy any of:
crow>1.0.0,<2.0.0a5
crow>2.0.0a7,<2.0.0b1
crow>2.0.0b1,<2.0.0b5
...
```
which are a bit more complicated; I'll focus on those cases in a
follow-up.
## Summary
This PR uses a single `Index` that's shared between the top-level
resolver and any sub-resolutions happen in the course of that top-level
resolution (namely, to resolve build dependencies for any source
distributions).
In theory it's an optimization, since (e.g.) if we have two packages
that both need the `flit-core` build system, and we attempt to build
them both at once, we'll only fetch its metadata _once_, and share it
across the two resolutions. In practice, I haven't been able to get this
to show up in benchmarks. I suspect you'd need a _lot_ of source
distributions for it to matter... Though it may still be worth doing, it
strikes me as a cleaner design.
Closes#200.
Closes#541.
## Summary
This PR restructures the flat index fetching in a few ways:
1. It now lives in its own `FlatIndexClient`, since it felt a bit
awkward (in my opinion) for it to live in `RegistryClient`.
2. We now fetch the `FlatIndex` outside of the resolver. This has a few
benefits: (1) the resolver construct is no longer `async` and no longer
returns `Result`, which feels better for a resolver; and (2) we can
share the `FlatIndex` across resolutions rather than re-fetching it for
every source distribution build.
Add directory `--find-links` support for local paths to pip-compile.
It seems that pip joins all sources and then picks the best package. We
explicitly give find links packages precedence if the same exists on an
index and locally by prefilling the `VersionMap`, otherwise they are
added as another index and the existing rules of precedence apply.
Internally, the feature is called _flat index_, which is more meaningful
than _find links_: We're not looking for links, we're picking up local
directories, and (TBD) support another index format that's just a flat
list of files instead of a nested index.
`RegistryBuiltDist` and `RegistrySourceDist` now use `WheelFilename` and
`SourceDistFilename` respectively. The `File` inside `RegistryBuiltDist`
and `RegistrySourceDist` gained the ability to represent both a url and
a path so that `--find-links` with a url and with a path works the same,
both being locked as `<package_name>@<version>` instead of
`<package_name> @ <url>`. (This is more of a detail, this PR in general
still work if we strip that and have directory find links represented as
`<package_name> @ file:///path/to/file.ext`)
`PrioritizedDistribution` and `FlatIndex` have been moved to locations
where we can use them in the upstack PR.
I added a `scripts/wheels` directory with stripped down wheels to use
for testing.
We're lacking tests for correct tag priority precedence with flat
indexes, i only confirmed this manually since it is not covered in the
pip-compile or pip-sync output.
Closes#876
## Summary
Our current setup uses the legacy `setup.py`-based builds if a
`pyproject.toml` file isn't present. This matches pip's behavior.
However, `pypa/build` uses PEP 517-based builds in such cases, and it
looks like pip plans to make that the default
(https://github.com/pypa/pip/issues/9175), with the limiting factor
being performance issues related to isolated builds.
This is now the default behavior, but the `--legacy-setup-py` flag
allows users to opt-in to using `setup.py` directly for distributions
that lack a `pyproject.toml`.
## Summary
If pre-releases are available for a package that we otherwise couldn't
resolve, we now show a hint that includes one of the example versions.
Closes https://github.com/astral-sh/puffin/issues/811.
Adjusts display of "no versions available" in error messages to be
consistent with other package/range pairings i.e. we usually display
"<package-name><range>".
In the past, I moved us to `owo-colors`
(https://github.com/astral-sh/puffin/pull/121); then, we moved back,
because we ran into issues with overriding the settings to force-disable
colors. But `anstream` solved those problems, so I'm moving us _back_ to
`owo-colors`, since it's what `anstream` recommends, and it's already
used by many of our dependencies (`miette`, `configparser`).
---------
Co-authored-by: konstin <konstin@mailbox.org>
## Summary
We can use `anstream` for all color control, rather than going through
`colored`. Note that we still need the `colored` crate, since `colored`
and `anstream` solve different problems. (`anstream` recommends using
`owo-colors` alongside it, but `colored` seems to work fine?)
Resolves the issue raised in
https://github.com/astral-sh/puffin/pull/742 via `anstream` rather than
`colored`.
Closes https://github.com/astral-sh/puffin/issues/782.
## Summary
This PR modifies the resolver to treat the Python version as a package,
which allows for better error messages (since we no longer treat
incompatible packages as if they "don't exist at all").
There are a few tricky pieces here...
First, we need to track both the interpreter's Python version and the
_target_ Python version, because we support resolving for other versions
via `--python 3.7`.
Second, we allow using incompatible wheels during resolution, as long as
there's a compatible source distribution. So we still need to test for
`requires-python` compatibility when selecting distributions.
This could use more testing, but it feels like an area where `packse`
would be more productive than writing PyPI tests.
Closes https://github.com/astral-sh/puffin/issues/406.
This PR adds a dedicated error message for resolutions that fail, but
might've succeeded if pre-releases were allowed. Specifically, if we see
a failed resolution, and failed to find a version for a package that
included a pre-release marker, we add a hint nudging the user to
explicitly enable all pre-releases.
We'd prefer a solution like
https://github.com/astral-sh/puffin/pull/666, but believe that it will
break some assumptions in PubGrub, so this is the lighter-weight
solution.
Closes https://github.com/astral-sh/puffin/issues/659.
The `async fn` and return-position `impl Trait` in traits improve
`BuildContext` ergonomics. The traits use `impl Future` over `async fn`
to make the send bound explicit
(https://blog.rust-lang.org/2023/12/21/async-fn-rpit-in-traits.html).
The remaining changes are due to clippy.
## Summary
This PR ensures that we re-use the resolution to install the build
dependencies when building a source distribution. Currently, we only
pass along the list of requirements, and then use the `Finder` to map
each requirement to a distribution. But we already determine the correct
distribution when resolving!
Closes https://github.com/astral-sh/puffin/issues/655.
## Summary
When resolving `transformers[tensorboard]`, the `[tensorboard]` extra
doesn't exist. Previously, we returned "unknown" dependencies for this
variant, which leads the resolution to try all versions, then fail. This
PR instead warns, but returns the base dependencies for the package,
which matches `pip`. (Poetry doesn't even warn, it just proceeds as
normal.)
Arguably, it would be better to return a custom incompatibility here and
then propagate... But this PR is better than the status quo, and I don't
know if we have support for that behavior yet...? (\cc @zanieb)
Closes#386.
Closes https://github.com/astral-sh/puffin/issues/423.
## Summary
This PR enables overrides to be passed to `pip-compile` and
`pip-install` via a new `--overrides` flag.
When overrides are provided, we effectively replace any requirements
that are overridden with the overridden versions. This is applied at all
depths of the tree.
The merge semantics are such that we replace _all_ requirements of a
package with _all_ requirements from the overrides files. So, for
example, if a package declares:
```
foo >= 1.0; python_version < '3.11'
foo < 1.0; python_version >= '3.11'
```
And the user provides an override like:
```
foo >= 2.0
```
Then _both_ of the `foo` requirements in the package will be replaced
with the override.
If instead, the user provided an override like:
```
foo >= 2.0; python_version < '3.11'
foo < 3.0; python_version >= '3.11'
```
Then we'd replace _both_ of the original `foo` requirements with both of
these overrides. (In technical terms, for each package in the
requirements file, we flat-map over its overrides.)
Closes https://github.com/astral-sh/puffin/issues/511.
Make `prepare_metadata_for_build_wheel` accessible across the puffin
codebase by splitting the built call into a setup, a metadata and a
wheel call. This does not actually use the hook yet, but it's the
required refactoring for it.
Part of #599.
This is mostly a mechanical refactor that moves 80% of our code to the
same cache abstraction.
It introduces cache `Cache`, which abstracts away the path of the cache
and the temp dir drop and is passed throughout the codebase. To get a
specific cache bucket, you need to requests your `CacheBucket` from
`Cache`. `CacheBucket` is the centralizes the names of all cache
buckets, moving them away from the string constants spread throughout
the crates.
Specifically for working with the `CachedClient`, there is a
`CacheEntry`. I'm not sure yet if that is a strict improvement over
`cache_dir: PathBuf, cache_file: String`, i may have to rotate that
later.
The interpreter cache moved into `interpreter-v0`.
We can use the `CacheBucket` page to document the cache structure in
each bucket:

## Summary and motivation
For a given source dist, we store the metadata of each wheel built
through it in `built-wheel-metadata-v0/pypi/<source dist
filename>/metadata.json`. During resolution, we check the cache status
of the source dist. If it is fresh, we check `metadata.json` for a
matching wheel. If there is one we use that metadata, if there isn't, we
build one. If the source is stale, we build a wheel and override
`metadata.json` with that single wheel. This PR thereby ties the local
built wheel metadata cache to the freshness of the remote source dist.
This functionality is available through `SourceDistCachedBuilder`.
`puffin_installer::Builder`, `puffin_installer::Downloader` and
`Fetcher` are removed, instead there are now `FetchAndBuild` which calls
into the also new `SourceDistCachedBuilder`. `FetchAndBuild` is the new
main high-level abstraction: It spawns parallel fetching/building, for
wheel metadata it calls into the registry client, for wheel files it
fetches them, for source dists it calls `SourceDistCachedBuilder`. It
handles locks around builds, and newly added also inter-process file
locking for git operations.
Fetching and building source distributions now happens in parallel in
`pip-sync`, i.e. we don't have to wait for the largest wheel to be
downloaded to start building source distributions.
In a follow-up PR, I'll also clear built wheels when they've become
stale.
Another effect is that in a fully cached resolution, we need neither zip
reading nor email parsing.
Closes#473
## Source dist cache structure
Entries by supported sources:
* `<build wheel metadata cache>/pypi/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/<sha256(index-url)>/foo-1.0.0.zip/metadata.json`
* `<build wheel metadata
cache>/url/<sha256(url)>/foo-1.0.0.zip/metadata.json`
But the url filename does not need to be a valid source dist filename
(<https://github.com/search?q=path%3A**%2Frequirements.txt+master.zip&type=code>),
so it could also be the following and we have to take any string as
filename:
* `<build wheel metadata
cache>/url/<sha256(url)>/master.zip/metadata.json`
Example:
```text
# git source dist
pydantic-extra-types @ git+https://github.com/pydantic/pydantic-extra-types.git
# pypi source dist
django_allauth==0.51.0
# url source dist
werkzeug @ ff1904eb5e/werkzeug-3.0.1.tar.gz
```
will be stored as
```text
built-wheel-metadata-v0
├── git
│ └── 5c56bc1c58c34c11
│ └── 843b753e9e8cb74e83cac55598719b39a4d5ef1f
│ └── metadata.json
├── pypi
│ └── django-allauth-0.51.0.tar.gz
│ └── metadata.json
└── url
└── 6781bd6440ae72c2
└── werkzeug-3.0.1.tar.gz
└── metadata.json
```
The inside of a `metadata.json`:
```json
{
"data": {
"django_allauth-0.51.0-py3-none-any.whl": {
"metadata-version": "2.1",
"name": "django-allauth",
"version": "0.51.0",
...
}
}
}
```
Preparing for #235, some refactoring to `puffin_interpreter`.
* Added a dedicated error type instead of anyhow
* `InterpreterInfo` -> `Interpreter`
* `detect_virtual_env` now returns an option so it can be chained for
#235
## Summary
This is a refactor to address a TODO in the build context whereby we
aren't respecting the resolution options in recursive resolutions. Now,
the options are split out from the resolution _manifest_, and shared
across the build context tree.
This works by filtering out files with a more recent upload time, so if
the index you use does not provide upload times, the results might be
inaccurate. pypi provides upload times for all files. This is, the field
is non-nullable in the warehouse schema, but the simple API PEP does not
know this field.
If you have only pypi dependencies, this means deterministic,
reproducible(!) resolution. We could try doing the same for git repos
but it doesn't seem worth the effort, i'd recommend pinning commits
since git histories are arbitrarily malleable and also if you care about
reproducibility and such you such not use git dependencies but a custom
index.
Timestamps are given either as RFC 3339 timestamps such as
`2006-12-02T02:07:43Z` or as UTC dates in the same format such as
`2006-12-02`. Dates are interpreted as including this day, i.e. until
midnight UTC that day. Date only is required to make this ergonomic and
midnight seems like an ergonomic choice.
In action for `pandas`:
```console
$ target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
Resolved 6 packages in 679ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2023-11-16 target/pandas.in
numpy==1.26.2
# via pandas
pandas==2.1.3
python-dateutil==2.8.2
# via pandas
pytz==2023.3.post1
# via pandas
six==1.16.0
# via python-dateutil
tzdata==2023.3
# via pandas
$ target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
Resolved 5 packages in 655ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2022-11-16 target/pandas.in
numpy==1.23.4
# via pandas
pandas==1.5.1
python-dateutil==2.8.2
# via pandas
pytz==2022.6
# via pandas
six==1.16.0
# via python-dateutil
$ target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
Resolved 5 packages in 594ms
# This file was autogenerated by Puffin v0.0.1 via the following command:
# target/debug/puffin pip-compile --exclude-newer 2021-11-16 target/pandas.in
numpy==1.21.4
# via pandas
pandas==1.3.4
python-dateutil==2.8.2
# via pandas
pytz==2021.3
# via pandas
six==1.16.0
# via python-dateutil
```
Previously, we were assuming that `which <python>` return the path to
the python executable. This is not true when using pyenv shims, which
are bash scripts. Instead, we have to use `sys.executable`. Luckily,
we're already querying the python interpreter and can do it in that
pass.
We are also not allowed to cache the execution of the python interpreter
through the shim because pyenv might change the target. As a heuristic,
we check whether `sys.executable`, the real binary, is the same our
canonicalized `which` result.
---------
Co-authored-by: Zanie Blue <contact@zanie.dev>
Filter out source dists and wheels whose `requires-python` from the
simple api is incompatible with the current python version.
This change showed an important problem: When we use a fake python
version for resolving, building source distributions breaks down because
we can only build with versions we actually have.
This change became surprisingly big. The tests now require python 3.7 to
be installed, but changing that would mean an even bigger change.
Fixes#388
## Summary
It looks like, when you install `pip`, it includes a bunch of
`__pycache__` directories in the RECORD file (although these directories
don't exist until you run `pip`). Our uninstaller assumed that the
RECORD file only contained _files_.
Closes https://github.com/astral-sh/puffin/issues/389.
I intend this to become the main form of caching for puffin: You can
make http requests, you tranform the data to what you really need, you
have control over the cache key, and the cache is always json (or
anything else much faster we want to replace it with as long as it's
serde!)
One of the most common errors i observed are build failures due to
missing header files. On ubuntu, this generally means that you need to
install some `<...>-dev` package that the documentation tells you about,
e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs
`default-libmysqlclient-dev`, [some psycopg
versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation)
(i remember that this was always required at some earlier point) require
`libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common
for many scientific packages (where conda has an advantage because they
can provide those package as a dependency).
The error message can be completely inscrutable if you're just a python
programmer (or user) and not a c programmer (example: pygraphviz):
```
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory
3020 | #include "graphviz/cgraph.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
```
The only relevant part is `Fatal error: graphviz/cgraph.h: No such file
or directory`. Why is this file not there and how do i get it to be
there?
This is even harder to spot in pip's output, where it's 11 lines above
the last line:

I've special cased missing headers and made sure that the last line
tells you the important information: We're missing some header, please
check the documentation of {package} {version} for what to install:

Scrolling up:

The difference gets even clearer with a default ubuntu terminal with its
80 columns:

---
Note that the situation is better for a missing compiler, there i get:
```
[...]
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
error: command 'gcc' failed: No such file or directory
---
```
Putting the last line into google, the first two results tell me to
`sudo apt-get install gcc`, the third even tells me about `sudo apt
install build-essential`
Closes https://github.com/astral-sh/puffin/issues/356.
The example from the issue now renders as:
```
❯ cargo run --bin puffin-dev -q -- resolve-cli tensorflow-cpu-aws
puffin-dev failed
Caused by: No solution found when resolving build dependencies for source distribution:
Caused by: Because there is no available version for tensorflow-cpu-aws and root depends on tensorflow-cpu-aws, version solving failed.
```
Part of https://github.com/astral-sh/puffin/issues/214
Adds a `project: Option<PackageName>` to the `Manifest`, `Resolver`, and
`RequirementsSpecification`.
To populate an optional `name` for `PubGubPackage::Root`.
I'll work on removing the version number next.
Should we consider using the parent directory name when a
`pyproject.toml` file is not present?
This PR makes the cache non-optional in most of Puffin, which simplifies
the code, allows us to reuse the cache within a single command (even
with `--no-cache`), and also allows us to use the cache for disk storage
across an invocation.
I left the cache as optional for the `Virtualenv` and `InterpreterInfo`
abstractions, since those are generic enough that it seems nice to have
a non-cached version, but it's kind of arbitrary.