## Summary
This PR extends the thinking in #10525 to platform tags, and then uses
the structured tag enums everywhere, rather than passing around strings.
I think this is a big improvement! It means we're no longer doing ad hoc
tag parsing all over the place.
From PEP 517:
```python
def prepare_metadata_for_build_wheel(metadata_directory, config_settings=None):
...
```
> Must create a .dist-info directory containing wheel metadata inside
the specified metadata_directory (i.e., creates a directory like
{metadata_directory}/{package}-{version}.dist-info/).
```python
def build_wheel(wheel_directory, config_settings=None, metadata_directory=None):
...
```
> If the build frontend has previously called
prepare_metadata_for_build_wheel and depends on the wheel resulting from
this call to have metadata matching this earlier call, then it should
provide the path to the created .dist-info directory as the
metadata_directory argument.
Notice that the `metadata_directory` is different for the both hooks:
For `prepare_metadata_for_build_wheel` is doesn't contain the
`.dist-info` directory as final segment, for `build_wheel` it does.
Previously, the code assumed that both directories didn't contain the
`.dist-info` for both cases.
Checked with:
```
maturin build
uv init test-uv-build-backend --build-backend uv
cd test-uv-build-backend
uv build --sdist --preview
cd ..
UV_PREVIEW=1 pip install test-uv-build-backend/dist/test_uv_build_backend-0.1.0.tar.gz --no-index --find-links target/wheels/ -v --no-cache-dir
```
Fixes#9969
This is like #9556, but at the level of all other builds, including the
resolver and installer. Going through PEP 517 to build a package is
slow, so when building a package with the uv build backend, we can call
into the uv build backend directly instead: No temporary virtual env, no
temp venv sync, no python subprocess calls, no uv subprocess calls.
This fast path is gated through preview. Since the uv wheel is not
available at test time, I've manually confirmed the feature by comparing
`uv venv && cargo run pip install . -v --preview --reinstall .` and `uv
venv && cargo run pip install . -v --reinstall .`. When hacking the
preview so that the python uv build backend works without the setting
the direct build also (wheel built with `maturin build --profile
profiling`), we can see the perfomance difference:
```
$ hyperfine --prepare "uv venv" --warmup 3 \
"UV_PREVIEW=1 target/profiling/uv pip install --no-deps --reinstall scripts/packages/built-by-uv --preview" \
"target/profiling/uv pip install --no-deps --reinstall scripts/packages/built-by-uv --find-links target/wheels/"
Benchmark 1: UV_PREVIEW=1 target/profiling/uv pip install --no-deps --reinstall scripts/packages/built-by-uv --preview
Time (mean ± σ): 33.1 ms ± 2.5 ms [User: 25.7 ms, System: 13.0 ms]
Range (min … max): 29.8 ms … 47.3 ms 73 runs
Benchmark 2: target/profiling/uv pip install --no-deps --reinstall scripts/packages/built-by-uv --find-links target/wheels/
Time (mean ± σ): 115.1 ms ± 4.3 ms [User: 54.0 ms, System: 27.0 ms]
Range (min … max): 109.2 ms … 123.8 ms 25 runs
Summary
UV_PREVIEW=1 target/profiling/uv pip install --no-deps --reinstall scripts/packages/built-by-uv --preview ran
3.48 ± 0.29 times faster than target/profiling/uv pip install --no-deps --reinstall scripts/packages/built-by-uv --find-links target/wheels/
```
Do we need a global option to disable the fast path? There is one for
`uv build` because `--force-pep517` moves `uv build` much closer to a
`pip install` from source that a user of a library would experience (See
discussion at #9610), but uv overall doesn't really make guarantees
around the build env of dependencies, so I consider the direct build a
valid option.
Best reviewed commit-by-commit, only the last commit is the actual
implementation, while the preview mode introduction is just a
refactoring touching too many files.
Using the directory writer trait, we can collect the files instead of
writing them to a real sink. This builds up to a `uv build --list`
similar to `cargo package --list`. It is not connected to the cli yet.
For listing files, we first use a directory writer for source dists,
which we will use for collecting the filenames instead of writing the
archive in the future. I've split breaking `lib.rs` of uv-build-backend
into modules into the next PR.
No logic changes, only restructuring.
Best reviewed commit-by-commit
Going through PEP 517 to build a package is slow, so when building a
package with the uv build backend, we can call into the uv build backend
directly. This is the basis for the `uv build --list`.
This does not enable the fast path for general source dependencies.
There is a possible difference in execution if the latest uv version is
newer than the one currently running: The PEP 517 path would use the
latest version, while the fast path uses the current version.
Please review commit-by-commit
### Benchmark
`built_with_uv`, using the fast path:
```
$ hyperfine "~/projects/uv/target/profiling/uv build"
Time (mean ± σ): 9.2 ms ± 1.1 ms [User: 4.6 ms, System: 4.6 ms]
Range (min … max): 6.4 ms … 12.7 ms 290 runs
```
`hatcling_editable`, with hatchling being optimized for fast startup
times:
```
$ hyperfine "~/projects/uv/target/profiling/uv build"
Time (mean ± σ): 270.5 ms ± 18.4 ms [User: 230.8 ms, System: 44.5 ms]
Range (min … max): 250.7 ms … 298.4 ms 10 runs
```
## Summary
After #9524, I noticed two other dependencies were misaligned.
Since the previous PR has been merged, I was thinking I could submit
those two misses.
Of course, open to any comments/decline!
Thanks!! 🙂
## Test Plan
All units tests are still passing on my side. Let's see with the
pull-request CI again 😄
When adding excludes, we usually don't want to include python cache
files. On the contrary, I haven't seen any project in my ecosystem
research that would want any of `__pycache__`, `*.pyc`, `*.pyo` to be
included. By moving them behind a `default-excludes` toggle, they are
always active even when defining custom excludes, but can be deactivated
if the user so chooses.
With includes and excludes being this small again, we can roll back the
include-exclude anchored difference to always using anchored globs (i.e.
you would need to use `**/build-*.h` below).
A pyproject.toml with custom settings with the change applied:
```toml
[project]
name = "foo"
version = "0.1.0"
readme = "README.md"
license-files = ["LICENSE*", "third-party-licenses/*"]
[tool.uv.build-backend]
# A file we need for the source dist -> wheel step, but not in the wheel itself (currently unused)
source-include = ["data/build-script.py"]
# A temporary or generated file we want to ignore
source-exclude = ["/src/foo/not-packaged.txt"]
# Headers are build-only
wheel-exclude = ["build-*.h"]
[tool.uv.build-backend.data]
scripts = "scripts"
data = "assets"
headers = "header"
[build-system]
requires = ["uv>=0.5.5,<0.6"]
build-backend = "uv"
```
When building the source distribution, we always need to include
`pyproject.toml` and the module, when building the wheel, we always
include the module but nothing else at top level. Since we only allow a
single module per wheel, that means that there are no specific wheel
includes. This means we have source includes, source excludes, wheel
excludes, but no wheel includes: This is defined by the module root,
plus the metadata files and data directories separately.
Extra source dist includes are currently unused (they can't end up in
the wheel currently), but it makes sense to model them here, they will
be needed for any sort of procedural build step.
This results in the following fields being relevant for inclusions and
exclusion:
* `pyproject.toml` (always included in the source dist)
* project.readme: PEP 621
* project.license-files: PEP 639
* module_root: `Path`
* source_include: `Vec<Glob>`
* source_exclude: `Vec<Glob>`
* wheel_exclude: `Vec<Glob>`
* data: `Map<KnownDataName, Path>`
An opinionated choice is that that wheel excludes always contain the
source excludes: Otherwise you could have a path A in the source tree
that gets included when building the wheel directly from the source
tree, but not when going through the source dist as intermediary,
because A is in source excludes, but not in the wheel excludes. This has
been a source of errors previously.
In the process, I fixed a bug where we would skip directories and only
include the files and were missing license due to absolute globs.
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:
- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->
## Summary
While working on potential bug fixes with temporary files on Windows (I
think I am currently ecountering the same issue as #2810)
I noticed that sub-workspaces were not all having the same `tempfile`
version. And they were not relying on the cargo root project dependency.
I don't know at all if it was done on purpose or not.
(I also wanted to override the root dependency with a local source but
it was not possible due to sub-workspaces not relying on the same).
The root lockfile already pinned to the `3.14.0`. Some sub-workspaces
were depending on the `3.12.0`, some others on the `3.14.0`. So I
updated the root `Cargo.toml` to the `3.14.0`.
Feel free to decline if it was done on purpose! No worries at all
🙂
Thanks!
<!-- What's the purpose of the change? What does it do, and why? -->
## Test Plan
All units tests are still passing on my side. Let's see with the
pull-request CI 😄
This PR contains three smaller improvements:
* Improve the include/exclude logging. We're still showing the current
directory as empty backticks, not sure what to do about that
* Add early stopping to license file globbing, so we don't traverse the
whole directory recursively when license files can only be in few places
* Support explicit wheel excludes. These are still not entirely right,
but at least we're correctly excluding compiled python files now. The
next step is to make sure that the wheel excludes contain all pattern
from source dist excludes, to make sure source tree -> wheel can't have
more files than source tree -> source dist -> wheel.
When building only a single crate in the workspace to run its tests, we
often recompile a lot of other, unrelated crates. Whenever cargo has a
different set of crate features, it needs to recompile. By moving some
features (non-exhaustive for now) to the workspace level, we always
activate them an avoid recompiling.
The cargo docs mismatch the behavior of cargo around default-deps, so I
filed that upstream and left most `default-features` mismatches:
https://github.com/rust-lang/cargo/issues/14841.
Reference script:
```python
import tomllib
from collections import defaultdict
from pathlib import Path
uv = Path("/home/konsti/projects/uv")
skip_list = ["uv-trampoline", "uv-dev", "uv-performance-flate2-backend", "uv-performance-memory-allocator"]
root_feature_map = defaultdict(set)
root_default_features = defaultdict(bool)
cargo_toml = tomllib.loads(uv.joinpath("Cargo.toml").read_text())
for dep, declaration in cargo_toml["workspace"]["dependencies"].items():
root_default_features[dep] = root_default_features[dep] or declaration.get("default-features", True)
root_feature_map[dep].update(declaration.get("features", []))
feature_map = defaultdict(set)
default_features = defaultdict(bool)
for crate in uv.joinpath("crates").iterdir():
if crate.name in skip_list:
continue
if not crate.joinpath("Cargo.toml").is_file():
continue
cargo_toml = tomllib.loads(crate.joinpath("Cargo.toml").read_text())
for dep, declaration in cargo_toml.get("dependencies", {}).items():
# If any item uses default features, they are used everywhere
default_features[dep] = default_features[dep] or declaration.get("default-features", True)
feature_map[dep].update(declaration.get("features", []))
for dep, features in sorted(feature_map.items()):
features = features - root_feature_map.get(dep, set())
if not features and default_features[dep] == root_default_features[dep]:
continue
print(dep, default_features[dep], sorted(features))
```
Allow including data files in wheels, configured through
`pyproject.toml`. This configuration is currently only read in the build
backend. We'd only start using it in the frontend when we're adding a
fast path.
Each data entry is a directory, whose contents are copied to the
matching directory in the wheel in
`<name>-<version>.data/(purelib|platlib|headers|scripts|data)`. Upon
installation, this data is moved to its target location, as defined by
<https://docs.python.org/3.12/library/sysconfig.html#installation-paths>:
- `data`: Installed over the virtualenv environment root. Warning: This
may override existing files!
- `scripts`: Installed to the directory for executables, `<venv>/bin` on
Unix or `<venv>\Scripts` on Windows. This directory is added to PATH
when the virtual environment is activated or when using `uv run`, so
this data type can be used to install additional binaries. Consider
using `project.scripts` instead for starting Python code.
- `headers`: Installed to the include directory, where compilers
building Python packages with this package as built requirement will
search for header files.
- `purelib` and `platlib`: Installed to the `site-packages` directory.
It is not recommended to uses these two options.
For simplicity, for now we're just defining a directory to be copied for
each data directory, while using the glob based include mechanism in the
background. We thereby introduce a third mechanism next to the main
includes and the PEP 639 mechanism, which is not what we should finalize
on.
## Summary
These were moved as part of a broader refactor to create a single
integration test module. That "single integration test module" did
indeed have a big impact on compile times, which is great! But we aren't
seeing any benefit from moving these tests into their own files (despite
the claim in [this blog
post](https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html),
I see the same compilation pattern regardless of where the tests are
located). Plus, we don't have many of these, and same-file tests is such
a strong Rust convention.
When building source distributions, we need to include the readme, so it
can become part the METADATA body when building the wheel. We also need
to support the license files from PEP 639. When building the source
distribution, we copy those file relative to their origin, when building
the wheel, we copy them to `.dist-info/licenses`.
The test for idempotence in wheel building is merged into the file
listing test, which also covers that source tree -> source dist -> wheel
is equivalent to source tree -> wheel, though we do need to check for
file inclusion stronger here.
Best reviewed commit-by-commit
A first milestone: source tree -> source dist -> wheel -> install works.
This PR adds a test for this.
There's obviously a lot still missing, including basics such as the
Readme inclusion.
When doing a directory traversal for source dist inclusion, we want to
offer the user include and exclude options, and we want to avoid
traversing irrelevant directories. The latter is important for
performance, especially on network file systems, but also with large
data directories, or (not-included) directories with other permissions.
To support this, we introduce `GlobDirFilter`, which uses a DFA from
regex_automata to determine whether any children of a directory can be
included and skips the directory if not.
The globs are based on PEP 639. The syntax is more restricted than glob
or globset, but it's standardized. I chose it over glob or globset
because we're already using this syntax for `project.license-files` a
required by PEP 639, so it makes sense to use the same globs for all
includes (see e.g.
4f52a3bb62/pyproject.toml (L36-L48)
for example with same semantics for include and exclude)
### Semantics
Glob semantics are complex due to mixing directories and files,
expectations around simplicity and our need to exclude most of the tree
in the project from traversal. The current draft uses a syntax that
optimizes for simple default use cases for the start.
#### includes
Glob expressions which files and directories to include in the source
distribution.
Includes are anchored, which means that `pyproject.toml` includes only
`<project root>/pyproject.toml`. Use for example `assets/**/sample.csv`
to include for all
`sample.csv` files in `<project root>/assets` or any child directory. To
recursively include
all files under a directory, use a `/**` suffix, e.g. `src/**`. For
performance and
reproducibility, avoid unanchored matches such as `**/sample.csv`.
The glob syntax is the reduced portable glob from
[PEP 639](https://peps.python.org/pep-0639/#add-license-FILES-key).
#### excludes
Glob expressions which files and directories to exclude from the
previous source
distribution includes.
Excludes are not, which means that `__pycache__` excludes all
directories named
`__pycache__` and it's children anywhere. To anchor a directory, use a
`/` prefix, e.g.,
`/dist` will exclude only `<project root>/dist`.
The glob syntax is the reduced portable glob from
[PEP 639](https://peps.python.org/pep-0639/#add-license-FILES-key).
Very basic source distribution support. What's included:
- Include and exclude patterns (hard-coded): Currently, we have
globset+walkdir in one part and glob in the other. I'll migrate
everything to globset+walkset and some custom perf optimizations to
avoid traversing irrelevant directories on top. I'll also pick a glob
syntax (or subset), PEP 639 seems like a good candidate since it's
consistent with what we already have to support.
- Add the `PKG-INFO` file with metadata: Thanks to Code Metadata 2.2,
this metadata is reliable and can be read statically by external tools.
Example output:
```
$ tar -ztvf dist/dummy-0.1.0.tar.gz
-rw-r--r-- 0/0 154 1970-01-01 01:00 dummy-0.1.0/PKG-INFO
-rw-rw-r-- 0/0 509 1970-01-01 01:00 dummy-0.1.0/pyproject.toml
drwxrwxr-x 0/0 0 1970-01-01 01:00 dummy-0.1.0/src/dummy
drwxrwxr-x 0/0 0 1970-01-01 01:00 dummy-0.1.0/src/dummy/submodule
-rw-rw-r-- 0/0 30 1970-01-01 01:00 dummy-0.1.0/src/dummy/submodule/impl.py
-rw-rw-r-- 0/0 14 1970-01-01 01:00 dummy-0.1.0/src/dummy/submodule/__init__.py
-rw-rw-r-- 0/0 12 1970-01-01 01:00 dummy-0.1.0/src/dummy/__init__.py
```
No tests since the source distributions don't build valid wheels yet.
As per
https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html
Before that, there were 91 separate integration tests binary.
(As discussed on Discord — I've done the `uv` crate, there's still a few
more commits coming before this is mergeable, and I want to see how it
performs in CI and locally).