Add a `--compile` option to `pip install` and `pip sync`.
I chose to implement this as a separate pass over the entire venv. If we
wanted to compile during installation, we'd have to make sure that
writing is exclusive, to avoid concurrent processes writing broken
`.pyc` files. Additionally, this ensures that the entire site-packages
are bytecode compiled, even if there are packages that aren't from this
`uv` invocation. The disadvantage is that we do not update RECORD and
rely on this comment from [PEP 491](https://peps.python.org/pep-0491/):
> Uninstallers should be smart enough to remove .pyc even if it is not
mentioned in RECORD.
If this is a problem we can change it to run during installation and
write RECORD entries.
Internally, this is implemented as an async work-stealing subprocess
worker pool. The producer is a directory traversal over site-packages,
sending each `.py` file to a bounded async FIFO queue/channel. Each
worker has a long-running python process. It pops the queue to get a
single path (or exists if the channel is closed), then sends it to
stdin, waits until it's informed that the compilation is done through a
line on stdout, and repeat. This is fast, e.g. installing `jupyter
plotly` on Python 3.12 it processes 15876 files in 319ms with 32 threads
(vs. 3.8s with a single core). The python processes internally calls
`compileall.compile_file`, the same as pip.
Like pip, we ignore and silence all compilation errors
(https://github.com/astral-sh/uv/issues/1559). There is a 10s timeout to
handle the case when the workers got stuck. For the reviewers, please
check if i missed any spots where we could deadlock, this is the hardest
part of this PR.
I've added `uv-dev compile <dir>` and `uv-dev clear-compile <dir>`
commands, mainly for my own benchmarking. I don't want to expose them in
`uv`, they almost certainly not the correct workflow and we don't want
to support them.
Fixes#1788Closes#1559Closes#1928
## Summary
This will make it easier to use the paths returned by `distutils.py`
(for some cases). No code or behavior changes; just removing some fields
we don't need.
Update #1918 to handle https://github.com/pypa/pip/pull/12536, where pip
removed their python minor entrypoint. The pip test is semi-functional
since it builds pip from source instead of using a wheel with the wrong
entrypoint, we have to update it when this pip version has a release.
Closes#1593.
## Summary
This is based on Pradyun's installer branch
(d01624e5f2/src/installer/scripts.py (L54)),
which is itself based on pip
(0ad4c94be7/src/pip/_vendor/distlib/scripts.py (L136)).
The gist of it is: on Posix platforms, if a path contains a space (or is
too long), we wrap the shebang in a `/bin/sh` invocation.
Closes https://github.com/astral-sh/uv/issues/2076.
## Test Plan
```
❯ cargo run venv "foo"
Finished dev [unoptimized + debuginfo] target(s) in 0.14s
Running `target/debug/uv venv foo`
Using Python 3.12.0 interpreter at: /Users/crmarsh/.local/share/rtx/installs/python/3.12.0/bin/python3
Creating virtualenv at: foo
Activate with: source foo/bin/activate
❯ source "foo bar/bin/activate"
❯ which black
black not found
❯ cargo run pip install black
Resolved 6 packages in 177ms
Installed 6 packages in 17ms
+ black==24.2.0
+ click==8.1.7
+ mypy-extensions==1.0.0
+ packaging==23.2
+ pathspec==0.12.1
+ platformdirs==4.2.0
❯ which black
/Users/crmarsh/workspace/uv/foo bar/bin/black
❯ black
Usage: black [OPTIONS] SRC ...
One of 'SRC' or 'code' is required.
❯ cat "foo bar/bin/black"
#!/bin/sh
'''exec' '/Users/crmarsh/workspace/uv/foo bar/bin/python' "$0" "$@"
' '''
# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == "__main__":
sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
sys.exit(patched_main())
```
## Summary
This was a missed find-and-replace. We shouldn't assume `layout.platlib`
here, since `RECORD` will be written to `site_packages` (which could be
`layout.purelib`).
This is hard to reproduce. You need a _fresh_ environment where
`purelib` and `platlib` differ (which isn't the case for virtualenvs, at
least typically), and you need to be installing a new package that is a
purelib. I tested it by manually changing `platlib` to point to a
different path.
Closes https://github.com/astral-sh/uv/issues/2064.
## Summary
This PR adds a `--python` flag that allows users to provide a specific
Python interpreter into which `uv` should install packages. This would
replace the `VIRTUAL_ENV=` workaround that folks have been using to
install into arbitrary, system environments, while _also_ actually being
correct for installing into non-virtual environments, where the bin and
site-packages paths can differ.
The approach taken here is to use `sysconfig.get_paths()` to get the
correct paths from the interpreter, and then use those for determining
the `bin` and `site-packages` directories, rather than constructing them
based on hard-coded expectations for each platform.
Closes https://github.com/astral-sh/uv/issues/1396.
Closes https://github.com/astral-sh/uv/issues/1779.
Closes https://github.com/astral-sh/uv/issues/1988.
## Test Plan
- Verified that, on my Windows machine, I was able to install `requests`
into a global environment with: `cargo run pip install requests --python
'C:\\Users\\crmarsh\\AppData\\Local\\Programs\\Python\\Python3.12\\python.exe`,
then `python` and `import requests`.
- Verified that, on macOS, I was able to install `requests` into a
global environment installed via Homebrew with: `cargo run pip install
requests --python $(which python3.8)`.
Address a few pedantic lints
lints are separated into separate commits so they can be reviewed
individually.
I've not added enforcement for any of these lints, but that could be
added if desirable.
Users expect pip to have `pip`, `pip3` and `pip3.x` entrypoints. But pip
is a universal wheel, so it contains the `pip3.x` entrypoint where it
was built on. To fix this, pip special cases itself when installing
(3898741e29/src/pip/_internal/operations/install/wheel.py (L283)),
replacing the wheel entrypoint with one for the current version. We now
do the same.
Fixes#1593
## Summary
Fixes#1444.
In situations where the installer fails to perform a reflink, a regular
file copy is also attempted, as a fallback. This circumvents issues with
linking files across filesystems or volumes.
## Test Plan
N/A
## Summary
If a registry doesn't support range requests, then today, we download
the entire wheel to disk and then read the metadata from the downloaded
archive. This PR instead modifies the registry client to stream the
zipfile and stop as soon as it's seen the metadata, which should be more
efficient.
Closes https://github.com/astral-sh/uv/issues/1596.
## Test Plan
Made this the _only_ path for downloading metadata; verified that the
test suite passed.
## Summary
This _could_ fix https://github.com/astral-sh/uv/issues/1454, but I'm
not sure. I was able to replicate by forcing a bunch of error states.
But, in short, if we fail to hardlink on the initial copy due to a file
existing, and then fail _again_, we fallback to copying. But if we copy,
then the tempfile doesn't exist, and so the `fs_err::rename(&tempfile,
&out_path)?;` will fail with "File not found".
This PR just ensures that the cases are explicitly mutually exclusive:
we only attempt to rename if the hardlink succeeded.
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.
Then, update directory and file names:
```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```
Then add all the files again
```
# Add all the files again
git add crates
git add python/uv
# This one needs a force-add
git add -f crates/uv-trampoline
```
## Summary
For PEP 517 builds, the current working directory needs to be set to the
directory of the source distribution. It turns out that on Windows, if
you use a UNC path for the working directory, then relative paths are
interpreted relative to the root of the current drive
([source](https://www.fileside.app/blog/2023-03-17_windows-file-paths/#paths-relative-to-the-root-of-the-current-drive)).
So, when builds attempted to resolve relative paths, they always
errored...
This PR ensures that we remove the UNC prefix when setting the current
working directory.
Closes#1238.
## Test Plan
I tested this on my Windows machine by installing `ujson` with
`--no-binary ujson`. (I don't want to add that specific test, since it's
really slow to build.)
Split out from the large test refactoring PR. Use `normalized_display`
in tests and two more thiserror derives to match snapshots and output,
and other small windows fixes.
## Summary
See: https://github.com/astral-sh/puffin/issues/1224
## Test Plan
Ran `python -m scripts.bench --puffin
scripts/requirements/compiled/jupyter.txt --min-runs 100 --benchmark
install-warm --verbose` several times, which failed eventually on `main`
but not on this branch.
Lacking windows compatible aarch64 hardware, i cross compiled the
trampoline from x86_64 linux to aarch64-pc-windows-msvc; I added the
instructions to the puffin-trampoline readme. With some testing on an
aarch64 windows machine, this should be sufficient to build working
win_arm64 tagged wheels.
i686-pc-windows-msvc is failing with an error:
```
error: linking with `lld-link` failed: exit status: 1
= note: lld-link: error: undefined symbol: __aulldiv
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced 4 more times
lld-link: error: undefined symbol: __aullrem
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced 4 more times
```
In Rust, `fs::copy` automatically preserves permissions (see:
https://doc.rust-lang.org/std/fs/fn.copy.html).
Elsewhere, when copying from the zip archive out to the cache, we can
set permissions during file creation, rather than as a separate call.
Both of these should be slightly more efficient.
## Background
In virtual environments, we want to install python programs as console
commands, e.g. `black .` over `python -m black .`. They may be called
[entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/)
or scripts. For entrypoints, we're given a module name and function to
call in that module.
On Unix, we generate a minimal python script launcher. Text files are
runnable on unix by adding a shebang at their top, e.g.
```python
#!/usr/bin/env python
```
will make the operating system run the file with the current python
interpreter. A venv launcher for black in `/home/ferris/colorize/.venv`
(module name: `black`, function to call: `patched_main`) would look like
this:
```python
#!/home/ferris/colorize/.venv/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == "__main__":
sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
sys.exit(patched_main())
```
On windows, this doesn't work, we can only rely on launching `.exe`
files.
## Summary
We use posy's rust implementation of a trampoline, which is based on
distlib's c++ implementation. We pre-build a minimal exe and append the
launcher script as stored zip archive behind it. The exe will look for
the venv python interpreter next to it and use it to execute the
appended script.
The changes in this PR make the `black` entrypoint work:
```powershell
cargo run -- venv .venv
cargo run -q -- pip install black
.\.venv\Scripts\black --version
```
Integration with our existing tests will be done in follow-up PRs.
## Implementation and Details
I've vendored the posy trampoline crate. It is a formatted, renamed and
slightly changed for embedding version of
https://github.com/njsmith/posy/pull/28.
The posy launchers are smaller than the distlib launchers, 16K vs 106K
for black. Currently only `x86_64-pc-windows-msvc` is supported. The
crate requires a nightly compiler for its no-std binary size tricks.
On windows, an application can be launched with a console or without (to
create windows instead), which needs two different launchers. The gui
launcher will subsequently use `pythonw.exe` while the console launcher
uses `python.exe`.
## Summary
This PR adds a `NormalizedDisplay` trait that we can use for user-facing
paths, to strip the UNC prefix on Windows.
On other platforms, the implementation is a no-op (vs. `Display`).
I audited all usages of `.display()`, and changed any that were
user-facing, either via `println!` or `eprintln!`, or by way of being
included in error messages. I did _not_ change uses that were only in
tests or only went to tracing.
Closes https://github.com/astral-sh/puffin/issues/1084.
## Summary
First batch of changes for windows support. Notable changes:
* Fixes all compile errors and added windows specific paths.
* Working venv creation on windows, both from a base interpreter and
from a venv. This requires querying `stdlib` from the sysconfig paths to
find the launcher.
* Basic url/path conversion handling for windows.
* `if cfg!(...)` instead of `#[cfg()]`. This should make it easier to
keep everything compiling across platforms.
## Outlook
Test summary: 402 tests run: 299 passed (15 slow), 103 failed, 1 skipped
There are various reason for the remaining test failure:
* Windows-specific colorama and tzdata dependencies that change the
snapshot slightly. This is by far the biggest batch.
* Some url-path handling issues. I fixed some in the PR, some remain.
* Lack of the latest python patch versions for older pythons on my
machine, since there are no builds for windows and we need to register
them in the registry for them to be picked up for `py --list-paths` (CC
@zanieb RE #1070).
* Lack of entrypoint launchers.
* ... likely more
On ubuntu and python 3.10,
```
cargo run -q -- pip-install --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html "jax[cuda12_pip]==0.4.23"
```
non-deterministically but for me consistently fails to install with
messages such as
```
error: Failed to install: nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (nvidia-nccl-cu12==2.19.3)
Caused by: failed to remove file `/home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py`
Caused by: No such file or directory (os error 2)
```
```
error: Failed to install: nvidia_cublas_cu12-12.3.4.1-py3-none-manylinux1_x86_64.whl (nvidia-cublas-cu12==12.3.4.1)
Caused by: Replacing an existing file or directory failed
```
```
error: Failed to install: nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (nvidia-cuda-nvcc-cu12==12.3.107)
Caused by: failed to hardlink file from /home/konsti/.cache/puffin/wheels-v0/pypi/nvidia-cuda-nvcc-cu12/nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64/nvidia/__init__.py to /home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py
Caused by: File exists (os error 17)
```
We install a lot of nvidia package, that all contain
`nvidia/__init__.py`, since they all install themselves into the
`nvidia` module:
```
nvidia-cublas-cu12==12.3.4.1
nvidia-cuda-cupti-cu12==12.3.101
nvidia-cuda-nvcc-cu12==12.3.107
nvidia-cuda-nvrtc-cu12==12.3.107
nvidia-cuda-runtime-cu12==12.3.101
nvidia-cudnn-cu12==8.9.7.29
nvidia-cufft-cu12==11.0.12.1
nvidia-cusolver-cu12==11.5.4.101
nvidia-cusparse-cu12==12.2.0.103
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
```
```
$ tree -L 1 .venv/lib/python3.10/site-packages/nvidia
.venv/lib/python3.10/site-packages/nvidia
├── cublas
├── cuda_cupti
├── cuda_nvcc
├── cuda_nvrtc
├── cuda_runtime
├── cudnn
├── cufft
├── cusolver
├── cusparse
├── __init__.py
├── nccl
└── nvjitlink
```
When installing we get a race condition, each package installation is
its own thread:
* Installer Thread 1 creates `nvidia/__init__.py`
* Installer Thread 2 sees an existing `nvidia/__init__.py`
* Installer Thread 3 sees an existing `nvidia/__init__.py`
* Installer Thread 2 removes `nvidia/__init__.py`
* Installer Thread 3 tries to remove `nvidia/__init__.py`, it doesn't
exist anymore -> failure.
We switch to a new strategy: When the target files exists, we don't
remove it, but instead hardlink the source file to a tempfile first,
then renaming the tempfile to the target file. Renaming is considered an
atomic operation.
I've put the logging on debug level because they cases indicate a
conflict between two packages while being rare.
Closes#925
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
Noticed these when working on something unrelated. Generally:
- Prefer `entry.file_type()` over `entry.path().is_file()` or similar,
as the former is almost always free on Unix.
- Call `entry.path()` once, since it allocates internally (returns a
`PathBuf`).
I've tried to investigate puffin's performance wrt to builds and
parallelism in general, but found the previous instrumentation to
granular. I've tried to add spans to every function that either needs
noticeable io or cpu resources without creating duplication. This also
fixes some wrong tracing usage on async functions
(https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code)
and some spans that weren't actually entered.
This PR combines three small changes to finish up the install-many
testing.
* Download pypi_10k_most_dependents.txt in script I'd like to have the
setup process of the large scale checks automated.
* Some install-many dev script improvements
* Fix mkl_fft-1.3.6-58-cp310-cp310-manylinux2014_x86_64.whl:
mkl_fft-1.3.6-58-cp310-cp310-manylinux2014_x86_64.whl has multiple
Wheel-Version entries, we have to ignore that like pip
Apart from the mkl-fft fix the only other errors i've seen showing up
are
https://github.com/astral-sh/puffin/issues/520#issuecomment-1869625642.
Similar to #516, but for individual files.
## Test Plan
Ran:
```sh
cargo run -p puffin-cli -- pip-uninstall plaid-python
mkdir -p /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests
echo "x=1" > /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests/__init__.py
cargo run -p puffin-cli -- pip-sync requirements.txt --no-cache --verbose
```
Previously, when installing a package we would delete the target
directory before copying (or linking) the contents of the package.
However, this means that we do not properly support namespace packages
which can share a target directory. Instead the last package to be
installed would be override existing packages. Since we install packages
in parallel, this could result in a race condition where the target
directory already exists which is not allowed when using `clonefile`.
See example error in #515.
c7e63d2dce
provides a regression test for this — it fails on `main`.
Here, we implement a recursive merge when the target directory already
exists. Both packages will be installed into the same directory. We no
longer delete the target directory, which seems okay since we uninstall
packages before installing now.
When files conflict, we will likely throw an error still. The correct
behavior to implement in this case is unclear, as if we just take "first
write wins" or "last write wins" we could end up with some files from
one package and some from another resulting in two broken packages. A
possible solution here is to lock the target directories while copying.
Ensure that we consistently show a path for all io errors in
install-wheel-rs either (preferred) through `fs_err`, or as fallback by
a custom error type. For zip reading errors, we rely on the caller to
add the name and/or location of the wheel.