## Summary
This _could_ fix https://github.com/astral-sh/uv/issues/1454, but I'm
not sure. I was able to replicate by forcing a bunch of error states.
But, in short, if we fail to hardlink on the initial copy due to a file
existing, and then fail _again_, we fallback to copying. But if we copy,
then the tempfile doesn't exist, and so the `fs_err::rename(&tempfile,
&out_path)?;` will fail with "File not found".
This PR just ensures that the cases are explicitly mutually exclusive:
we only attempt to rename if the hardlink succeeded.
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.
Then, update directory and file names:
```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```
Then add all the files again
```
# Add all the files again
git add crates
git add python/uv
# This one needs a force-add
git add -f crates/uv-trampoline
```
## Summary
For PEP 517 builds, the current working directory needs to be set to the
directory of the source distribution. It turns out that on Windows, if
you use a UNC path for the working directory, then relative paths are
interpreted relative to the root of the current drive
([source](https://www.fileside.app/blog/2023-03-17_windows-file-paths/#paths-relative-to-the-root-of-the-current-drive)).
So, when builds attempted to resolve relative paths, they always
errored...
This PR ensures that we remove the UNC prefix when setting the current
working directory.
Closes#1238.
## Test Plan
I tested this on my Windows machine by installing `ujson` with
`--no-binary ujson`. (I don't want to add that specific test, since it's
really slow to build.)
Split out from the large test refactoring PR. Use `normalized_display`
in tests and two more thiserror derives to match snapshots and output,
and other small windows fixes.
## Summary
See: https://github.com/astral-sh/puffin/issues/1224
## Test Plan
Ran `python -m scripts.bench --puffin
scripts/requirements/compiled/jupyter.txt --min-runs 100 --benchmark
install-warm --verbose` several times, which failed eventually on `main`
but not on this branch.
Lacking windows compatible aarch64 hardware, i cross compiled the
trampoline from x86_64 linux to aarch64-pc-windows-msvc; I added the
instructions to the puffin-trampoline readme. With some testing on an
aarch64 windows machine, this should be sufficient to build working
win_arm64 tagged wheels.
i686-pc-windows-msvc is failing with an error:
```
error: linking with `lld-link` failed: exit status: 1
= note: lld-link: error: undefined symbol: __aulldiv
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced 4 more times
lld-link: error: undefined symbol: __aullrem
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced by libcompiler_builtins-2fb09dee087e9f64.rlib(compiler_builtins-2fb09dee087e9f64.compiler_builtins.597f0152646f1b8-cgu.0.rcgu.o):(compiler_builtins::int::specialized_div_rem::u128_div_rem::h06aed1e23a3f8f5c)
>>> referenced 4 more times
```
In Rust, `fs::copy` automatically preserves permissions (see:
https://doc.rust-lang.org/std/fs/fn.copy.html).
Elsewhere, when copying from the zip archive out to the cache, we can
set permissions during file creation, rather than as a separate call.
Both of these should be slightly more efficient.
## Background
In virtual environments, we want to install python programs as console
commands, e.g. `black .` over `python -m black .`. They may be called
[entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/)
or scripts. For entrypoints, we're given a module name and function to
call in that module.
On Unix, we generate a minimal python script launcher. Text files are
runnable on unix by adding a shebang at their top, e.g.
```python
#!/usr/bin/env python
```
will make the operating system run the file with the current python
interpreter. A venv launcher for black in `/home/ferris/colorize/.venv`
(module name: `black`, function to call: `patched_main`) would look like
this:
```python
#!/home/ferris/colorize/.venv/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == "__main__":
sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
sys.exit(patched_main())
```
On windows, this doesn't work, we can only rely on launching `.exe`
files.
## Summary
We use posy's rust implementation of a trampoline, which is based on
distlib's c++ implementation. We pre-build a minimal exe and append the
launcher script as stored zip archive behind it. The exe will look for
the venv python interpreter next to it and use it to execute the
appended script.
The changes in this PR make the `black` entrypoint work:
```powershell
cargo run -- venv .venv
cargo run -q -- pip install black
.\.venv\Scripts\black --version
```
Integration with our existing tests will be done in follow-up PRs.
## Implementation and Details
I've vendored the posy trampoline crate. It is a formatted, renamed and
slightly changed for embedding version of
https://github.com/njsmith/posy/pull/28.
The posy launchers are smaller than the distlib launchers, 16K vs 106K
for black. Currently only `x86_64-pc-windows-msvc` is supported. The
crate requires a nightly compiler for its no-std binary size tricks.
On windows, an application can be launched with a console or without (to
create windows instead), which needs two different launchers. The gui
launcher will subsequently use `pythonw.exe` while the console launcher
uses `python.exe`.
## Summary
This PR adds a `NormalizedDisplay` trait that we can use for user-facing
paths, to strip the UNC prefix on Windows.
On other platforms, the implementation is a no-op (vs. `Display`).
I audited all usages of `.display()`, and changed any that were
user-facing, either via `println!` or `eprintln!`, or by way of being
included in error messages. I did _not_ change uses that were only in
tests or only went to tracing.
Closes https://github.com/astral-sh/puffin/issues/1084.
## Summary
First batch of changes for windows support. Notable changes:
* Fixes all compile errors and added windows specific paths.
* Working venv creation on windows, both from a base interpreter and
from a venv. This requires querying `stdlib` from the sysconfig paths to
find the launcher.
* Basic url/path conversion handling for windows.
* `if cfg!(...)` instead of `#[cfg()]`. This should make it easier to
keep everything compiling across platforms.
## Outlook
Test summary: 402 tests run: 299 passed (15 slow), 103 failed, 1 skipped
There are various reason for the remaining test failure:
* Windows-specific colorama and tzdata dependencies that change the
snapshot slightly. This is by far the biggest batch.
* Some url-path handling issues. I fixed some in the PR, some remain.
* Lack of the latest python patch versions for older pythons on my
machine, since there are no builds for windows and we need to register
them in the registry for them to be picked up for `py --list-paths` (CC
@zanieb RE #1070).
* Lack of entrypoint launchers.
* ... likely more
On ubuntu and python 3.10,
```
cargo run -q -- pip-install --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html "jax[cuda12_pip]==0.4.23"
```
non-deterministically but for me consistently fails to install with
messages such as
```
error: Failed to install: nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (nvidia-nccl-cu12==2.19.3)
Caused by: failed to remove file `/home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py`
Caused by: No such file or directory (os error 2)
```
```
error: Failed to install: nvidia_cublas_cu12-12.3.4.1-py3-none-manylinux1_x86_64.whl (nvidia-cublas-cu12==12.3.4.1)
Caused by: Replacing an existing file or directory failed
```
```
error: Failed to install: nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (nvidia-cuda-nvcc-cu12==12.3.107)
Caused by: failed to hardlink file from /home/konsti/.cache/puffin/wheels-v0/pypi/nvidia-cuda-nvcc-cu12/nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64/nvidia/__init__.py to /home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py
Caused by: File exists (os error 17)
```
We install a lot of nvidia package, that all contain
`nvidia/__init__.py`, since they all install themselves into the
`nvidia` module:
```
nvidia-cublas-cu12==12.3.4.1
nvidia-cuda-cupti-cu12==12.3.101
nvidia-cuda-nvcc-cu12==12.3.107
nvidia-cuda-nvrtc-cu12==12.3.107
nvidia-cuda-runtime-cu12==12.3.101
nvidia-cudnn-cu12==8.9.7.29
nvidia-cufft-cu12==11.0.12.1
nvidia-cusolver-cu12==11.5.4.101
nvidia-cusparse-cu12==12.2.0.103
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
```
```
$ tree -L 1 .venv/lib/python3.10/site-packages/nvidia
.venv/lib/python3.10/site-packages/nvidia
├── cublas
├── cuda_cupti
├── cuda_nvcc
├── cuda_nvrtc
├── cuda_runtime
├── cudnn
├── cufft
├── cusolver
├── cusparse
├── __init__.py
├── nccl
└── nvjitlink
```
When installing we get a race condition, each package installation is
its own thread:
* Installer Thread 1 creates `nvidia/__init__.py`
* Installer Thread 2 sees an existing `nvidia/__init__.py`
* Installer Thread 3 sees an existing `nvidia/__init__.py`
* Installer Thread 2 removes `nvidia/__init__.py`
* Installer Thread 3 tries to remove `nvidia/__init__.py`, it doesn't
exist anymore -> failure.
We switch to a new strategy: When the target files exists, we don't
remove it, but instead hardlink the source file to a tempfile first,
then renaming the tempfile to the target file. Renaming is considered an
atomic operation.
I've put the logging on debug level because they cases indicate a
conflict between two packages while being rare.
Closes#925
---------
Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
Noticed these when working on something unrelated. Generally:
- Prefer `entry.file_type()` over `entry.path().is_file()` or similar,
as the former is almost always free on Unix.
- Call `entry.path()` once, since it allocates internally (returns a
`PathBuf`).
I've tried to investigate puffin's performance wrt to builds and
parallelism in general, but found the previous instrumentation to
granular. I've tried to add spans to every function that either needs
noticeable io or cpu resources without creating duplication. This also
fixes some wrong tracing usage on async functions
(https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code)
and some spans that weren't actually entered.
This PR combines three small changes to finish up the install-many
testing.
* Download pypi_10k_most_dependents.txt in script I'd like to have the
setup process of the large scale checks automated.
* Some install-many dev script improvements
* Fix mkl_fft-1.3.6-58-cp310-cp310-manylinux2014_x86_64.whl:
mkl_fft-1.3.6-58-cp310-cp310-manylinux2014_x86_64.whl has multiple
Wheel-Version entries, we have to ignore that like pip
Apart from the mkl-fft fix the only other errors i've seen showing up
are
https://github.com/astral-sh/puffin/issues/520#issuecomment-1869625642.
Similar to #516, but for individual files.
## Test Plan
Ran:
```sh
cargo run -p puffin-cli -- pip-uninstall plaid-python
mkdir -p /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests
echo "x=1" > /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests/__init__.py
cargo run -p puffin-cli -- pip-sync requirements.txt --no-cache --verbose
```
Previously, when installing a package we would delete the target
directory before copying (or linking) the contents of the package.
However, this means that we do not properly support namespace packages
which can share a target directory. Instead the last package to be
installed would be override existing packages. Since we install packages
in parallel, this could result in a race condition where the target
directory already exists which is not allowed when using `clonefile`.
See example error in #515.
c7e63d2dce
provides a regression test for this — it fails on `main`.
Here, we implement a recursive merge when the target directory already
exists. Both packages will be installed into the same directory. We no
longer delete the target directory, which seems okay since we uninstall
packages before installing now.
When files conflict, we will likely throw an error still. The correct
behavior to implement in this case is unclear, as if we just take "first
write wins" or "last write wins" we could end up with some files from
one package and some from another resulting in two broken packages. A
possible solution here is to lock the target directories while copying.
Ensure that we consistently show a path for all io errors in
install-wheel-rs either (preferred) through `fs_err`, or as fallback by
a custom error type. For zip reading errors, we rely on the caller to
add the name and/or location of the wheel.
## Summary
Closes https://github.com/astral-sh/puffin/issues/390.
## Test Plan
Installed `jupyter_core==5.5.0`, then removed the `jupyter_core` and
`jupyter_core-5.5.0.dist-info` directories from my virtualenv manually,
but left `jupyter.py`. I then re-ran `puffin pip-compile`, and verified
that it errored on `main` but succeeded here.
## Summary
It looks like, when you install `pip`, it includes a bunch of
`__pycache__` directories in the RECORD file (although these directories
don't exist until you run `pip`). Our uninstaller assumed that the
RECORD file only contained _files_.
Closes https://github.com/astral-sh/puffin/issues/389.
One of the most common errors i observed are build failures due to
missing header files. On ubuntu, this generally means that you need to
install some `<...>-dev` package that the documentation tells you about,
e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs
`default-libmysqlclient-dev`, [some psycopg
versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation)
(i remember that this was always required at some earlier point) require
`libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common
for many scientific packages (where conda has an advantage because they
can provide those package as a dependency).
The error message can be completely inscrutable if you're just a python
programmer (or user) and not a c programmer (example: pygraphviz):
```
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory
3020 | #include "graphviz/cgraph.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
```
The only relevant part is `Fatal error: graphviz/cgraph.h: No such file
or directory`. Why is this file not there and how do i get it to be
there?
This is even harder to spot in pip's output, where it's 11 lines above
the last line:

I've special cased missing headers and made sure that the last line
tells you the important information: We're missing some header, please
check the documentation of {package} {version} for what to install:

Scrolling up:

The difference gets even clearer with a default ubuntu terminal with its
80 columns:

---
Note that the situation is better for a missing compiler, there i get:
```
[...]
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
error: command 'gcc' failed: No such file or directory
---
```
Putting the last line into google, the first two results tell me to
`sudo apt-get install gcc`, the third even tells me about `sudo apt
install build-essential`
## Summary
This PR just adds the logic in `install-wheel-rs` to write
`direct_url.json`. We're not actually taking advantage of it yet (or
wiring it through) in Puffin.
Part of https://github.com/astral-sh/puffin/issues/332.
Ran `cargo upgrade --incompatible`, seems there are no changes required.
From cacache 0.12.0:
> BREAKING CHANGE: some signatures for copy have changed, and copy no
longer automatically reflinks
`which` 5.0.0 seems to have only error message changes.
From
https://packaging.python.org/en/latest/specifications/recording-installed-packages/#recording-installed-packages
> This directory is named as {name}-{version}.dist-info, with name and
version fields corresponding to Core metadata specifications. Both
fields must be normalized (see Package name normalization and PEP 440
for the definition of normalization for each field respectively), and
replace dash (-) characters with underscore (_) characters, so the
.dist-info directory always has exactly one dash (-) character in its
stem, separating the name and version fields.
Follow up to #278