Commit graph

41 commits

Author SHA1 Message Date
Bas Schoenmaeckers
1218766ea5
Clone individual files on windows ReFS (#3551)
Windows does not support cloning whole directories so clone each file
instead.

closes #3547 

## Test Plan

Ran ` uv pip install setuptools --link-mode=clone` manually
2024-05-13 12:03:39 -04:00
konsti
f63a1fde42
Fix span recording with install_wheel_rs (#3293)
We would only record spans for `uv` prefixed crates, while the rayon
operations are in `install_wheel_rs`, therefore "disappearing" spans
from rayon.

With these changes, we can profile the parallel installation:


![jupyter](bb3c626a-ba9a-443d-9727-ac1e3bf14a71)
2024-04-28 18:33:40 +00:00
Charlie Marsh
ad99e3af63
Create --target directories lazily (#3274)
## Summary

Based on feedback in
https://github.com/astral-sh/uv/pull/3257#issuecomment-2078560574.
2024-04-26 03:36:56 +00:00
Charlie Marsh
7fb2bf816f
Add JSON Schema support (#3046)
## Summary

This PR adds JSON Schema support. The setup mirrors Ruff's own.
2024-04-17 17:24:41 +00:00
Charlie Marsh
b3f98d5e05
Use kebab-case for serde enums (#3080)
By default, these serialize as (e.g.) `LowestDirect`. This now matches
the format we use in Ruff.
2024-04-17 01:13:39 +00:00
Charlie Marsh
295b58ad37
Add uv-workspace crate with settings discovery and deserialization (#3007)
## Summary

This PR adds basic struct definitions along with a "workspace" concept
for discovering settings. (The "workspace" terminology is used to match
Ruff; I did not invent it.)

A few notes:

- We discover any `pyproject.toml` or `uv.toml` file in any parent
directory of the current working directory. (We could adjust this to
look at the directories of the input files.)
- We don't actually do anything with the configuration yet; but those
PRs are large and I want this to be reviewed in isolation.
2024-04-16 13:56:47 -04:00
konsti
d2da575c41
Log hardlink failures (#3015)
Inspired by https://github.com/astral-sh/uv/issues/2964, we now properly
log hardlink failures, e.g. when the cache is a docker container but the
venv is in a bind mount, e.g.:

```
DEBUG Failed to hardlink `/code/venv/uv/lib/python3.12/site-packages/asgiref-3.8.1.dist-info/WHEEL` to `/root/.cache/uv/archive-v0/nnpkKgUoM3LMxcNDmEKJQ/asgiref-3.8.1.dist-info/WHEEL`, attempting to copy files as a fallback
```
2024-04-12 15:06:38 +00:00
Zanie Blue
baa30697a4
Ensure mtime of site packages is updated during wheel installation (#2545)
Closes https://github.com/astral-sh/uv/issues/2530
2024-03-20 00:54:05 +00:00
Charlie Marsh
492ffbf997
Loosen .dist-info validation to accept arbitrary versions (#2441)
## Summary

It turns out that pip does _not_ validate the normalization of the
version specifier in the `.dist-info` directory. In particular, it seems
that some tools replace the `+` in a local version segment with a `_`.

Closes https://github.com/astral-sh/uv/issues/2424.
2024-03-14 09:04:39 -04:00
Charlie Marsh
5fed1f6259
Use simpler pip-like Scheme for install paths (#2173)
## Summary

This will make it easier to use the paths returned by `distutils.py`
(for some cases). No code or behavior changes; just removing some fields
we don't need.
2024-03-04 15:50:13 -05:00
Charlie Marsh
6c23cffd07
Avoid assuming RECORD file is in platlib (#2091)
## Summary

This was a missed find-and-replace. We shouldn't assume `layout.platlib`
here, since `RECORD` will be written to `site_packages` (which could be
`layout.purelib`).

This is hard to reproduce. You need a _fresh_ environment where
`purelib` and `platlib` differ (which isn't the case for virtualenvs, at
least typically), and you need to be installing a new package that is a
purelib. I tested it by manually changing `platlib` to point to a
different path.

Closes https://github.com/astral-sh/uv/issues/2064.
2024-02-29 17:21:49 +00:00
Charlie Marsh
10175143d1
Add a --python flag to allow installation into arbitrary Python interpreters (#2000)
## Summary

This PR adds a `--python` flag that allows users to provide a specific
Python interpreter into which `uv` should install packages. This would
replace the `VIRTUAL_ENV=` workaround that folks have been using to
install into arbitrary, system environments, while _also_ actually being
correct for installing into non-virtual environments, where the bin and
site-packages paths can differ.

The approach taken here is to use `sysconfig.get_paths()` to get the
correct paths from the interpreter, and then use those for determining
the `bin` and `site-packages` directories, rather than constructing them
based on hard-coded expectations for each platform.

Closes https://github.com/astral-sh/uv/issues/1396.

Closes https://github.com/astral-sh/uv/issues/1779.

Closes https://github.com/astral-sh/uv/issues/1988.

## Test Plan

- Verified that, on my Windows machine, I was able to install `requests`
into a global environment with: `cargo run pip install requests --python
'C:\\Users\\crmarsh\\AppData\\Local\\Programs\\Python\\Python3.12\\python.exe`,
then `python` and `import requests`.
- Verified that, on macOS, I was able to install `requests` into a
global environment installed via Homebrew with: `cargo run pip install
requests --python $(which python3.8)`.
2024-02-28 02:10:29 +00:00
Charlie Marsh
5997d0da3d
Remove some unused code from install-wheel-rs (#2001)
I need to make a bunch of changes to this crate, and I'm finding that
the existing unused interfaces are really getting in the way.
2024-02-27 04:27:25 +00:00
konsti
11ed4f7183
Generate versioned pip launchers (#1918)
Users expect pip to have `pip`, `pip3` and `pip3.x` entrypoints. But pip
is a universal wheel, so it contains the `pip3.x` entrypoint where it
was built on. To fix this, pip special cases itself when installing
(3898741e29/src/pip/_internal/operations/install/wheel.py (L283)),
replacing the wheel entrypoint with one for the current version. We now
do the same.

Fixes #1593
2024-02-23 18:01:31 +00:00
Jane Lewis
da3a7ec801
Linker copies files as a fallback when ref-linking fails (#1773)
## Summary

Fixes #1444.

In situations where the installer fails to perform a reflink, a regular
file copy is also attempted, as a fallback. This circumvents issues with
linking files across filesystems or volumes.

## Test Plan
N/A
2024-02-21 21:57:31 -05:00
Charlie Marsh
4e0b6f8f84
Avoid attempting rename in copy fallback path (#1546)
## Summary

This _could_ fix https://github.com/astral-sh/uv/issues/1454, but I'm
not sure. I was able to replicate by forcing a bunch of error states.
But, in short, if we fail to hardlink on the initial copy due to a file
existing, and then fail _again_, we fallback to copying. But if we copy,
then the tempfile doesn't exist, and so the `fs_err::rename(&tempfile,
&out_path)?;` will fail with "File not found".

This PR just ensures that the cases are explicitly mutually exclusive:
we only attempt to rename if the hardlink succeeded.
2024-02-16 17:08:49 -05:00
Zanie Blue
2586f655bb
Rename to uv (#1302)
First, replace all usages in files in-place. I used my editor for this.
If someone wants to add a one-liner that'd be fun.

Then, update directory and file names:

```
# Run twice for nested directories
find . -type d -print0 | xargs -0 rename s/puffin/uv/g
find . -type d -print0 | xargs -0 rename s/puffin/uv/g

# Update files
find . -type f -print0 | xargs -0 rename s/puffin/uv/g
```

Then add all the files again

```
# Add all the files again
git add crates
git add python/uv

# This one needs a force-add
git add -f crates/uv-trampoline
```
2024-02-15 11:19:46 -06:00
Charlie Marsh
be9125b0f0
Remove unnecessary is_dir in clone_recursive (#1247) 2024-02-04 23:54:22 +00:00
Charlie Marsh
bb49ebee1e
Avoid race condition in clone file replacement (#1229)
## Summary

I've never seen this in practice but in theory it is possible, and we
have the same guardrail in the hardlink path.
2024-02-01 10:55:23 -05:00
Charlie Marsh
d243250dec
Avoid unnecessary permissions changes for copy paths (#1152)
In Rust, `fs::copy` automatically preserves permissions (see:
https://doc.rust-lang.org/std/fs/fn.copy.html).

Elsewhere, when copying from the zip archive out to the cache, we can
set permissions during file creation, rather than as a separate call.

Both of these should be slightly more efficient.
2024-01-27 22:11:55 -05:00
konsti
39021263dd
Windows launchers using posy trampolines (#1092)
## Background

In virtual environments, we want to install python programs as console
commands, e.g. `black .` over `python -m black .`. They may be called
[entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/)
or scripts. For entrypoints, we're given a module name and function to
call in that module.

On Unix, we generate a minimal python script launcher. Text files are
runnable on unix by adding a shebang at their top, e.g.

```python
#!/usr/bin/env python
```

will make the operating system run the file with the current python
interpreter. A venv launcher for black in `/home/ferris/colorize/.venv`
(module name: `black`, function to call: `patched_main`) would look like
this:

```python
#!/home/ferris/colorize/.venv/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == "__main__":
    sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
    sys.exit(patched_main())
```

On windows, this doesn't work, we can only rely on launching `.exe`
files.

## Summary

We use posy's rust implementation of a trampoline, which is based on
distlib's c++ implementation. We pre-build a minimal exe and append the
launcher script as stored zip archive behind it. The exe will look for
the venv python interpreter next to it and use it to execute the
appended script.

The changes in this PR make the `black` entrypoint work:

```powershell
cargo run -- venv .venv
cargo run -q -- pip install black
.\.venv\Scripts\black --version
```

Integration with our existing tests will be done in follow-up PRs.

## Implementation and Details

I've vendored the posy trampoline crate. It is a formatted, renamed and
slightly changed for embedding version of
https://github.com/njsmith/posy/pull/28.

The posy launchers are smaller than the distlib launchers, 16K vs 106K
for black. Currently only `x86_64-pc-windows-msvc` is supported. The
crate requires a nightly compiler for its no-std binary size tricks.

On windows, an application can be launched with a console or without (to
create windows instead), which needs two different launchers. The gui
launcher will subsequently use `pythonw.exe` while the console launcher
uses `python.exe`.
2024-01-26 13:54:11 +00:00
Charlie Marsh
69c72b6fa1
Validate wheel metadata against filename (#1002)
Closes #983.
2024-01-19 05:48:55 +00:00
konsti
5051b2c004
Use tempfile to prevent install io race crashes (#929)
On ubuntu and python 3.10,

```
cargo run -q -- pip-install --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html "jax[cuda12_pip]==0.4.23"
```

non-deterministically but for me consistently fails to install with
messages such as

```
error: Failed to install: nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (nvidia-nccl-cu12==2.19.3)
  Caused by: failed to remove file `/home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py`
  Caused by: No such file or directory (os error 2)
```

```
error: Failed to install: nvidia_cublas_cu12-12.3.4.1-py3-none-manylinux1_x86_64.whl (nvidia-cublas-cu12==12.3.4.1)
  Caused by: Replacing an existing file or directory failed
```

```
error: Failed to install: nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64.whl (nvidia-cuda-nvcc-cu12==12.3.107)
  Caused by: failed to hardlink file from /home/konsti/.cache/puffin/wheels-v0/pypi/nvidia-cuda-nvcc-cu12/nvidia_cuda_nvcc_cu12-12.3.107-py3-none-manylinux1_x86_64/nvidia/__init__.py to /home/konsti/projects/puffin/.venv/lib/python3.10/site-packages/nvidia/__init__.py
  Caused by: File exists (os error 17)
```

We install a lot of nvidia package, that all contain
`nvidia/__init__.py`, since they all install themselves into the
`nvidia` module:

```
nvidia-cublas-cu12==12.3.4.1
nvidia-cuda-cupti-cu12==12.3.101
nvidia-cuda-nvcc-cu12==12.3.107
nvidia-cuda-nvrtc-cu12==12.3.107
nvidia-cuda-runtime-cu12==12.3.101
nvidia-cudnn-cu12==8.9.7.29
nvidia-cufft-cu12==11.0.12.1
nvidia-cusolver-cu12==11.5.4.101
nvidia-cusparse-cu12==12.2.0.103
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
```

```
$  tree -L 1 .venv/lib/python3.10/site-packages/nvidia
.venv/lib/python3.10/site-packages/nvidia
├── cublas
├── cuda_cupti
├── cuda_nvcc
├── cuda_nvrtc
├── cuda_runtime
├── cudnn
├── cufft
├── cusolver
├── cusparse
├── __init__.py
├── nccl
└── nvjitlink
```

When installing we get a race condition, each package installation is
its own thread:
* Installer Thread 1 creates `nvidia/__init__.py`
* Installer Thread 2 sees an existing  `nvidia/__init__.py`
* Installer Thread 3 sees an existing  `nvidia/__init__.py`
* Installer Thread 2 removes `nvidia/__init__.py`
* Installer Thread 3 tries to remove `nvidia/__init__.py`, it doesn't
exist anymore -> failure.

We switch to a new strategy: When the target files exists, we don't
remove it, but instead hardlink the source file to a tempfile first,
then renaming the tempfile to the target file. Renaming is considered an
atomic operation.

I've put the logging on debug level because they cases indicate a
conflict between two packages while being rare.

Closes #925

---------

Co-authored-by: Charlie Marsh <charlie.r.marsh@gmail.com>
2024-01-16 21:07:39 +00:00
Charlie Marsh
c06bf658bb
Remove some filesystem calls from the installer (#834)
Noticed these when working on something unrelated. Generally:

- Prefer `entry.file_type()` over `entry.path().is_file()` or similar,
as the former is almost always free on Unix.
- Call `entry.path()` once, since it allocates internally (returns a
`PathBuf`).
2024-01-08 12:59:01 -05:00
konsti
3f587156ec
Improve install instrumentation (#829)
Add tracing spans to different phases of the wheel installation.
2024-01-08 10:13:59 +00:00
Charlie Marsh
02b157085e
Add INSTALLER file to install-wheel-rs (#760)
See:
https://packaging.python.org/en/latest/specifications/recording-installed-packages/#the-installer-file
2024-01-03 17:30:54 -05:00
konsti
26f597a787
Add spans to all significant tasks (#740)
I've tried to investigate puffin's performance wrt to builds and
parallelism in general, but found the previous instrumentation to
granular. I've tried to add spans to every function that either needs
noticeable io or cpu resources without creating duplication. This also
fixes some wrong tracing usage on async functions
(https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code)
and some spans that weren't actually entered.
2024-01-02 16:17:03 +00:00
Charlie Marsh
af13c83177
Overwrite individual files when reflinking (#556)
Similar to #516, but for individual files.

## Test Plan

Ran:

```sh
cargo run -p puffin-cli -- pip-uninstall plaid-python
mkdir -p /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests
echo "x=1" > /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests/__init__.py
cargo run -p puffin-cli -- pip-sync requirements.txt --no-cache --verbose
```
2023-12-04 23:59:35 +00:00
Charlie Marsh
3b55d0b295
Deduplicate various .dist-info/METADATA read implementations (#531)
Closes https://github.com/astral-sh/puffin/issues/484.
2023-12-03 21:29:00 -05:00
Zanie Blue
5f1f207628
Recursively merge existing package directories on installation (#516)
Previously, when installing a package we would delete the target
directory before copying (or linking) the contents of the package.
However, this means that we do not properly support namespace packages
which can share a target directory. Instead the last package to be
installed would be override existing packages. Since we install packages
in parallel, this could result in a race condition where the target
directory already exists which is not allowed when using `clonefile`.
See example error in #515.
c7e63d2dce
provides a regression test for this — it fails on `main`.

Here, we implement a recursive merge when the target directory already
exists. Both packages will be installed into the same directory. We no
longer delete the target directory, which seems okay since we uninstall
packages before installing now.

When files conflict, we will likely throw an error still. The correct
behavior to implement in this case is unclear, as if we just take "first
write wins" or "last write wins" we could end up with some files from
one package and some from another resulting in two broken packages. A
possible solution here is to lock the target directories while copying.
2023-11-30 10:14:51 -06:00
konsti
6841c06e2d
Show error paths in install-wheel-rs (#514)
Ensure that we consistently show a path for all io errors in
install-wheel-rs either (preferred) through `fs_err`, or as fallback by
a custom error type. For zip reading errors, we rely on the caller to
add the name and/or location of the wheel.
2023-11-29 20:14:34 +01:00
Charlie Marsh
06b312de7e
Overwrite existing files when hardlinking (#402)
## Summary

Closes https://github.com/astral-sh/puffin/issues/390.

## Test Plan

Installed `jupyter_core==5.5.0`, then removed the `jupyter_core` and
`jupyter_core-5.5.0.dist-info` directories from my virtualenv manually,
but left `jupyter.py`. I then re-ran `puffin pip-compile`, and verified
that it errored on `main` but succeeded here.
2023-11-10 20:24:19 +00:00
Charlie Marsh
56a4b51eb6
Refactor hardlink fallback to use an enum (#401)
Makes an invalid state unrepresentable (`first_try_hard_linking = true`,
`use_copy_fallback` = true`).
2023-11-10 15:18:51 -05:00
konsti
d407bbbee6
Special case missing header build errors (on linux) (#354)
One of the most common errors i observed are build failures due to
missing header files. On ubuntu, this generally means that you need to
install some `<...>-dev` package that the documentation tells you about,
e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs
`default-libmysqlclient-dev`, [some psycopg
versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation)
(i remember that this was always required at some earlier point) require
`libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common
for many scientific packages (where conda has an advantage because they
can provide those package as a dependency).

The error message can be completely inscrutable if you're just a python
programmer (or user) and not a c programmer (example: pygraphviz):

```
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory
 3020 | #include "graphviz/cgraph.h"
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
```

The only relevant part is `Fatal error: graphviz/cgraph.h: No such file
or directory`. Why is this file not there and how do i get it to be
there?

This is even harder to spot in pip's output, where it's 11 lines above
the last line:


![image](7a3d7279-e7b1-4511-ab22-d0a35be5e672)

I've special cased missing headers and made sure that the last line
tells you the important information: We're missing some header, please
check the documentation of {package} {version} for what to install:


![image](4bbb8923-5a82-472f-ab1f-9e1471aa2896)

Scrolling up:


![image](89a2495a-e188-4288-b534-ad885ee08763)

The difference gets even clearer with a default ubuntu terminal with its
80 columns:


![image](49fb27bc-07c6-4b10-a1a1-30ec8e112438)

---

Note that the situation is better for a missing compiler, there i get:

```
[...]
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
error: command 'gcc' failed: No such file or directory
---
```
Putting the last line into google, the first two results tell me to
`sudo apt-get install gcc`, the third even tells me about `sudo apt
install build-essential`
2023-11-08 15:26:39 +00:00
Charlie Marsh
b013ea9c93
Move DirectUrl into pypi-types (#343)
This needs to be reused elsewhere, and there's nothing specific to wheel
installation about it.
2023-11-06 18:26:33 +00:00
Charlie Marsh
d9bcfafa16
Write direct_url.json in wheel installer (#337)
## Summary

This PR just adds the logic in `install-wheel-rs` to write
`direct_url.json`. We're not actually taking advantage of it yet (or
wiring it through) in Puffin.

Part of https://github.com/astral-sh/puffin/issues/332.
2023-11-06 17:09:28 +00:00
konsti
c6aa1cd7a3
Only fall back to copy when the first hard linking failed (#268)
Hard linking might not be supported but we (afaik) can't detect this
ahead of time, so we'll try hard linking the first file, if this
succeeds we'll know later hard linking errors are not due to lack of
os/fs support, if it fails we'll switch to copying for the rest of the
install. Follow up to
https://github.com/astral-sh/puffin/pull/237#discussion_r1376705137
2023-11-01 18:35:52 +01:00
konsti
35d6bd761b
Fallback to copy if hardlinking failed (#237) 2023-10-30 19:10:01 +00:00
Charlie Marsh
ffbf6b6c16
Avoid symlinking RECORD file (#227)
This is the one file that gets modified during installation. Hardlinking
it is bad!
2023-10-30 03:10:38 +00:00
konsti
889f6173cc
Unify python interpreter abstractions (#178)
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.

Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
2023-10-25 20:11:36 +00:00
Charlie Marsh
49a27ff33c
Add support for parameterized link modes (#164)
Allows the user to select between clone, hardlink, and copy semantics
for installs. (The pnpm documentation has a decent description of what
these mean: https://pnpm.io/npmrc#package-import-method.)

Closes #159.
2023-10-22 04:35:50 +00:00
Renamed from crates/install-wheel-rs/src/unpacked.rs (Browse further)