Commit graph

94 commits

Author SHA1 Message Date
Charlie Marsh
af13c83177
Overwrite individual files when reflinking (#556)
Similar to #516, but for individual files.

## Test Plan

Ran:

```sh
cargo run -p puffin-cli -- pip-uninstall plaid-python
mkdir -p /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests
echo "x=1" > /Users/crmarsh/workspace/puffin/.venv/lib/python3.10/site-packages/tests/__init__.py
cargo run -p puffin-cli -- pip-sync requirements.txt --no-cache --verbose
```
2023-12-04 23:59:35 +00:00
Charlie Marsh
3b55d0b295
Deduplicate various .dist-info/METADATA read implementations (#531)
Closes https://github.com/astral-sh/puffin/issues/484.
2023-12-03 21:29:00 -05:00
Zanie Blue
5f1f207628
Recursively merge existing package directories on installation (#516)
Previously, when installing a package we would delete the target
directory before copying (or linking) the contents of the package.
However, this means that we do not properly support namespace packages
which can share a target directory. Instead the last package to be
installed would be override existing packages. Since we install packages
in parallel, this could result in a race condition where the target
directory already exists which is not allowed when using `clonefile`.
See example error in #515.
c7e63d2dce
provides a regression test for this — it fails on `main`.

Here, we implement a recursive merge when the target directory already
exists. Both packages will be installed into the same directory. We no
longer delete the target directory, which seems okay since we uninstall
packages before installing now.

When files conflict, we will likely throw an error still. The correct
behavior to implement in this case is unclear, as if we just take "first
write wins" or "last write wins" we could end up with some files from
one package and some from another resulting in two broken packages. A
possible solution here is to lock the target directories while copying.
2023-11-30 10:14:51 -06:00
konsti
6841c06e2d
Show error paths in install-wheel-rs (#514)
Ensure that we consistently show a path for all io errors in
install-wheel-rs either (preferred) through `fs_err`, or as fallback by
a custom error type. For zip reading errors, we rely on the caller to
add the name and/or location of the wheel.
2023-11-29 20:14:34 +01:00
Charlie Marsh
9d35128840
Use Clippy lint table over Cargo config (#490)
Closes https://github.com/astral-sh/puffin/issues/482.
2023-11-22 15:10:27 +00:00
konsti
7c7daa8f83
Consistent Cargo.toml syntax (#483)
Remove the last Cargo.toml inconsistencies, see
1526b3458a (r1401083681).
Now all `[dependencies]` are workspace dependencies.
2023-11-22 08:34:08 +00:00
Charlie Marsh
8decb29bad
Use a dedicated error type for puffin-distribution (#466) 2023-11-20 11:38:27 +00:00
Charlie Marsh
b1c29447df
Use temp_dir casing everywhere (#440) 2023-11-16 21:04:10 +00:00
Charlie Marsh
06b312de7e
Overwrite existing files when hardlinking (#402)
## Summary

Closes https://github.com/astral-sh/puffin/issues/390.

## Test Plan

Installed `jupyter_core==5.5.0`, then removed the `jupyter_core` and
`jupyter_core-5.5.0.dist-info` directories from my virtualenv manually,
but left `jupyter.py`. I then re-ran `puffin pip-compile`, and verified
that it errored on `main` but succeeded here.
2023-11-10 20:24:19 +00:00
Charlie Marsh
56a4b51eb6
Refactor hardlink fallback to use an enum (#401)
Makes an invalid state unrepresentable (`first_try_hard_linking = true`,
`use_copy_fallback` = true`).
2023-11-10 15:18:51 -05:00
Charlie Marsh
e8108cb28b
Remove __pycache__ directories when uninstalling (#397)
According to the [packaging
documentation](https://packaging.python.org/en/latest/specifications/binary-distribution-format/#binary-distribution-format),
"uninstallers should be smart enough to remove .pyc even if it is not
mentioned in RECORD". Previously, we weren't handling this case, so if
you installed via Puffin, then imported a file (to trigger bytecode
compilation), then uninstalled, we'd leave spare `__pycache__`
directories around.

Closes https://github.com/astral-sh/puffin/issues/395.
2023-11-10 14:55:33 -05:00
Charlie Marsh
b3edf7c2b2
Delete any directories listed in the RECORD file (#394)
## Summary

It looks like, when you install `pip`, it includes a bunch of
`__pycache__` directories in the RECORD file (although these directories
don't exist until you run `pip`). Our uninstaller assumed that the
RECORD file only contained _files_.

Closes https://github.com/astral-sh/puffin/issues/389.
2023-11-10 18:17:52 +00:00
Charlie Marsh
6a15950cb5
Rename Distribution to Dist in all structs and traits (#384)
We tend to avoid abbreviations, but this one is just so long and
absolutely ubiquitous.
2023-11-10 14:55:11 +00:00
konsti
d407bbbee6
Special case missing header build errors (on linux) (#354)
One of the most common errors i observed are build failures due to
missing header files. On ubuntu, this generally means that you need to
install some `<...>-dev` package that the documentation tells you about,
e.g. [mysqlclient](https://github.com/PyMySQL/mysqlclient#linux) needs
`default-libmysqlclient-dev`, [some psycopg
versions](https://www.psycopg.org/psycopg3/docs/basic/install.html#local-installation)
(i remember that this was always required at some earlier point) require
`libpq-dev` and pygraphviz wants `graphviz-dev`. This is quite common
for many scientific packages (where conda has an advantage because they
can provide those package as a dependency).

The error message can be completely inscrutable if you're just a python
programmer (or user) and not a c programmer (example: pygraphviz):

```
warning: no files found matching '*.png' under directory 'doc'
warning: no files found matching '*.txt' under directory 'doc'
warning: no files found matching '*.css' under directory 'doc'
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
pygraphviz/graphviz_wrap.c:3020:10: fatal error: graphviz/cgraph.h: No such file or directory
 3020 | #include "graphviz/cgraph.h"
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
```

The only relevant part is `Fatal error: graphviz/cgraph.h: No such file
or directory`. Why is this file not there and how do i get it to be
there?

This is even harder to spot in pip's output, where it's 11 lines above
the last line:


![image](7a3d7279-e7b1-4511-ab22-d0a35be5e672)

I've special cased missing headers and made sure that the last line
tells you the important information: We're missing some header, please
check the documentation of {package} {version} for what to install:


![image](4bbb8923-5a82-472f-ab1f-9e1471aa2896)

Scrolling up:


![image](89a2495a-e188-4288-b534-ad885ee08763)

The difference gets even clearer with a default ubuntu terminal with its
80 columns:


![image](49fb27bc-07c6-4b10-a1a1-30ec8e112438)

---

Note that the situation is better for a missing compiler, there i get:

```
[...]
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '.svn' found anywhere in distribution
no previously-included directories found matching 'doc/build'
error: command 'gcc' failed: No such file or directory
---
```
Putting the last line into google, the first two results tell me to
`sudo apt-get install gcc`, the third even tells me about `sudo apt
install build-essential`
2023-11-08 15:26:39 +00:00
konsti
fbe28d3b7c
Fix mastodon-py dist-info handling (#336)
mastodon-py 1.5.1 uses a dot in its dist-info dir name, which we
previously didn't handle, causing home-assistant to fail. The new
implementation is based on
2f83540272/src/packaging/utils.py (L146-L172).

Part of #199

```
unzip -l  Mastodon.py-1.5.1-py2.py3-none-any.whl
Archive:  Mastodon.py-1.5.1-py2.py3-none-any.whl
  Length      Date    Time    Name
---------  ---------- -----   ----
   153929  2020-02-29 17:39   mastodon/Mastodon.py
     1029  2019-10-11 19:15   mastodon/__init__.py
     7357  2019-10-11 20:24   mastodon/streaming.py
       10  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/DESCRIPTION.rst
     1398  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/metadata.json
        9  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/top_level.txt
      110  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/WHEEL
     1543  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/METADATA
      753  2020-03-14 18:14   Mastodon.py-1.5.1.dist-info/RECORD
---------                     -------
   166138                     9 files
```
2023-11-07 12:36:11 +01:00
Charlie Marsh
b013ea9c93
Move DirectUrl into pypi-types (#343)
This needs to be reused elsewhere, and there's nothing specific to wheel
installation about it.
2023-11-06 18:26:33 +00:00
Charlie Marsh
d9bcfafa16
Write direct_url.json in wheel installer (#337)
## Summary

This PR just adds the logic in `install-wheel-rs` to write
`direct_url.json`. We're not actually taking advantage of it yet (or
wiring it through) in Puffin.

Part of https://github.com/astral-sh/puffin/issues/332.
2023-11-06 17:09:28 +00:00
konsti
9b077f3d0f
cargo upgrade --incompatible (#330)
Ran `cargo upgrade --incompatible`, seems there are no changes required.

From cacache 0.12.0:
> BREAKING CHANGE: some signatures for copy have changed, and copy no
longer automatically reflinks

`which` 5.0.0 seems to have only error message changes.
2023-11-06 14:14:47 +00:00
konsti
b2439b24a1
Fetch wheel metadata by async range requests on the remote wheel (#301)
Use range requests and async zip to extract the METADATA file from a
remote wheel.

We currently only cache when the remote says the remote declares the
resource as immutable, see
https://github.com/06chaynes/http-cache/issues/57 and
https://github.com/baszalmstra/async_http_range_reader/pull/1 . The
cache is stored as json with the description omitted, this improve cache
deserialization performance.
2023-11-06 15:06:49 +01:00
konsti
e008c43f29
Add PackageName::as_dist_info_name (#305)
From
https://packaging.python.org/en/latest/specifications/recording-installed-packages/#recording-installed-packages

> This directory is named as {name}-{version}.dist-info, with name and
version fields corresponding to Core metadata specifications. Both
fields must be normalized (see Package name normalization and PEP 440
for the definition of normalization for each field respectively), and
replace dash (-) characters with underscore (_) characters, so the
.dist-info directory always has exactly one dash (-) character in its
stem, separating the name and version fields.

Follow up to #278
2023-11-03 08:16:44 +00:00
konsti
4adaa9a700
Wheel filename distribution package name (#278)
The normalized name abstractions were not consistently, this PR uses
them where they were previously missing:
* `WheelFilename::distribution`
* `Requirement::name`
* `Requirement::extras`
* `Metadata21::name`
* `Metadata21::provides_dist`

With `puffin-package` depending on `pep508_rs` this would be cyclical
crate dependency, so `puffin-normalize` gets split out from
`puffin-package`.

`DistInfoName` has the same task and semantics as `PackageName`, so it's
merged into the latter.

`PackageName` and `ExtraName` documentation is moved onto the type and
their constructors are called `new` instead of `normalize`. We now use
these constructors rarely enough the implicit allocation by
`to_string()` shouldn't matter anymore, while more actual cloning
becomes visible.
2023-11-02 11:15:27 +00:00
konsti
c6aa1cd7a3
Only fall back to copy when the first hard linking failed (#268)
Hard linking might not be supported but we (afaik) can't detect this
ahead of time, so we'll try hard linking the first file, if this
succeeds we'll know later hard linking errors are not due to lack of
os/fs support, if it fails we'll switch to copying for the rest of the
install. Follow up to
https://github.com/astral-sh/puffin/pull/237#discussion_r1376705137
2023-11-01 18:35:52 +01:00
Charlie Marsh
3312ce30f5
Upgrade crates and remove unused dependencies (#256) 2023-10-31 13:16:58 -04:00
konsti
35d6bd761b
Fallback to copy if hardlinking failed (#237) 2023-10-30 19:10:01 +00:00
Charlie Marsh
ffbf6b6c16
Avoid symlinking RECORD file (#227)
This is the one file that gets modified during installation. Hardlinking
it is bad!
2023-10-30 03:10:38 +00:00
konsti
889f6173cc
Unify python interpreter abstractions (#178)
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.

Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
2023-10-25 20:11:36 +00:00
konsti
1fbe328257
Build source distributions in the resolver (#138)
This is isn't ready, but it can resolve
`meine_stadt_transparent==0.2.14`.

The source distributions are currently being built serially one after
the other, i don't know if that is incidentally due to the resolution
order, because sdist building is blocking or because of something in the
resolver that could be improved.

It's a bit annoying that the thing that was supposed to do http requests
now suddenly also has to a whole download/unpack/resolve/install/build
routine, it messes up the type hierarchy. The much bigger problem though
is avoid recursive crate dependencies, it's the reason for the callback
and for splitting the builder into two crates (badly named atm)
2023-10-25 20:05:13 +00:00
konstin
815c2117c8 Clippy 2023-10-23 13:54:31 +02:00
Charlie Marsh
49a27ff33c
Add support for parameterized link modes (#164)
Allows the user to select between clone, hardlink, and copy semantics
for installs. (The pnpm documentation has a decent description of what
these mean: https://pnpm.io/npmrc#package-import-method.)

Closes #159.
2023-10-22 04:35:50 +00:00
konsti
ae9d1f7572
Add source distribution filename abstraction (#154)
The need for this became clear when working on the source distribution
integration into the resolver.

While at it i also switch the `WheelFilename` version to the parsed
`pep440_rs` version now that we have this crate.
2023-10-20 17:45:57 +02:00
Charlie Marsh
4645f79237
Use FxHash (#151) 2023-10-20 05:26:06 +00:00
Charlie Marsh
7ef6c0315c
Unify site-packages into distribution enum (#136)
Gets rid of the custom `DistInfo` struct in the site-packages
abstraction in favor of a new kind of distribution
(`InstalledDistribution`). No change in behavior.
2023-10-19 04:37:52 +00:00
Charlie Marsh
573f5832a3
Allow uninstall to take multiple packages and files (#125)
Moves the command to `puffin pip-uninstall` for now to separate from the
managed interface, and redoes the command output.
2023-10-18 22:30:11 -04:00
Charlie Marsh
7e8ffeb2df
Use fs-err in more crates (#100)
Closes https://github.com/astral-sh/puffin/issues/88.
2023-10-16 13:37:58 +00:00
Charlie Marsh
85162d1111
Parallelize wheel installations with Rayon (#84)
It looks like using _either_ async Rust with a `JoinSet` _or_
parallelizing a fixed threadpool with Rayon provide about a ~5% speed-up
over our current serial approach:

```console
❯ hyperfine --runs 30 --warmup 5 --prepare "./target/release/puffin venv .venv" \
  "./target/release/rayon sync ./scripts/benchmarks/requirements-large.txt" \
  "./target/release/async sync ./scripts/benchmarks/requirements-large.txt" \
  "./target/release/main sync ./scripts/benchmarks/requirements-large.txt"
Benchmark 1: ./target/release/rayon sync ./scripts/benchmarks/requirements-large.txt
  Time (mean ± σ):     295.7 ms ±  16.9 ms    [User: 28.6 ms, System: 263.3 ms]
  Range (min … max):   249.2 ms … 315.9 ms    30 runs

Benchmark 2: ./target/release/async sync ./scripts/benchmarks/requirements-large.txt
  Time (mean ± σ):     296.2 ms ±  20.2 ms    [User: 36.1 ms, System: 340.1 ms]
  Range (min … max):   258.0 ms … 359.4 ms    30 runs

Benchmark 3: ./target/release/main sync ./scripts/benchmarks/requirements-large.txt
  Time (mean ± σ):     306.6 ms ±  19.5 ms    [User: 25.3 ms, System: 220.5 ms]
  Range (min … max):   269.6 ms … 332.2 ms    30 runs

Summary
  './target/release/rayon sync ./scripts/benchmarks/requirements-large.txt' ran
    1.00 ± 0.09 times faster than './target/release/async sync ./scripts/benchmarks/requirements-large.txt'
    1.04 ± 0.09 times faster than './target/release/main sync ./scripts/benchmarks/requirements-large.txt'
```

It's much easier to just parallelize with Rayon and avoid async in the
underlying wheel code, so this PR takes that approach for now.
2023-10-10 23:46:30 -04:00
Charlie Marsh
a0294a510c
Rework puffin sync output to summarize (#81)
This also moves away from using `tracing` for user-facing logging,
instead introducing a new `Printer` abstraction.

Closes #66.
2023-10-10 03:29:09 +00:00
Charlie Marsh
ba2b200fce
Enable release builds via cargo-dist (#79) 2023-10-09 20:48:55 +00:00
Charlie Marsh
b90140e1bc
Add support for wheel uninstalls (#77)
Closes #36.
2023-10-09 14:14:33 -04:00
Charlie Marsh
ba72950546
Avoid passing cached wheels to the resolver step (#70)
When we go to install a locked `requirements.txt`, if a wheel is already
available in the local cache, and matches the version specifiers, we can
just use it directly without fetching the package metadata. This speeds
up the no-op case by about 33%.

Closes https://github.com/astral-sh/puffin/issues/48.
2023-10-08 22:17:19 -04:00
Charlie Marsh
5b71cfdd0b
Remove Monotrail-specific code from install-wheel-rs (#68)
I think this isn't necessary to support in this generic crate. If we
choose to adopt Monotrail-style concepts, we'll likely need to rework
them anyway.
2023-10-08 18:28:57 -04:00
Charlie Marsh
adbee4fb32
Use recursive clonefile calls on macOS (#67)
It turns out that on macOS, you can pass `clonefile` a directory to
recursively copy an entire directory. This speeds up wheel installation
dramatically, by about 3x.
2023-10-08 21:44:02 +00:00
Charlie Marsh
2a846e76b7
Store unzipped wheels in a cache (#49)
This PR massively speeds up the case in which you need to install wheels
that already exist in the global cache.

The new strategy is as follows:

- Download the wheel into the content-addressed cache.
- Unzip the wheel into the cache, but ignore content-addressing. It
turns out that writing to `cacache` for every file in the zip added a
ton of overhead, and I don't see any actual advantages to doing so.
Instead, we just unzip the contents into a directory at, e.g.,
`~/.cache/puffin/django-4.1.5`.
- (The unzip itself is now parallelized with Rayon.)
- When installing the wheel, we now support unzipping from a directory
instead of a zip archive. This required duplicating and tweaking a few
functions.
- When installing the wheel, we now use reflinks (or copy-on-write
links). These have a few fantastic properties: (1) they're extremely
cheap to create (on macOS, they are allegedly faster than hard links);
(2) they minimize disk space, since we avoid copying files entirely in
the vast majority of cases; and (3) if the user then edits a file
locally, the cache doesn't get polluted. Orogene, Bun, and soon pnpm all
use reflinks.

Puffin is now ~15x faster than `pip` for the common case of installing
cached data into a fresh environment.

Closes https://github.com/astral-sh/puffin/issues/21.

Closes https://github.com/astral-sh/puffin/issues/39.
2023-10-08 04:04:48 +00:00
Charlie Marsh
ae28552b3a
Use local copy of install-wheel-rs (#34)
This PR modifies the `install-wheel-rs` (and a few other crates) to get
everything playing nicely. Specifically, CI should pass, and all these
crates now use workspace dependencies between one another.

As part of this change, I split out the wheel name parsing into its own
`wheel-filename` crate, and the compatibility tag parsing into its own
`platform-tags` crate.
2023-10-07 01:43:55 +00:00
Charlie Marsh
e824fe6d2b
Copy over install-wheel-rs crate (#33)
This PR copies over the `install-wheel-rs` crate at commit
`10730ea1a84c58af6b35fb74c89ed0578ab042b6` with no modifications.

It won't pass CI, but modifications will intentionally be confined to
later PRs.
2023-10-06 21:38:38 -04:00