Commit graph

709 commits

Author SHA1 Message Date
Zanie
dd5b5fd6e3 Baseline 2024-01-26 14:48:16 -06:00
Zanie
3e74c93bb7 Reduce changes to the minimum necessary 2024-01-26 14:43:23 -06:00
Zanie
ca97e0ec3e Make sure for sure that the versions are sorted going into simplify 2024-01-26 14:11:49 -06:00
Zanie
78004ba66f Order insertions into the available versions map 2024-01-26 14:11:47 -06:00
Zanie Blue
5cc4e5d31e
Add pip compile test where specific Python versions are available on the system (#1111)
Extends https://github.com/astral-sh/puffin/pull/1106 with the scenario
from https://github.com/zanieb/packse/pull/95 which tests that `pip
compile` will use the matching system Python version for builds when
available
2024-01-26 18:38:24 +00:00
Zanie Blue
91f421cf97
Do not allow pip compile scenario tests to discover other Python versions (#1106)
In https://github.com/astral-sh/puffin/pull/1040 we broke the pip
compile scenarios designed to test failure when a required Python
version is not available — resolution succeeded because all of the
Python versions were available in CI. Following #1105 we have the
ability to isolate tests from Python versions available in the system.
Here, we limit the scenarios to only the Python version in the current
environment, restoring our ability to test the error messages.

With https://github.com/zanieb/packse/pull/95, we will be able to
specify scenarios with access to additional system Python versions. This
will allow us to include test coverage where resolution can succeed by
using a version available elsewhere on the system. See #1111 for this
follow-up.
2024-01-26 18:18:15 +00:00
Zanie Blue
21577ad002
Add bootstrapping and isolation of development Python versions (#1105)
Replaces https://github.com/astral-sh/puffin/pull/1068 and #1070 which
were more complicated than I wanted.

- Introduces a `.python-versions` file which defines the Python versions
needed for development
- Adds a Bash script at `scripts/bootstrap/install` which installs the
required Python versions from `python-build-standalone` to `./bin`
- Checks in a `versions.json` file with metadata about available
versions on each platform and a `fetch-version` Python script derived
from `rye` for updating the versions
- Updates CI to use these Python builds instead of the `setup-python`
action
- Updates to the latest packse scenarios which require Python 3.8+
instead of 3.7+ since we cannot use 3.7 anymore and includes new test
coverage of patch Python version requests
- Adds a `PUFFIN_PYTHON_PATH` variable to prevent lookup of system
Python versions for isolation during development

Tested on Linux (via CI) and macOS (locally) — presumably it will be a
bit more complicated to do proper Windows support.
2024-01-26 12:12:48 -06:00
Charlie Marsh
cc0e211074
Avoid embedding launcher scripts on non-Windows (#1124)
Just to reduce binary size on all other platforms.
2024-01-26 17:19:05 +01:00
Charlie Marsh
f946d46273
Avoid allocating a max-size buffer (#1123)
This seems potentially-dangerous with no upside.
2024-01-26 14:27:19 +00:00
konsti
39021263dd
Windows launchers using posy trampolines (#1092)
## Background

In virtual environments, we want to install python programs as console
commands, e.g. `black .` over `python -m black .`. They may be called
[entrypoints](https://packaging.python.org/en/latest/specifications/entry-points/)
or scripts. For entrypoints, we're given a module name and function to
call in that module.

On Unix, we generate a minimal python script launcher. Text files are
runnable on unix by adding a shebang at their top, e.g.

```python
#!/usr/bin/env python
```

will make the operating system run the file with the current python
interpreter. A venv launcher for black in `/home/ferris/colorize/.venv`
(module name: `black`, function to call: `patched_main`) would look like
this:

```python
#!/home/ferris/colorize/.venv/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from black import patched_main
if __name__ == "__main__":
    sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])
    sys.exit(patched_main())
```

On windows, this doesn't work, we can only rely on launching `.exe`
files.

## Summary

We use posy's rust implementation of a trampoline, which is based on
distlib's c++ implementation. We pre-build a minimal exe and append the
launcher script as stored zip archive behind it. The exe will look for
the venv python interpreter next to it and use it to execute the
appended script.

The changes in this PR make the `black` entrypoint work:

```powershell
cargo run -- venv .venv
cargo run -q -- pip install black
.\.venv\Scripts\black --version
```

Integration with our existing tests will be done in follow-up PRs.

## Implementation and Details

I've vendored the posy trampoline crate. It is a formatted, renamed and
slightly changed for embedding version of
https://github.com/njsmith/posy/pull/28.

The posy launchers are smaller than the distlib launchers, 16K vs 106K
for black. Currently only `x86_64-pc-windows-msvc` is supported. The
crate requires a nightly compiler for its no-std binary size tricks.

On windows, an application can be launched with a console or without (to
create windows instead), which needs two different launchers. The gui
launcher will subsequently use `pythonw.exe` while the console launcher
uses `python.exe`.
2024-01-26 13:54:11 +00:00
konsti
f1d3b08c12
Add missing version to pip sync test (#1121)
The test started failing due to a newer version on pypi.
2024-01-26 13:36:25 +00:00
Charlie Marsh
361a2039d2
Add --no-annotate and --no-header flags (#1117)
Closes #1107.
Closes #1108.
2024-01-26 12:14:18 +00:00
Charlie Marsh
7755f986c3
Support extras in editable requirements (#1113)
## Summary

This PR adds support for requirements like `-e .[d]`.

Closes #1091.
2024-01-26 12:07:51 +00:00
Charlie Marsh
f593b65447
Remove refresh checks from the install plan (#1119)
## Summary

Rather than checking cache freshness in the install plan, it's a lot
simple to have the install plan _never_ return cached data when the
refresh policy is in place, and then rely on the distribution database
to check for freshness. The original implementation didn't support this,
since the distribution database was rebuilding things too often. Now, it
rarely rebuilds (it's much better about this), so it seems conceptually
much simpler to split up the responsibilities like this.
2024-01-25 22:48:16 -05:00
Charlie Marsh
50057cd5f2
Re-add Cargo's known hosts checking (#1118)
## Summary

This ensures that (like Cargo) we don't suffer from
https://github.com/advisories/GHSA-r5w3-xm58-jv6j, by way of checking
known hosts when fetching via `libgit2`.

The implementation is taken from Cargo itself, modified to remove all
configuration, since we don't yet support configuration for known hosts,
etc.

Closes #285.
2024-01-25 22:29:36 -05:00
Charlie Marsh
67b41427cc
Store source distribution directly in the cache (#1116)
I want to move towards using the archive bucket exclusively for wheels.
We never overwrite source distributions, so there's no need to symlink
them.
2024-01-25 20:52:31 -05:00
Charlie Marsh
77351c7874
Use snapshots for requirements.txt error tests (#1115)
## Summary

I find these too difficult to edit and maintain. This brings them closer
to the rest of our testing setups.
2024-01-25 20:35:52 -05:00
Charlie Marsh
57c116ee9a
Move Black editable to flit backend (#1114)
I ran into a bug in PDM that's making it impossible to use the Black
example for extras: https://github.com/pdm-project/pdm/issues/2591.

I've confirmed that Flit handles it correctly.
2024-01-25 19:54:54 -05:00
Zanie Blue
3a05ef5285
Add venv tests for missing Python versions (#1096)
These demonstrate some lackluster error messages.
2024-01-25 13:57:05 -06:00
Charlie Marsh
f36c167982
Use a consolidated error for distribution failures (#1104)
## Summary

Use a single error type in `puffin_distribution`, rather than two
confusingly similar types between `DistributionDatabase` and the source
distribution module.

Also removes the `#[from]` for IO errors and replaces with explicit
wrapping, which is verbose but removes a bunch of incorrect error
messages.
2024-01-25 14:49:11 -05:00
Charlie Marsh
8ef819e07e
Remove Option wrapper from requirement extras (#1103)
There's no semantic difference between `None` and empty, so seems
simpler to represent this way.
2024-01-25 13:21:53 -05:00
Andrew Gallant
067acfe79e
puffin-client: rejigger error type (#1102)
This PR changes the error type to be boxed internally so that it uses
less size on the stack. This makes functions returning `Result<T,
Error>`, in particular, return something much smaller.

The specific thing that motivated this was Clippy lints firing when I
tried to refactor code in this crate.

I chose to achieve boxing by splitting the enum out into a separate
type, and then wiring up the necessary `From` impl to make error
conversions easy, and then making `Error` itself opaque. We could expose
the `Box`, but there isn't a ton of benefit in doing so because one
cannot pattern match through a `Box`.

This required using more explicit error conversions in several places.
And as a result, I was able to remove all `#[from]` attributes on
non-transparent error variants.
2024-01-25 13:13:21 -05:00
Charlie Marsh
3e86c80874
Set buffer size when unzipping (#1101)
The zip archive includes an uncompressed size header, which we can use
to preallocate.
2024-01-25 17:58:36 +00:00
Charlie Marsh
e0902d7d5a
Make puffin-fs tokio dependency opt-in (#1100) 2024-01-25 12:47:46 -05:00
Charlie Marsh
5ad2e60561
Use same-file to detect interpreter shims (#1099)
Our existing detection doesn't work on Windows, because we canoncalize
the interpreter path but not `info.sys_executable`, so the former
includes the UNC prefix, etc. This is cross-platform and gets at the
intent of the check.
2024-01-25 12:27:49 -05:00
Charlie Marsh
f4939e50a6
Remove UNC prefixes on Windows (#1086)
## Summary

This PR adds a `NormalizedDisplay` trait that we can use for user-facing
paths, to strip the UNC prefix on Windows.

On other platforms, the implementation is a no-op (vs. `Display`).

I audited all usages of `.display()`, and changed any that were
user-facing, either via `println!` or `eprintln!`, or by way of being
included in error messages. I did _not_ change uses that were only in
tests or only went to tracing.

Closes https://github.com/astral-sh/puffin/issues/1084.
2024-01-25 11:44:22 -05:00
konsti
035cd81ac8
Fix venv PATH on windows (#1095)
Windows uses `;` instead of `:` to separate `PATH` entries. This pull
request switches from manually using `:` to the `std::env` functions.
This fixes

```
puffin pip install -e scripts/editable-installs/maturin_editable
```

on windows.
2024-01-25 15:40:52 +00:00
Charlie Marsh
904db967af
Use junctions instead of symlinks on Windows (#1087)
## Summary

When we unzip wheels in the cache, we write the directories out to an
`archive-v0` bucket, and then symlink into that bucket from the
`wheels-v0` and `built-wheels-v0` buckets.

On Windows, symlinks are not well supported. Specifically, they need to
be explicitly enabled by the user. So, instead of symlinks, we now use
junctions, which are well-supported on Windows, and allow you to
(effectively) symlink a directory to another directory. This PR
implements said junction support, which gets the core installer working
on Windows.

In the past, we also used symlinks to implement another primitive: we
wanted to be able to replace a directory "atomically" (I put
"atomically" in quotes because I don't know if it's actually a
guaranteed atomic operation), in case someone was trying to use the
directory while we were replacing it (as opposed to deleting the
directory, then moving it into place).

On Windows, it doesn't appear to be possible to atomically replace a
junction. So instead, I'm using a new design, whereby the cache always
returns canonicalized paths. We know these canonicalized paths are
unique and won't be replaced, so they're safe for writers to rely on. In
general, when we write new data to the cache, we now return the
canonicalized path. When we read from the cache, and try to identify
(e.g.) the set of wheels available to us, we canonicalize the links
immediately and consider them non-existent if that operation fails.

Closes #1085.

---------

Co-authored-by: konstin <konstin@mailbox.org>
2024-01-25 10:06:38 +01:00
Charlie Marsh
036b7e5f43
Use parse_headers rather than parsing body (#1090)
Looking at the internals, this should make almost no difference in
performance, but anyway...
2024-01-25 09:41:21 +01:00
Zanie Blue
ed1ac640b9
Consolidate UnusableDependencies into a generic Unavailable incompatibility (#1088)
Requires https://github.com/zanieb/pubgrub/pull/20

In short, `UnusableDependencies` can be generalized into `Unavailable`
which encompasses incompatibilities where a package range which is
unusable for some inherent reason as well as when its dependencies are
unusable. We can eventually use this to track more incompatibilities in
the solver. I made the reason string required because I can't see a case
where we should leave it out.

Additionally, this improves the display of conflicts in the root
requirements.
2024-01-24 22:10:44 -06:00
Zanie Blue
091f8e09ff
Use a cache directory for venv tests (#1089) 2024-01-24 22:09:37 -06:00
konsti
ed6a1606b9
Use which::which instead of which::which_global (#1083)
`which::which_global` does not resolve relative paths, which we want to
support, while `which::which` does.
2024-01-24 18:35:57 -06:00
Charlie Marsh
cedd2e0b3f
Use a buffered reader for wheel metadata (#1082)
## Summary

It turns out this is significantly faster when reading (e.g.) _just_ the
`METADATA` file from a zipped wheel.

I audited other `File::open` usages, and everything else seems to be
using a buffered reader already (directly, or in whatever third-party
crate it's passed to) _or_ is read immediately in full.

See the criterion benchmark:

```
file_reader/numpy-1.26.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
                        time:   [6.9618 ms 6.9664 ms 6.9713 ms]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
file_reader/flask-3.0.1-py3-none-any.whl
                        time:   [237.50 µs 238.25 µs 239.13 µs]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

buffered_reader/numpy-1.26.3-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
                        time:   [648.92 µs 653.85 µs 660.09 µs]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
buffered_reader/flask-3.0.1-py3-none-any.whl
                        time:   [39.578 µs 39.712 µs 39.869 µs]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
```
2024-01-24 15:22:55 -05:00
Zanie Blue
0019fe71f6
Add warning when target version does not match build version (#1072)
Follow-up to https://github.com/astral-sh/puffin/pull/1040 adding a
user-facing warning when we cannot build with their requested version.

e.g.

```
❯ cargo run -- pip compile requirements.in --python-version 3.11.4 --no-build
Resolved 8 packages in 483ms
❯ cargo run -- pip compile requirements.in --python-version 3.11.4
warning: The requested Python version 3.11.4 is not available; 3.11.7 will be used to build dependencies instead.
Resolved 8 packages in 71ms
❯ cargo run -- pip compile requirements.in --python-version 3.11
Resolved 8 packages in 71ms
```
2024-01-24 13:42:19 -06:00
Charlie Marsh
738e8341e2
Use a consistent Timestamp struct (#1081)
## Summary

This PR uses `ctime` consistently on Unix as a more conservative
approach to change detection. It also ensures that our timestamp
abstraction is entirely internal, so we can change the representation
and logic easily across the codebase in the future.
2024-01-24 14:21:31 -05:00
Zanie Blue
bdfabfb088
Fixup doc for find_best (#1079) 2024-01-24 12:55:01 -06:00
konsti
2e0ce70d13
Initial windows support (#940)
## Summary

First batch of changes for windows support. Notable changes:

* Fixes all compile errors and added windows specific paths.
* Working venv creation on windows, both from a base interpreter and
from a venv. This requires querying `stdlib` from the sysconfig paths to
find the launcher.
* Basic url/path conversion handling for windows.
* `if cfg!(...)` instead of `#[cfg()]`. This should make it easier to
keep everything compiling across platforms.

## Outlook

Test summary: 402 tests run: 299 passed (15 slow), 103 failed, 1 skipped

There are various reason for the remaining test failure:
* Windows-specific colorama and tzdata dependencies that change the
snapshot slightly. This is by far the biggest batch.
* Some url-path handling issues. I fixed some in the PR, some remain.
* Lack of the latest python patch versions for older pythons on my
machine, since there are no builds for windows and we need to register
them in the registry for them to be picked up for `py --list-paths` (CC
@zanieb RE #1070).
* Lack of entrypoint launchers.
* ... likely more
2024-01-24 18:27:49 +01:00
Zanie Blue
ea4ab29bad
Prefer target Python version over current version for builds (#1040)
Extends #1029 
Closes https://github.com/astral-sh/puffin/issues/1038

Instead of always using the current Python version for builds when a
target version is provided, we will do our best to use a compatible
Python version for builds.

Removes behavior where Python versions without patch versions were
always assumed to be the latest known patch version (previously
discussed in https://github.com/astral-sh/puffin/pull/534). While this
was convenient for resolutions which include packages which require
minimum patch versions e.g. `requires-python=">=3.7.4"`, it conflicts
with the idea that the target Python version you provide is the
_minimum_ compatible version. Additionally, it complicates interpreter
lookup as we cannot tell if the user has asked for that specific patch
version or not.
2024-01-24 11:12:02 -06:00
Charlie Marsh
0519375bd6
Remove some unused dependencies (#1077) 2024-01-24 11:58:21 -05:00
Charlie Marsh
afb571643f
Avoid unzipping local wheels when fresh (#1076)
Since the archive is a single file in this case, we can rely on the
modification timestamp to check for freshness.
2024-01-24 15:01:16 +00:00
konsti
411613a24e
No python prefix in packse scenarios (#1066)
In windows, `python3.9` and `python3.11` are not in `PATH`. Instead, we
should pass only the python version to `puffin venv -p` in packse
scenarios (#1039).
2024-01-24 11:22:48 +00:00
Charlie Marsh
63f3434b21
Use nanoid instead of uuid (#1074)
## Summary

Gives us equivalent randomness with ~half as many characters.
2024-01-24 05:05:14 +00:00
Andrew Gallant
eebc2f340a
make some things guaranteed to be deterministic (#1065)
This PR replaces a few uses of hash maps/sets with btree maps/sets and
index maps/sets. This has the benefit of guaranteeing a deterministic
order of iteration.

I made these changes as part of looking into a flaky test.
Unfortunately, I'm not optimistic that anything here will actually fix
the flaky test, since I don't believe anything was actually dependent
on the order of iteration.
2024-01-23 20:30:33 -05:00
Charlie Marsh
1b3a3f4e80
Add --refresh behavior to the cache (#1057)
## Summary

This PR is an alternative approach to #949 which should be much safer.
As in #949, we add a `Refresh` policy to the cache. However, instead of
deleting entries from the cache the first time we read them, we now
check if the entry is sufficiently new (created after the start of the
command) if the refresh policy applies. If the entry is stale, then we
avoid reading it and continue onward, relying on the cache to
appropriately overwrite based on "new" data. (This relies on the
preceding PRs, which ensure the cache is append-only, and ensure that we
can atomically overwrite.)

Unfortunately, there are just a lot of paths through the cache, and
didn't data is handled with different policies, so I really had to go
through and consider the "right" behavior for each case. For example,
the HTTP requests can use `max-age=0, must-revalidate`. But for the
routes that are based on filesystem modification, we need to do
something slightly different.

Closes #945.
2024-01-23 18:30:26 -05:00
Charlie Marsh
cf8b452414
Track HTTP caches for URL wheels (#1071)
## Summary

This PR ensures that we store HTTP caching information for wheels.
Previously, we only stored these for source distributions. This will be
helpful for refresh, since we can avoid re-downloading wheels that are
unchanged per HTTP caching semantics.

There should be zero performance hit here for warm installs, and only an
extremely small hit for cold installs (writing the HTTP cache data to
disk). The hyperfine benchmarks reflect this.
2024-01-23 17:31:42 -05:00
Charlie Marsh
09f5884f28
Avoid revalidating immutable HTTP responses (#1069)
## Summary

If you send a revalidation request to a resource that returns an
`immutable` directive, the server apparently returns a 200 instead of a
304? In other words, the server can ignore the revalidation request.
This PR adds handling on top of the HTTP cache semantics to respect
immutable resources, which is especially useful since all PyPI files are
immutable.
2024-01-23 16:22:21 -05:00
Charlie Marsh
5621c414cf
Use symlinks for directories entries in cache (#1037)
## Summary

One problem we have in the cache today is that we can't overwrite
entries atomically, because we store unzipped _directories_ in the cache
(which makes installation _much_ faster than storing zipped
directories). So, if you ignore the existing contents of the cache when
writing, you might run into an error, because you might attempt to write
a directory where a directory already exists.

This is especially annoying for cache refresh, because in order to
refresh the cache, we have to purge it (i.e., delete a bunch of stuff),
which is also highly unsafe if Puffin is running across multiple threads
or multiple processes.

The solution I'm proposing here is that whenever we persist a
_directory_ to the cache, we persist it to a special "archive" bucket.
Then, within the other buckets, directory entries are actually symlinks
into that "archive" bucket. With symlinks, we can atomically replace,
which means we can easily overwrite cache entries without having to
delete from the cache.

The main downside is that we'll now accumulate dangling entries in the
"archive" bucket, and so we'll need to implement some form of garbage
collection to ensure that we remove entries with no symlinks. Another
downside is that cache reads and writes will be a bit slower, since we
need to deal with creating and resolving these symlinks.

As an example... after this change, the cache entry for this unzipped
wheel is actually a symlink:

![Screenshot 2024-01-22 at 11 56
18 AM](99ff6940-5096-4246-8d16-2a7bdcdd8d4b)

Then, within the archive directory, we actually have two unique entries
(since I intentionally ran the command twice to ensure overwrites were
safe):

![Screenshot 2024-01-22 at 11 56
22 AM](717d04e2-25d9-4225-b190-bad1441868c6)
2024-01-23 19:52:37 +00:00
Charlie Marsh
556080225d
Use ctime for interpreter timestamps (#1067)
Per https://apenwarr.ca/log/20181113, `ctime` should be a lot more
conservative, and should detect things like the issue we see with the
python-build-standalone builds, where the `mtime` is identical across
builds.

On Windows, I'm just using `last_write_time`. But we should probably add
`volume_serial_number` and other attributes via
[`winapi_util`](https://docs.rs/winapi-util/latest/winapi_util/index.html).
2024-01-23 19:52:20 +00:00
Charlie Marsh
6561617c56
Store source distribution builds under a unique manifest ID (#1051)
## Summary

This is a refactor of the source distribution cache that again aims to
make the cache purely additive. Instead of deleting all built wheels
when the cache gets invalidated (e.g., because the source distribution
changed on PyPI or something), we now treat each invalidation as its own
cache directory. The manifest inside of the source distribution
directory now becomes a pointer to the "latest" version of the source
distribution cache.

Here's a visual example:

![Screenshot 2024-01-22 at 5 35
41 PM](ca103c83-e116-4956-b91c-8434fe62cffe)

With this change, we avoid deleting built distributions that might be
relied on elsewhere and maintain our invariant that the cache is purely
additive. The cost is that we now preserve stale wheels, but we should
add a garbage collection mechanism to deal with that.
2024-01-23 19:49:11 +00:00
Charlie Marsh
e32027e384
Avoid persisting manifest data in standalone file (#1044)
## Summary

This PR gets rid of the manifest that we store for source distributions.
Historically, that manifest included the source distribution metadata,
plus a list of built wheels.

The problem with the manifest is that it duplicates state, since we now
have to look at both the manifest and the filesystem to understand the
cache state. Instead, I think we should treat the cache as the source of
truth, and get rid of the duplicated state in the manifest.

Now, we store the manifest (which is merely used to check for cache
freshness -- in future PRs, I will repurpose it though, so I left it
around), then the distribution metadata as its own file, then any
distributions in the same directory. When we want to see if there are
any valid distributions, we `readdir` on the directory. This is also
much more consistent with how the install plan works.
2024-01-23 19:46:48 +00:00