Ensure we're using atomic writes everywhere in our cache to avoid broken
cache records and error with parallel puffin actions
(https://github.com/astral-sh/puffin/pull/544#issuecomment-1838841581).
All json files that are written to the cache are written atomically and
the build wheels are written to temp dir and then moved atomically. I
didn't touch venv creation though, i don't think that's worth it since
python does not support atomic package installation through its design.
This removes the last usage of cacache by replacing it with a custom,
flat json caching keyed by the digest of the executable path.

A step towards #478. I've made `CachedByTimestamp<T>` generic over `T`
but intentionally not moved it to `puffin-cache` yet.
This is mostly a mechanical refactor that moves 80% of our code to the
same cache abstraction.
It introduces cache `Cache`, which abstracts away the path of the cache
and the temp dir drop and is passed throughout the codebase. To get a
specific cache bucket, you need to requests your `CacheBucket` from
`Cache`. `CacheBucket` is the centralizes the names of all cache
buckets, moving them away from the string constants spread throughout
the crates.
Specifically for working with the `CachedClient`, there is a
`CacheEntry`. I'm not sure yet if that is a strict improvement over
`cache_dir: PathBuf, cache_file: String`, i may have to rotate that
later.
The interpreter cache moved into `interpreter-v0`.
We can use the `CacheBucket` page to document the cache structure in
each bucket:

Preparing for #235, some refactoring to `puffin_interpreter`.
* Added a dedicated error type instead of anyhow
* `InterpreterInfo` -> `Interpreter`
* `detect_virtual_env` now returns an option so it can be chained for
#235
Previously, we were assuming that `which <python>` return the path to
the python executable. This is not true when using pyenv shims, which
are bash scripts. Instead, we have to use `sys.executable`. Luckily,
we're already querying the python interpreter and can do it in that
pass.
We are also not allowed to cache the execution of the python interpreter
through the shim because pyenv might change the target. As a heuristic,
we check whether `sys.executable`, the real binary, is the same our
canonicalized `which` result.
---------
Co-authored-by: Zanie Blue <contact@zanie.dev>
## Summary
This PR just adds the logic in `install-wheel-rs` to write
`direct_url.json`. We're not actually taking advantage of it yet (or
wiring it through) in Puffin.
Part of https://github.com/astral-sh/puffin/issues/332.
This PR makes the cache non-optional in most of Puffin, which simplifies
the code, allows us to reuse the cache within a single command (even
with `--no-cache`), and also allows us to use the cache for disk storage
across an invocation.
I left the cache as optional for the `Virtualenv` and `InterpreterInfo`
abstractions, since those are generic enough that it seems nice to have
a non-cached version, but it's kind of arbitrary.
musl (which we already use in ruff) allows statically linked binaries on
linux. This PR switches to rustls and vendors and fixes the glibc
detection. Using static musl builds makes it easier to avoid glibc
errors in docker and we'll need it later for alpine users anyway.
An alternative is using vendored openssl.
Previously, we had two python interpreter metadata structs, one in
gourgeist and one in puffin. Both would spawn a subprocess to query
overlapping metadata and both would appear in the cli crate, if you
weren't careful you could even have to different base interpreters at
once. This change unifies this to one set of metadata, queried and
cached once.
Another effect of this crate is proper separation of python interpreter
and venv. A base interpreter (such as `/usr/bin/python/`, but also pyenv
and conda installed python) has a set of metadata. A venv has a root and
inherits the base python metadata except for `sys.prefix`, which unlike
`sys.base_prefix`, gets set to the venv root. From the root and the
interpreter info we can compute the paths inside the venv. We can reuse
the interpreter info of the base interpreter when creating a venv
without having to query the newly created `python`.
This is isn't ready, but it can resolve
`meine_stadt_transparent==0.2.14`.
The source distributions are currently being built serially one after
the other, i don't know if that is incidentally due to the resolution
order, because sdist building is blocking or because of something in the
resolver that could be improved.
It's a bit annoying that the thing that was supposed to do http requests
now suddenly also has to a whole download/unpack/resolve/install/build
routine, it messes up the type hierarchy. The much bigger problem though
is avoid recursive crate dependencies, it's the reason for the callback
and for splitting the builder into two crates (badly named atm)
The need for this became clear when working on the source distribution
integration into the resolver.
While at it i also switch the `WheelFilename` version to the parsed
`pep440_rs` version now that we have this crate.
This adds a basic sdist builder that has been tested with two source
distributions, one with a PEP 517 backend and one with setup.py.
It uses pip for requirements installation atm, lacks testing in all
directions, lacks checks for recursive requirements, can't pass in
already resolved versions, doesn't support prepare metadata for build to
allow resolution to continue without doing the actual (native) build,
error messages are mediocre, etc.
```console
$ RUST_LOG=puffin_build=debug puffin-build --wheels wheels downloads/tqdm-4.66.1.tar.gz
2023-10-16T12:28:35.503182Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/tqdm-4.66.1.tar.gz
2023-10-16T12:28:35.521780Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=18.4ms time.idle=16.7µs
2023-10-16T12:28:35.845096Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: Calling pip to install build dependencies
2023-10-16T12:28:37.668660Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:resolve_and_install: puffin_build: close time.busy=1.82s time.idle=13.2µs
2023-10-16T12:28:37.668744Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.get_requires_for_build_wheel()`
2023-10-16T12:28:38.159205Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=490ms time.idle=13.0µs
2023-10-16T12:28:38.159304Z DEBUG build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Calling `setuptools.build_meta.build_wheel()`
2023-10-16T12:28:38.501732Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}:run_python_script{python_interpreter="/tmp/.tmpm4cTra/venv/bin/python"}: puffin_build: close time.busy=342ms time.idle=15.2µs
2023-10-16T12:28:38.522700Z INFO build_sdist{path="downloads/tqdm-4.66.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=3.02s time.idle=16.2µs
Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/tqdm-4.66.1-py3-none-any.whl
2023-10-16T12:28:38.522772Z DEBUG puffin_build: Took 3020ms
$ puffin-build --wheels wheels downloads/geoextract-0.3.1.tar.gz
2023-10-16T12:28:40.884622Z DEBUG build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: Building downloads/geoextract-0.3.1.tar.gz
2023-10-16T12:28:40.887743Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}:extract_archive: puffin_build: close time.busy=2.97ms time.idle=12.6µs
2023-10-16T12:28:41.469738Z INFO build_sdist{path="downloads/geoextract-0.3.1.tar.gz" base_python="/usr/bin/python3"}: puffin_build: close time.busy=585ms time.idle=15.3µs
Wheel built to /home/konsti/projects/puffin/crates/puffin-build/wheels/geoextract-0.3.1-py3-none-any.whl
2023-10-16T12:28:41.469814Z DEBUG puffin_build: Took 585ms
```
I think this isn't necessary to support in this generic crate. If we
choose to adopt Monotrail-style concepts, we'll likely need to rework
them anyway.
This PR gets `gourgeist` passing our local CI and integrated into the
broader workspace.
There's some duplicate between concepts in `gourgeist` (like the
`InterpreterInfo`) and structs we have elsewhere, but we can tackle
those later.
This PR copies over the `gourgeist` crate at commit
`e64c17a263dac6933702dc8d155425c053fe885a` with no modifications.
It won't pass CI, but modifications will intentionally be confined to
later PRs.