uv/crates/uv-trampoline
samypr100 66699def2e
fix: adjust close_handles pointer offsets to match distlib cleanup_fds (#6955)
## Summary

Resolves issues mentioned in comments
* https://github.com/astral-sh/uv/issues/6699#issuecomment-2322515962
* https://github.com/astral-sh/uv/issues/6866#issuecomment-2322785906

Further investigation on the comments revealed that the pointer
arithmethic being performed in `let handle_start = unsafe {
crt_magic.offset(1 + handle_count) };` from [posy
trampoline](dda22e6f90/src/trampolines/windows-trampolines/posy-trampoline/src/bounce.rs (L146))
had some slight errors. Since `crt_magic` was a `*const u32`, doing an
offset by `1 + handle_count` would offset by too much, with some
possible out of bounds reads or attempts to call CloseHandle on garbage.

We needed to offset differently since we want to offset by
`handle_count` bytes after the initial offset as seen in
[launcher.c](888c48b568/PC/launcher.c (L578)).
Similarly, we needed to skip the first 3 handles, otherwise we'd still
be attempting to close standard I/O handles of the parent (in this case
the shell from `busybox.exe sh -l`).

I also added a few extra checks available from `launcher.c` which checks
if the handle value is `-2` just to match the distlib implementation
more closely and minimize differences.

## Test Plan

Manually compiled distlib's launcher with additional logging and
replaced `Lib/site-packages/pip/_vendor/distlib/t64.exe` with the
compiled one to log pointers. As a result, I was able to verify the
retrieved handle memory addresses in this function actually match in
both uv and distlib's implementation from within busybox.exe nested
shell where this behavior can be observed and manually tested.

I was also able to confirm this fixes the issues mentioned in the
comments, at least with busybox's shell, but I assume this would fix the
case with cmake.

## Open areas

`launcher.c` also [checks the
size](888c48b568/PC/launcher.c (L573-L576))
of `cbReserved2` before retrieving `handle_start` which this function
currently doesn't do. If we wanted to, we could add the additional check
here as well, but I wasn't fully sure why it wasn't added in the first
place. Thoughts?

```rust
// Verify the buffer is large enough
if si.cbReserved2 < (size_of::<u32>() as isize + handle_count + size_of::<HANDLE>() as isize * handle_count) as u16 {
    return;
}
```

---------

Co-authored-by: konstin <konstin@mailbox.org>
2024-09-04 13:31:57 +02:00
..
.cargo feat: fix uv-trampoline renovate issues (#5204) 2024-07-23 10:12:28 +02:00
src fix: adjust close_handles pointer offsets to match distlib cleanup_fds (#6955) 2024-09-04 13:31:57 +02:00
tests feat: more rust in trampoline (#5750) 2024-08-07 08:19:38 +00:00
trampolines fix: adjust close_handles pointer offsets to match distlib cleanup_fds (#6955) 2024-09-04 13:31:57 +02:00
build.rs feat: fix uv-trampoline renovate issues (#5204) 2024-07-23 10:12:28 +02:00
Cargo.lock Remove path-absolutize dependency (#6589) 2024-08-25 12:01:07 +00:00
Cargo.toml feat: more rust in trampoline (#5750) 2024-08-07 08:19:38 +00:00
README.md Use prettier to format the documentation (#5708) 2024-08-02 08:58:31 -05:00
rust-toolchain.toml feat: re-enable std in uv-trampoline (#4722) 2024-07-06 20:38:45 +00:00

Windows trampolines

This is a fork of posy trampolines.

Building

Cross-compiling from Linux

Install cargo xwin. Use your package manager to install LLD and add the rustup targets:

sudo apt install llvm clang lld
rustup target add i686-pc-windows-msvc
rustup target add x86_64-pc-windows-msvc
rustup target add aarch64-pc-windows-msvc

Then, build the trampolines for both supported architectures:

cargo +nightly-2024-06-08 xwin build --xwin-arch x86 --release --target i686-pc-windows-msvc
cargo +nightly-2024-06-08 xwin build --release --target x86_64-pc-windows-msvc
cargo +nightly-2024-06-08 xwin build --release --target aarch64-pc-windows-msvc

Cross-compiling from macOS

Install cargo xwin. Use your package manager to install LLVM and add the rustup targets:

brew install llvm
rustup target add i686-pc-windows-msvc
rustup target add x86_64-pc-windows-msvc
rustup target add aarch64-pc-windows-msvc

Then, build the trampolines for both supported architectures:

cargo +nightly-2024-06-08 xwin build --release --target i686-pc-windows-msvc
cargo +nightly-2024-06-08 xwin build --release --target x86_64-pc-windows-msvc
cargo +nightly-2024-06-08 xwin build --release --target aarch64-pc-windows-msvc

Updating the prebuilt executables

After building the trampolines for both supported architectures:

cp target/aarch64-pc-windows-msvc/release/uv-trampoline-console.exe trampolines/uv-trampoline-aarch64-console.exe
cp target/aarch64-pc-windows-msvc/release/uv-trampoline-gui.exe trampolines/uv-trampoline-aarch64-gui.exe
cp target/x86_64-pc-windows-msvc/release/uv-trampoline-console.exe trampolines/uv-trampoline-x86_64-console.exe
cp target/x86_64-pc-windows-msvc/release/uv-trampoline-gui.exe trampolines/uv-trampoline-x86_64-gui.exe
cp target/i686-pc-windows-msvc/release/uv-trampoline-console.exe trampolines/uv-trampoline-i686-console.exe
cp target/i686-pc-windows-msvc/release/uv-trampoline-gui.exe trampolines/uv-trampoline-i686-gui.exe

Testing the trampolines

To perform a basic smoke test of the trampolines, run the following commands on a Windows machine, from the root of the repository:

cargo clean
cargo run venv
cargo run pip install black
.venv\Scripts\black --version

Background

What is this?

Sometimes you want to run a tool on Windows that's written in Python, like black or mypy or jupyter or whatever. But, Windows does not know how to run Python files! It knows how to run .exe files. So we need to somehow convert our Python file a .exe file.

That's what this does: it's a generic "trampoline" that lets us generate custom .exes for arbitrary Python scripts, and when invoked it bounces to invoking python <the script> instead.

How do you use it?

Basically, this looks up python.exe (for console programs) and invokes python.exe path\to\the\<the .exe>.

The intended use is:

  • take your Python script, name it __main__.py, and pack it into a .zip file. Then concatenate that .zip file onto the end of one of our prebuilt .exes.
  • After the zip file content, write the path to the Python executable that the script uses to run the Python script as UTF-8 encoded string, followed by the path's length as a 32-bit little-endian integer.
  • At the very end, write the magic number UVUV in bytes.
launcher.exe
<zipped python script>
<path to python.exe>
<len(path to python.exe)>
<b'U', b'V', b'U', b'V'>

Then when you run python on the .exe, it will see the .zip trailer at the end of the .exe, and automagically look inside to find and execute __main__.py. Easy-peasy.

Why does this exist?

I probably could have used Vinay's C++ implementation from distlib, but what's the fun in that? In particular, optimizing for binary size was entertaining (these are ~7x smaller than the distlib, which doesn't matter much, but does a little bit, considering that it gets added to every Python script). There are also some minor advantages, like I think the Rust code is easier to understand (multiple files!) and it's convenient to be able to straightforwardly code the Python-finding logic we want. But mostly it was just an interesting challenge.

This does owe a lot to the distlib implementation though. The overall logic is copied more-or-less directly.

Anything I should know for hacking on this?

In order to minimize binary size, this uses, panic="abort", and carefully avoids using core::fmt. This removes a bunch of runtime overhead: by default, Rust "hello world" on Windows is ~150 KB! So these binaries are ~10x smaller.

Of course the tradeoff is that this is an awkward super-limited environment. No C runtime and limited platform APIs... you don't even panicking support by default. To work around this:

  • We use windows to access Win32 APIs directly. Who needs a C runtime? Though uh, this does mean that literally all of our code is unsafe. Sorry!

  • diagnostics.rs uses ufmt and some cute Windows tricks to get a convenient version of eprintln! that works without core::fmt, and automatically prints to either the console if available or pops up a message box if not.

  • All the meat is in bounce.rs.

Miscellaneous tips:

  • cargo-bloat is a useful tool for checking what code is ending up in the final binary and how much space it's taking. (It makes it very obvious whether you've pulled in core::fmt!)

  • Lots of Rust built-in panicking checks will pull in core::fmt, e.g., if you ever use .unwrap() then suddenly our binaries double in size, because the if foo.is_none() { panic!(...) } that's hidden inside .unwrap() will invoke core::fmt, even if the unwrap will actually never fail. .unwrap_unchecked() avoids this. Similar for slice[idx] vs slice.get_unchecked(idx).

How do you build this stupid thing?

Building this can be frustrating, because the low-level compiler/runtime machinery have a bunch of implicit assumptions about the environment they'll run in, and the facilities it provides for things like memcpy, unwinding, etc. So we need to replace the bits that we actually need, and which bits we need can change depending on stuff like optimization options. For example: we use panic="abort", so we don't actually need unwinding support, but at lower optimization levels the compiler might not realize that, and still emit references to the unwinding helper__CxxFrameHandler3. And then the linker blows up because that symbol doesn't exist.

cargo build --release --target i686-pc-windows-msvc
cargo build --release --target x86_64-pc-windows-msvc
cargo build --release --target aarch64-pc-windows-msvc