mirror of
https://github.com/astral-sh/uv.git
synced 2025-08-04 02:48:17 +00:00
![]() ## Summary Previously, we were blocking operations that could run in parallel. We would send request through our main requests channel, but not yield so that the receiver could only start processing requests much later than necessary. We solve this by switching to the async `tokio::sync::mpsc::channel`, where send is an async functions that yields. Due to the increased parallelism cache deserialization and the conversion from simple api request to version map became bottlenecks, so i moved them to `spawn_blocking`. Together these result in a 30-60% speedup for larger warm cache resolution. Small cases such as black already resolve in 5.7 ms on my machine so there's no speedup to be gained, refresh and no cache were to noisy to get signal from. Note for the future: Revisit the bounded channel if we want to produce requests from `process_request`, too, (this would be good for prefetching) to avoid deadlocks. ## Details We can look at the behavior change through the spans: ``` RUST_LOG=puffin=info TRACING_DURATIONS_FILE=target/traces/jupyter-warm-branch.ndjson cargo run --features tracing-durations-export --bin puffin-dev --profile profiling -- resolve jupyter 2> /dev/null ``` Below, you can see how on main, we have discrete phases: All (cached) simple api requests in parallel, then all (cached) metadata requests in parallel, repeat until done. The solver is mostly waiting until it has it's version map from the simple API query to be able to choose a version. The main thread is blocked by process requests. In the PR branch, the simple api requests succeeds much earlier, allowing the solver to advance and also to schedule more prefetching. Due to that `parse_cache` and `from_metadata` became bottlenecks, so i moved them off the main thread (green color, and their spans can now overlap because they can run on multiple threads in parallel). The main thread isn't blocked on `process_request` anymore, instead it has frequent idle times. The spans are all much shorter, which indicates that on main they could have finished much earlier, but a task didn't yield so they weren't scheduled to finish (though i haven't dug deep enough to understand the exact scheduling between the process request stream and the solver here). **main** ![jupyter-warm-main]( |
||
---|---|---|
.. | ||
src | ||
tests | ||
Cargo.toml |