Latest Results
Split uv integration tests into command-area harnesses (#19770)
## Summary
This PR deliberately relaxes the single integration-test harness
introduced in [#8093](https://github.com/astral-sh/uv/pull/8093). In
2024, `uv` had 41 top-level integration-test targets and a nextest run
saw 91 binaries across the workspace. Shared test support was compiled
into many of those targets, and the original benchmark improved from
roughly 35 seconds to 9-14 seconds after consolidation.
Since then, shared test infrastructure has moved into the separately
compiled `uv-test` crate in
[#17551](https://github.com/astral-sh/uv/pull/17551), while the suite
has grown to 62 integration-test source modules and 2,958 tests. The
remaining single executable has therefore become a large link unit:
changing one localized test relinks the entire integration suite.
This PR groups those 62 modules into 12 coarse command-area harnesses.
It does not restore one harness per source file, and it keeps the tests
in the existing `uv` package with standard Cargo binary discovery. Each
harness owns a top-level directory containing its `main.rs` and test
modules, making membership visible without `#[path]` indirection. The
groups are coarse compilation boundaries rather than a strict product
taxonomy. Shared PyPI proxy support moves into `uv-test` so it can be
reused by the independent harnesses. The reorganization preserves the
exact 2,958-test inventory and suite assignments used by the benchmarks
below.
To compare the tradeoff directly, I derived one-, 12-, and 62-harness
layouts from the same commit. Each edit used a new compile-affecting
marker, with five interleaved warm rounds for the no-op and edit cases
and three rounds after cleaning the `uv` package. Total time includes
both `cargo test --package uv --tests --no-run --locked` and subsequent
nextest test discovery.
| Layout | Harnesses | No-op | Localized test edit | Shared `uv-test`
edit | `uv` package rebuild | Integration executable size |
| -------- | --------: | ----: | ------------------: |
--------------------: | -------------------: |
--------------------------: |
| Single | 1 | 1.10s | 7.52s | 10.20s | 35.18s | 96.5 MiB |
| This PR | 12 | 1.22s | 2.60s | 7.60s | 33.30s | 591.2 MiB |
| Per-file | 62 | 2.01s | 2.58s | 17.51s | 42.95s | 1,700.9 MiB |
The 12-harness layout captures effectively all of the per-file benefit
for localized edits: 2.60 seconds versus 2.58 seconds. It avoids the
per-file penalties for shared edits, package rebuilds, executable
discovery, and disk usage. On this machine, it also outperformed the
single harness for shared edits and package rebuilds because the giant
single link unit outweighed the additional parallel link targets. All
three layouts discovered the same 2,958 tests.
CI timing is effectively unchanged. Compared with the median of three
adjacent `main` runs, two complete warm PR attempts produced:
| Job | `main` median | PR attempt 1 | PR attempt 2 | Warm PR median |
| ------------------ | ------------: | -----------: | -----------: |
-------------: |
| Linux | 4:23 | 4:29 | 4:44 | 4:37 |
| macOS | 9:40 | 9:35 | 7:54 | 8:45 |
| Windows 1 of 3 | 4:00 | 4:09 | 3:54 | 4:02 |
| Windows 2 of 3 | 3:45 | 4:22 | 4:36 | 4:29 |
| Windows 3 of 3 | 3:51 | 3:54 | 4:25 | 4:10 |
| Runner-minutes sum | 25:53 | 26:29 | 25:33 | 26:01 |
| Slowest test job | 9:40 | 9:35 | 7:54 | 8:45 |
The warm PR median changes aggregate runner time by +0:08 (+0.5%) and
reduces the observed slowest test job by 0:55. The `main` runs and first
PR attempt used partial Rust cache hits, while the rerun used exact
cache hits, so the per-platform differences should be treated as runner
and cache variance rather than a reliable clean-CI speedup.
This is the narrower alternative discussed in #19603. Add parallel discovery of Python versions for `uv python list` (#18684)
Currently, Python discovery happens sequentially — which is good most of
the time because uv will lazily exit once it finds a satisfactory
version. However, in `uv python list`, uv needs to query all
interpreters on the system which makes sequential discovery quite slow.
Here, we add parallelization to discovery. Annoyingly, it means a fair
amount of code repetition, but I've done my best to minimize that.
In the long-term, we may want to consolidate these code paths such that
uv can do prefetching during normal discovery, since technically if the
interpreter we want is far down the PATH then laziness does not save us
anything.
```
Benchmark 1: main
Time (mean ± σ): 3.108 s ± 0.026 s [User: 2.290 s, System: 0.386 s]
Range (min … max): 3.072 s … 3.157 s 8 runs
Benchmark 2: branch
Time (mean ± σ): 414.1 ms ± 11.1 ms [User: 2955.2 ms, System: 743.7 ms]
Range (min … max): 398.1 ms … 429.5 ms 8 runs
Summary
branch ran
7.51 ± 0.21 times faster than main
``` Latest Branches
0%
zb/python-require-infer-registry +1%
zaniebot:claude/lock-build-dependencies-7KQVH 0%
charlie/split-uv-integration-harnesses © 2026 CodSpeed Technology