Latest Results
bench: declarative CodSpeed gate comments and localized imports
Make the `#[cfg(not(codspeed))]` exclusions self-documenting and durable:
- Every gate now carries a comment stating exactly why the benchmark (or its
helper/import) is excluded, focused on the root cause (output-buffer
allocation + glibc `memcpy`/`memmove` ifunc variance and SIMD code-layout
sensitivity across runner images) rather than per-PR flake counts, which go
stale as the codebase moves. Each links to the analysis in PR #8519.
- Drop time-relative phrasing (e.g. "last N PRs", "fired in M PRs") so the
comments stay accurate over time.
- Move `Canonical`/`IntoArray` in alp_compress.rs into `decompress_rd`, the only
gated consumer, removing two per-import cfg attributes. In chunk_array_builder.rs
the gated imports serve several gated benches, so they stay at module scope with
a clear note; the two gated helper fns now explain their gating too.
Verified: `cargo build`, `RUSTFLAGS="--cfg codspeed" cargo build`, and
`cargo +nightly fmt --check` are clean for vortex-alp, vortex-array, and
vortex-fastlanes bench targets; `cargo clippy -p vortex-alp --benches` is clean.
Signed-off-by: Claude <connor@spiraldb.com>
Claude-Session: https://claude.ai/code/session_01HGegegFqsRfGJuA8X9aHT7claude/sharp-planck-i4ifv8 bench: gate flaky take_10k bitpacking benches from CodSpeed simulation
`take_10k_random`, `take_10k_contiguous`, `patched_take_10k_random`, and
`patched_take_10k_contiguous_patches` gather 10k elements and canonicalize the
result, so their CodSpeed-simulation instruction count is bimodal (~195 us vs
~255 us, +-23% for unchanged code) from output-buffer allocation + `memcpy` and
the SIMD bit-unpack's code-layout sensitivity across runner images. They fired
in 4+ recent unrelated PRs and on this PR itself, so they are gated with
`#[cfg(not(codspeed))]` and remain available via local `cargo bench`. The other
take variants in this file have not exhibited the instability and are kept.
Verified: `cargo codspeed build` is clean, the four benches are excluded from
the built suite while the other take variants remain, and the local
`cargo bench` build, `cargo fmt`, and `cargo clippy` pass.
Signed-off-by: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GXdjWYp7AbSKwn2bw6GYsfclaude/sharp-planck-i4ifv8 bench: gate CodSpeed-unstable canonicalization benches from simulation
A small set of microbenchmarks report false-positive regressions in nearly
every PR. Their CodSpeed CPU-simulation instruction count is dominated by
output-buffer allocation and glibc `memcpy`/`memmove` (whose `ifunc`-selected
implementation varies across runner images) rather than by Vortex compute, so
they move bidirectionally by 10-90% for unchanged code and CodSpeed flags
"different runtime environments" on the comparisons. They cannot be stabilized
under simulation, so per `docs/developer-guide/benchmarking.md` they are gated
with `#[cfg(not(codspeed))]` and remain available via local `cargo bench`.
Gated from CodSpeed (kept for local runs):
- alp_compress.rs: `decompress_rd` (decode-to-canonical; moved in 7/9 sampled
PRs, 842-1025 us for identical code). `compress_rd` (encode, compute-bound,
never flaky) is kept.
- chunk_array_builder.rs: `chunked_varbinview_*` (string canonicalization,
memcpy-bound; flaky in 6/9 PRs) and `chunked_bool_canonical_into` (also
below the ~16-35 us noise floor, ~2x swings). The compute-bound
`chunked_opt_bool_*` and `chunked_constant_*` benches are kept.
Verified: both suites build and run under `cargo codspeed` (Simulation mode),
the gated benches are excluded while the kept benches still execute, and the
local `cargo bench` path, `cargo fmt`, and `cargo clippy` are clean.
Signed-off-by: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GXdjWYp7AbSKwn2bw6GYsfclaude/sharp-planck-i4ifv8 Latest Branches
+14%
claude/sharp-planck-i4ifv8 +8%
ngates/duckdb-chunk-exporter -7%
Ā© 2026 CodSpeed Technology