Latest Results
refactor: prune optimizations falsified by codegen audit
Removes optimizations that produce no measurable codegen change across the
audited compiler matrix (g++-12/13/14/15, clang++-18/19/20/21/22) on Zen4.
Dropped (byte-identical asm in all 36 baseline cells, or dead under C++20):
- POET_PUSH_OPTIMIZE / POET_POP_OPTIMIZE block in macros.hpp and all use sites
in for_utils.hpp, dynamic_for.hpp, dispatch.hpp. The IRA-pressure / vector-
width pragmas leave no trace in regions exercised by dispatch and
dynamic_for under -O3 -march=native on every supported GCC.
- POET_ASSUME(flat_idx < table_size) at dispatch.hpp:664. The compiler
already infers the bound from the dispatch_npos check above; the hint
contributes zero.
- DeBruijn CTZ fallback in macros.hpp. Dead code: the C++20 std::countr_zero
branch reaches every supported toolchain. Replaced with #error so a future
exotic compiler trips loudly instead of silently using a slow lookup.
- POET_LIKELY hints at the hot-path use sites where the matrix shows them
to be noise (dispatch.hpp:222, 257, 331, 645) and the matching
POET_UNLIKELY at dynamic_for.hpp:154. Cold/error-path POET_LIKELY/UNLIKELY
uses are kept (dispatch.hpp:298, 663; dynamic_for.hpp:139, 187, 190, 202,
221, 234) — those annotate genuinely rare branches.
Kept (proven load-bearing by the audit, with new comments citing evidence):
- Power-of-2 stride fast-path in dynamic_for.hpp:159-164. Removing it shrinks
GCC asm by ~9 instructions but replaces tzcntq+shrxq (cpi 0.4+0.5) with
divq (cpi 7.0). simdref-show + llvm-mca-22 -mcpu=znver4 agree on an
~18x block-RThroughput penalty if removed.
- POET_NOINLINE_FLATTEN on tail_binary{,_ct}_noinline. Forcing inline grows
dynamic_for region asm by 13-21% on every audited compiler.
- Strided sparse div/mod path in dispatch.hpp. Forcing lower_bound costs
+12 to +18 insns on every audited compiler for sparse dispatches.
Verified: 444/444 tests pass on gcc-15 and clang-21 (release).
Bench delta (build-bench, znver4): dynamic_for ±0.7% (noise floor); dispatch
hit-path -2 to -15% (faster); dispatch miss-path ±2-3ns absolute, mixed sign
(noise on sub-5ns operations, consistent with audit prediction that the
removed POET_LIKELY hints are below modern-predictor signal). Latest Branches
No pull requests foundAs pull requests are created, their performance will appear here. © 2026 CodSpeed Technology