Avatar for the DiamonDinoia user
DiamonDinoia
poet
BlogDocsChangelog

Performance History

Latest Results

docs(readme): align "Full source" text baseline with CE badge Wraps the source link in <sub> and joins it with the Compiler Explorer badge on a single line so the text and badge image render on the same visual baseline instead of stacking.
main
8 days ago
refactor: prune optimizations falsified by codegen audit Removes optimizations that produce no measurable codegen change across the audited compiler matrix (g++-12/13/14/15, clang++-18/19/20/21/22) on Zen4. Dropped (byte-identical asm in all 36 baseline cells, or dead under C++20): - POET_PUSH_OPTIMIZE / POET_POP_OPTIMIZE block in macros.hpp and all use sites in for_utils.hpp, dynamic_for.hpp, dispatch.hpp. The IRA-pressure / vector- width pragmas leave no trace in regions exercised by dispatch and dynamic_for under -O3 -march=native on every supported GCC. - POET_ASSUME(flat_idx < table_size) at dispatch.hpp:664. The compiler already infers the bound from the dispatch_npos check above; the hint contributes zero. - DeBruijn CTZ fallback in macros.hpp. Dead code: the C++20 std::countr_zero branch reaches every supported toolchain. Replaced with #error so a future exotic compiler trips loudly instead of silently using a slow lookup. - POET_LIKELY hints at the hot-path use sites where the matrix shows them to be noise (dispatch.hpp:222, 257, 331, 645) and the matching POET_UNLIKELY at dynamic_for.hpp:154. Cold/error-path POET_LIKELY/UNLIKELY uses are kept (dispatch.hpp:298, 663; dynamic_for.hpp:139, 187, 190, 202, 221, 234) — those annotate genuinely rare branches. Kept (proven load-bearing by the audit, with new comments citing evidence): - Power-of-2 stride fast-path in dynamic_for.hpp:159-164. Removing it shrinks GCC asm by ~9 instructions but replaces tzcntq+shrxq (cpi 0.4+0.5) with divq (cpi 7.0). simdref-show + llvm-mca-22 -mcpu=znver4 agree on an ~18x block-RThroughput penalty if removed. - POET_NOINLINE_FLATTEN on tail_binary{,_ct}_noinline. Forcing inline grows dynamic_for region asm by 13-21% on every audited compiler. - Strided sparse div/mod path in dispatch.hpp. Forcing lower_bound costs +12 to +18 insns on every audited compiler for sparse dispatches. Verified: 444/444 tests pass on gcc-15 and clang-21 (release). Bench delta (build-bench, znver4): dynamic_for ±0.7% (noise floor); dispatch hit-path -2 to -15% (faster); dispatch miss-path ±2-3ns absolute, mixed sign (noise on sub-5ns operations, consistent with audit prediction that the removed POET_LIKELY hints are below modern-predictor signal).
main
9 days ago

Latest Branches

No pull requests foundAs pull requests are created, their performance will appear here.
© 2026 CodSpeed Technology
Home Terms Privacy Docs