DiamonDinoia/poet - CodSpeed

poet

Blog Docs Changelog

Performance History

Latest Results

docs(readme): align "Full source" text baseline with CE badge Wraps the source link in <sub> and joins it with the Compiler Explorer badge on a single line so the text and badge image render on the same visual baseline instead of stacking.

main

8 days ago

fix(benchmarks): make index-only bench compile under poet/poet.hpp Two issues caught by Benchmarks/CodSpeed CI: - poet/poet.hpp ends with undef_macros.hpp, so POET_FORCEINLINE, POET_UNLIKELY, POET_ALWAYS_INLINE_LAMBDA were undefined in this TU. Re-include poet/core/macros.hpp after the umbrella. - blocked_for took Callable& and rejected the temporary lambda passed by run_blocked_for. Switch to forwarding reference.

main

8 days ago

feat(examples): runnable Compiler Explorer links + microbench - 8 self-contained examples (3 new: polynomial, dot_product, benchmark) - tools/make_godbolt_links.py generates short godbolt URLs via CE's URL-include feature; idempotent regeneration; HTML redirect emitter - godbolt-links workflow publishes shortlinks + redirect pages to an orphan godbolt-links branch served via GitHub Pages - README/docs link to stable Pages URLs, so main never churns - ci.yml builds example targets

main

8 days ago

fix: drop unused dimensions/table_size locals in dispatch_nd MSVC /W4 + warnings-as-errors flagged C4189 (unused local) on the table_size constexpr in dispatch_nd, breaking the windows-2022 msvc Debug job. The values were leftover from earlier debug scaffolding and never read.

main

9 days ago

refactor: prune optimizations falsified by codegen audit Removes optimizations that produce no measurable codegen change across the audited compiler matrix (g++-12/13/14/15, clang++-18/19/20/21/22) on Zen4. Dropped (byte-identical asm in all 36 baseline cells, or dead under C++20): - POET_PUSH_OPTIMIZE / POET_POP_OPTIMIZE block in macros.hpp and all use sites in for_utils.hpp, dynamic_for.hpp, dispatch.hpp. The IRA-pressure / vector- width pragmas leave no trace in regions exercised by dispatch and dynamic_for under -O3 -march=native on every supported GCC. - POET_ASSUME(flat_idx < table_size) at dispatch.hpp:664. The compiler already infers the bound from the dispatch_npos check above; the hint contributes zero. - DeBruijn CTZ fallback in macros.hpp. Dead code: the C++20 std::countr_zero branch reaches every supported toolchain. Replaced with #error so a future exotic compiler trips loudly instead of silently using a slow lookup. - POET_LIKELY hints at the hot-path use sites where the matrix shows them to be noise (dispatch.hpp:222, 257, 331, 645) and the matching POET_UNLIKELY at dynamic_for.hpp:154. Cold/error-path POET_LIKELY/UNLIKELY uses are kept (dispatch.hpp:298, 663; dynamic_for.hpp:139, 187, 190, 202, 221, 234) — those annotate genuinely rare branches. Kept (proven load-bearing by the audit, with new comments citing evidence): - Power-of-2 stride fast-path in dynamic_for.hpp:159-164. Removing it shrinks GCC asm by ~9 instructions but replaces tzcntq+shrxq (cpi 0.4+0.5) with divq (cpi 7.0). simdref-show + llvm-mca-22 -mcpu=znver4 agree on an ~18x block-RThroughput penalty if removed. - POET_NOINLINE_FLATTEN on tail_binary{,_ct}_noinline. Forcing inline grows dynamic_for region asm by 13-21% on every audited compiler. - Strided sparse div/mod path in dispatch.hpp. Forcing lower_bound costs +12 to +18 insns on every audited compiler for sparse dispatches. Verified: 444/444 tests pass on gcc-15 and clang-21 (release). Bench delta (build-bench, znver4): dynamic_for ±0.7% (noise floor); dispatch hit-path -2 to -15% (faster); dispatch miss-path ±2-3ns absolute, mixed sign (noise on sub-5ns operations, consistent with audit prediction that the removed POET_LIKELY hints are below modern-predictor signal).

main

9 days ago

docs: add inline comments on gnarly template machinery Explain the non-obvious WHY for the dispatch and dynamic_for template internals: short-circuit folds, unsigned-subtraction bound tricks, sorted/strided/sparse lookup strategies, flat-to-per-dim decoding, stateless vs stateful thunks, leading-param split, and the tail_binary log2(N) cascade.

main

12 days ago

refactor: rename public dispatch API to snake_case + add version wiring Rename awkward public names before the 0.0.0 cut: - make_range -> inclusive_range - DispatchParam -> dispatch_param - DispatchSet -> dispatch_set - T<Vs...> -> tuple_<Vs...> - throw_t -> throw_on_no_match (tag type throw_on_no_match_t kept) Add VERSION + cmake/GenerateVersion.cmake + include/poet/version.hpp(.in) to derive POET_VERSION_FULL (MAJOR.MINOR.PATCH[-dev.N]) from the tracked VERSION file plus git commit count. The script is driven by both the build (include(...) during configure) and a local pre-commit hook (cmake -DCHECK=ON -P). Drop +dirty: the index is always non-empty during pre-commit, which would make the suffix sticky; rev-list --count HEAD only sees committed history, so retries of the same commit are stable and the count only advances when a new commit lands. Docs narrative scrubbed of cpu_info file name; refers to symbols (poet::available_registers(), poet::cache_line()) and the "CPU detection" grouping instead.

main

12 days ago

docs: restore README benchmark charts

main

2 months ago

Latest Branches

No pull requests foundAs pull requests are created, their performance will appear here.

© 2026 CodSpeed Technology

Home Terms Privacy Docs