art049
image-rs
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
No performance history available yet
Once you have some commits, you will be able to see the performance history of your primary branch.
Latest Results
Local run
by
art049
3 days ago
Local run
by
art049
3 days ago
Use platform-aware fast rounding for FloatNearest::to_u8/to_u16 Introduce fast_round_f32 that delegates to hardware rounding (roundss/frintn) on SSE 4.1 and aarch64, and falls back to the mantissa snapping trick ((x + 2^23) - 2^23) elsewhere to avoid the costly libm roundf call.
perf/faster-blur
3 days ago
Use platform-aware fast rounding for FloatNearest::to_u8/to_u16 Introduce fast_round_f32 that delegates to hardware rounding (roundss/frintn) on SSE 4.1 and aarch64, and falls back to the mantissa snapping trick ((x + 2^23) - 2^23) elsewhere to avoid the costly libm roundf call.
perf/faster-blur
3 days ago
Replace integer division with reciprocal multiplication in u8 blur Precompute ceil(2^32 / kernel_size) once per pass, then use a multiply-shift to normalize accumulators instead of integer division.
perf/faster-blur
3 days ago
Local run
by
art049
3 days ago
Local run
by
art049
3 days ago
Use u32 integer accumulators for u8 fast blur The box blur hot path used f32 accumulators for all pixel types. For u8 images (the dominant case), this meant every pixel went through to_f32/from_f32 conversions and software roundf. Replace with u32 integer arithmetic for u8 pixels, dispatched at compile time by pixel size and channel count (1-4). ~1.64x wall-clock speedup on all fast blur benchmarks. wip
perf/faster-blur
3 days ago
Active Branches
Make fast and gaussian blur faster
last run
3 days ago
#2
CodSpeed Performance Gauge
×5.6
Add CodSpeed continuous performance benchmarking
last run
4 days ago
#1
CodSpeed Performance Gauge
N/A
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs