Commits
Click on a commit to change the comparison rangefeat[fastlanes]: add BMI2 PEXT/PDEP transpose and fix GFNI implementations feat[fastlanes]: add scalar_fast and ARM64 NEON transpose implementations perf[fastlanes]: fully unroll BMI2 transpose for 12% performance gain test[fastlanes]: add verification against fastlanes crate transpose feat[fastlanes]: add AVX-512 VBMI transpose with 7.5x speedup feat[fastlanes]: add dual-block VBMI transpose for 10% more throughput feat[fastlanes]: add 4-block VBMI transpose for 7% additional speedup