Avatar for the vortex-data user
vortex-data
vortex
BlogDocsChangelog

feat[fastlanes]: add optimized 1024-bit transpose implementations

#6135
Comparing
claude/bitpacking-transpose-optimization-tM1U4
(
17c7783
) with
develop
(
1a6ece1
)
CodSpeed Performance Gauge
-44%
Improvement
2
Regression
9
Untouched
1251
New
16
Skipped
1215
Archived
75

Benchmarks

2568 total
u8_FoR[10M]
vortex-cuda/benches/for_cuda.rs::benches::cuda_benchmarks::benchmark_for_cuda::FoR_cuda_u8
CodSpeed Performance Gauge
-44%
5.7 µs10.2 µs
canonical_into_non_nullable[(10000, 100, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-30%
1.9 ms2.7 ms
into_canonical_non_nullable[(10000, 100, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-29%
1.9 ms2.7 ms
canonical_into_non_nullable[(10000, 100, 0.01)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-28%
2.1 ms3 ms
u16_FoR[10M]
vortex-cuda/benches/for_cuda.rs::benches::cuda_benchmarks::benchmark_for_cuda::FoR_cuda_u16
CodSpeed Performance Gauge
-27%
7.7 µs10.5 µs
into_canonical_non_nullable[(10000, 100, 0.01)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-27%
2.2 ms3 ms
canonical_into_non_nullable[(10000, 100, 0.1)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-18%
3.7 ms4.5 ms
into_canonical_non_nullable[(10000, 100, 0.1)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-18%
3.8 ms4.6 ms
into_canonical_nullable[(10000, 100, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-16%
4.4 ms5.2 ms
canonical_into_nullable[(10000, 100, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
+20%
4.9 ms4.1 ms
canonical_into_nullable[(10000, 10, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
+19%
528.5 µs445.6 µs
transpose_baseline_throughput
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A2.5 ms
transpose_best_throughput
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A92.8 µs
transpose_scalar_throughput
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A661 µs
transpose_scalar
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A3.4 µs
transpose_avx2
encodings/fastlanes/benches/transpose_bench.rs::x86_benches
CodSpeed Performance Gauge
N/A
N/A2.8 µs
transpose_baseline
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A10.9 µs
transpose_best
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A2 µs
untranspose_best
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A2.8 µs
untranspose_scalar
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A3.2 µs
untranspose_baseline
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A10.9 µs
transpose_avx2_throughput
encodings/fastlanes/benches/transpose_bench.rs::x86_throughput_benches
CodSpeed Performance Gauge
N/A
N/A314.3 µs
untranspose_bmi2
encodings/fastlanes/benches/transpose_bench.rs::x86_untranspose_benches
CodSpeed Performance Gauge
N/A
N/A2.7 µs
transpose_scalar_fast_throughput
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A64.2 µs
transpose_bmi2
encodings/fastlanes/benches/transpose_bench.rs::x86_benches
CodSpeed Performance Gauge
N/A
N/A1.8 µs

Commits

Click on a commit to change the comparison range
Base
develop
1a6ece1
-29.9%
feat[fastlanes]: add BMI2 PEXT/PDEP transpose and fix GFNI implementations
338b2bc
4 days ago
by claude
0%
feat[fastlanes]: add scalar_fast and ARM64 NEON transpose implementations
f13f085
4 days ago
by claude
0%
perf[fastlanes]: fully unroll BMI2 transpose for 12% performance gain
c471853
4 days ago
by claude
-15.84%
test[fastlanes]: add verification against fastlanes crate transpose
7282427
4 days ago
by claude
+15.84%
feat[fastlanes]: add AVX-512 VBMI transpose with 7.5x speedup
ccf3a19
4 days ago
by claude
-15.69%
feat[fastlanes]: add dual-block VBMI transpose for 10% more throughput
99299a5
4 days ago
by claude
+15.69%
feat[fastlanes]: add 4-block VBMI transpose for 7% additional speedup
2cbd439
4 days ago
by claude
-14.45%
docs[fastlanes]: add transpose optimization plan and results
17c7783
3 days ago
by claude
© 2026 CodSpeed Technology
Home Terms Privacy Docs