vortex-data
vortex
BlogDocsChangelog

feat[fastlanes]: add optimized 1024-bit transpose implementations

#6135
Comparing
claude/bitpacking-transpose-optimization-tM1U4
(
2cbd439
) with
develop
(
13f120f
)
CodSpeed Performance Gauge
-30%
Improvement
3
Regression
7
Untouched
1252
New
16
Skipped
1290

Benchmarks

2568 total
canonical_into_non_nullable[(10000, 100, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-30%
1.9 ms2.7 ms
into_canonical_non_nullable[(10000, 100, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-29%
1.9 ms2.7 ms
canonical_into_non_nullable[(10000, 100, 0.01)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-28%
2.1 ms3 ms
into_canonical_non_nullable[(10000, 100, 0.01)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-27%
2.2 ms3 ms
canonical_into_non_nullable[(10000, 100, 0.1)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-18%
3.7 ms4.5 ms
into_canonical_non_nullable[(10000, 100, 0.1)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-18%
3.8 ms4.6 ms
into_canonical_nullable[(10000, 100, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
-16%
4.4 ms5.2 ms
u8_FoR[10M]
vortex-cuda/benches/for_cuda.rs::benches::cuda_benchmarks::benchmark_for_cuda::FoR_cuda_u8
CodSpeed Performance Gauge
×13
71.7 µs5.6 µs
canonical_into_nullable[(10000, 100, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
+20%
4.9 ms4.1 ms
canonical_into_nullable[(10000, 10, 0.0)]
encodings/fastlanes/benches/canonicalize_bench.rs
CodSpeed Performance Gauge
+19%
528.5 µs445.6 µs
transpose_baseline_throughput
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A2.5 ms
transpose_baseline
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A10.9 µs
transpose_best_throughput
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A92.8 µs
transpose_best
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A2 µs
transpose_scalar
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A3.4 µs
untranspose_best
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A2.8 µs
transpose_scalar_throughput
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A661 µs
transpose_scalar_fast
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A1.7 µs
untranspose_baseline
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A10.9 µs
transpose_scalar_fast_throughput
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A64.2 µs
transpose_avx2
encodings/fastlanes/benches/transpose_bench.rs::x86_benches
CodSpeed Performance Gauge
N/A
N/A2.8 µs
untranspose_scalar
encodings/fastlanes/benches/transpose_bench.rs
CodSpeed Performance Gauge
N/A
N/A3.2 µs
transpose_bmi2
encodings/fastlanes/benches/transpose_bench.rs::x86_benches
CodSpeed Performance Gauge
N/A
N/A1.8 µs
transpose_avx2_throughput
encodings/fastlanes/benches/transpose_bench.rs::x86_throughput_benches
CodSpeed Performance Gauge
N/A
N/A314.3 µs
transpose_bmi2_throughput
encodings/fastlanes/benches/transpose_bench.rs::x86_throughput_benches
CodSpeed Performance Gauge
N/A
N/A90.1 µs

Commits

Click on a commit to change the comparison range
Base
develop
13f120f
-29.9%
feat[fastlanes]: add BMI2 PEXT/PDEP transpose and fix GFNI implementations
338b2bc
7 hours ago
by claude
0%
feat[fastlanes]: add scalar_fast and ARM64 NEON transpose implementations
f13f085
7 hours ago
by claude
0%
perf[fastlanes]: fully unroll BMI2 transpose for 12% performance gain
c471853
7 hours ago
by claude
-15.84%
test[fastlanes]: add verification against fastlanes crate transpose
7282427
7 hours ago
by claude
+15.84%
feat[fastlanes]: add AVX-512 VBMI transpose with 7.5x speedup
ccf3a19
6 hours ago
by claude
-15.69%
feat[fastlanes]: add dual-block VBMI transpose for 10% more throughput
99299a5
6 hours ago
by claude
+15.69%
feat[fastlanes]: add 4-block VBMI transpose for 7% additional speedup
2cbd439
5 hours ago
by claude
© 2026 CodSpeed Technology
Home Terms Privacy Docs