Avatar for the OpenMathLib user
OpenMathLib
OpenBLAS
BlogDocsChangelog

Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for A64FX

#5353Merged
Comparing
nakagawa-fj:feature/gemm_divide_rate_for_A64FX
(
5253c8f
) with
develop
(
8f0a1a3
)
CodSpeed Performance Gauge
0%
Untouched
62

Benchmarks

62 total
test_dgbmv[1-100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
36.9 µs36.8 µs
test_gesv[100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
395.1 µs394.1 µs
test_syrk[100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
340.8 µs340.1 µs
test_nrm2[1000-dz]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
35.1 µs35 µs
test_syrk[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
213.2 µs213 µs
test_syrk[100-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
473.3 µs473.1 µs
test_dgemv[1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
7 ms7 ms
test_gesdd[mn1-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
93.8 ms93.8 ms
test_gesv[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
93.3 ms93.3 ms
test_gesv[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
257.3 µs257.2 µs
test_gemm[100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
470.9 µs470.8 µs
test_dgbmv[1-1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
99.1 µs99.1 µs
test_syev[200-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
49.1 ms49.1 ms
test_dgemv[100-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
148.6 µs148.6 µs
test_gemm[1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
117.4 ms117.4 ms
test_syrk[1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
65.4 ms65.4 ms
test_syrk[1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
227.5 ms227.5 ms
test_syrk[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
130.4 ms130.4 ms
test_gesv[1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
188.6 ms188.6 ms
test_syrk[1000-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
476.4 ms476.4 ms
test_gemm[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
239.4 ms239.4 ms
test_dgemv[1000-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
26.3 ms26.3 ms
test_gesv[1000-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
353.6 ms353.6 ms
test_gesdd[mn1-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
65.2 ms65.2 ms
test_dgemv[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
13.9 ms13.9 ms

Commits

Click on a commit to change the comparison range
Base
develop
8f0a1a3
-0.13%
Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for
5253c8f
11 months ago
by nakagawa-fj
© 2026 CodSpeed Technology
Home Terms Privacy Docs