Avatar for the OpenMathLib user
OpenMathLib
OpenBLAS
BlogDocsChangelog

Added ability to accumulate in FP16 for GEMM for RISC-V

#5640
Comparing
ChipKerchner:RVV_Narrow_Accumulate_FP16_GEMM
(
e3cb067
) with
develop
(
986ba29
)
CodSpeed Performance Gauge
0%
Untouched
62

Benchmarks

62 total
test_nrm2[1000-dz]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
35.4 µs35.3 µs
test_dgbmv[1-100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
37.9 µs37.8 µs
test_nrm2[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
30.4 µs30.3 µs
test_nrm2[100-dz]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
28.7 µs28.6 µs
test_daxpy[1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
32.6 µs32.5 µs
test_daxpy[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
24 µs23.9 µs
test_daxpy[100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
24.1 µs24.1 µs
test_dgbmv[1-1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
99.5 µs99.3 µs
test_dgbmv[1-1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
74.9 µs74.8 µs
test_gesv[100-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
938.3 µs937.1 µs
test_dgemv[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
104 µs103.9 µs
test_gesdd[mn0-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
109.3 µs109.2 µs
test_dot[100]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
22.4 µs22.4 µs
test_daxpy[100-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
25.9 µs25.8 µs
test_gesv[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
256.8 µs256.7 µs
test_syrk[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
213.4 µs213.3 µs
test_dgemv[100-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
231.2 µs231.1 µs
test_daxpy[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
32.4 µs32.3 µs
test_gemm[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
273.1 µs273 µs
test_dgbmv[1-1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
83.5 µs83.5 µs
test_gemm[100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
471.2 µs471 µs
test_gemm[100-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
1.2 ms1.2 ms
test_gemm[100-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
659.6 µs659.4 µs
test_syrk[100-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
856.6 µs856.5 µs
test_syrk[100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
339.5 µs339.4 µs

Commits

Click on a commit to change the comparison range
Base
develop
986ba29
-0.18%
Added ability to accumulate in FP16 for GEMM. Widens once at the end of loops.
b5f2a50
1 day ago
by ChipKerchner
+0.19%
128-bit versions.
aa1cebd
23 hours ago
by ChipKerchner
-0.19%
Forget to add defintion.
74d9fe2
23 hours ago
by ChipKerchner
+0.19%
Fixed MADD to use float16 values. Use LMUL = 2 in main loop. Now 1.85X faster on BananaPi.
e3cb067
17 hours ago
by ChipKerchner
© 2026 CodSpeed Technology
Home Terms Privacy Docs