Avatar for the OpenMathLib user
OpenMathLib
OpenBLAS
BlogDocsChangelog

Split VORTEXM4 from VORTEX target and fix SGEMM_DIRECT support for SME-capable targets

#5423Merged
Comparing
martin-frbg:issue5414
(
6f225da
) with
develop
(
e07bea1
)
CodSpeed Performance Gauge
0%
Untouched
62

Benchmarks

62 total
test_nrm2[100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
+3%
38 µs36.8 µs
test_dot[100]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
22.6 µs22.6 µs
test_gesv[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
257.5 µs257 µs
test_dgemv[100-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
231.5 µs231.2 µs
test_dgbmv[1-100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
37.3 µs37.3 µs
test_gesdd[mn0-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
120.3 µs120.2 µs
test_dgbmv[1-1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
75.2 µs75.1 µs
test_dgemv[1000-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
7 ms7 ms
test_syrk[100-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
339.7 µs339.6 µs
test_syev[50-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
1.4 ms1.4 ms
test_daxpy[1000-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
40.6 µs40.6 µs
test_syrk[100-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
472.5 µs472.5 µs
test_gemm[100-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
659.7 µs659.6 µs
test_syev[200-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
58.6 ms58.6 ms
test_gesdd[mn1-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
93.8 ms93.8 ms
test_gesv[1000-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
353.6 ms353.6 ms
test_gesdd[mn1-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
65.2 ms65.2 ms
test_gesv[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
93.3 ms93.3 ms
test_gemm[100-z]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
1.2 ms1.2 ms
test_syev[200-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
49.1 ms49.1 ms
test_syrk[100-s]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
213.5 µs213.5 µs
test_syrk[1000-d]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
130.4 ms130.4 ms
test_syrk[1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
227.5 ms227.5 ms
test_dgemv[1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
14.8 ms14.8 ms
test_gemm[1000-c]
benchmark/pybench/benchmarks/bench_blas.py
CodSpeed Performance Gauge
0%
426 ms426 ms

Commits

Click on a commit to change the comparison range
Base
develop
e07bea1
-0.15%
Add VORTEXM4
18f9582
6 months ago
by martin-frbg
0%
Add VORTEXM4
ca542f3
6 months ago
by martin-frbg
0%
Add VORTEXM4 to DYNAMIC_ARCH list
bf98e44
6 months ago
by martin-frbg
0%
Relax version number requirement for AppleClang
4609732
6 months ago
by martin-frbg
0%
Update SME-related kernels
107c883
6 months ago
by martin-frbg
+0.07%
Hide the local 2VLx2VL symbol as static is insufficient for this with gcc
edaa73f
6 months ago
by martin-frbg
-0.1%
Add VORTEXM4
1ee8879
6 months ago
by martin-frbg
+0.13%
Add registers d8 to d15 to clobber lists as the code does not expressly save them
b4fc09e
6 months ago
by martin-frbg
+0.03%
remove debugging printout
2b5d8c7
6 months ago
by martin-frbg
+0.02%
Merge branch 'develop' into issue5414
fc516af
5 months ago
by martin-frbg
-0.01%
Update Makefile.L3
b3d0bc4
5 months ago
by martin-frbg
+0.01%
Rework for DYNAMIC_ARCH use and use of SGEMM functions by SSYMM
c889558
5 months ago
by martin-frbg
+0.07%
Update limits based on benchmarking the SME code on Apple M4
47a66ae
5 months ago
by martin-frbg
-0.03%
Merge branch 'OpenMathLib:develop' into issue5414
9bfc361
5 months ago
by martin-frbg
-0.14%
Add prototype for gotoblas_corename
682f61e
4 months ago
by martin-frbg
+0.18%
Merge branch 'OpenMathLib:develop' into issue5414
ea85b66
3 months ago
by martin-frbg
0%
Merge branch 'OpenMathLib:develop' into issue5414
9c0965b
3 months ago
by martin-frbg
0%
Merge branch 'OpenMathLib:develop' into issue5414
8c0b13c
3 months ago
by martin-frbg
-0.06%
Add cpuid for Apple M5 (from a PR to the archspec project)
7d35bf6
3 months ago
by martin-frbg
+0.05%
fix sequence
c3c857c
3 months ago
by martin-frbg
0%
AppleClang does not define feature local_streaming
825d3ad
3 months ago
by martin-frbg
+0.01%
remove za from clobber lists
e85efb8
3 months ago
by martin-frbg
+0.03%
syntax fix
67fd33e
2 months ago
by martin-frbg
0%
reset SVE and SME capabilities between targets
02bc005
2 months ago
by martin-frbg
0%
Use the armv9 capability set in the compiler test for SME
e384396
2 months ago
by martin-frbg
-0.15%
Adapt for DYNAMIC_ARCH with multiple ...preprocess symbols
a9a6eda
2 months ago
by martin-frbg
+0.05%
Merge branch 'OpenMathLib:develop' into issue5414
6de062c
1 month ago
by martin-frbg
-0.01%
Merge branch 'OpenMathLib:develop' into issue5414
aafd3cb
1 month ago
by martin-frbg
0%
Move early exit up; don't rely on support_sme() for now
31150eb
1 month ago
by martin-frbg
0%
fix missing parentheses on endif
10ba0e6
1 month ago
by martin-frbg
0%
Distinguish AppleClang from LLVM on ARM64
770ad68
1 month ago
by martin-frbg
0%
Apple Clang requires +sme in the arch string for M4
31bb6ca
1 month ago
by martin-frbg
+0.01%
fix missing endif() and add AppleClang options for M4
fa021e1
1 month ago
by martin-frbg
-0.01%
fix os variable reference
6137236
1 month ago
by martin-frbg
-0.01%
drop the cpu=apple-m4 part as nonessential
6735872
1 month ago
by martin-frbg
+0.01%
remove cpu=apple-m4 as not required and less portable
d3e4b41
1 month ago
by martin-frbg
0%
Update Makefile
88c583e
1 month ago
by martin-frbg
-0.02%
fix spurious change of (S)BGEMM parameters for NeoverseV1
7ffce1c
1 month ago
by martin-frbg
+0.02%
Force linking to clang_rt_builtins when using LLVM for AppleM4
93cd7b9
1 month ago
by martin-frbg
0%
typo fix
faa1875
1 month ago
by martin-frbg
0%
Make VortexM4 available in DYNAMIC_ARCH on MacOS only
55a10c7
1 month ago
by martin-frbg
-0.03%
make VORTEXM4 MacOS-only for now
6f225da
1 month ago
by martin-frbg
© 2026 CodSpeed Technology
Home Terms Privacy Docs