fix RVV 1.0 detection code
There were a couple of issues with the detection code used to check
for RVV 1.0 on kernels that do not support hwprobe.
1. The vtype clobber was missing
2. The wrong form of vsetvli was being used. The vsetvli x0, x0 form
is inappropriate for this use case as it can only be safely used
in code where the value of vtype is known. The use of vsetvli
x0, x0 here can lead to a failure to detect RVV 1.0, if,
for example, the vill bit happens to be set before
detect_riscv64_rvv100 is called.
We fix both issues by adding the missing clobber and replacing the
first parameter to vsetvli with t0 (which we add to our clobbers).
disable fp16 flags on RISC-V unless BUILD_HFLOAT16=1
The compiler options that enable 16 bit floating point instructions
should not be enabled by default when building the RISCV64_ZVL128B
and RISCV64_ZVL256B targets. The zfh and zvfh extensions are not part
of the 'V' extension and are not required by any of the RVA profiles.
There's no guarantee that kernels built with zfh and zvfh will work
correctly on fully compliant RVA23U64 devices.
To fix the issue we only build the RISCV64_ZVL128B and RISCV64_ZVL256B
kernels with the half float flags if BUILD_HFLOAT16=1. We also update
the RISC-V dynamic detection code to disable the RISCV64_ZVL128B and
RISCV64_ZVL256B kernels at runtime if we've built with DYNAMIC_ARCH=1
and BUILD_HFLOAT16=1 and are running on a device that does not support
both Zfh and Zvfh.
Fixes: https://github.com/OpenMathLib/OpenBLAS/issues/5428