Commits
Click on a commit to change the comparison rangeAdded ability to accumulate in FP16 for GEMM. Widens once at the end of loops.1 month ago
by ChipKerchner 128-bit versions.1 month ago
by ChipKerchner Forget to add defintion.1 month ago
by ChipKerchner Fixed MADD to use float16 values. Use LMUL = 2 in main loop. Now 1.85X faster on BananaPi.1 month ago
by ChipKerchner Convert inputs from BF16 to FP32 and use FP32 vector madds. 18% faster.1 month ago
by ChipKerchner Convert BF16 values once (and vectorized).1 month ago
by ChipKerchner One small change.1 month ago
by ChipKerchner Only convert B if M is greater or equal to 4.1 month ago
by ChipKerchner Add flag for not converting A & B - will be used in future to do conversion during packing.1 month ago
by ChipKerchner Add dummy memsets - just in case.1 month ago
by ChipKerchner Add pre-RVA23 to BF16 GEMM.29 days ago
by ChipKerchner