BLAS Extension API - ?gemm_compute()

- Added support for 2 new APIs: 1. sgemm_compute() 2. dgemm_compute() These are dependent on the ?gemm_pack_get_size() and ?gemm_pack() APIs. - ?gemm_compute() takes the packed matrix buffer (represented by the packed matrix identifier) and performs the GEMM operation: C := A * B + beta * C. - Whenever the kernel storage preference and the matrix storage scheme isn't matching, and the respective matrix being loaded isn't packed either, on-the-go packing has been enabled for such cases to pack that matrix. - Note: If both the matrices are packed using the ?gemm_pack() API, it is the responsibility of the user to pack only one matrix with alpha scalar and the other with a unit scalar. - Note: Support is presently limited to Single Thread only. Both, pack and compute APIs are forced to take n_threads=1. AMD-Internal: [CPUPL-3560] Change-Id: I825d98a0a5038d31668d2a4b84b3ccc204e6c158
2026-05-05 15:01:13 +00:00 · 2023-07-17 12:44:42 +05:30
parent 81161066e5
commit c8f14edcf5
32 changed files with 3623 additions and 20 deletions
--- a/frame/thread/bli_thread.h
+++ b/frame/thread/bli_thread.h
@@ -60,6 +60,9 @@
 // Include the pack full thread decorator and related definitions and prototypes
 // for the pack code path.
 #include "bli_pack_full_decor.h"
+// Include the level-3 thread decorator and related definitions and prototypes
+// for the compute code path.
+#include "bli_l3_compute_decor.h"

 // Initialization-related prototypes.
 void bli_thread_init( void );