BLAS Extension API - ?gemm_compute()

- Added support for 2 new APIs:
	1. sgemm_compute()
	2. dgemm_compute()
  These are dependent on the ?gemm_pack_get_size() and ?gemm_pack()
  APIs.
- ?gemm_compute() takes the packed matrix buffer (represented by the
  packed matrix identifier) and performs the GEMM operation:
  C := A * B + beta * C.
- Whenever the kernel storage preference and the matrix storage
  scheme isn't matching, and the respective matrix being loaded isn't
  packed either, on-the-go packing has been enabled for such cases to
  pack that matrix.
- Note: If both the matrices are packed using the ?gemm_pack() API,
  it is the responsibility of the user to pack only one matrix with
  alpha scalar and the other with a unit scalar.
- Note: Support is presently limited to Single Thread only. Both, pack
  and compute APIs are forced to take n_threads=1.

AMD-Internal: [CPUPL-3560]
Change-Id: I825d98a0a5038d31668d2a4b84b3ccc204e6c158
This commit is contained in:
Arnav Sharma
2023-07-17 12:44:42 +05:30
committed by Arnav Sharma
parent 81161066e5
commit c8f14edcf5
32 changed files with 3623 additions and 20 deletions

View File

@@ -60,6 +60,9 @@
// Include the pack full thread decorator and related definitions and prototypes
// for the pack code path.
#include "bli_pack_full_decor.h"
// Include the level-3 thread decorator and related definitions and prototypes
// for the compute code path.
#include "bli_l3_compute_decor.h"
// Initialization-related prototypes.
void bli_thread_init( void );