Multi data type downscaling support for u8s8s16 - u8s8s16<u8|s8>

Downscaling is used when GEMM output is accumulated at a higher
precision and needs to be converted to a lower precision afterwards.
Currently the u8s8s16 flavor of api only supports downscaling to s8
(int8_t) via aocl_gemm_u8s8s16os8 after results are accumulated at
int16_t.
LPGEMM is modified to support downscaling to different data types,
like u8, s16, apart from s8. The framework (5 loop) passes the
downscale data type to the micro-kernels. Within the micro-kernel,
based on the downscale type, appropriate beta scaling and output
buffer store logic is executed. This support is only enabled for
u8s8s16 flavor of api's.
The LPGEMM bench is also modified to support passing downscale data
type for performance and accuracy testing.

AMD-Internal: [SWLCSG-2313]
Change-Id: I723d0802baf8649e5e41236b239880a6043bfd30
This commit is contained in:
mkadavil
2023-10-09 17:12:03 +05:30
committed by MithunMohan KadavilMadanaMohanan
parent a6a67fea2d
commit ea0324ab95
30 changed files with 1134 additions and 371 deletions

View File

@@ -62,7 +62,7 @@ void lpgemm_rowvar_ ## LP_SFX \
lpgemm_thrinfo_t* thread, \
lpgemm_cntx_t* lcntx, \
lpgemm_post_op* post_op_list, \
bool c_downscale \
AOCL_STORAGE_TYPE c_downscale \
) \
LPGEMM_5LOOP(uint8_t,int8_t,int32_t,u8s8s32o32);