Multi data type downscaling support for u8s8s16 - u8s8s16<u8|s8>

Downscaling is used when GEMM output is accumulated at a higher precision and needs to be converted to a lower precision afterwards. Currently the u8s8s16 flavor of api only supports downscaling to s8 (int8_t) via aocl_gemm_u8s8s16os8 after results are accumulated at int16_t. LPGEMM is modified to support downscaling to different data types, like u8, s16, apart from s8. The framework (5 loop) passes the downscale data type to the micro-kernels. Within the micro-kernel, based on the downscale type, appropriate beta scaling and output buffer store logic is executed. This support is only enabled for u8s8s16 flavor of api's. The LPGEMM bench is also modified to support passing downscale data type for performance and accuracy testing. AMD-Internal: [SWLCSG-2313] Change-Id: I723d0802baf8649e5e41236b239880a6043bfd30
2026-05-05 15:01:13 +00:00 · 2023-10-09 17:12:03 +05:30
parent a6a67fea2d
commit ea0324ab95
30 changed files with 1134 additions and 371 deletions
--- a/addon/aocl_gemm/frame/lpgemm_5loop_interface_apis.h
+++ b/addon/aocl_gemm/frame/lpgemm_5loop_interface_apis.h
@@ -62,7 +62,7 @@ void lpgemm_rowvar_ ## LP_SFX \
       lpgemm_thrinfo_t*     thread, \
       lpgemm_cntx_t*        lcntx, \
       lpgemm_post_op*       post_op_list, \
-       bool                  c_downscale \
+       AOCL_STORAGE_TYPE     c_downscale \
     ) \

 LPGEMM_5LOOP(uint8_t,int8_t,int32_t,u8s8s32o32);