blis/addon/aocl_gemm/config at 2e1cc2f14a413cc445f2ba0e4ebdae81b94ff794 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 01:30:00 +00:00

Files

Meghana Vankadari 2e1cc2f14a Added bf16s4f32 kernels to handle m=4 cases

Details:
- In WOQ, if m = 4, special case kernels are added where
  s4->bf16 conversion happens inside the compute kernel and
  packing is avoided. For all other cases, B matrix is
  dequantized and packed at KC loop level and native bf16
  kernels are re-used at compute level.
- Fixes in bench to avoid accuracy failures when datatype of
  output is bf16.

Change-Id: Ie8db42da536891693d5e82a5336b66514a50ccb2

2024-09-04 07:36:57 -04:00

lpgemm_blksz_map.h

Added bf16s4f32 kernels to handle m=4 cases

2024-09-04 07:36:57 -04:00

lpgemm_config.c

Element wise operations API for bfloat16 input matrix in LPGEMM.

2024-08-05 07:17:08 -04:00

lpgemm_config.h

Element wise operations API for bfloat16 input matrix in LPGEMM.

2024-08-05 07:17:08 -04:00

lpgemm_func_map.h

Element wise operations API for float(f32) input matrix in LPGEMM.

2024-08-27 03:28:52 -04:00