mirror of
https://github.com/amd/blis.git
synced 2026-05-12 10:05:38 +00:00
Details: - Group quantization is technique to improve accuracy where scale factors to quantize inputs and weights varies at group level instead of per channel and per tensor level. - Added new bench files to test GEMM with symmetric static quantization. - Added new get_size and reorder functions to account for storing sum of col-values separately per group. - Added new framework, kernels to support the same. - The scalefactors could be of type float or bf16. AMD-Internal:[SWLCSG-3274] Change-Id: I3e69ecd56faa2679a4f084031d35ffb76556230f
13 lines
984 B
Plaintext
13 lines
984 B
Plaintext
r n n n r 5 10 128 128 10 10 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 4 10 127 127 10 10 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 3 10 127 127 10 10 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 2 10 127 127 10 10 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 1 10 127 127 10 10 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 293 4105 127 127 4105 4105 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 292 4105 127 127 4105 4105 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 291 4105 127 127 4105 4105 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 290 4105 127 127 4105 4105 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 289 4105 127 127 4105 4105 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 288 4096 4096 4096 4096 4096 s8s8s32of32:group_size=64,sym_quant_sf=f32
|
|
r n n n r 6 64 64 64 64 64 s8s8s32of32:group_size=4,sym_quant_sf=bf16
|
|
r n n n r 6 128 64 64 128 128 s8s8s32of32:group_size=4,sym_quant_sf=bf16 |