amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-05 06:51:11 +00:00

Author	SHA1	Message	Date
Meghana Vankadari	69ca5dbcd6	Fixed compilation errors for gcc versions < 11.2 Details: - Disabled intrinsics code of f32obf16 pack function for gcc < 11.2 as the instructions used in kernels are not supported by the compiler versions. - Addded early-return check for WOQ APIs when compiling with gcc < 11.2 - Fixed code to check whether JIT kernels are generated inside batch_gemm API for bf16 datatype. AMD Internal: [CPUPL-6327] Change-Id: I0a017c67eb9d9d22a14e095e435dc397e265fb0a	2025-01-21 07:13:31 -05:00
Deepak Negi	182a6373b5	Added support to specify bias data type in u8s8s32/s8s8s32 API's Description: 1. The bias type was supported only based on output data type. 2. The option is added in the pre-ops structure to select the bias data type(s8/s32/bf16) irrespective of the storage data type in u8s8s32/s8s8s32 API's. AMD-Internal: SWLCSG-3302 Change-Id: I3c465fe428672d2d58c1c60115c46d2d5b11f0f4	2025-01-15 05:56:26 -05:00
Meghana Vankadari	852cdc6a9a	Implemented batch_matmul for f32 & int8 datatypes Details: - The batch matmul performs a series of matmuls, processing more than one GEMM problem at once. - Introduced a new parameter called batch_size for the user to indicate number of GEMM problems in a batch/group. - This operation supports processing GEMM problems with different parameters including dims,post-ops,stor-schemes etc., - This operation is optimized for problems where all the GEMMs in a batch are of same size and shape. - For now, the threads are distributed among different GEMM problems equally irrespective of their dimensions which leads to better performance for batches with identical GEMMs but performs sub-optimally for batches with non-identical GEMMs. - Optimizations for batches with non-identical GEMMs is in progress. - Added bench and input files for batch_matmul. - Added logger functionality for batch_matmul APIs. AMD-Internal: [SWLCSG-2944] Change-Id: I83e26c1f30a5dd5a31139f6706ac74be0aa6bd9a	2025-01-10 04:10:53 -05:00
Mithun Mohan	ef4286a97e	Multi-data type buffer and scale support for matrix add\|mul post-ops in s32 API. -As it stands the buffer type in matrix add\|mul post-ops is expected to be the same as that of the output C matrix type. This limitation is now removed and user can specify the buffer type by setting the stor_type attribute in add\|mul post-op struct. As of now int8, int32, bfloat16 and float types are supported for the buffer in s32 micro-kernels. The same support is also added for bf16 micro-kernels, with bfloat16 and float supported for now. -Additionally the values (from buffer) are added/multiplied as is to the output registers while performing the matrix add\|mul post-ops. Support is added for scaling these values before using them in the post-ops. Both scalar and vector scale_factors are supported. -The bias_stor_type attribute is renamed to stor_type in bias post-ops. AMD-Internal: [SWLCSG-3319] Change-Id: I4046ab84481b02c55a71ebb7038e38aec840c0fa	2025-01-10 02:11:12 -05:00
Mithun Mohan	4a95f44d39	Buffer scale support for matrix add and matrix mul post-ops in bf16 API. -Currently the values (from buffer) are added/multiplied as is to the output registers while performing the matrix add/mul post-ops. Support is added for scaling these values before using them in the post-ops. Both scalar and vector scale_factors are supported. AMD-Internal: [SWLCSG-3181] Change-Id: Ifdb7160a1ea4f5ecccfa3ef31ecfed432898c14d	2025-01-08 10:35:50 +00:00
Meghana Vankadari	bfc512d3e1	Implemented batch_gemm for bf16bf16f32of32\|bf16 Details: - The batch matmul performs a series of matmuls, processing more than one GEMM problem at once. - Introduced a new parameter called batch_size for the user to indicate number of GEMM problems in a batch/group. - This operation supports processing GEMM problems with different parameters including dims,post-ops,stor-schemes etc., - This operation is optimized for problems where all the GEMMs in a batch are of same size and shape. - For now, the threads are distributed among different GEMM problems equally irrespective of their dimensions which leads to better performance for batches with identical GEMMs but performs sub-optimally for batches with non-identical GEMMs. - Optimizations for batches with non-identical GEMMs is in progress. - Added bench and input files for batch_matmul. AMD-Internal: [SWLCSG-2944] Change-Id: Idc59db5b8c5794bf19f6f86bcb8455cd2599c155	2025-01-03 03:28:32 -05:00

6 Commits