amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-05 15:01:13 +00:00

Author	SHA1	Message	Date
Mithun Mohan	8d8a8e2f19	Light-weight logging framewok for LPGEMM. -A light-weight mechanism/framework to log input details and a stringified version of the post-ops structure is added to LPGEMM. Additionally the runtime of the API is also logged. The logging framework logs to a file with filename following the format aocl_gemm_log_<PID>_<TID>.txt. -To enable this feature, the AOCL_LPGEMM_LOGGER_SUPPORT=1 macro needs to be defined when compiling BLIS (with aocl_gemm addon enabled) by passing CFLAGS="-DAOCL_LPGEMM_LOGGER_SUPPORT=1" to ./configure. Additionally AOCL_ENABLE_LPGEMM_LOGGER=1 has to be exported in the environment during LPGEMM runtime. AMD-Internal: [SWLCSG-3280] Change-Id: I30bfb35b2dc412df70044601b335938fc9f49cfb	2025-01-03 11:28:57 +00:00
Meghana Vankadari	bfc512d3e1	Implemented batch_gemm for bf16bf16f32of32\|bf16 Details: - The batch matmul performs a series of matmuls, processing more than one GEMM problem at once. - Introduced a new parameter called batch_size for the user to indicate number of GEMM problems in a batch/group. - This operation supports processing GEMM problems with different parameters including dims,post-ops,stor-schemes etc., - This operation is optimized for problems where all the GEMMs in a batch are of same size and shape. - For now, the threads are distributed among different GEMM problems equally irrespective of their dimensions which leads to better performance for batches with identical GEMMs but performs sub-optimally for batches with non-identical GEMMs. - Optimizations for batches with non-identical GEMMs is in progress. - Added bench and input files for batch_matmul. AMD-Internal: [SWLCSG-2944] Change-Id: Idc59db5b8c5794bf19f6f86bcb8455cd2599c155	2025-01-03 03:28:32 -05:00
mkadavil	f040ba617f	Element wise operations API for bfloat16 input matrix in LPGEMM. -This API supports applying element wise operations (eg: post-ops) on a bfloat16 input matrix to get an output matrix of the same(bfloat16) or upscaled data type (float). -Benchmarking/testing framework for the same is added. AMD Internal: SWLCSG-2947 Change-Id: I43f1c269be1a1997d4912d8a3a97be5e5f3442d2	2024-08-05 07:17:08 -04:00
Meghana Vankadari	77bd9a7f17	Added parameter checking for LPGEMM APIs Change-Id: I6ea89fd0d2516539e5a4e9cd8537570b23194d89	2023-11-09 21:50:55 -05:00

4 Commits