amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-05 23:11:15 +00:00

Author	SHA1	Message	Date
Mithun Mohan	8d8a8e2f19	Light-weight logging framewok for LPGEMM. -A light-weight mechanism/framework to log input details and a stringified version of the post-ops structure is added to LPGEMM. Additionally the runtime of the API is also logged. The logging framework logs to a file with filename following the format aocl_gemm_log_<PID>_<TID>.txt. -To enable this feature, the AOCL_LPGEMM_LOGGER_SUPPORT=1 macro needs to be defined when compiling BLIS (with aocl_gemm addon enabled) by passing CFLAGS="-DAOCL_LPGEMM_LOGGER_SUPPORT=1" to ./configure. Additionally AOCL_ENABLE_LPGEMM_LOGGER=1 has to be exported in the environment during LPGEMM runtime. AMD-Internal: [SWLCSG-3280] Change-Id: I30bfb35b2dc412df70044601b335938fc9f49cfb	2025-01-03 11:28:57 +00:00
varshav2	d4e0fa9b4c	Revert duplicate check and fix bug in the check for post-ops - Revert of patch 1110983 - Duplicate check removal and early return for s8s8s32/u8s8s32 - Add fix - Added check to see if post-ops is enabled with col-major storage and return early in that case. Change-Id: Id3b8c97b6d1425dfb06f3b196e5acd60caee8fca	2024-08-29 06:52:14 -04:00
varshav2	e3c434080a	Fix duplicate check and early return in s8s8s32/u8s8s32 - removed the duplicate check for col-major inputs in s8s8s32/u8s8s32 APIs - Fixed the print in bench_lpgemm Change-Id: If40837b89927dd82d8aa6f620d1a7f2c24aed53c	2024-08-23 02:32:20 +05:30
mkadavil	d37c91dffa	Quantization (scale + zero point) support for BF16 LPGEMM api. -Quantization of f32 to bf16 (bf16 = (f32 * scale_factor) + zero_point) instead of just type conversion in aocl_gemm_bf16bf16f32obf16. -Support for multiple scale/sum/matrix_add/bias post-ops in a single LPGEMM api call. -Post-ops mask related fixes in lpgemv kernels . -Additional scale post-ops sanity checks. AMD-Internal: [SWLCSG-2945] Change-Id: I3b35cc413c176bb50bfdbd6acd4839a5ba7e94bb	2024-07-18 05:32:51 -04:00
Nallani Bhaskar	29db6eb42b	Added transB in all AVX512 based int8 API's Description: --Added support for tranB in u8s8s32o<s32\|s8> and s8s8s32o<s32\|s8> API's --Updated the bench_lpgemm by adding options to support transpose of B matrix --Updated data_gen_script.py in lpgemm bench according to latest input format. AMD-Internal: [SWLCSG-2582] Change-Id: I4a05cc390ae11440d6ff86da281dbafbeb907048	2024-05-23 03:46:13 +05:30
eashdash	ef134dc49f	Added Trans A feature for all INT8 LPGEMM APIs 1. Added Trans A feature to handle column major inputs for A matrix. 2. Trans A is enabled by on-the-go pack of A matrix. 3. The on-the-go pack of A converts a column storage MCxKC block of A into row storage MCxKC block as LPGEMM kernels are row major kernels. 4. New pack routines are added for conversion of A matrix from column major storage to row major storage. 5. LPGEMM Cntx is updated with pack kernel function pointers. 6. Packing of A matrix: - Converts column major input A to row major in blocks of MCxKC with newly added pack A functions when cs_a > 1. 7. Pack routines are added for AVX512 and AVX2 INT8 LPGEMM APIs. 8. Trans A feature is now supported in: 1. u8s8s32os32/os8 2. u8s8s16os16/os8/ou8 3. s8s8s32os32/os8 4. s8s8s16os16/os8 AMD-Internal: SWLCSG-2582 Change-Id: I7ce331545525a9a09f3853280615b55fcf2edabf	2024-01-30 03:40:56 -05:00
Meghana Vankadari	77bd9a7f17	Added parameter checking for LPGEMM APIs Change-Id: I6ea89fd0d2516539e5a4e9cd8537570b23194d89	2023-11-09 21:50:55 -05:00
Meghana Vankadari	f8f4343b55	Updated cntx with packA function pointer for AVX512_VNNI support Details: - Modified bench to support testing for sizes where matrix strides are larger than the corresponding dimensions. - Modified early-return checks in all interface APIs to check validity of strides in relation to the corresponding dimension rather than checking if strides are equal to dimensions. Change-Id: I382529b636a4acc75f6d93d997af22a168a7bfc4	2023-11-03 04:50:00 -04:00
mkadavil	ea0324ab95	Multi data type downscaling support for u8s8s16 - u8s8s16<u8\|s8> Downscaling is used when GEMM output is accumulated at a higher precision and needs to be converted to a lower precision afterwards. Currently the u8s8s16 flavor of api only supports downscaling to s8 (int8_t) via aocl_gemm_u8s8s16os8 after results are accumulated at int16_t. LPGEMM is modified to support downscaling to different data types, like u8, s16, apart from s8. The framework (5 loop) passes the downscale data type to the micro-kernels. Within the micro-kernel, based on the downscale type, appropriate beta scaling and output buffer store logic is executed. This support is only enabled for u8s8s16 flavor of api's. The LPGEMM bench is also modified to support passing downscale data type for performance and accuracy testing. AMD-Internal: [SWLCSG-2313] Change-Id: I723d0802baf8649e5e41236b239880a6043bfd30	2023-10-12 09:19:56 -04:00
Edward Smyth	bb4c158e63	Merge commit 'b683d01b' into amd-main * commit 'b683d01b': Use extra #undef when including ba/ex API headers. Minor preprocessor/header cleanup. Fixed typo in cpp guard in bli_util_ft.h. Defined eqsc, eqv, eqm to test object equality. Defined setijv, getijv to set/get vector elements. Minor API breakage in bli_pack API. Add err_t* "return" parameter to malloc functions. Always stay initialized after BLAS compat calls. Renamed membrk files/vars/functions to pba. Switch allocator mutexes to static initialization. AMD-Internal: [CPUPL-2698] Change-Id: Ied2ca8619f144d4b8a7123ac45a1be0dda3875df	2023-08-21 07:01:38 -04:00
eashdash	bd8cd763ff	Added NEW LPGEMM TYPE- S8S8S32/S8 1. New LPGEMM type - S8S8S32/S8 is added. 2. New interface, frame and kernel files are added. 3. Frame and kernel files added/modified for S8S8S32/S8 have 2 operations - Pack B and Mat Mul 4. Pack B kernel routines to pack B matrix for VNNI and compute the sum of every column of B matrix to implement the S8S8S32 operation using the VNNI instructions. 5. Mat Mul Kernel files to compute the GEMM output using the VNNI. Here the A matrix elements are converted from int8 to uint8 (VNNI works with A matrix type uint8 only). 6. Post GEMM computation, additional operations are performed on the accumulated outputs to get the correct results. 7. With this change, two new LPGEMM APIs are introduced in LPGEMM - s8s8s32os32 and s8s8s32os8. 8. All previously added post-ops are supported on S8S8S32/S8 also. AMD-Internal: [CPUPL-3154] Change-Id: Ib18f82bde557ea4a815a63adc7870c4234bfb9d3	2023-03-31 05:44:54 -04:00

11 Commits