amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-04 14:31:12 +00:00

Author	SHA1	Message	Date
Deepak Negi	baeebe75c9	Support for standard AutoAWQ storage format. Description: 1. AutoAWQ use a int32 buffer to store 8 elements each of 4 bits in this format [0, 2, 4, 6, 1, 3, 5, 7]. 2. Support is added to convert above format back to the original sequential order [0, 1, 2, 3, 4, 5, 6, 7] before reordering in the AWQ API. AMD-Internal: SWLCSG-3169 Change-Id: I5395766060c200ab81d0b8be94356678a169ac13	2024-12-02 04:02:27 -05:00
Meghana Vankadari	fbb72d047f	Added group quantization and zero-point support for WOQ kernels Description: 1. Added group quantization and zero-point (zp) in aocl_gemm_bf16s4f32o<bf16\|f32> API. 2. Group quantization is technique to improve accuracy where scale factors to dequantize weights varies at group level instead of per channel and per tensor level. 3. Added zp and scaling in woq packb kernels so that for large M values zp and scaling are performed at pack-b stage and bf16 kernels are called 4. Adding zp support and scaling to default path in WoQ kernels created some performance overhead when M value is very small. 5. Added string group_size to lpgemm bench to read group size from bench_input.txt and tested for various combinations of matrix dimensions. 6. The scalefactors could be of type float or bf16 and the zeropoint values are expected to be in int8 format. AMD-Internal: [SWLCSG-3168, SWLCSG-3172] Change-Id: Iff07b54d76edc7408eb2ea0b29ce8b4a04a38f57	2024-12-02 06:46:13 +00:00
Nallani Bhaskar	9735391e1d	Implemented f32tobf16 reorder function Description: aocl_reorder_f32obf16 function is implemented to reorder input weight matrix of data type float to bfloat16. The reordering is done to match the input requirements of API aocl_gemm_bf16bf16f32o<f32\|bf16>. The objective of the API is to convert a model/matrix of type f32 to bf16 and process when machine supports bf16 FMA instruction _mm512_dpbf16_ps but the model is still in float Change-Id: Ib7c743d52d01a1ac09e84ac120577ec9e02f90f5	2024-11-04 04:32:01 +00:00
Meghana Vankadari	b04b8f22c9	Introduced un-reorder API for bf16bf16f32of32 Details: - Added a new API called unreorder that converts a matrix from reordered format to it's original format( row-major or col-major ). - Currently this API only supports bf16 datatype. - Added corresponding bench and input file to test accuracy of the API. - The new API is only supported for 'B' matrix. - Modified input validation checks in reorder API to account for row Vs col storage of matrix and transposes for bf16 datatype. Change-Id: Ifb9c53b7e6da6f607939c164eb016e82514581b7	2024-10-23 07:49:24 -04:00
Meghana Vankadari	d5b4d3aa5e	Fixing control flow in aocl_gemm_bf16s4f32of32\|bf16 - Fixed framework of bf16s4f32of32 API to correct pointer updations. - Modified pre_op structure to exclude pre-op-offset. Now offset is passed as a separate parameter to the scale-pack functions. - Fixed work-distribution among threads in MT scenario. - Added Blocksizes and kernel-pointers and verified functionality for the new API. AMD-Internal: [SWLCSG-2943] Change-Id: I58fece240d62c798c880a2b2b7fa64e560cc753d	2024-07-29 05:12:09 -04:00
Nallani Bhaskar	c6dd7c1b4b	Added new API in aocl_gemm to support A bf16 data type and B s4 data type Description: 1. Added a new API aocl_gemm_bf16s4f32of32 to support for WoQ (Weight-only-Quantization) in LLM's 2. The API supports only reordered B matrix of data size signed 4 bits (S4). 3. Substracting zero point and multiplying with scale on B matrix is performed in packing B. 4. zero point and scale data should be passed by user through pre-ops data structure. 5. The API is still in experimental state and NOT tested. AMD-Internal: SWLCSG-2943 Change-Id: I10b159b64c2e2aaf39da5462685618ba8cc800ee	2024-07-25 11:59:03 +00:00
mkadavil	7114376519	New kernels for int4 B matrix reordering following BF16 kernel schema. -To enable Weight-only-Quantization (WOQ) workflow, new LPGEMM APIs are required where data types are A:bf16, B:int4 and C:f32/bf16. It is expected that the BF16 kernels will be reused within this API and subsequently the B matrix needs to be reordered following the BF16 kernel schema, but with the reordered matrix type still being int4. To address this, new BF16 reorder kernels enabling the same are added. AMD-Internal: [SWLCSG-2943] Change-Id: Ib770ecbf90a3d906deafece94b1a96e0b9412738	2024-07-25 01:10:13 -04:00
Edward Smyth	ed5010d65b	Code cleanup: AMD copyright notice Standardize format of AMD copyright notice. AMD-Internal: [CPUPL-3519] Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0	2023-11-23 08:54:31 -05:00
Nallani Bhaskar	b3391ef5da	Updated ERF threshold and packa changes in bf16 Description: 1. Updated ERF function threshold from 3.91920590400 to 3.553 to match with the reference erf float implementation which reduced errors a the borders and also clipped the output to 1.0 2. Updated packa function call with pack function ptr in bf16 api to avoid compilation issues for non avx512bf16 archs 3. Updated lpgemm bench [AMD-Internal: SWLCSG-2423 ] Change-Id: Id432c0669521285e6e6a151739d9a72a7340381d	2023-10-29 23:55:46 +05:30
Meghana Vankadari	eb5ab3f762	LPGEMM: Added transB support for bf16bf16f32o<bf16\|f32> APIs Details: - Modified aocl_get_reorder_buf_size_ and aocl_reorder_ APIs to allow reordering from column major input matrix. - Added new pack kernels that packs/reorders B matrix from column-major input format. - Updated Early-return check conditions to account for trans parameters. - Updated bench file to test/benchmark transpose support. AMD-Internal: [CPUPL-2268] Change-Id: Ida66d7e3033c52cca0229c6b78d16976fbbecc4c	2023-10-12 23:36:18 +05:30
Meghana Vankadari	4874895a68	LPGEMM: Added transA support for bf16bf16f32o<bf16\|f32> APIs Details: - Added new params(order, trans) to aocl_get_reorder_buf_size_ and aocl_reorder_ APIs. - Added new pack kernels that packs A matrix from either row-major or column major input matrix to pack buffer with row-major format. - Updated cntx with pack kernel function pointers for packing A matrix. - Transpose of A matrix is handled by packing A matrix to row-major format during run-time. - Updated Early-return check conditions to account for trans parameters. - Updated bench file to test/benchmark transpose support. AMD-Internal: [SWLCSG-2268, SWLCSG-2442] Change-Id: I43a113dc4bc11e6bb7cc4d768c239a16cb6bbea4	2023-10-11 07:16:08 -04:00

11 Commits