Introduced support for GEMV operations with group-level symmetric quantization for the S8S8S32032 API.
Framework Changes:
- Added macro definitions and function prototypes for GEMV with symmetric quantization in lpgemm_5loop_interface_apis.h and lpgemm_kernels.h.
- LPGEMV_M_EQ1_KERN2 for the lpgemv_m_one_s8s8s32os32_sym_quant kernel, and
- LPGEMV_N_EQ1_KERN2 for the lpgemv_n_one_s8s8s32os32_sym_quant kernel.
- Implemented the main GEMV framework for symmetric quantization in lpgemm_s8s8s32_sym_quant.c.
Kernel Changes:
- lpgemv_m_one_s8s8s32os32_sym_quant for handling the case where M = 1 and implemented in lpgemv_m_kernel_s8_grp_amd512vnni.c.
- lpgemv_n_one_s8s8s32os32_sym_quant for handling the case where N = 1 and implemented in lpgemv_n_kernel_s8_grp_amd512vnni.c.
- Updated the buffer reordering logic for group quantization for N=1 cases in aocl_gemm_s8s8s32os32_utils.c.
Notes
- Ensure that group_size is a factor of both K (and KC when K > KC).
- The B matrix must be provided in reordered format (mtag_b == REORDERED).
AMD-Internal: [SWLCSG-3604]
- Fixed the group size validation logic to correctly check if the
group_size is a multiple of 4.
- Previously the condition was incorrectly performing bitwise AND with
decimal 11 instead of binary 11 (decimal 3).
AMD-Internal: [CPUPL-6754]
Details:
- In reorder functions, validity of strides are being checked assuming
that the matrix to be reordered is always row-major. Modified the code
to take stor_order into consideration while checking for validity of
strides.
- This does not directly impact the functionality of GEMM as we don't
support GEMM on col-major matrices where A and/or B matrices are
reordered before GEMM computation. But this change makes sense when
reordering is viewed as an independent functionality irrespective of
what the reordered buffers will be used for.
Change-Id: If2cc4a353bca2f998ad557d6f128881bc9963330
Details:
- Group quantization is technique to improve accuracy
where scale factors to quantize inputs and weights
varies at group level instead of per channel
and per tensor level.
- Added new bench files to test GEMM with symmetric static
quantization.
- Added new get_size and reorder functions to account for
storing sum of col-values separately per group.
- Added new framework, kernels to support the same.
- The scalefactors could be of type float or bf16.
AMD-Internal:[SWLCSG-3274]
Change-Id: I3e69ecd56faa2679a4f084031d35ffb76556230f
Description:
1. Implement s8 unreorder API function which performs
unreordering of int8 matrix which is reordered
2. Removed bf16vnni check for bf16 unreorder reference API
because it can work on any architecture as it is reference
code
3. Tested the reference code for all main and fringe paths.
AMD-Interneal: [SWLCSG-3426]
Change-Id: I920f807be870e1db5f9d0784cdcec7b366e1eff5
- When n=1, reorder of B matrix is avoided to efficiently
process data. A dot-product based kernel is implemented to
perform gemv when n==1.
AMD-Internal: [SWLCSG-2354]
Change-Id: I6b73dfddd9a15e7b914d031646a1d913a7ab4761
Description:
--Added support for tranB in u8s8s32o<s32|s8> and
s8s8s32o<s32|s8> API's
--Updated the bench_lpgemm by adding options to
support transpose of B matrix
--Updated data_gen_script.py in lpgemm bench
according to latest input format.
AMD-Internal: [SWLCSG-2582]
Change-Id: I4a05cc390ae11440d6ff86da281dbafbeb907048
- Implemented optimized lpgemv for both m == 1 and n == 1 cases.
- Fixed few bugs in LPGEMV for bf16 and f32 datatypes.
- Fixed few bugs in JIT-based implementation of LPGEMM for BF16
datatype.
AMD-Internal: [SWLCSG-2354]
Change-Id: I245fd97c8f160b148656f782d241f86097a0cf38
* commit 'b683d01b':
Use extra #undef when including ba/ex API headers.
Minor preprocessor/header cleanup.
Fixed typo in cpp guard in bli_util_ft.h.
Defined eqsc, eqv, eqm to test object equality.
Defined setijv, getijv to set/get vector elements.
Minor API breakage in bli_pack API.
Add err_t* "return" parameter to malloc functions.
Always stay initialized after BLAS compat calls.
Renamed membrk files/vars/functions to pba.
Switch allocator mutexes to static initialization.
AMD-Internal: [CPUPL-2698]
Change-Id: Ied2ca8619f144d4b8a7123ac45a1be0dda3875df
1. New LPGEMM type - S8S8S32/S8 is added.
2. New interface, frame and kernel files are added.
3. Frame and kernel files added/modified for S8S8S32/S8 have
2 operations - Pack B and Mat Mul
4. Pack B kernel routines to pack B matrix for VNNI and compute the sum
of every column of B matrix to implement the S8S8S32 operation using
the VNNI instructions.
5. Mat Mul Kernel files to compute the GEMM output using the VNNI.
Here the A matrix elements are converted from int8 to uint8 (VNNI
works with A matrix type uint8 only).
6. Post GEMM computation, additional operations are performed on the
accumulated outputs to get the correct results.
7. With this change, two new LPGEMM APIs are introduced in LPGEMM -
s8s8s32os32 and s8s8s32os8.
8. All previously added post-ops are supported on S8S8S32/S8 also.
AMD-Internal: [CPUPL-3154]
Change-Id: Ib18f82bde557ea4a815a63adc7870c4234bfb9d3