Files
blis/frame/include
Vignesh Balasubramanian c4b84601da AVX512 optimizations for CGEMM(rank-1 kernel)
- Implemented an AVX512 rank-1 kernel that is
  expected to handle column-major storage schemes
  of A, B and C(without transposition) when k = 1.

- This kernel is single-threaded, and acts as a direct
  call from the BLAS layer for its compatible inputs.

- Defined custom BLAS and BLIS_IMPLI layers for CGEMM
  (instead of using the macro definition), in order to
  integrate the call to this kernel at runtime(based on
  the corresponding architecture and input constraints).

- Added unit-tests for functional and memory testing of the
  kernel.

- Updated the ZEN5 context to include the AVX512 CGEMM
  SUP kernels, with its cache-blocking parameters.

AMD-Internal: [CPUPL-6498]
Change-Id: I42a66c424325bd117ceb38970726a05e2896a46b
2025-03-06 20:14:05 +05:30
..