blis/kernels/zen/2 at 4cfbb47b870f9e484bfa94de81fb7cc0249dc830 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-06-29 18:57:23 +00:00

Files

Hari Govind S 349fc47ec5 DGEMV Optimizations for TRANSPOSE Cases

- Developed new AVX512 DGEMV kernels for Zen4/5 architectures and
  AVX2 kernels for Zen1/2/3 architectures. These kernels are written
  from the ground up and are independent of fused kernels.

- The DGEMV primary kernel processes the calculation in chunks of
  8 columns. Fringe columns (sizes 1 to 7) are handled by fringe
  kernels, which are invoked by the primary kernel as needed.

- Implemented the kernels by computing the dot product of matrix A
  columns with vector x in chunks of 32 elements, storing the results
  in accumulator registers. Fringe elements are handled in chunks
  of 16, 8, etc. The data in the accumulator registers is then reduced
  and added to vector y.

AMD-Internal: [CPUPL-5835]
Change-Id: I5cb9eb1330db095931586a7028fd7676fbbecc61

2025-01-24 00:38:34 -05:00

bli_gemv_avx2.c

DGEMV Optimizations for NO_TRANSPOSE Cases

2024-12-12 10:26:50 -05:00

bli_gemv_t_zen_int.c

DGEMV Optimizations for TRANSPOSE Cases

2025-01-24 00:38:34 -05:00

bli_gemv_zen_int_4.c

Code cleanup: Miscellaneous fixes

2024-08-06 06:56:01 -04:00

bli_gemv_zen_ref.c

DGEMV Optimizations for Tiny Sizes