mangala v 0659a647e0 Gtestsuite: Micro Kernel Testing of ZGEMM API
Summary:
- Aims to perform accuracy testing of ZGEMM micro kernel.
- Blis kernel is called directly from gtestuite framework.
- Micro kernel is invoked with required input, output parameters.
- No objects are created to call micro kernel.
- No framework code would be invoked in this method.

Below AVX2 & AVX512 Micro kernels are being tested using gtestsuite

Native Kernels:
 - AVX2: bli_zgemm_haswell_asm_3x4
         bli_zgemm_zen_asm_2x6(Required for TRSM computation)
 - AVX512: bli_zgemm_zen4_asm_12x4
           bli_zgemm_zen4_asm_4x12(Required for TRSM computation)

SUP Kernels:
- AVX2 Kernels:
      bli_zgemmsup_rd_zen_asm_3x4m
      bli_zgemmsup_rd_zen_asm_3x2m
      bli_zgemmsup_rd_zen_asm_3x4n
      bli_zgemmsup_rd_zen_asm_2x4n
      bli_zgemmsup_rd_zen_asm_(2/1)x4
      bli_zgemmsup_rd_zen_asm_(2/1)x2
      bli_zgemmsup_rv_zen_asm_(2/1)x4
      bli_zgemmsup_rv_zen_asm_(2/1)x2
      bli_zgemmsup_rv_zen_asm_3x4m
      bli_zgemmsup_rv_zen_asm_3x2m
      bli_zgemmsup_rv_zen_asm_3x4n
      bli_zgemmsup_rv_zen_asm_2x4n
      bli_zgemmsup_rv_zen_asm_1x4n
      bli_zgemmsup_rv_zen_asm_3x2

- AVX512 kernels:
     bli_zgemmsup_cv_zen4_asm_12x4m
     bli_zgemmsup_cv_zen4_asm_12x3m
     bli_zgemmsup_cv_zen4_asm_12x2m
     bli_zgemmsup_cv_zen4_asm_12x1m
     bli_zgemmsup_cv_zen4_asm_8x(4/3/2/1)
     bli_zgemmsup_cv_zen4_asm_4x(4/3/2/1)
     bli_zgemmsup_cv_zen4_asm_2x(4/3/2/1)

Above kernels are tested with different combination of parameters such as storage, alpha, beta, transpose & dimensions.

DGEMM: Minor update in DGEMM micro kernel (Buffer allocation, comment section, order of passing arguments)

AMD-Internal: [CPUPL-4426]

Change-Id: I9d6ab24278450f57d13589ad89151a4acc641f08
2024-01-31 10:30:57 -05:00
2024-01-17 11:41:15 -05:00
2024-01-25 04:31:25 -05:00
2024-01-17 11:41:15 -05:00
2019-05-23 12:51:17 -05:00
2023-11-23 08:54:31 -05:00
2023-11-23 08:54:31 -05:00
2019-10-02 10:16:22 +01:00
2021-03-22 17:42:33 -05:00
2024-01-25 04:31:25 -05:00
2023-11-23 08:54:31 -05:00
2024-01-25 04:31:25 -05:00
2023-11-10 13:05:12 -05:00
2018-08-07 14:21:07 -05:00
2023-11-23 08:54:31 -05:00
2023-11-23 08:54:31 -05:00
2023-05-25 14:46:33 +00:00
2023-08-08 07:27:41 -04:00
2023-08-08 07:27:41 -04:00

AOCL-BLAS library

AOCL-BLAS is AMD's optimized version of BLAS targeted for AMD EPYC and Ryzen CPUs. It is developed as a forked version of BLIS (https://github.com/flame/blis), which is developed by members of the Science of High-Performance Computing (SHPC) group in the Institute for Computational Engineering and Sciences at The University of Texas at Austin and other collaborators (including AMD). All known features and functionalities of BLIS are retained and supported in AOCL-BLAS library. AOCL-BLAS is regularly updated with the improvements from the upstream repository.

AOCL BLAS is optimized with SSE2, AVX2, AVX512 instruction sets which would be enabled based on the target Zen architecture using the dynamic dispatch feature. All prominent Level 3, Level 2 and Level 1 APIs are designed and optimized for specific paths targeting different size spectrums e.g., Small, Medium and Large sizes. These algorithms are designed and customized to exploit the architectural improvements of the target platform.

For detailed instructions on how to configure, build, install, and link against AOCL-BLAS on AMD CPUs, please refer to the AOCL User Guide located on AMD developer portal.

The upstream repository (https://github.com/flame/blis) contains further information on BLIS, including background information on BLIS design, usage examples, and a complete BLIS API reference.

AOCL-BLAS is developed and maintained by AMD. You can contact us on the email-id toolchainsupport@amd.com. You can also raise any issue/suggestion on the git-hub repository at https://github.com/amd/blis/issues.

Description
BLAS-like Library Instantiation Software Framework
Readme BSD-3-Clause 72 MiB
Languages
C 86.3%
C++ 9.5%
Fortran 1.9%
Makefile 0.8%
MATLAB 0.5%
Other 0.9%