amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Go to file

Bhaskar Nallani 2ce47e6f5e Implemented optimal AVX512-variant of f32 LPGEMV

1. The 5 LOOP LPGEMM path is in-efficient when A or B is a vector
   (i.e, m == 1 or n == 1).

2. An efficient implementation of lpgemv_rowvar_f32 is developed
   considering the b matrix reorder in case of m=1 and post-ops fusion.

3. When m = 1 the algorithm divide the GEMM workload in n dimension
   intelligently at a granularity of NR. Each thread work on A:1xk
   B:kx(>=NR) and produce C=1x(>NR).  K is unrolled by 4 along with
   remainder loop.

4. When n = 1 the algorithm divide the GEMM workload in m dimension
   intelligently at a granularity of MR. Each thread work on A:(>=MR)xk
   B:kx1 and produce C = (>=MR)x1. When n=1 reordering of B is avoided
   to efficiently process in n one kernel.

5. Fixed few warnings while loading 2 f32 bias elements using
   _mm_load_sd using float pointer. Typecasted to (const double *)

AMD-Internal: [SWLCSG-2391, SWLCSG-2353]
Change-Id: If1d0b8d59e0278f5f16b499de1d629e63da5b599

2024-03-04 23:53:23 +05:30

addon

Implemented optimal AVX512-variant of f32 LPGEMV

2024-03-04 23:53:23 +05:30

aocl_dtl

fatal error: malloc.h: No such file or directory #785

2023-12-13 04:05:07 -05:00

bench

Implemented optimal AVX512-variant of f32 LPGEMV

2024-03-04 23:53:23 +05:30

blastest

CMake: Removing blatest-related targets for Windows/shared libs.

2024-01-10 05:06:41 -05:00

build

CMake: Modified flatten-headers.py file to fix issue observed with ninja on windows.

2024-02-29 15:42:02 +05:30

config

Fix gcc 7.5 compilation error for zen4 and above configs

2024-02-26 05:20:35 -05:00

docs

CMake: CMake is updated for Code Coverage

2024-02-07 06:12:51 -05:00

examples

Fixed double free() in level1v example (#482 )

2021-03-01 16:06:56 -06:00

frame

Fix for build issue when Mixed Datatypes are disabled

2024-02-23 04:02:49 -05:00

gtestsuite

Added functionality tests for ?NRM2 micro-kernels

2024-03-04 04:11:25 -05:00

kernels

Implemented optimal AVX512-variant of f32 LPGEMV

2024-03-04 23:53:23 +05:30

mpi_test

Minor build system housekeeping.

2019-05-23 12:51:17 -05:00

ref_kernels

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

sandbox

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

test

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

testsuite

CMake: CMake is updated for Code Coverage

2024-02-07 06:12:51 -05:00

travis

Merge commit '5013a6cb' into amd-main

2023-11-10 13:05:12 -05:00

vendor

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

windows/tests

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

.appveyor.yml

Add comment about make checkblas on Windows

2021-07-07 15:44:11 -05:00

.dir-locals.el

Modify Emacs config

2019-10-02 10:16:22 +01:00

.gitignore

Updated Windows build system to pick AMD specific sources.

2022-05-17 18:09:20 +05:30

.travis.yml

Safelist 'master', 'dev', 'amd' branches.

2021-09-21 14:54:20 -05:00

blis.pc.in

drop CFLAGS in the generated pkgconfig file

2021-01-12 17:07:04 -08:00

CHANGELOG

CHANGELOG update (0.8.1)

2021-03-22 17:42:33 -05:00

CMakeLists.txt

CMake: Modified flatten-headers.py file to fix issue observed with ninja on windows.

2024-02-29 15:42:02 +05:30

common.mk

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

config_registry

Merge commit 'e366665c' into amd-main

2023-10-18 09:09:54 -04:00

configure

AOCL-BLIS changed to AOCL-BLAS

2024-01-25 04:31:25 -05:00

CONTRIBUTING.md

Minor changes to README.md and CONTRIBUTING.md.

2018-05-17 16:38:49 -05:00

CREDITS

Merge commit '5013a6cb' into amd-main

2023-11-10 13:05:12 -05:00

INSTALL

INSTALL file update.

2018-08-07 14:21:07 -05:00

LICENSE

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

Makefile

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

README.md

README File Update

2023-05-25 14:46:33 +00:00

RELEASING

Minor updates/elaborations to RELEASING file.

2020-04-06 15:01:53 -05:00

so_version

Version String Update

2023-08-08 07:27:41 -04:00

version

Version String Update

2023-08-08 07:27:41 -04:00

README.md

AOCL-BLAS library

AOCL-BLAS is AMD's optimized version of BLAS targeted for AMD EPYC and Ryzen CPUs. It is developed as a forked version of BLIS (https://github.com/flame/blis), which is developed by members of the Science of High-Performance Computing (SHPC) group in the Institute for Computational Engineering and Sciences at The University of Texas at Austin and other collaborators (including AMD). All known features and functionalities of BLIS are retained and supported in AOCL-BLAS library. AOCL-BLAS is regularly updated with the improvements from the upstream repository.

AOCL BLAS is optimized with SSE2, AVX2, AVX512 instruction sets which would be enabled based on the target Zen architecture using the dynamic dispatch feature. All prominent Level 3, Level 2 and Level 1 APIs are designed and optimized for specific paths targeting different size spectrums e.g., Small, Medium and Large sizes. These algorithms are designed and customized to exploit the architectural improvements of the target platform.

For detailed instructions on how to configure, build, install, and link against AOCL-BLAS on AMD CPUs, please refer to the AOCL User Guide located on AMD developer portal.

The upstream repository (https://github.com/flame/blis) contains further information on BLIS, including background information on BLIS design, usage examples, and a complete BLIS API reference.

AOCL-BLAS is developed and maintained by AMD. You can contact us on the email-id toolchainsupport@amd.com. You can also raise any issue/suggestion on the git-hub repository at https://github.com/amd/blis/issues.

Languages

C 86.2%

C++ 9.7%

Fortran 1.9%

Makefile 0.8%

MATLAB 0.4%

Other 0.9%