amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-24 18:34:40 +00:00

Go to file

Vignesh Balasubramanian 6165001658 Bugfix and optimizations for ?AXPBYV API

- Updated the existing code-path for ?AXPBYV to
  reroute the inputs to the appropriate L1 kernel,
  based on the alpha and beta value. This is done
  in order to utilize sensible optimizations with
  regards to the compute and memory operations.

- Updated the typed API interface for ?AXPBYV to include
  an early exit condition(when n is 0, or when alpha is
  0 and beta is 1). Further updated this layer to query
  the right kernel from context, based on the input values
  of alpha and beta.

- Added the necessary L1 vector kernels(i.e, ?SETV, ?ADDV,
  ?SCALV, ?SCAL2V and ?COPYV) to be used as part of special
  case handling in ?AXPBYV.

- Moved the early return with negative increments from ?SCAL2V
  kernels to its typed API interface.

- Updated the zen, zen2 and zen3 context to include function
  pointers for all these vector kernels.

- Updated the existing ?AXPBYV vector kernels to handle only
  the required computation. Additional cleanup was done to
  these kernels.

- Added accuracy and memory tests for AVX2 kernels of ?SETV
  ?COPYV, ?ADDV, ?SCALV, ?SCAL2V, ?AXPYV and ?AXPBYV APIs

- Updated the existing thresholds in ?AXPBYV tests for complex
  types. This is due to the fact that every complex multiplication
  involves two mul ops and one add op. Further added test-cases
  for API level accuracy check, that includes special cases of
  alpha and beta.

- Decomposed the reference call to ?AXPBYV with several other
  L1 BLAS APIs(in case of the reference not supporting its own
  ?AXPBYV API). The decomposition is done to match the exact
  operations that is done in BLIS based on alpha and/or beta
  values. This ensures that we test for our own compliance.

AMD-Internal: [CPUPL-4861]
Change-Id: Ia6d48f12f059f52b31c0bef6c75f47fd364952c6

2024-06-20 16:22:07 +05:30

addon

Implemented LPGEMV(n=1) for AVX2-INT8 variants

2024-06-18 12:09:18 +05:30

aocl_dtl

Support for DOTC in DOTV Bench and DTL updates

2024-04-04 12:27:53 +05:30

bench

CMake: Added logic to link openmp library given through OpenMP_libomp_LIBRARY cmake variable on linux.

2024-06-10 04:41:23 -04:00

blastest

CMake: Added logic to link openmp library given through OpenMP_libomp_LIBRARY cmake variable on linux.

2024-06-10 04:41:23 -04:00

build

BLIS: Implement zen5 sub-configuration in cmake

2024-04-15 07:40:50 -04:00

config

Bugfix and optimizations for ?AXPBYV API

2024-06-20 16:22:07 +05:30

docs

CMake: CMake is updated for Code Coverage

2024-02-07 06:12:51 -05:00

examples

Fixed double free() in level1v example (#482 )

2021-03-01 16:06:56 -06:00

frame

Bugfix and optimizations for ?AXPBYV API

2024-06-20 16:22:07 +05:30

gtestsuite

Bugfix and optimizations for ?AXPBYV API

2024-06-20 16:22:07 +05:30

kernels

Bugfix and optimizations for ?AXPBYV API

2024-06-20 16:22:07 +05:30

mpi_test

Minor build system housekeeping.

2019-05-23 12:51:17 -05:00

ref_kernels

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

sandbox

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

test

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

testsuite

CMake: Added logic to link openmp library given through OpenMP_libomp_LIBRARY cmake variable on linux.

2024-06-10 04:41:23 -04:00

travis

Merge commit '5013a6cb' into amd-main

2023-11-10 13:05:12 -05:00

vendor

CMake: Added logic to link openmp library given through OpenMP_libomp_LIBRARY cmake variable on linux.

2024-06-10 04:41:23 -04:00

windows/tests

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

.appveyor.yml

Add comment about make checkblas on Windows

2021-07-07 15:44:11 -05:00

.dir-locals.el

Modify Emacs config

2019-10-02 10:16:22 +01:00

.gitignore

Updated Windows build system to pick AMD specific sources.

2022-05-17 18:09:20 +05:30

.travis.yml

Safelist 'master', 'dev', 'amd' branches.

2021-09-21 14:54:20 -05:00

blis.pc.in

drop CFLAGS in the generated pkgconfig file

2021-01-12 17:07:04 -08:00

CHANGELOG

CHANGELOG update (0.8.1)

2021-03-22 17:42:33 -05:00

CMakeLists.txt

CMake: Added logic to link openmp library given through OpenMP_libomp_LIBRARY cmake variable on linux.

2024-06-10 04:41:23 -04:00

CMakePresets.json

CMake: Introducing CMake presets to simplify CI jobs and development.

2024-03-08 05:52:04 -05:00

common.mk

Implemented JIT-based microkernel for bf16 datatype

2024-03-13 05:55:18 +05:30

config_registry

BLIS: Implement zen5 sub-configuration

2024-04-12 07:26:31 -04:00

configure

Implemented JIT-based microkernel for bf16 datatype

2024-03-13 05:55:18 +05:30

CONTRIBUTING.md

Minor changes to README.md and CONTRIBUTING.md.

2018-05-17 16:38:49 -05:00

CREDITS

Merge commit '5013a6cb' into amd-main

2023-11-10 13:05:12 -05:00

INSTALL

INSTALL file update.

2018-08-07 14:21:07 -05:00

LICENSE

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

Makefile

Implemented JIT-based microkernel for bf16 datatype

2024-03-13 05:55:18 +05:30

README.md

README File Update

2023-05-25 14:46:33 +00:00

RELEASING

Minor updates/elaborations to RELEASING file.

2020-04-06 15:01:53 -05:00

so_version

Updated version string from 4.1.1 to 4.2.1

2024-03-12 02:07:58 -04:00

version

Updated version string from 4.1.1 to 4.2.1

2024-03-12 02:07:58 -04:00

README.md

AOCL-BLAS library

AOCL-BLAS is AMD's optimized version of BLAS targeted for AMD EPYC and Ryzen CPUs. It is developed as a forked version of BLIS (https://github.com/flame/blis), which is developed by members of the Science of High-Performance Computing (SHPC) group in the Institute for Computational Engineering and Sciences at The University of Texas at Austin and other collaborators (including AMD). All known features and functionalities of BLIS are retained and supported in AOCL-BLAS library. AOCL-BLAS is regularly updated with the improvements from the upstream repository.

AOCL BLAS is optimized with SSE2, AVX2, AVX512 instruction sets which would be enabled based on the target Zen architecture using the dynamic dispatch feature. All prominent Level 3, Level 2 and Level 1 APIs are designed and optimized for specific paths targeting different size spectrums e.g., Small, Medium and Large sizes. These algorithms are designed and customized to exploit the architectural improvements of the target platform.

For detailed instructions on how to configure, build, install, and link against AOCL-BLAS on AMD CPUs, please refer to the AOCL User Guide located on AMD developer portal.

The upstream repository (https://github.com/flame/blis) contains further information on BLIS, including background information on BLIS design, usage examples, and a complete BLIS API reference.

AOCL-BLAS is developed and maintained by AMD. You can contact us on the email-id toolchainsupport@amd.com. You can also raise any issue/suggestion on the git-hub repository at https://github.com/amd/blis/issues.

Languages

C 86.3%

C++ 9.5%

Fortran 1.9%

Makefile 0.8%

MATLAB 0.5%

Other 0.9%