amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-25 02:44:31 +00:00

Go to file

vignbala f8218bb9f2 Compiler warnings when using masked loads

- Updated the AVX512 DOTXF kernels to use MASKZ loads
  instead of MASK loads when loading X vector in fringe
  case. This avoids compiler warnings of uninitialized
  vector as input to the intrinsic.

- The functionality will not change when using either MASK
  or MASKZ loads on X, since A matrix is loaded using MASKZ
  loads.

AMD-Internal: [CPUPL-4974]
Change-Id: I1ef98a1292352d0e905cc09cd5667acd883df827

2024-05-03 09:53:36 -04:00

addon

CMake: Enable builds for both static and shared builds for Linux.

2024-03-14 10:32:51 -04:00

aocl_dtl

Support for DOTC in DOTV Bench and DTL updates

2024-04-04 12:27:53 +05:30

bench

Added support to benchmark AXPYV APIs

2024-04-08 00:06:54 -04:00

blastest

Updated return type of xerbla and xerbla_array APIs to void

2024-04-29 00:51:10 -04:00

build

BLIS: Implement zen5 sub-configuration in cmake

2024-04-15 07:40:50 -04:00

config

DGEMMT SUP Optimizations for AVX512

2024-05-03 05:11:11 -04:00

docs

CMake: CMake is updated for Code Coverage

2024-02-07 06:12:51 -05:00

examples

Fixed double free() in level1v example (#482 )

2021-03-01 16:06:56 -06:00

frame

DGEMMT SUP Optimizations for AVX512

2024-05-03 05:11:11 -04:00

gtestsuite

GTestSuite: check stored value of INFO

2024-05-03 09:08:21 -04:00

kernels

Compiler warnings when using masked loads

2024-05-03 09:53:36 -04:00

mpi_test

Minor build system housekeeping.

2019-05-23 12:51:17 -05:00

ref_kernels

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

sandbox

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

test

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

testsuite

CMake: Update code to support blastest for ILP64 on windows

2024-03-27 12:02:26 -04:00

travis

Merge commit '5013a6cb' into amd-main

2023-11-10 13:05:12 -05:00

vendor

CMake: Update code to support blastest for ILP64 on windows

2024-03-27 12:02:26 -04:00

windows/tests

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

.appveyor.yml

Add comment about make checkblas on Windows

2021-07-07 15:44:11 -05:00

.dir-locals.el

Modify Emacs config

2019-10-02 10:16:22 +01:00

.gitignore

Updated Windows build system to pick AMD specific sources.

2022-05-17 18:09:20 +05:30

.travis.yml

Safelist 'master', 'dev', 'amd' branches.

2021-09-21 14:54:20 -05:00

blis.pc.in

drop CFLAGS in the generated pkgconfig file

2021-01-12 17:07:04 -08:00

CHANGELOG

CHANGELOG update (0.8.1)

2021-03-22 17:42:33 -05:00

CMakeLists.txt

BLIS: Implement zen5 sub-configuration in cmake

2024-04-15 07:40:50 -04:00

CMakePresets.json

CMake: Introducing CMake presets to simplify CI jobs and development.

2024-03-08 05:52:04 -05:00

common.mk

Implemented JIT-based microkernel for bf16 datatype

2024-03-13 05:55:18 +05:30

config_registry

BLIS: Implement zen5 sub-configuration

2024-04-12 07:26:31 -04:00

configure

Implemented JIT-based microkernel for bf16 datatype

2024-03-13 05:55:18 +05:30

CONTRIBUTING.md

Minor changes to README.md and CONTRIBUTING.md.

2018-05-17 16:38:49 -05:00

CREDITS

Merge commit '5013a6cb' into amd-main

2023-11-10 13:05:12 -05:00

INSTALL

INSTALL file update.

2018-08-07 14:21:07 -05:00

LICENSE

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

Makefile

Implemented JIT-based microkernel for bf16 datatype

2024-03-13 05:55:18 +05:30

README.md

README File Update

2023-05-25 14:46:33 +00:00

RELEASING

Minor updates/elaborations to RELEASING file.

2020-04-06 15:01:53 -05:00

so_version

Updated version string from 4.1.1 to 4.2.1

2024-03-12 02:07:58 -04:00

version

Updated version string from 4.1.1 to 4.2.1

2024-03-12 02:07:58 -04:00

README.md

AOCL-BLAS library

AOCL-BLAS is AMD's optimized version of BLAS targeted for AMD EPYC and Ryzen CPUs. It is developed as a forked version of BLIS (https://github.com/flame/blis), which is developed by members of the Science of High-Performance Computing (SHPC) group in the Institute for Computational Engineering and Sciences at The University of Texas at Austin and other collaborators (including AMD). All known features and functionalities of BLIS are retained and supported in AOCL-BLAS library. AOCL-BLAS is regularly updated with the improvements from the upstream repository.

AOCL BLAS is optimized with SSE2, AVX2, AVX512 instruction sets which would be enabled based on the target Zen architecture using the dynamic dispatch feature. All prominent Level 3, Level 2 and Level 1 APIs are designed and optimized for specific paths targeting different size spectrums e.g., Small, Medium and Large sizes. These algorithms are designed and customized to exploit the architectural improvements of the target platform.

For detailed instructions on how to configure, build, install, and link against AOCL-BLAS on AMD CPUs, please refer to the AOCL User Guide located on AMD developer portal.

The upstream repository (https://github.com/flame/blis) contains further information on BLIS, including background information on BLIS design, usage examples, and a complete BLIS API reference.

AOCL-BLAS is developed and maintained by AMD. You can contact us on the email-id toolchainsupport@amd.com. You can also raise any issue/suggestion on the git-hub repository at https://github.com/amd/blis/issues.

Languages

C 86.3%

C++ 9.5%

Fortran 1.9%

Makefile 0.8%

MATLAB 0.5%

Other 0.9%