amd/blis

mirror of https://github.com/amd/blis.git synced 2026-07-17 09:07:31 +00:00

Go to file

Harihara Sudhan S 03fa660792 Optimized xGEMV for non-unit stride X vector

- In GEMV variant 1, the input matrix A is in row major. X vector
  has to be of unit stride if the operation is to be vectorized.
- In cases when X vector is non-unit stride, vectorization of the GEMV
  operation inside the kernel has been ensured by packing the input X
  vector to a temporary buffer with unit stride. Currently, the
  packing is done using the SCAL2V.
- In case of DGEMV, X vector is scaled by alpha as part of packing.
  In CGEMV and ZGEMV, alpha is passed as 1 while packing.
- The temporary buffer created is released once the GEMV operation
  is complete.
- In DGEMV variant 1, moved problem decomposition for Zen architecture
  to the DOTXF kernel.
- Removed flag check based kernel dispatch logic from DGEMV. Now,
  kernels will be picked from the context for non-avx machines. For
  avx machines, the kernel(s) to be dispatched is(are) assigned to
  the function pointer in the unf_var layer.

AMD-Internal: [CPUPL-3475]
Change-Id: Icd9fd91eccd831f1fcb9fbf0037fcbbc2e34268e

2023-08-08 01:01:22 -04:00

addon

BF16 Downscale and Performance fix for bf16 API

2023-05-18 10:02:56 -04:00

aocl_dtl

Added NT in DTL logs for GEMMT, TRSM and NRM2

2023-07-27 05:15:08 -04:00

bench

Adding nrm2 target for benchmarking on Windows.

2023-07-10 14:03:05 -04:00

blastest

Code cleanup: No newline at end of file

2023-04-21 10:02:48 -04:00

build

Code cleanup: No newline at end of file

2023-04-21 10:02:48 -04:00

config

Level-3 triangular routines now use different block sizes and kernels.

2023-07-26 01:26:11 -04:00

docs

Doxygen document generation from cmake build

2023-05-25 07:41:40 -04:00

examples

Fixed double free() in level1v example (#482 )

2021-03-01 16:06:56 -06:00

frame

Optimized xGEMV for non-unit stride X vector

2023-08-08 01:01:22 -04:00

gtestsuite

Updating nrm2 GTestSuite testing

2023-07-28 05:03:00 -04:00

kernels

Optimized xGEMV for non-unit stride X vector

2023-08-08 01:01:22 -04:00

mpi_test

Minor build system housekeeping.

2019-05-23 12:51:17 -05:00

ref_kernels

Level-3 triangular routines now use different block sizes and kernels.

2023-07-26 01:26:11 -04:00

sandbox

Code cleanup: No newline at end of file

2023-04-21 10:02:48 -04:00

test

Removing omp library linking to static multithreaded library build.

2023-07-13 06:54:02 -04:00

testsuite

Removing omp library linking to static multithreaded library build.

2023-07-13 06:54:02 -04:00

travis

Makefile cleanup

2021-07-02 01:20:01 -04:00

vendor

Code cleanup: dos2unix file conversion

2023-04-21 08:41:16 -04:00

windows/tests

Code cleanup: No newline at end of file

2023-04-21 10:02:48 -04:00

.appveyor.yml

make unix friendly archives on appveyor (#310 )

2019-04-27 17:56:02 -05:00

.dir-locals.el

Modify Emacs config

2019-10-02 10:16:22 +01:00

.gitignore

Updated Windows build system to pick AMD specific sources.

2022-05-17 18:09:20 +05:30

.travis.yml

Update do_sde.sh (#330 )

2019-08-21 17:40:24 -05:00

blis.pc.in

drop CFLAGS in the generated pkgconfig file

2021-01-12 17:07:04 -08:00

CHANGELOG

CHANGELOG update (0.8.1)

2021-03-22 17:42:33 -05:00

CMakeLists.txt

Removing omp library linking to static multithreaded library build.

2023-07-13 06:54:02 -04:00

common.mk

Zen4 compilation flag updates to support low precision gemm.

2022-09-29 08:19:40 -04:00

config_registry

Enabled AVX-512 kernels for Zen4 config

2022-06-03 06:34:35 +00:00

configure

BLIS cpuid: distinguish submodels within a microarchitecture

2023-04-20 10:47:44 -04:00

CONTRIBUTING.md

Minor changes to README.md and CONTRIBUTING.md.

2018-05-17 16:38:49 -05:00

CREDITS

Fixed out-of-bounds bug in sup s6x16m haswell kernel.

2022-07-31 21:10:58 +05:30

INSTALL

INSTALL file update.

2018-08-07 14:21:07 -05:00

LICENSE

Updated version and copyright notice.

2022-05-17 18:10:39 +05:30

Makefile

Fixed Compilation Fails when configured with --disable-blas

2023-03-23 06:11:52 -04:00

README.md

README File Update

2023-05-25 14:46:33 +00:00

RELEASING

Minor updates/elaborations to RELEASING file.

2020-04-06 15:01:53 -05:00

so_version

Updated blis library version string to 4.0.1

2022-11-24 10:35:34 +05:30

version

Updated blis library version string to 4.0.1

2022-11-24 10:35:34 +05:30

README.md

AOCL-BLAS library

AOCL-BLAS is AMD's optimized version of BLAS targeted for AMD EPYC and Ryzen CPUs. It is developed as a forked version of BLIS (https://github.com/flame/blis), which is developed by members of the Science of High-Performance Computing (SHPC) group in the Institute for Computational Engineering and Sciences at The University of Texas at Austin and other collaborators (including AMD). All known features and functionalities of BLIS are retained and supported in AOCL-BLAS library. AOCL-BLAS is regularly updated with the improvements from the upstream repository.

AOCL BLAS is optimized with SSE2, AVX2, AVX512 instruction sets which would be enabled based on the target Zen architecture using the dynamic dispatch feature. All prominent Level 3, Level 2 and Level 1 APIs are designed and optimized for specific paths targeting different size spectrums e.g., Small, Medium and Large sizes. These algorithms are designed and customized to exploit the architectural improvements of the target platform.

For detailed instructions on how to configure, build, install, and link against AOCL-BLAS on AMD CPUs, please refer to the AOCL User Guide located on AMD developer portal.

The upstream repository (https://github.com/flame/blis) contains further information on BLIS, including background information on BLIS design, usage examples, and a complete BLIS API reference.

AOCL-BLAS is developed and maintained by AMD. You can contact us on the email-id toolchainsupport@amd.com. You can also raise any issue/suggestion on the git-hub repository at https://github.com/amd/blis/issues.

Languages

C 86.3%

C++ 9.5%

Fortran 1.9%

Makefile 0.8%

MATLAB 0.5%

Other 0.9%