blis/bench at 32bbd966521c0de437cfb63f87fc7d72ecfa43b6 - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-13 10:35:38 +00:00

Files

History

mkadavil 5e510727a9 Masked load/store to replace copy macros in u8s8s32 micro-kernels.

-As part of an earlier optimization, the memcpy function call in k
fringe ((k % 4) != 0 case, to utilize vpdpbusd instruction) and n fringe
(n < 16 - beta scale and C store) were replaced with copy macros
specifically optimized for less than 4 and 16 elements each. However
upon further analysis it was observed that masked load/broadcast and
masked store performed better on average than the copy macros. The copy
macros contained more if conditions, which resulted in more branching
and thus resulting in perf variations. It was also noted that code
generation varied a lot based on the compilers when using the copy
macros due to the extra conditional code.
-As part of this change, the copy macros are completely replaced with
masked load/broadcast/store. Performance was observed to be better and
less prone to variations for the k fringe and n fringe (< 16) cases.

AMD-Internal: [CPUPL-3173]
Change-Id: I73e6e65302ecf02e1397541b4a32b2a536f19503

2023-04-13 09:17:26 -04:00

..

bench_aocl_gemm

Masked load/store to replace copy macros in u8s8s32 micro-kernels.

2023-04-13 09:17:26 -04:00

bench_amaxv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_axpbyv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_copyv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_dotv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_gemm.c

Integrated 32x6 DGEMM kernel for zen4 and its related changes are added.

2023-01-19 23:11:36 +05:30

bench_gemmt.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_gemv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_ger.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_nrm2.c

Adding AVX2 support for DNRM2

2022-09-20 06:05:01 -04:00

bench_scalv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_swapv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_syrk.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_trsm.c

Fixed Bug in bench_trsm.c

2022-07-25 15:38:30 +00:00

bench_trsv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

CMakeLists.txt

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

inputamaxv.txt

Bench addition for amaxv API

2021-06-04 17:45:04 +05:30

inputaxpbyv.txt

Optimized AXPBYV Kernel using AVX2 Intrinsics

2022-01-05 04:19:11 -05:00

inputcopy.txt

Added bench utility for copyv API

2021-06-09 12:29:49 +05:30

inputdotv.txt

Added bench utility for dotv and scalv APIs.

2021-05-21 10:00:32 +05:30

inputgemm.txt

AOCL DTL - Added thread and execution time details in logs

2021-11-12 08:58:54 +05:30

inputgemmt.txt

Added bench app for syrk - input is a log file generated from AOCL_DTL

2021-05-11 14:57:51 +05:30

inputgemv.txt

Fixed crash issue in bench utility for gemv API

2021-05-19 14:21:09 +05:30

inputger.txt

Added bench utility for ger API.

2021-05-19 14:05:01 +05:30

inputnrm2.txt

Adding AVX2 support for DNRM2

2022-09-20 06:05:01 -04:00

inputscalv.txt

Added bench utility for dotv and scalv APIs.

2021-05-21 10:00:32 +05:30

inputswap.txt

Added bench utility for swapv API

2021-06-09 17:05:00 +05:30

inputsyrk.txt

Added bench app for syrk - input is a log file generated from AOCL_DTL

2021-05-11 14:57:51 +05:30

inputtrsm.txt

Trsm bench utility missmatch DTL logs and bench

2021-11-12 08:58:52 +05:30

inputtrsv.txt

Bench trsv logging error

2021-06-08 11:54:55 +05:30

Makefile

Adding AVX2 support for DNRM2

2022-09-20 06:05:01 -04:00