blis/bench at 758d68467ff5221aa5947d64e18f00fc15069021 - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-07-18 09:37:52 +00:00

Files

History

mkadavil 3870792e62 Low precision gemm s32 downscale optimization.

-The post operations attributes are moved to a new struct
lpgemm_post_op_attr, and an object of this struct is passed to the
low precision gemm kernels in place of the multiple parameters.
-The u8s8s32s8 api (downscale api) performance is low when the k
value is less (k < KC). Two scenarios are observed here:
a. beta = 0: Currently, for downscale api, a temporary buffer is
used to accumulate intermediate s32 output, so that it can be used
in later iterations of pc loop (k dim). The usage of this buffer
(store) can be avoided if k < KC. Here intermediate accumulation
is not required, since the after the first iteration of the pc loop,
the output can be downscaled and stored.
b. beta != 0: In this case the existing values of the original s8 C
output matrix needs to be converted to s32 and beta scaled. Currently
the s8 values are converted to s32 and stored in temporary buffer in
pc loop (5 loop algorithm) in blocks of mxNC. This temporary buffer
is passed to the micro kernel and beta scaling is applied on this.
However the mxNC block copy is costly and can be avoided if a new
condition is introduced for beta scaling in the micro kernel, whereby
the original s8 data is loaded instead of from the temporary buffer
to a register, converted to s32 and beta scaling applied on it.

AMD-Internal: [CPUPL-2884]
Change-Id: Id9b4650d500e1b553e48c4f1e4c902b3f553211c

2023-01-10 13:15:22 +05:30

..

bench_aocl_gemm

Low precision gemm s32 downscale optimization.

2023-01-10 13:15:22 +05:30

bench_amaxv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_axpbyv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_copyv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_dotv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_gemm.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_gemmt.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_gemv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_ger.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_nrm2.c

Adding AVX2 support for DNRM2

2022-09-20 06:05:01 -04:00

bench_scalv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_swapv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_syrk.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

bench_trsm.c

Fixed Bug in bench_trsm.c

2022-07-25 15:38:30 +00:00

bench_trsv.c

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

CMakeLists.txt

AOCL-WINDOWS: Added the windows build system to build bench folder on windows.

2022-06-27 22:32:39 -04:00

inputamaxv.txt

Bench addition for amaxv API

2021-06-04 17:45:04 +05:30

inputaxpbyv.txt

Optimized AXPBYV Kernel using AVX2 Intrinsics

2022-01-05 04:19:11 -05:00

inputcopy.txt

Added bench utility for copyv API

2021-06-09 12:29:49 +05:30

inputdotv.txt

Added bench utility for dotv and scalv APIs.

2021-05-21 10:00:32 +05:30

inputgemm.txt

AOCL DTL - Added thread and execution time details in logs

2021-11-12 08:58:54 +05:30

inputgemmt.txt

Added bench app for syrk - input is a log file generated from AOCL_DTL

2021-05-11 14:57:51 +05:30

inputgemv.txt

Fixed crash issue in bench utility for gemv API

2021-05-19 14:21:09 +05:30

inputger.txt

Added bench utility for ger API.

2021-05-19 14:05:01 +05:30

inputnrm2.txt

Adding AVX2 support for DNRM2

2022-09-20 06:05:01 -04:00

inputscalv.txt

Added bench utility for dotv and scalv APIs.

2021-05-21 10:00:32 +05:30

inputswap.txt

Added bench utility for swapv API

2021-06-09 17:05:00 +05:30

inputsyrk.txt

Added bench app for syrk - input is a log file generated from AOCL_DTL

2021-05-11 14:57:51 +05:30

inputtrsm.txt

Trsm bench utility missmatch DTL logs and bench

2021-11-12 08:58:52 +05:30

inputtrsv.txt

Bench trsv logging error

2021-06-08 11:54:55 +05:30

Makefile

Adding AVX2 support for DNRM2

2022-09-20 06:05:01 -04:00