blis/bench at aaa9c1ac09c79e60dd3f908c75dbf66274ab438e - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-25 19:04:32 +00:00

Files

History

Bhaskar Nallani 2ce47e6f5e Implemented optimal AVX512-variant of f32 LPGEMV

1. The 5 LOOP LPGEMM path is in-efficient when A or B is a vector
   (i.e, m == 1 or n == 1).

2. An efficient implementation of lpgemv_rowvar_f32 is developed
   considering the b matrix reorder in case of m=1 and post-ops fusion.

3. When m = 1 the algorithm divide the GEMM workload in n dimension
   intelligently at a granularity of NR. Each thread work on A:1xk
   B:kx(>=NR) and produce C=1x(>NR).  K is unrolled by 4 along with
   remainder loop.

4. When n = 1 the algorithm divide the GEMM workload in m dimension
   intelligently at a granularity of MR. Each thread work on A:(>=MR)xk
   B:kx1 and produce C = (>=MR)x1. When n=1 reordering of B is avoided
   to efficiently process in n one kernel.

5. Fixed few warnings while loading 2 f32 bias elements using
   _mm_load_sd using float pointer. Typecasted to (const double *)

AMD-Internal: [SWLCSG-2391, SWLCSG-2353]
Change-Id: If1d0b8d59e0278f5f16b499de1d629e63da5b599

2024-03-04 23:53:23 +05:30

..

bench_aocl_gemm

Implemented optimal AVX512-variant of f32 LPGEMV

2024-03-04 23:53:23 +05:30

bench_amaxv.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_axpbyv.c

Code cleanup: No newline at end of file

2023-04-21 10:02:48 -04:00

bench_copyv.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_dotv.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_gemm_pack_compute.c

Code cleanup: No newline at end of file

2023-11-22 17:11:10 -05:00

bench_gemm.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_gemmt.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_gemv.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_ger.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_nrm2.c

Adding AVX2 support for DNRM2

2022-09-20 06:05:01 -04:00

bench_scalv.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_swapv.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_syrk.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_trsm.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

bench_trsv.c

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

CMakeLists.txt

CMake:Added cmake for bench

2024-02-06 06:50:34 -05:00

inputamaxv.txt

Bench addition for amaxv API

2021-06-04 17:45:04 +05:30

inputaxpbyv.txt

Optimized AXPBYV Kernel using AVX2 Intrinsics

2022-01-05 04:19:11 -05:00

inputcopy.txt

Added bench utility for copyv API

2021-06-09 12:29:49 +05:30

inputdotv.txt

Added bench utility for dotv and scalv APIs.

2021-05-21 10:00:32 +05:30

inputgemm.txt

AOCL DTL - Added thread and execution time details in logs

2021-11-12 08:58:54 +05:30

inputgemmpackcompute.txt

Code cleanup: No newline at end of file

2023-11-22 17:11:10 -05:00

inputgemmt.txt

Added bench app for syrk - input is a log file generated from AOCL_DTL

2021-05-11 14:57:51 +05:30

inputgemv.txt

Fixed crash issue in bench utility for gemv API

2021-05-19 14:21:09 +05:30

inputger.txt

Added bench utility for ger API.

2021-05-19 14:05:01 +05:30

inputnrm2.txt

Code cleanup: No newline at end of file

2023-04-21 10:02:48 -04:00

inputscalv.txt

Added bench utility for dotv and scalv APIs.

2021-05-21 10:00:32 +05:30

inputswap.txt

Added bench utility for swapv API

2021-06-09 17:05:00 +05:30

inputsyrk.txt

Added bench app for syrk - input is a log file generated from AOCL_DTL

2021-05-11 14:57:51 +05:30

inputtrsm.txt

Trsm bench utility missmatch DTL logs and bench

2021-11-12 08:58:52 +05:30

inputtrsv.txt

Bench trsv logging error

2021-06-08 11:54:55 +05:30

Makefile

BLAS Extension API - ?gemm_compute()

2023-10-16 08:18:52 -04:00