blis/addon at 46965dfc57f9549131d3c92e52978c5d1ab2f6ce - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Files

mkadavil 63ee4c5e4c Remove memcpy usage in u8s8s32 lpgemm micro kernels.

-As of now, memcpy is used in u8s8s32 micro-kernel for copying in k
fringe loop (( k % 4 )!= 0) and NR' < 16 fringe kernels. However for
small k/n dimensions, memcpy invocation has high overhead.
-This issue is fixed by replacing memcpy with a MACRO based
implementation of copy routine, specifically optimized for the sizes
that will be encountered in fringe cases (k < 4, NR' < 16).

AMD-Internal: [CPUPL-3008]
Change-Id: I376bab0aac325832e42e370b291614e5fd5272dc

2023-02-16 05:52:19 -05:00

aocl_gemm

Remove memcpy usage in u8s8s32 lpgemm micro kernels.

2023-02-16 05:52:19 -05:00

gemmd

Added support for addons.

2022-03-31 12:03:27 +05:30