blis/kernels at 3572baa9d3d67056ba2549817c65546674e35aba - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-13 10:35:38 +00:00

Files

mkadavil 3572baa9d3 aocl_softmax_f32 api's for softmax computation as part of lpgemm.

-Softmax is often used as the last activation function in a neural
network - softmax(xi) = exp(xi)/(exp(x0) + exp(x1) + ... + exp(xn))).
This step happens after the final low precision gemm computation,
and it helps to have the softmax functionality that can be invoked
as part of the lpgemm workflow. In order to support this, a new api,
aocl_softmax_f32 is introduced as part of aocl_gemm. This api
computes element-wise softmax of a matrix/vector of floats. This api
invokes ISA specific vectorized micro-kernels (vectorized only when
incx=1), and a cntx based mechanism (similar to lpgemm_cntx) is used
to dispatch to the appropriate kernel.

AMD-Internal: [CPUPL-3247]
Change-Id: If15880360947435985fa87b6436e475571e4684a

2023-04-21 05:26:08 -04:00

armsve

New kernel set for Arm SVE using assembly (#396 )

2020-05-21 11:56:45 +05:30

armv7a

Squash-merge 'pr' into 'squash'. (#457 )

2020-11-14 09:39:48 -06:00

armv8a

avoid loading twice in armv8a gemm kernel (#403 )

2020-05-21 12:37:53 +05:30

bgq

Replaced use of bool_t type with C99 bool.