mirror of
https://github.com/amd/blis.git
synced 2026-05-12 10:05:38 +00:00
-Softmax is often used as the last activation function in a neural network - softmax(xi) = exp(xi)/(exp(x0) + exp(x1) + ... + exp(xn))). This step happens after the final low precision gemm computation, and it helps to have the softmax functionality that can be invoked as part of the lpgemm workflow. In order to support this, a new api, aocl_softmax_f32 is introduced as part of aocl_gemm. This api computes element-wise softmax of a matrix/vector of floats. This api invokes ISA specific vectorized micro-kernels (vectorized only when incx=1), and a cntx based mechanism (similar to lpgemm_cntx) is used to dispatch to the appropriate kernel. AMD-Internal: [CPUPL-3247] Change-Id: If15880360947435985fa87b6436e475571e4684a
34 lines
618 B
Plaintext
34 lines
618 B
Plaintext
f32_softmax 1 1
|
|
f32_softmax 2 1
|
|
f32_softmax 4 1
|
|
f32_softmax 21 1
|
|
f32_softmax 64 1
|
|
f32_gelu_tanh 1 1
|
|
f32_gelu_tanh 2 1
|
|
f32_gelu_tanh 8 1
|
|
f32_gelu_tanh 16 1
|
|
f32_gelu_tanh 21 1
|
|
f32_gelu_tanh 64 1
|
|
f32_gelu_tanh 1029 1
|
|
f32_gelu_erf 1 1
|
|
f32_gelu_erf 2 1
|
|
f32_gelu_erf 8 1
|
|
f32_gelu_erf 16 1
|
|
f32_gelu_erf 21 1
|
|
f32_gelu_erf 64 1
|
|
f32_gelu_erf 1029 1
|
|
f32_gelu_tanh 1 9
|
|
f32_gelu_tanh 2 9
|
|
f32_gelu_tanh 8 9
|
|
f32_gelu_tanh 16 1024
|
|
f32_gelu_tanh 21 1024
|
|
f32_gelu_tanh 64 1024
|
|
f32_gelu_tanh 1029 512
|
|
f32_gelu_erf 1 9
|
|
f32_gelu_erf 2 9
|
|
f32_gelu_erf 8 9
|
|
f32_gelu_erf 16 1024
|
|
f32_gelu_erf 21 1024
|
|
f32_gelu_erf 64 1024
|
|
f32_gelu_erf 1029 512
|