Details:
- Disabled intrinsics code of f32obf16 pack function
for gcc < 11.2 as the instructions used in kernels
are not supported by the compiler versions.
- Addded early-return check for WOQ APIs when compiling with
gcc < 11.2
- Fixed code to check whether JIT kernels are generated inside
batch_gemm API for bf16 datatype.
AMD Internal: [CPUPL-6327]
Change-Id: I0a017c67eb9d9d22a14e095e435dc397e265fb0a
Description:
1. The bias type was supported only based on output data type.
2. The option is added in the pre-ops structure to select the bias data
type(s8/s32/bf16) irrespective of the storage data type in
u8s8s32/s8s8s32 API's.
AMD-Internal: SWLCSG-3302
Change-Id: I3c465fe428672d2d58c1c60115c46d2d5b11f0f4
Details:
- The batch matmul performs a series of matmuls, processing
more than one GEMM problem at once.
- Introduced a new parameter called batch_size for the user
to indicate number of GEMM problems in a batch/group.
- This operation supports processing GEMM problems with
different parameters including dims,post-ops,stor-schemes etc.,
- This operation is optimized for problems where all the
GEMMs in a batch are of same size and shape.
- For now, the threads are distributed among different GEMM
problems equally irrespective of their dimensions which
leads to better performance for batches with identical GEMMs
but performs sub-optimally for batches with non-identical GEMMs.
- Optimizations for batches with non-identical GEMMs is in progress.
- Added bench and input files for batch_matmul.
- Added logger functionality for batch_matmul APIs.
AMD-Internal: [SWLCSG-2944]
Change-Id: I83e26c1f30a5dd5a31139f6706ac74be0aa6bd9a
-As it stands the buffer type in matrix add|mul post-ops is expected to
be the same as that of the output C matrix type. This limitation is now
removed and user can specify the buffer type by setting the stor_type
attribute in add|mul post-op struct. As of now int8, int32, bfloat16 and
float types are supported for the buffer in s32 micro-kernels. The same
support is also added for bf16 micro-kernels, with bfloat16 and float
supported for now.
-Additionally the values (from buffer) are added/multiplied as is to the
output registers while performing the matrix add|mul post-ops. Support
is added for scaling these values before using them in the post-ops.
Both scalar and vector scale_factors are supported.
-The bias_stor_type attribute is renamed to stor_type in bias post-ops.
AMD-Internal: [SWLCSG-3319]
Change-Id: I4046ab84481b02c55a71ebb7038e38aec840c0fa
-Currently the values (from buffer) are added/multiplied as is to the
output registers while performing the matrix add/mul post-ops. Support
is added for scaling these values before using them in the post-ops.
Both scalar and vector scale_factors are supported.
AMD-Internal: [SWLCSG-3181]
Change-Id: Ifdb7160a1ea4f5ecccfa3ef31ecfed432898c14d
Details:
- The batch matmul performs a series of matmuls, processing
more than one GEMM problem at once.
- Introduced a new parameter called batch_size for the user
to indicate number of GEMM problems in a batch/group.
- This operation supports processing GEMM problems with
different parameters including dims,post-ops,stor-schemes etc.,
- This operation is optimized for problems where all the
GEMMs in a batch are of same size and shape.
- For now, the threads are distributed among different GEMM
problems equally irrespective of their dimensions which
leads to better performance for batches with identical GEMMs
but performs sub-optimally for batches with non-identical GEMMs.
- Optimizations for batches with non-identical GEMMs is in progress.
- Added bench and input files for batch_matmul.
AMD-Internal: [SWLCSG-2944]
Change-Id: Idc59db5b8c5794bf19f6f86bcb8455cd2599c155