-A light-weight mechanism/framework to log input details and a
stringified version of the post-ops structure is added to LPGEMM.
Additionally the runtime of the API is also logged.
The logging framework logs to a file with filename following the format
aocl_gemm_log_<PID>_<TID>.txt.
-To enable this feature, the AOCL_LPGEMM_LOGGER_SUPPORT=1 macro needs to
be defined when compiling BLIS (with aocl_gemm addon enabled) by passing
CFLAGS="-DAOCL_LPGEMM_LOGGER_SUPPORT=1" to ./configure. Additionally
AOCL_ENABLE_LPGEMM_LOGGER=1 has to be exported in the environment during
LPGEMM runtime.
AMD-Internal: [SWLCSG-3280]
Change-Id: I30bfb35b2dc412df70044601b335938fc9f49cfb
Details:
- The batch matmul performs a series of matmuls, processing
more than one GEMM problem at once.
- Introduced a new parameter called batch_size for the user
to indicate number of GEMM problems in a batch/group.
- This operation supports processing GEMM problems with
different parameters including dims,post-ops,stor-schemes etc.,
- This operation is optimized for problems where all the
GEMMs in a batch are of same size and shape.
- For now, the threads are distributed among different GEMM
problems equally irrespective of their dimensions which
leads to better performance for batches with identical GEMMs
but performs sub-optimally for batches with non-identical GEMMs.
- Optimizations for batches with non-identical GEMMs is in progress.
- Added bench and input files for batch_matmul.
AMD-Internal: [SWLCSG-2944]
Change-Id: Idc59db5b8c5794bf19f6f86bcb8455cd2599c155
-This API supports applying element wise operations (eg: post-ops) on a
bfloat16 input matrix to get an output matrix of the same(bfloat16) or
upscaled data type (float).
-Benchmarking/testing framework for the same is added.
AMD Internal: SWLCSG-2947
Change-Id: I43f1c269be1a1997d4912d8a3a97be5e5f3442d2