When bli_trsm() API is called, we make sure the "side" argument is "side_t" and
not f77_char and argument is passed by value and not by its address.
Change-Id: I5a616eb054c034be2d67640b8ab3b9615706a8c9
- Low precision gemm sets thread meta data (lpgemm_thrinfo_t) to NULL
when compiled without open mp threading support. Subsequently the code
is executed as if it is single-threaded. However, when B matrix needs
to be packed, communicators are required (irrespective of single or
multi-threaded), and the code accesses lpgemm_thrinfo_t for the same
without NULL check. This results in seg fault.
For the fix, a non-open mp thread decorator layer is added, which
creates a placeholder lpgemm_thrinfo_t object with a communicator before
invoking the 5 loop algorithm. This object will be used for packing.
- Makefile for compilation of aocl_gemm bench.
AMD-Internal: [CPUPL-2304]
Change-Id: Id505235c8421792240b84f93942ca62dac78f3dc
1. Added the checks in .c files of the bench folder to read the input parameters from the given input files on windows using fscanf.
Change-Id: Ie0497696304d318f345a646ab0ce3ba84debd4e2
- Multi-Threaded int8 GEMM (Input - uint8_t, int8_t, Output - int32_t).
AVX512_vnni based micro-kernel for int8 gemm. Paralellization supported
along m and n dimensions.
- Multi-Threaded B matrix reorder support for sgemm. Reordering B matrix
is packing entire B matrix upfront before sgemm. It allows sgemm to
take advantage of packed B matrix without incurring packing costs during
runtime.
- Makefile updates to addon make rules to compile avx512 code for
selected files in addon folder.
- CPU features query enhancements to check for AVX512_VNNI flag.
- Bench for int8 gemm and sgemm with B matrix reorder. Supports
performance mode for benchmarking and accuracy mode for testing code
correctness.
AMD-Internal: [CPUPL-2102]
Change-Id: I8fb25f5c2fbd97d756f95b623332cb29e3b8d182
Description:
1. Decision logic to choose optimal number of threads for
given input dgemm dimensions under aocl dynamic feature
were retuned based on latest code.
2. Updated code in few file to avoid compilation warnings.
3. Added a min check for nt in bli_sgemv_var1_smart_threading
function
AMD-Internal: [ CPUPL-2100 ]
Change-Id: I2bc70cc87c73505dd5d2bdafb06193f664760e02
Details :
- SUP Threshold change for native vs SUP
- Improved the ST performances for sizes n<800
- Introduce PACKB in SUP to improve ST performance between 320<n<800
- 16T SUP Tuning for n<1600.
AMD-Internal: [CPUPL-1981]
Change-Id: Ie59afa4d31570eb0edccf760c088deaa2e10cdda
Details:
- Intrinsic implementation of axpbyv for AVX2
- Bench written for axpbyv
- Added definitions in zen contexts
AMD-Internal: [CPUPL-1963]
Change-Id: I9bc21a6170f5c944eb6e9e9f0e994b9992f8b539
-- Added number of threads used in DTL logs
-- Added support for timestamps in DTL traces
-- Added time taken by API at BLAS layer in the DTL logs
-- Added GFLOPS achieved in DTL logs
-- Added support to enable/disable execution time and
gflops printing for individual API's. We may not want
it for all API's. Also it will help us migrate API's
to execution time and gflops logs in stages.
-- Updated GEMM bench to match new logs
-- Refactored aocldtl_blis.c to remove code duplication.
-- Clean up logs generation and reading to use spaces
consistently to separate various fields.
-- Updated AOCL_gettid() to return correct thread id
when using pthreads.
AMD-Internal: [CPUPL-1691]
Change-Id: Iddb8a3be2a5cd624a07ccdbf5ae0695799d8ae8e
Details:
- BLIS has reserved rs = cs = 1 case only for 1x1 scalars.
- For vectors, even though rs = cs = 1 is a valid input, BLIS
adjusts the strides to satisfy the error checking.
- For an mxn matrix, if m > 1 and n = 1, BLIS sets cs = m
to indicate that this is a column vector stored in column major
order. Similarly BLIS sets rs = n in case of m = 1 and n > 1.
- So determining storage-scheme based on row-stride could lead to
errors if one of the matrices becomes vector.
- Modified bench files to determine storage scheme based on
stor_scheme character instead of checking row-strides.
Change-Id: Id2dc0ea11f0e549ce8e49eb2c393442b33851527
Details
- Passing enum rather than char for uplo, transa, and diaga
- Deleting log file, and other temp files, merged in the codebase from amax
AOCL-Internal: [CPUPL-1591]
Change-Id: Ife85a388b45659aa608a552d18a65fe828b046b2
Details:
- To determine whether matrices are col-stored, we were checking
ldc == 1. This is incorrect as a matrix can be col-stored with ldc = 1
if dimension is 1.
- Modified the condition to check row_stride instead of col stride.
if row-stride != 1, we can assume that matrices are not col-stored
and ignore those inputs by printing an error message.
Change-Id: Id4d5b971104eb11cbcdd6d22c5c620febefd3a87
When op(A) or op(B) = transpose - the leading dimensions of these matrices altered.
Commented out the statements "if(transa) lda = ..." similarly for matrix B and corrected this
mistake in both column and row storages.
Provide a provision to call BLIS interfaces when row-major inputs are used.
Change-Id: Id2041af219a64567471c14190f283274d1df2f7f
- Added bench utility for dotv and scalv API's
- Corrected logging for scalv to handle complex types
- Corrected logging to remove transpose field from dotv logs
AOCL-Internal: [CPUPL-1577]
Change-Id: Ieb29e773309de1520c7fa5b79b97c943d894ba07
- incx and incy was not considered while allocating
memory for x and y vectors.
- Updated test data set
AMD-Internal: [CPUPL-1578]
Change-Id: I374a75aaa66f951f0f8353434d94c135d09b2f05
1. Updated lda, ldb based on trans flags
2. Updated deriving storage type using leading dimension
2. Cleanup and alignment
3. Included transpose and row major cases in inputgemm.txt
Change-Id: I25f5cd522eb64f212445d98f4682132bf5a330b6
Fix reading input parameters
Interchange the reading of n and k, first n appears and then k appears in the logs.
Added comments to explain the format of the input gemmt log.
Change-Id: I44c6081d4449ba210728bc089c4215d5eef18834
Verifying the valid values of m, n, k, lda, ldb and ldc is removed.
Since the bench app is run on logs collected from AOCL traces.
The correct way of checking should consider transpose parameter and storage order.
Change-Id: If0fbf733c2650c6f328661293eb99d062685d638