18 Commits

Author SHA1 Message Date
Vignesh Balasubramanian
8abb37a0ad Update to AOCL-BLAS bench application for logging outputs
- Updated the format specifiers to have a leading space,
  in order to delimit the outputs appropriately in the
  output file.

- Further updated every source file to have a leading space
  in its format string occuring after the macros.

AMD-Internal: [CPUPL-5895]
Change-Id: If856f55363bb811de0be6fdd1d7bbc8ec5c76c15
2025-02-06 22:59:59 +05:30
Vignesh Balasubramanian
445327f255 Bugfix for AOCL-BLAS bench application
- Bug : When configuring our library with the native
        BLIS integer size being 32, the bench application
	would crash or read an invalid value when parsing
        the input file. This is because of a mismatch
        of format specifier, that we hardset in the
        Makefile.

- Fix : Defined a header that sets the format specifiers
        as macros with the right matching, based on how we
        configure and build the library. It is expected to
        include this header in every source file for
        benchmarking.

AMD-Internal: [CPUPL-5895]
Change-Id: I9718c36a1a9fe3eba4d5da419823c16097902d89
2025-01-29 03:25:57 -05:00
Edward Smyth
82bdf7c8c7 Code cleanup: Copyright notices
- Standardize formatting (spacing etc).
- Add full copyright to cmake files (excluding .json)
- Correct copyright and disclaimer text for frame and
  zen, skx and a couple of other kernels to cover all
  contributors, as is commonly used in other files.
- Fixed some typos and missing lines in copyright
  statements.

AMD-Internal: [CPUPL-4415]
Change-Id: Ib248bb6033c4d0b408773cf0e2a2cda6c2a74371
2024-08-05 15:35:08 -04:00
Edward Smyth
591a3a7395 Code cleanup: file formats and permissions
- Remove execute file permission from source and make files.
- dos2unix conversion.
- Add missing eol at end of files.

Also update .gitignore to not exclude build directory but to
exclude any build_* created by cmake builds.

AMD-Internal: [CPUPL-4415]
Change-Id: I5403290d49fe212659a8015d5e94281fe41eb124
2024-08-05 11:52:33 -04:00
srikanth pogula
1d7f6d414f Bench APPs - change in Print statement for more params
>Made changes in the print statements in bench files
         to print all the params of the individual APIs

        > Ex : removing tab & adding Func param
         "Dt\t n\t incx\t incy\t gflops\n" --> "Func Dt n incx incy gflops\n"

        > Ex : adding func, incx, incy params
         "dt_ch, n, alpha_r, alpha_i, beta_r, beta_i, gflops" --> "tmp, dt_ch, n, alpha_r, alpha_i, incx,
                                                                       beta_r, beta_i, incy, gflops"

Change-Id: Ib5d151d7472d3f88c13a85a615a447dfa5e6b528
2024-07-11 02:04:19 -04:00
Varaganti, Kiran
2ac24d1f9c Avoided Extra copy of "c" matrix
Initailized c_save instead of 'c" and then removed copying c to c_save.
Because at the start every n_repeats iteration we are copying back c_save to c.
Therefore if we initialize c_save, we can avoid extra copy of "c" to c_save before calling
GEMM. For very large sizes matrix initialization takes considerable amount of time. This can
be reduced now.

Change-Id: I2c6ffe169e991607314897cb0c1fbfc0d74ef179
2024-07-09 00:54:03 -04:00
Edward Smyth
ed5010d65b Code cleanup: AMD copyright notice
Standardize format of AMD copyright notice.

AMD-Internal: [CPUPL-3519]
Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0
2023-11-23 08:54:31 -05:00
Kiran Varaganti
201db7883c Integrated 32x6 DGEMM kernel for zen4 and its related changes are added.
Details:
- Now AOCL BLIS uses AX512 - 32x6 DGEMM kernel for native code path.
  Thanks to Moore, Branden <Branden.Moore@amd.com> for suggesting and
  implementing these optimizations.
- In the initial version of 32x6 DGEMM kernel, to broadcast elements of B packed
  we perform load into xmm (2 elements), broadcast into zmm from xmmm and then to get the
  next element, we do vpermilpd(xmm). This logic is replaced with direct broadcast from
  memory, since the elements of Bpack are stored contiguously, the first broadcast fetches
  the cacheline and then subsequent broadcasts happen faster. We use two registers for broadcast
  and interleave broadcast operation with FMAs to hide any memory latencies.
- Native dTRSM uses 16x14 dgemm - therefore we need to override the default blkszs (MR,NR,..)
  when executing trsm. we call bli_zen4_override_trsm_blkszs(cntx_local) on a local cntx_t object
  for double data-type as well in the function bli_trsm_front(), bli_trsm_xx_ker_var2, xx = {ll,lu,rl,ru}.
  Renamed "BLIS_GEMM_AVX2_UKR" to "BLIS_GEMM_FOR_TRSM_UKR" and in the bli_cntx_init_zen4() we replaced
  dgemm kernel for TRSM with 16x14 dgemm kernel.
- New packm kernels - 16xk, 24xk and 32xk are added.
- New 32xk packm reference kernel is added in bli_packm_cxk_ref.c and it is
  enabled for zen4 config (bli_dpackm_32xk_zen4_ref() )
- Copyright year updated for modified files.
- cleaned up code for "zen" config - removed unused packm kernels declaration in kernels/zen/bli_kernels.h
- [SWLCSG-1374], [CPUPL-2918]

Change-Id: I576282382504b72072a6db068eabd164c8943627
2023-01-19 23:11:36 +05:30
Chandrashekara K R
ff2ee0ae3f AOCL-WINDOWS: Added the windows build system to build bench folder on windows.
1. Added the checks in .c files of the bench folder to read the input parameters from the given input files on windows using fscanf.

Change-Id: Ie0497696304d318f345a646ab0ce3ba84debd4e2
2022-06-27 22:32:39 -04:00
Dipal M Zambare
8f310c3384 AOCL DTL - Added thread and execution time details in logs
-- Added number of threads used in DTL logs
    -- Added support for timestamps in DTL traces
    -- Added time taken by API at BLAS layer in the DTL logs
    -- Added GFLOPS achieved in DTL logs
    -- Added support to enable/disable execution time and
       gflops printing for individual API's. We may not want
       it for all API's. Also it will help us migrate API's
       to execution time and gflops logs in stages.
    -- Updated GEMM bench to match new logs
    -- Refactored aocldtl_blis.c to remove code duplication.
    -- Clean up logs generation and reading to use spaces
       consistently to separate various fields.
    -- Updated AOCL_gettid() to return correct thread id
       when using pthreads.

AMD-Internal: [CPUPL-1691]
Change-Id: Iddb8a3be2a5cd624a07ccdbf5ae0695799d8ae8e
2021-11-12 08:58:54 +05:30
Meghana Vankadari
1944de1cfa Fixed a bug in Level-3 bench files
Details:
- BLIS has reserved rs = cs = 1 case only for 1x1 scalars.
- For vectors, even though rs = cs = 1 is a valid input, BLIS
  adjusts the strides to satisfy the error checking.
- For an mxn matrix, if m > 1 and n = 1, BLIS sets cs = m
  to indicate that this is a column vector stored in column major
  order. Similarly BLIS sets rs = n in case of m = 1 and n > 1.
- So determining storage-scheme based on row-stride could lead to
  errors if one of the matrices becomes vector.
- Modified bench files to determine storage scheme based on
  stor_scheme character instead of checking row-strides.

Change-Id: Id2dc0ea11f0e549ce8e49eb2c393442b33851527
2021-06-22 10:38:11 +05:30
Meghana Vankadari
3804e301c9 Fixed a bug in Level-3 bench files where ldc = 1
Details:
- To determine whether matrices are col-stored, we were checking
  ldc == 1. This is incorrect as a matrix can be col-stored with ldc = 1
  if dimension is 1.
- Modified the condition to check row_stride instead of col stride.
  if row-stride != 1, we can assume that matrices are not col-stored
  and ignore those inputs by printing an error message.

Change-Id: Id4d5b971104eb11cbcdd6d22c5c620febefd3a87
2021-06-01 10:57:18 +05:30
Kiran Varaganti
492f54fb5e Fix a bug in bench_gemm.c
When op(A) or op(B) = transpose - the leading dimensions of these matrices altered.
Commented out the statements "if(transa) lda = ..." similarly for matrix B and corrected this
mistake in both column and row storages.
Provide a provision to call BLIS interfaces when row-major inputs are used.

Change-Id: Id2041af219a64567471c14190f283274d1df2f7f
2021-05-24 12:59:28 +05:30
Nallani Bhaskar
a59796ef16 Updated leading dimensions for transpose case in gemm bench
1. Updated lda, ldb based on trans flags
2. Updated deriving storage type using leading dimension
2. Cleanup and alignment
3. Included transpose and row major cases in inputgemm.txt

Change-Id: I25f5cd522eb64f212445d98f4682132bf5a330b6
2021-05-14 15:26:20 +05:30
Nageshwar Singh
a88cb82cec Revert "Adding trans h support in bench_gemm.c"
This reverts commit 791903b31c.

Change-Id: I24403cced67ea9e851adb58a8bf01a3e17bb4e85
2021-05-07 04:11:30 -04:00
Nageshwar Singh
791903b31c Adding trans h support in bench_gemm.c
Change-Id: If340d515c38a593df26d5075e29685ef044601a5
2021-03-02 02:33:06 +05:30
Kiran Varaganti
80a516382e Fixed wrong dimensions check in bench/bench_gemm.c application
Verifying the valid values of m, n, k, lda, ldb and ldc is removed.
Since the bench app is run on logs collected from AOCL traces.
The correct way of checking should consider transpose parameter and storage order.

Change-Id: If0fbf733c2650c6f328661293eb99d062685d638
2020-11-20 20:39:20 +05:30
Kiran Varaganti
60642d98a3 Benchmark using AOCL Logs as input
Added benchmark application for gemm - input is a log file generated from AOCL
DTL from BLIS.

Change-Id: I2ac7a3c48d5a37c5b24ec0f0cff7e7886dad0b99
2020-11-06 14:31:53 +05:30