amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 18:15:37 +00:00

Author	SHA1	Message	Date
Vignesh Balasubramanian	8abb37a0ad	Update to AOCL-BLAS bench application for logging outputs - Updated the format specifiers to have a leading space, in order to delimit the outputs appropriately in the output file. - Further updated every source file to have a leading space in its format string occuring after the macros. AMD-Internal: [CPUPL-5895] Change-Id: If856f55363bb811de0be6fdd1d7bbc8ec5c76c15	2025-02-06 22:59:59 +05:30
Vignesh Balasubramanian	445327f255	Bugfix for AOCL-BLAS bench application - Bug : When configuring our library with the native BLIS integer size being 32, the bench application would crash or read an invalid value when parsing the input file. This is because of a mismatch of format specifier, that we hardset in the Makefile. - Fix : Defined a header that sets the format specifiers as macros with the right matching, based on how we configure and build the library. It is expected to include this header in every source file for benchmarking. AMD-Internal: [CPUPL-5895] Change-Id: I9718c36a1a9fe3eba4d5da419823c16097902d89	2025-01-29 03:25:57 -05:00
Edward Smyth	82bdf7c8c7	Code cleanup: Copyright notices - Standardize formatting (spacing etc). - Add full copyright to cmake files (excluding .json) - Correct copyright and disclaimer text for frame and zen, skx and a couple of other kernels to cover all contributors, as is commonly used in other files. - Fixed some typos and missing lines in copyright statements. AMD-Internal: [CPUPL-4415] Change-Id: Ib248bb6033c4d0b408773cf0e2a2cda6c2a74371	2024-08-05 15:35:08 -04:00
Edward Smyth	591a3a7395	Code cleanup: file formats and permissions - Remove execute file permission from source and make files. - dos2unix conversion. - Add missing eol at end of files. Also update .gitignore to not exclude build directory but to exclude any build_* created by cmake builds. AMD-Internal: [CPUPL-4415] Change-Id: I5403290d49fe212659a8015d5e94281fe41eb124	2024-08-05 11:52:33 -04:00
srikanth pogula	1d7f6d414f	Bench APPs - change in Print statement for more params >Made changes in the print statements in bench files to print all the params of the individual APIs > Ex : removing tab & adding Func param "Dt\t n\t incx\t incy\t gflops\n" --> "Func Dt n incx incy gflops\n" > Ex : adding func, incx, incy params "dt_ch, n, alpha_r, alpha_i, beta_r, beta_i, gflops" --> "tmp, dt_ch, n, alpha_r, alpha_i, incx, beta_r, beta_i, incy, gflops" Change-Id: Ib5d151d7472d3f88c13a85a615a447dfa5e6b528	2024-07-11 02:04:19 -04:00
Varaganti, Kiran	2ac24d1f9c	Avoided Extra copy of "c" matrix Initailized c_save instead of 'c" and then removed copying c to c_save. Because at the start every n_repeats iteration we are copying back c_save to c. Therefore if we initialize c_save, we can avoid extra copy of "c" to c_save before calling GEMM. For very large sizes matrix initialization takes considerable amount of time. This can be reduced now. Change-Id: I2c6ffe169e991607314897cb0c1fbfc0d74ef179	2024-07-09 00:54:03 -04:00
Edward Smyth	ed5010d65b	Code cleanup: AMD copyright notice Standardize format of AMD copyright notice. AMD-Internal: [CPUPL-3519] Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0	2023-11-23 08:54:31 -05:00
Kiran Varaganti	201db7883c	Integrated 32x6 DGEMM kernel for zen4 and its related changes are added. Details: - Now AOCL BLIS uses AX512 - 32x6 DGEMM kernel for native code path. Thanks to Moore, Branden <Branden.Moore@amd.com> for suggesting and implementing these optimizations. - In the initial version of 32x6 DGEMM kernel, to broadcast elements of B packed we perform load into xmm (2 elements), broadcast into zmm from xmmm and then to get the next element, we do vpermilpd(xmm). This logic is replaced with direct broadcast from memory, since the elements of Bpack are stored contiguously, the first broadcast fetches the cacheline and then subsequent broadcasts happen faster. We use two registers for broadcast and interleave broadcast operation with FMAs to hide any memory latencies. - Native dTRSM uses 16x14 dgemm - therefore we need to override the default blkszs (MR,NR,..) when executing trsm. we call bli_zen4_override_trsm_blkszs(cntx_local) on a local cntx_t object for double data-type as well in the function bli_trsm_front(), bli_trsm_xx_ker_var2, xx = {ll,lu,rl,ru}. Renamed "BLIS_GEMM_AVX2_UKR" to "BLIS_GEMM_FOR_TRSM_UKR" and in the bli_cntx_init_zen4() we replaced dgemm kernel for TRSM with 16x14 dgemm kernel. - New packm kernels - 16xk, 24xk and 32xk are added. - New 32xk packm reference kernel is added in bli_packm_cxk_ref.c and it is enabled for zen4 config (bli_dpackm_32xk_zen4_ref() ) - Copyright year updated for modified files. - cleaned up code for "zen" config - removed unused packm kernels declaration in kernels/zen/bli_kernels.h - [SWLCSG-1374], [CPUPL-2918] Change-Id: I576282382504b72072a6db068eabd164c8943627	2023-01-19 23:11:36 +05:30
Chandrashekara K R	ff2ee0ae3f	AOCL-WINDOWS: Added the windows build system to build bench folder on windows. 1. Added the checks in .c files of the bench folder to read the input parameters from the given input files on windows using fscanf. Change-Id: Ie0497696304d318f345a646ab0ce3ba84debd4e2	2022-06-27 22:32:39 -04:00
Dipal M Zambare	8f310c3384	AOCL DTL - Added thread and execution time details in logs -- Added number of threads used in DTL logs -- Added support for timestamps in DTL traces -- Added time taken by API at BLAS layer in the DTL logs -- Added GFLOPS achieved in DTL logs -- Added support to enable/disable execution time and gflops printing for individual API's. We may not want it for all API's. Also it will help us migrate API's to execution time and gflops logs in stages. -- Updated GEMM bench to match new logs -- Refactored aocldtl_blis.c to remove code duplication. -- Clean up logs generation and reading to use spaces consistently to separate various fields. -- Updated AOCL_gettid() to return correct thread id when using pthreads. AMD-Internal: [CPUPL-1691] Change-Id: Iddb8a3be2a5cd624a07ccdbf5ae0695799d8ae8e	2021-11-12 08:58:54 +05:30
Meghana Vankadari	1944de1cfa	Fixed a bug in Level-3 bench files Details: - BLIS has reserved rs = cs = 1 case only for 1x1 scalars. - For vectors, even though rs = cs = 1 is a valid input, BLIS adjusts the strides to satisfy the error checking. - For an mxn matrix, if m > 1 and n = 1, BLIS sets cs = m to indicate that this is a column vector stored in column major order. Similarly BLIS sets rs = n in case of m = 1 and n > 1. - So determining storage-scheme based on row-stride could lead to errors if one of the matrices becomes vector. - Modified bench files to determine storage scheme based on stor_scheme character instead of checking row-strides. Change-Id: Id2dc0ea11f0e549ce8e49eb2c393442b33851527	2021-06-22 10:38:11 +05:30
Meghana Vankadari	3804e301c9	Fixed a bug in Level-3 bench files where ldc = 1 Details: - To determine whether matrices are col-stored, we were checking ldc == 1. This is incorrect as a matrix can be col-stored with ldc = 1 if dimension is 1. - Modified the condition to check row_stride instead of col stride. if row-stride != 1, we can assume that matrices are not col-stored and ignore those inputs by printing an error message. Change-Id: Id4d5b971104eb11cbcdd6d22c5c620febefd3a87	2021-06-01 10:57:18 +05:30
Kiran Varaganti	492f54fb5e	Fix a bug in bench_gemm.c When op(A) or op(B) = transpose - the leading dimensions of these matrices altered. Commented out the statements "if(transa) lda = ..." similarly for matrix B and corrected this mistake in both column and row storages. Provide a provision to call BLIS interfaces when row-major inputs are used. Change-Id: Id2041af219a64567471c14190f283274d1df2f7f	2021-05-24 12:59:28 +05:30
Nallani Bhaskar	a59796ef16	Updated leading dimensions for transpose case in gemm bench 1. Updated lda, ldb based on trans flags 2. Updated deriving storage type using leading dimension 2. Cleanup and alignment 3. Included transpose and row major cases in inputgemm.txt Change-Id: I25f5cd522eb64f212445d98f4682132bf5a330b6	2021-05-14 15:26:20 +05:30
Nageshwar Singh	a88cb82cec	Revert "Adding trans h support in bench_gemm.c" This reverts commit `791903b31c`. Change-Id: I24403cced67ea9e851adb58a8bf01a3e17bb4e85	2021-05-07 04:11:30 -04:00
Nageshwar Singh	791903b31c	Adding trans h support in bench_gemm.c Change-Id: If340d515c38a593df26d5075e29685ef044601a5	2021-03-02 02:33:06 +05:30
Kiran Varaganti	80a516382e	Fixed wrong dimensions check in bench/bench_gemm.c application Verifying the valid values of m, n, k, lda, ldb and ldc is removed. Since the bench app is run on logs collected from AOCL traces. The correct way of checking should consider transpose parameter and storage order. Change-Id: If0fbf733c2650c6f328661293eb99d062685d638	2020-11-20 20:39:20 +05:30
Kiran Varaganti	60642d98a3	Benchmark using AOCL Logs as input Added benchmark application for gemm - input is a log file generated from AOCL DTL from BLIS. Change-Id: I2ac7a3c48d5a37c5b24ec0f0cff7e7886dad0b99	2020-11-06 14:31:53 +05:30

18 Commits