amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-06-29 02:37:05 +00:00

Author	SHA1	Message	Date
S, Hari Govind	e097346658	Implemented Multithreading Support and Optimization of DGEMV API (#10 ) - Implemented multithreading framework for the DGEMV API on Zen architectures. Architecture specific AOCL-dynamic logic determines the optimal number of threads for improved performance. - The condition check for the value of beta is optimized by utilizing masked operations. The mask value is set based on value of beta, and the masked operations are applied when the vector y is loaded or scaled with beta. AMD-Internal: [CPUPL-6746]	2025-06-17 12:39:48 +05:30
Chandrashekara K R	ae698be825	Updated version string from 5.0.1 to 5.1.1	2025-06-13 11:24:50 +05:30
Hari Govind S	29f30c7863	Optimisation for DCOPY API - Introducted new assembly kernel that copies data from source to destination from the front and back of the vector at the same time. This kernel provides better performance for larger input sizes. - Added a wrapper function responsible for selecting the kernel used by DCOPYV API to handle the given input for zen5 architecture. - Updated AOCL-dynamic threshold for DCOPYV API in zen4 and zen5 architectures. - New unit-tests were included in the grestsuite for the new kernel. AMD-Internal: [CPUPL-6650] Change-Id: Ie2af88b8e97196b6aa02c089e59247742002f568	2025-04-28 05:58:21 -04:00
Vignesh Balasubramanian	b4b0887ca4	Additional optimizations to ZGEMM SUP and Tiny codepaths(ZEN4 and ZEN5) - Added a set of AVX512 fringe kernels(using masked loads and stores) in order to avoid rerouting to the GEMV typed API interface(when m = 1). This ensures uniformity in performance across the main and fringe cases, when the calls are multithreaded. - Further tuned the thresholds to decide between ZGEMM Tiny, Small SUP and Native paths for ZEN4 and ZEN5 architectures(in case of parallel execution). This would account for additional combinations of the input dimensions. - Moved the call to Tiny-ZGEMM before the BLIS object creation, since this code-path operates on raw buffers. - Added the necessary test-cases for functional and memory testing of the newly added kernels. AMD-Internal: [CPUPL-6378][CPUPL-6661] Change-Id: I9af73d1b6ef82b26503d4fc373111132aee3afd6	2025-04-23 00:56:58 -04:00
Vignesh Balasubramanian	c4b84601da	AVX512 optimizations for CGEMM(rank-1 kernel) - Implemented an AVX512 rank-1 kernel that is expected to handle column-major storage schemes of A, B and C(without transposition) when k = 1. - This kernel is single-threaded, and acts as a direct call from the BLAS layer for its compatible inputs. - Defined custom BLAS and BLIS_IMPLI layers for CGEMM (instead of using the macro definition), in order to integrate the call to this kernel at runtime(based on the corresponding architecture and input constraints). - Added unit-tests for functional and memory testing of the kernel. - Updated the ZEN5 context to include the AVX512 CGEMM SUP kernels, with its cache-blocking parameters. AMD-Internal: [CPUPL-6498] Change-Id: I42a66c424325bd117ceb38970726a05e2896a46b	2025-03-06 20:14:05 +05:30
Vignesh Balasubramanian	07df9f471e	AVX512 optimizations for CGEMM(SUP) - Implemented the following AVX512 SUP column-preferential kernels(m-variant) for CGEMM : Main kernel : 24x4m Fringe kernels : 24x3m, 24x2m, 24x1m, 16x4, 16x3, 16x2, 16x1, 8x4, 8x3, 8x2, 8x1, fx4, fx3, fx2, fx1(where 0<f<8). - Utlized the packing kernel to pack A when handling inputs with CRC storage scheme. This would in turn handle RRC with operation transpose in the framework layer. - Further adding C prefetching to the main kernel, and updated the cache-blocking parameters for ZEN4 and ZEN5 contexts. - Added a set of decision logics to choose between SUP and Native AVX512 code-paths for ZEN4 and ZEN5 architectures. - Updated the testing interface for complex GEMMSUP to accept the kernel dimension(MR) as a parameter, in order to set the appropriate panel stride for functional and memory testing. Also updated the existing instantiators to send their kernel dimensions as a parameter. - Added unit tests for functional and memory testing of these newly added kernels. AMD-Internal: [CPUPL-6498] Change-Id: Ie79d3d0dc7eed7edf30d8d4f74b888135f31d6b4	2025-03-06 06:03:39 -05:00
Hari Govind S	8998839c71	Optimisation of DGEMV Transpose Case for unit stride - Included a new code section to handle input having non-unit strided y vector for dgemv transpose case. Removed the same from the respective kernels to avoid repeated branching caused by condition checks within the 'for' loop. - The condition check for beta is equal to zero in the primary kernels are moved outside the for loop to avoid repeated branching. - The '_mm512_reduce_pd' operations in the primary kernel is replaced by a series of operations to reduce the number of instructions required to reduce the 8 registers. - Changing naming convention for DGEMV transpose kernels. - Modified unit kernel test to avoid y increment for dgemv tranpose kernels during the test. AMD-Internal: [CPUPL-6565] Change-Id: I1ac516d6b8f156ac53ac9f6eb18badd50e152e05	2025-03-06 05:15:58 -05:00
Vignesh Balasubramanian	99770558bb	AVX512 optimizations for CGEMM(Native) - Implemented the following AVX512 native computational kernels for CGEMM : Row-preferential : 4x24 Column-preferential : 24x4 - The implementations use a common set of macros, defined in a separate header. This is due to the fact that the implementations differ solely on the matrix chosen for load/broadcast operations. - Added the associated AVX512 based packing kernels, packing 24xk and 4xk panels of input. - Registered the column-preferential kernel(24x4) in ZEN4 and ZEN5 contexts. Further updated the cache-blocking parameters. - Removed redundant BLIS object creation and its contingencies in the native micro-kernel testing interface(for complex types). Added the required unit-tests for memory and functionality checks of the new kernels. AMD-Interal: [CPUPL-6498] Change-Id: I520ff17dba4c2f9bc277bf33ba9ab4384408ffe1	2025-02-28 03:18:24 -05:00
Edward Smyth	eee3fe1b54	GTestSuite: nested parallelism tests Optionally enable parallelism inside gtestsuite so that we can check BLIS functions perform correctly when nested parallelism is in operation. Enable with: cmake ... -DOPENMP_NESTED={0,1,2,1diff} where in gtestsuite - 0 is the default choice with no parallelism. - 1 and 2 are simple nested parallelism. - 1diff has one level of parallelism setting different numbers of threads to be used by BLIS and reference library calls from each gtestsuite thread. Note: OMP_NUM_THREADS must be set appropriately to enable or disable parallelism at each level in the test programs as desired. OMP_NUM_THREADS will also define the parallelism used within the BLIS library (if it is multithreaded), unless BLIS-specific ways of specifying parallelism have been used. If a BLIS-specific parallelism option has been set, the same mechanism will be used in the 1diff option to vary the number of threads in BLIS per application thread. AMD-Internal: [CPUPL-3902] Change-Id: I89f9edb4125c64ef03e025a9f6ccb84960ba8771	2025-02-07 08:49:25 -05:00
Edward Smyth	1f0fb05277	Code cleanup: Copyright notices (2) More changes to standardize copyright formatting and correct years for some files modified in recent commits. AMD-Internal: [CPUPL-5895] Change-Id: Ie95d599710c1e0605f14bbf71467ca5f5352af12	2025-02-07 05:41:44 -05:00
Arnav Sharma	5a4739d288	DGEMV NO_TRANSPOSE Optimizations and Unit Tests - Added 32x3n n-biased kernels to directly handle the cases where n=3 which were earlier being handled by the primary n-biased, 32x8n, kernel. - Modified the n-biased fringe kernels to further handle the smaller m-fringe cases. Thus, now the kernels handle the following range of m for any value of n: - 16x8n : m = [16, 31) - 8x8n : m = [8, 15) - m_leftx8n : m = [1, 7] - Updated the function pointer map for n-biased kernels with added granularity to invoke the smaller fringe cases directly on the basis of m-dimension. - Added micro-kernel unit tests for all the dgemv_n kernels. AMD-Internal: [CPUPL-6231] Change-Id: Ibe88848c2c1bbb65b3e79fbc90a2800dc15f5119	2025-02-06 18:52:32 +05:30
Shubham Sharma	f8c83fedb6	Added new ZTRSM small code path for ZEN5 - Added new ZTRSM kernels for right and left variants. - Kernel dimensions are 12x4. - 12x4 ZGEMM SUP kernels are used internally for solving GEMM subproblem. - These kernels do not support conjugate transpose. - Only column major inputs are supported. - Tuned thresholds to pick efficent code path for ZEN5. AMD-Internal: [CPUPL-6356] Change-Id: I33ba3d337b0fcd972ca9cfe4668cb23d2b279b6e	2025-02-06 18:01:10 +05:30
Hari Govind S	106a2b1fe1	Gtestsuite: UKR test for GEMV kernels - Added support for gemv kernels unit test in gtestsuite. - Added micro-kernel tests and memory tests for DGEMV transpose case kernels. AMD-Internal: [CPUPL-5835] Change-Id: I7d2d3cdbfea436f6c9b2cce9f2e85bfc5c51f201	2025-01-24 05:09:33 -05:00
Edward Smyth	0ae5a0492f	GTestSuite: fix to ukr tests for dgemm avx512 8x24 kernels - Restore test for old bli_dgemm_zen4_asm_8x24 kernel, so that we can test this if linking with older AOCL versions. - Move K_bli_dgemm_avx512_asm_8x24 definition from AOCL_42 list to AOCL_50 list. AMD-Internal: [CPUPL-4500] Change-Id: Id522f4bc5b89e86f77c4e1d26c75e261736ab450	2025-01-10 12:33:15 -05:00
harsdave	54b46ec1ed	Enhance 24x8 DGEMM SUP/Tiny Kernel Performance with Optimized Loops and Edge Kernels This patch introduces comprehensive optimizations to the DGEMM kernel, focusing on loop efficiency and edge kernel performance. The following technical improvements have been implemented: 1. IR Loop Optimization: - The IR loop has been re-implemented in hand-written assembly to eliminate the overhead associated with `begin_asm` and `end_asm` calls, resulting in more efficient execution. 2. JR Loop Integration: - The JR loop is now incorporated into the micro kernel. This integration avoids the repetitive overhead of stack frame management for each JR iteration, thereby enhancing loop performance. 3. Kernel Decomposition Strategy: - The m dimension is decomposed into specific sizes: 20, 18, 17, 16, 12, 11, 10, 9, 8, 4, 2, and 1. - For remaining cases, masked variants of edge kernels are utilized to handle the decomposition efficiently. 1. Interleaved Scaling by Alpha: - Scaling by the alpha factor is interleaved with load instructions to optimize the instruction pipeline and reduce latency. 2. Efficient Mask Preparation: - Masks are prepared within inline assembly code only at points where masked load-store operations are necessary, minimizing unnecessary overhead. 3. Broadcast Instruction Optimization: - In edge kernels where each FMA (Fused Multiply-Add) operation requires a broadcast without subsequent reuse, the broadcast instruction is replaced with `mem_1to8`. - This allows the compiler to optimize by assigning separate vector registers for broadcasting, thus avoiding dependency chains and improving execution efficiency. 4. C Matrix Update Optimization: - During the update of the C matrix in edge kernels, columns are pre-loaded into multiple vector registers. This approach breaks dependency chains during FMA operations following the scaling by alpha, thereby mitigating performance bottlenecks and enhancing throughput. These optimizations collectively improve the performance of the DGEMM kernel, particularly in handling edge cases and reducing overhead in critical loops. The changes are expected to yield significant performance gains in matrix multiplication operations. This patch also involves changes for tiny gemm interface. A light interface for calling kernels and removing calls to avx2 dgemm kernels as we use avx512 dgemm kernels for all the sizes for zen4 and zen5. For zen4 and zen5 when A matrix transposed(CRC, RRC), tiny kernel does not have the support to handle such inputs and thus such inputs are routed to gemm_small path. AMD-Internal: [CPUPL-6054] Change-Id: I57b430f9969ca39aa111b54fa169e4225b900c4a	2024-12-13 00:03:00 -05:00
Shubham Sharma	f2320a1fef	Enabled DGEMM row major kernel for ZEN4 - Merged ZEN4 and ZEN5 DGEMM 8x24 kernel. - Replaced 32x6 kernel with 8x24. Now same kernel is used for ZEN4 and ZEN5. - Blocksizes have been tuned for genoa only. - DGEMM kernel for DTRSM native code path is replaced with 8x24 kernel. - Enabled alpha scaling during packing for ZEN4. - ZEN4 8x24 kernel has been removed. AMD-Internal: [CPUPL-5912] Change-Id: I89a16a7e3355af037d21d453aabf53c5ecccb754	2024-11-29 08:18:48 +00:00
Edward Smyth	971c890fc6	GTestSuite: Select ukr tests by BLIS version Add definitions in gtestsuite header to list available kernel by AOCL BLIS version. Check these definitions in ukr test programs to avoid missing symbol errors when testing with an older version of BLIS. Currently AOCL_41, AOCL_42, AOCL_50 and AOCL_DEV are supported, with AOCL_DEV inferred from the version being later than the value of AOCL_BLAS_LATEST_VERSION set in CMakeLists.txt. Thanks to Eleni Vlachopoulou for the cmake functionality to automatically detect the version from the library. AMD-Internal: [CPUPL-4500] Change-Id: I40ffd3d3789324fbb1dabfbf5e1dd4e0c94d54d9	2024-11-15 10:07:29 -05:00
Edward Smyth	6330ac6a52	GTestSuite: Misc changes - Correct matsize and NumericalComparison functions for tests with first matrix dimension <= 0. - BLAS1: - Fix for BLAS vs CBLAS differences in amaxv IIT_ERS tests. - Threshold adjustments in ddotxf and zaxpy. - Break axpyv and scalv into separate executables for each data type. - BLAS2: - Threshold adjustments in symv and hemv. - Break ger into separate executables for each data type. - UKR: - Break gemm and trsm ukr test into separate executables for each data type. - Threshold adjustments in daxpyf - Disable {z,c}trsm ukr tests when BLIS_INT_ELEMENT_TYPE is used, as matrix generator is not currently suitable for this. AMD-Internal: [CPUPL-4500] Change-Id: I1d9e7acc11025f1478b8b511c14def5517ef0ae6	2024-09-19 10:17:36 -04:00
Edward Smyth	8d4881c4fd	GTestSuite: add option to test blis_impl layer Add BLAS_TEST_IMPL option for TEST_INTERFACE to test the wrapper layer underneath BLAS and CBLAS interfaces. This is particularly useful if building a BLIS library with these interfaces disabled, e.g. ./configure --disable-blas amdzen or cmake . -DENABLE_BLAS=OFF -DBLIS_CONFIG_FAMILY=amdzen The ?_blis_impl wrappers should have the same arguments as the BLAS interfaces, thus we define TEST_BLAS_LIKE as an additional definition for convenience when selecting tests and options in the C++ files. AMD-Internal: [CPUPL-5650] Change-Id: I0275a387563f3efc2b40029950c8569956f2df7b	2024-09-16 09:53:56 -04:00
Edward Smyth	89f52a6df5	Code cleanup: spelling corrections Corrections for spelling and other mistakes in code comments and doc files. AMD-Internal: [CPUPL-4500] Change-Id: I33e28932b0e26bbed850c55602dee12fd002da7f	2024-08-05 16:18:51 -04:00
Edward Smyth	b964308e50	GTestSuite: option to check input arguments Add tests to check input arguments have not been modified by BLIS routine. These tests add a large runtime overhead, so they are disabled by default. To enable them, configure gtestsuite with: cmake -DTEST_INPUT_ARGS=ON ... and run desired tests as normal. Also: - Correct testinghelpers::chktrans to handle upper case values of argument trns. - Change testinghelpers::matsize to return size 0 if m, n or leading dimension are 0, or if leading dimension is too small. AMD-Internal: [CPUPL-4379] Change-Id: I9494af800f9383195272ce99f622104a38fd0ed8	2024-08-05 09:58:17 -04:00
Edward Smyth	75f21182bd	GTestSuite: IIT and ERS test improvements Various improvements: - Where appropriate, test both: - with nullptr for suitable arguments that should never be touched. - with all arguments correct except the one we want to test, to check we are not returning early because another argument is a nullptr. - Test incorrect values for order argument in CBLAS calls. - Test early exits with limited data changes, e.g. set C to 0 or scale C in GEMM when alpha = 0. - Bugfix in gemmt test when alpha is 0 and beta is 1. - Use reference library gemmt for comparison when library is not netlib BLAS. AMD-Internal: [CPUPL-4500] Change-Id: Ibde7eaba5a484a87674044ca44855c6f6ee4ff4b	2024-07-31 15:36:01 -04:00
Edward Smyth	b90e12dfa4	GTestSuite: copyright notice Standardize format of copyright notice. AMD-Internal: [CPUPL-4500] Change-Id: I6bde64c15ff639492dd0de95423c660112a37e2c	2024-07-26 15:34:41 -04:00
Edward Smyth	ea286cf6f6	GTestSuite: whitespace at end of lines Unnecessary whitespace (spaces, tabs) at the end of lines has been removed. AMD-Internal: [CPUPL-4500] Change-Id: Ice5f5504232cb22460c14ac47e6a3a43309cba22	2024-07-26 12:12:56 -04:00
Edward Smyth	4183efa722	GTestSuite: No newline at end of file Add missing newline at the end of these files. AMD-Internal: [CPUPL-4500] Change-Id: I835cc73de0008b66ae3cf77fbb3daa1c8fcaaa7f	2024-07-26 11:42:57 -04:00
Vignesh Balasubramanian	6165001658	Bugfix and optimizations for ?AXPBYV API - Updated the existing code-path for ?AXPBYV to reroute the inputs to the appropriate L1 kernel, based on the alpha and beta value. This is done in order to utilize sensible optimizations with regards to the compute and memory operations. - Updated the typed API interface for ?AXPBYV to include an early exit condition(when n is 0, or when alpha is 0 and beta is 1). Further updated this layer to query the right kernel from context, based on the input values of alpha and beta. - Added the necessary L1 vector kernels(i.e, ?SETV, ?ADDV, ?SCALV, ?SCAL2V and ?COPYV) to be used as part of special case handling in ?AXPBYV. - Moved the early return with negative increments from ?SCAL2V kernels to its typed API interface. - Updated the zen, zen2 and zen3 context to include function pointers for all these vector kernels. - Updated the existing ?AXPBYV vector kernels to handle only the required computation. Additional cleanup was done to these kernels. - Added accuracy and memory tests for AVX2 kernels of ?SETV ?COPYV, ?ADDV, ?SCALV, ?SCAL2V, ?AXPYV and ?AXPBYV APIs - Updated the existing thresholds in ?AXPBYV tests for complex types. This is due to the fact that every complex multiplication involves two mul ops and one add op. Further added test-cases for API level accuracy check, that includes special cases of alpha and beta. - Decomposed the reference call to ?AXPBYV with several other L1 BLAS APIs(in case of the reference not supporting its own ?AXPBYV API). The decomposition is done to match the exact operations that is done in BLIS based on alpha and/or beta values. This ensures that we test for our own compliance. AMD-Internal: [CPUPL-4861] Change-Id: Ia6d48f12f059f52b31c0bef6c75f47fd364952c6	2024-06-20 16:22:07 +05:30
Arnav Sharma	91bdf9a3eb	Gtestsuite: Bugfix for DOTXF, Changes to AXPYF - Fixed bug in ddotxf generic tests where the parameters lda_inc and inca were being read incorrectly. - Fixed bug in dotxf test wherein the y vector was being generated with length m instead of b. - Corrected function signatures to use type gtint_t instead of gint_t. - Updated the tests to use conjugate values of type char and convert to conj_t type only while invoking BLIS tests for both DOTXF and AXPYF. AMD-Internal: [CPUPL-5117] Change-Id: I0ef7af429057583a1cbf34827802e72401181caf	2024-06-07 15:05:10 +05:30
Edward Smyth	bc7d2df832	GTestSuite: misc corrections 2 - Correct value of alpha in ger ERS test. - rename ERS_IIT.cpp files to match naming convention used for other APIs. - Change all cases of gint_t to gtint_t except for dotxf, which is fixed in another commit. - Add TEST_UPPERCASE_ARGS to imatcopy and omatcopy{2} headers. - Corrected typo. AMD-Internal: [CPUPL-4500] Change-Id: I8844bb8c5941785e64daa9df5569092c19f91838	2024-05-20 03:51:21 -04:00
Edward Smyth	782e009b66	GTestSuite: check data that should just be set is not read 2 Correction to commit `8657e661fc` to allocate matrix or vector correctly when special read-only case occurs. Also define a set_matrix generator for symmetric matrices to only set upper or lower triangle to the supplied value, while setting the unused elements to a large value to help catch incorrect access to those elements. AMD-Internal: [CPUPL-4548] Change-Id: I22b3a20e2ce8be70eb27179247cd47fdb2d87b9d	2024-05-15 11:56:16 -04:00
Edward Smyth	8657e661fc	GTestSuite: check data that should just be set is not read Some BLAS routines do not require matrices or vectors to be initialized in certain use cases. For example, in GEMM when beta=zero, C is set rather than updated, thus input values of C should not be used. In these cases set the inital values of such matrices or vectors to an extreme value, to help detect if these are incorrectly being read. The extreme value can be NaN or Inf. The default is Inf, change it by running cmake ... -DEXT_VALUE=NaN AMD-Internal: [CPUPL-4548] Change-Id: I4a665363779d2496b8247f6357e970b7f23cd1eb	2024-05-10 06:29:03 -04:00
vignbala	ca6276d52b	Accuracy and memory testing of AVX512 ?SETV, ZAXPYV and ZAXPYF kernels - Added accuracy and memory tests for AVX2 and AVX512 ?SETV kernels, AVX512 ZAXPYV kernel and AVX512 ZAXPYF kernels, with fuse-factors 2, 4 and 8. - Cleanup of the code-section that declares and defines the reference compute for AXPYF operation. Corrected the type mismatch with the arguments that reference AXPYV would expect(this is used to decompose AXPYF as part of reference). Ensured usage of GTestSuite's internal alias for integer types. - Updated the API level testsuite and testing interface for AXPYF, based on the cleaup done to the reference code. AMD-Internal: [CPUPL-4974] Change-Id: I71de6c09d3877cd3dd1eaa20ab4f90e7c33eb1e1	2024-05-09 00:24:02 -04:00
Arnav Sharma	89a06cf252	Gtestsuite: Unit Tests for ZDOTV AVX512 Kernel - Updated DOTV Gtestsuite interface to invoke C/ZDOTC when conjx='c' and testing interface is either BLAS or CBLAS. - Added ukr tests for bli_zdotv_zen4_asm_avx512( ... ) and bli_zdotv_zen_int_avx512( ... ) kernels. AMD-Internal: [CPUPL-5011] Change-Id: I32fb69027a35d9ea92f997a095d412c8242a4b68	2024-05-08 09:20:31 -04:00
Edward Smyth	82e628b833	GTestSuite: seg faults in data generator Following a recent change to the data generators to allow a stride to be specified (`60cc23f3d3`), seg faults can occur if m<=0 for column storage or n<=0 for row storage. Prevent this by having separarate code paths to handle these scenarios. AMD-Internal: [CPUPL-4500] Change-Id: I23ed8b2dccaaca140e2ddfda45bcdb4c888d5708	2024-05-01 05:46:52 -04:00
Edward Smyth	c2d4f1d7a5	GTestSuite: Avoid infinite recursion in generators Previous commit introduced an infinite recursion problem in generators for symmetric matrices. This was reported as a compiler warning by gcc 12.2 but not by gcc 11.4. AMD-Internal: [CPUPL-4862] Change-Id: I8642b81a62f0643b5a9ebedb4fcc83b25542de1b	2024-04-04 19:46:18 +05:30
Vignesh Balasubramanian	60cc23f3d3	Test-case development for ?IMATCOPY and ?OMATCOPY2 APIs - Added test-cases to verify the functional behaviour of the BLAS-extension API ?imatcopy_() and ?omatcopy2_(). The test-cases cover the following categories for the supported datatypes : - Functional and memory testing. - Negative parameter testing with invalid inputs. - Early return scenarios. - Exception value testing. - Updated functions in testinghelpers to include strides in addition to leading-dimension, when initializing a matrix. The default value for stride is set as 1. - Implemented functions to load the reference symbol, based on the choice of the reference library. The function definition is overloaded due to different API standards being exposed by different libraries. - Code cleanup of files for ?OMATCOPY API. AMD-Internal: [CPUPL-4862] Change-Id: If63b348f517e2cde1fe48f3a195808b33a91c312	2024-04-04 16:26:20 +05:30
Nimmy Krishnan	9226641585	Gtestsuite: Added overflow and underflow tests for dgemm - Added overflow and underflow tests for dgemm These tests cause floating point overflow and underflow by feeding values close to DBL_MAX and DBL_MIN values to matrices DBL_MAX = 1.7976931348623158e+308 DBL_MIN = 2.2250738585072014e-308 When computations result in values beyond the range [DBL_MIN, DBL_MAX], it leads to an overflow or underflow condition Two new arguments are added to test_gemm routine - over_under and input_range over_under = 0 indicates overflow over_under = 1 indicates underflow input_range = -1 indicates values within overflow or underflow limits input_range = 0 indicates values very close to DBL_MIN or DBL_MAX input_range = 1 indicates values beyond DBL_MIN or DBL_MAX - New file: dgemm_ovr_undr.cpp Overflow and underflow tests are called from this file dgemm_overflow and dgemm_underflow. This file uses cfloat header file for DBL_MIN and DBL_MAX values Signed-off-by: Nimmy Krishnan <nimmy.krishnan@amd.com> AMD-Internal: [CPUPL-4492] Change-Id: I4bbd519abacc56f322c73d6c0187ed6e1abbbf2b	2024-04-01 11:21:10 +00:00
Vignesh Balasubramanian	70b57cd16f	Test-case development for ?OMATCOPY APIs - Added test-cases to verify the functional behaviour of the BLAS-extension API ?omatcopy_(). The test-cases cover the following categories for the supported datatypes : - Functional and memory testing. - Negative parameter testing with invalid inputs. - Early return scenarios. - Exception value testing. - Implemented a function to load the reference symbol, based on the choice of the reference library. The function definition is overloaded due to different API standards being exposed by different libraries. AMD-Internal: [CPUPL-4810][SWLCSG-2706] Change-Id: I8dcaeeaa36d392b752eb0685e32583a12ddc4220	2024-03-27 23:34:47 -04:00
Edward Smyth	21e66b667d	GTestSuite: Misc corrections - Handle -0.0 separately in get_value_string() - Avoid unused variable warning when not TEST_BLIS_TYPED in subv_evt_testing.cpp - Remove unused variables in dgemm_ukernel.cpp - Remove unnecessary local copies of greenzone1 in test programs now that greenzone_1 and greenzone_2 will not overlap. - Protect tests of haswell kernels by ifdef on BLIS_KERNELS_HASWELL rather than BLIS_KERNELS_ZEN. - Added GTEST_ALLOW_UNINSTANTIATED_PARAMETERIZED_TEST statements in TRSM kernel tests. - Correct descriptions of trsm and trmm operations. - Correct typos. AMD-Internal: [CPUPL-4500] Change-Id: If8520347e417785e6aa953a0c8a65d4f5f3c1591	2024-03-25 09:32:50 -04:00
Mangala V	220c7bb627	Gtestsuite: Added test for ?SWAPV - Added API tests - Added Invalid input test cases (IIT). - Added memory testing for SWAPV API. - Added micro kernel testing for single and double precision - Added reference swapv functionality in testinghelpers - Added binary comparison method for two vectors with different increments in check_error.h AMD-Internal: [CPUPL-4814] Change-Id: I32bcca51b4e998d51ede70869035da76a7f6dbca	2024-03-25 22:56:31 +05:30
Shubham Sharma	9c40473a96	GTestSuite: Added Tests for DTRSV - Added API tests for DTRSV. - Added Extreme Value Test cases (EVT) for DTRSV. - Tests for various combinations of INFs and NANs for X vector and B matrix are added. - Added Invalid input test cases (IIT). - Added memory testing for DTRSV kernels. - Fixed a bug in alphax function where scaling of a vector with a scalar was not handled correctly when incx was negative. AMD-Internal: [CPUPL-4715] Change-Id: I84c873e98f845e05b11860e7ef6083d1184489b4	2024-03-20 01:05:57 -04:00
Eleni Vlachopoulou	068c2f6ba6	GTestSuite: Generic changes so that GTestSuite builds and runs as expected. - Updating printing functionality for vectors and matrices. - Adding macro definition checks so that GTestSuite builds successfully for shared libraries on zen3. - Casting integers so that code builds for ILP64. AMD-Internal: [CPUPL-4500] Change-Id: I03afd08d5ad8ae50193d9559cf4ab8fc1d08753c	2024-03-14 23:47:33 +05:30
Edward Smyth	d61a74ec8f	GTestSuite: Allow fp values in test names Modifications to testinghelpers::get_value_string() to allow floating point values (e.g. for alpha and beta) to be used in generating test names. Values will be generated in the form 1p3 or m2p4, or 3p0_4p5i for complex data. One decimal place is currently enabled but this can be increased if needed. This helps prevent duplicate test name errors when the list of values for alpha or beta includes e.g. 1.0 and 1.3. Also add support in testinghelpers::get_value_string() for variables of type gtint_t. AMD-Internal: [CPUPL-4500] Change-Id: Icc8ca3c3cfacd7d46fffefee5a6e05452f704d4e	2024-03-14 11:01:58 -04:00
Harsh Dave	51e1bfc1f1	Added dotxf reference implementation for gtestsuite - dotxf is a blis specific kernel, which performs dotxv operation but in multiple of fused factors to speed up the operations. - So dotxf reference function is implemented for gtestsuite, where dotxf computation compared against computation done by looping over dotxv function. AMD-Internal: [CPUPL-4764] Change-Id: I342dab066ceb1710649e54bb73afc5a23e2a8177	2024-03-14 05:05:49 -04:00
Harsh Dave	f10d6eced6	Added axpyf reference implementation for gtestsuite - axpyf is a blis specific kernel, which performs axpy operation but in multiple of fused factors to speed up the operations. - So axpyf reference function is implemented for gtestsuite, where axpyf computation compared against computation done by looping over axpy function. AMD-Internal: [CPUPL-4763] Change-Id: I4713fd0b0d9e9cf688c9aaa82ac0e6ae07a05989	2024-03-14 02:54:36 -04:00
Shubham Sharma	9968821ed9	GTestSuite: Added tests for STRSM - Added API tests for STRSM. - Added Extreme Value Test cases (EVT) for STRSM. - Tests for various combinations of (+/-) INFs and NANs in A and B matrix are added. - Added micro kernel testing - Added unit tests for small and native path kernels. - Added memory testing for STRSM kernels. - Edited the protected buffer in memory testing to make sure that greenzone1 and greenzone2 do not intersect. AMD-Internal: [CPUPL-4640] Change-Id: Ic48590d3b4ad12c4f2f6beaec2e1106a7aaa5213	2024-02-29 23:40:17 -05:00
Arnav Sharma	98b28368d8	Functional Tests for ZSCALV and ZDSCALV - Updated test_scalv and ref_scalv templates for SCALV gtestsuite to support unit-tests for mixed precision SCALV. - Added unit-tests for the following kernels: ZSCALV - bli_zscalv_zen_int( ... ) ZDSCALV - bli_zdscalv_zen_int10( ... ) - bli_zdscalv_zen_int_avx512( ... ) - Also, added API level unit-tests for the following cases: - Unit Positive Increments - Non-Unit Positive Increments - Updated comments in DSCALV unit-tests with the correct kernel name. AMD-Internal: [CPUPL-4624] Change-Id: I96db8d3612687be07cd0e638a3119d41c3641ce8	2024-02-28 12:21:05 +05:30
Edward Smyth	936a0a29df	GTestSuite: BLAS2 thresholds Modify thresholds to reflect number of operations that accumulate results into each output element. Different limits are set for early return and special cases. Constants are still subject to experimentation and change. AMD-Internal: [CPUPL-4378] Change-Id: Ic4540a2f1f6cd6380228b6a2884ac62850d6d8c6	2024-02-27 11:52:48 -05:00
Arnav Sharma	38af5752c4	Simplified and Fixed gtestsuite get_value_string - Simplified the get_value_string( ... ) for complex types. AMD-Internal: [CPUPL-4653] Change-Id: I5bf8f6fe5753d0037b52bc4e31f87ad27b5d2c1c	2024-02-27 10:37:41 -05:00
Shubham Sharma	de92fb0680	Added Memory testing for DTRSM - Added framework for memory testing. - Out of bound reads and writes can be detected in both C and assembly. - Added memory tests for DTRSM. - Test methodology: - Use linux's protected pages to set some memory before and after the required buffer as protected. - Set the first and last page_size bytes as read, write and execute protected (red_zones). - If any part of code tries to read/write in redzones, a SIGSEGV signal will be generated, which can be used to detect a out of bounds read and write. - Page protection can only be set per page. If required size for buffer is not a multiple of pagesize we have to allocate more memory than required in order make sure the start and end of redzones align with page boundaries. - Overwrite malloc(size) to allocate 'buffer_size+(2*pagesize)' where buffer_size = minimum size such that buffer_size > 'size' and buffer_size is multiple of pagesize. - Use first and last page_size bytes of allocated buffer as redzones, use first 'size' of the middle buffer as first greenzone and last 'size' bytes as second greenzone. - Call test code once with first geenzone and then with second greenzone. Greenzones are surrounded by redzones, if test code read/writes before or after greenzones, it will be detected. \|_____________________________________________________\| \| red_zone1 \| green_zone1 greenzone_2 \| red_zone2\| \|_____________________________________________________\| AMD-Internal: [CPUPL-4403] Change-Id: Ic5c22a9adf8f833c77510686eee886485e894354	2024-02-19 23:41:28 -05:00
Edward Smyth	00accfb3b1	GTestSuite: option to test with threshold = zero Add cmake option to override thresholds and set them all to zero. In this case we don't switch to binary comparison as we want the error to be calculated and printed. This functionality is intended for: - Helping to determine or alter thresholds. - To compare different max errors between different reference libraries. - To test when we expect identical results, e.g. some comparisons of BLIS vs BLIS. To simplify coding, this is implemented by setting epsilon to zero in the testinghelpers function. AMD-Internal: [CPUPL-4400] Change-Id: I2cf021e0cc24c62e7600ba80fd810f3aa55a6ea5	2024-02-08 11:06:25 -05:00

1 2

75 Commits