amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-25 02:44:31 +00:00

Author	SHA1	Message	Date
Arnav Sharma	91bdf9a3eb	Gtestsuite: Bugfix for DOTXF, Changes to AXPYF - Fixed bug in ddotxf generic tests where the parameters lda_inc and inca were being read incorrectly. - Fixed bug in dotxf test wherein the y vector was being generated with length m instead of b. - Corrected function signatures to use type gtint_t instead of gint_t. - Updated the tests to use conjugate values of type char and convert to conj_t type only while invoking BLIS tests for both DOTXF and AXPYF. AMD-Internal: [CPUPL-5117] Change-Id: I0ef7af429057583a1cbf34827802e72401181caf	2024-06-07 15:05:10 +05:30
Edward Smyth	7829a7cf85	GTestSuite: test name consistency changes 6 Improve consistency in test names across different APIs: - Improve consistency of TEST_P part of test names. - Rename _evt_testing.cpp and nrm2_extreme.cpp files to _evt.cpp to match other APIs. - Standardize naming of IIT_ERS files. Also: - Restore trsv IIT_ERS file which was misnamed in commit `a2beef3255` - Tidy ukr gemm tests to be more consistent with each other and move threshold setting to individual TEST_P functions to allow different adjustments to be made. - Similarly make trsm tests more consistent. - Tidy naming of is_memory_test variable. AMD-Internal: [CPUPL-4500] Change-Id: I0af1fc9973b02187b19a7c2488eed1b829cfdc2f	2024-06-05 11:26:16 -04:00
Nallani Bhaskar	1b79f35e6d	Updated store to avoid warning in gcc-10 Description: - _mm512_storeu_epi8 and _mm512_storeu_epi16 intrensic instructions are not available in gcc-10 - Replaced above intrensics _mm512_storeu_si512 Change-Id: I2878780b7acd040ccf45e571d486ff8c2388088c	2024-05-30 22:22:50 +05:30
mkadavil	cd032225ca	BF16 bias support for bf16bf16f32ob16. -As it stands the bf16bf16f32ob16 API expects bias array to be of type float. However actual use case requires the usage of bias array of bf16 type. The bf16 micro-kernels are updated to work with bf16 bias array by upscaling it to float type and then using it in the post-ops workflow. -Corrected register usage in bf16 JIT generator for bf16bf16f32ob16 API when k > KC. AMD-Internal: [SWLCSG-2604] Change-Id: I404e566ff59d1f3730b569eb8bef865cb7a3b4a1	2024-05-23 04:48:20 +05:30
Nallani Bhaskar	29db6eb42b	Added transB in all AVX512 based int8 API's Description: --Added support for tranB in u8s8s32o<s32\|s8> and s8s8s32o<s32\|s8> API's --Updated the bench_lpgemm by adding options to support transpose of B matrix --Updated data_gen_script.py in lpgemm bench according to latest input format. AMD-Internal: [SWLCSG-2582] Change-Id: I4a05cc390ae11440d6ff86da281dbafbeb907048	2024-05-23 03:46:13 +05:30
Edward Smyth	d9c269786a	GTestSuite: bli_zdscalv isn't created by BLIS BLIS includes the BLAS and CBLAS interfaces for zdscal but not the BLIS typed interface bli_zdscalv. Thus, when TEST_INTERFACE=BLIS_TYPED is defined, disable tests for zdscal. AMD-Internal: [CPUPL-4671] Change-Id: I397454c83e272f9e775e37e00533002576041a93	2024-05-21 15:24:02 -04:00
Vignesh Balasubramanian	947811a429	Bugfix for ?OMATCOPY2 and ?IMATCOPY APIs - Updated the parameter check for leading dimensions in the functions handling transpose case of matrix A. - Updated the logic to perform ?IMATCOPY operation. The new logic uses an auxiliary buffer to copy and scale in place, if and when needed. This is done in order to avoid overwriting any subsequent reads that might follow(specifically in case of having different leading dimensions for reading and writing). - Updated xerbla_() to throw memory allocation failure based on INFO parameter being -10. This value is specific to its use-case in ?IMATCOPY, where it is set to -10. - Updated the Extreme Value Tests(EVT) logger for ?IMATCOPY for uniformity. - Cleaned up the files to follow coding conventions. AMD-Internal: [CPUPL-4862][SWLCSG-2706] Change-Id: I34dfa2bcb66b821315e11f7ab2139c41a79ef780	2024-05-21 11:13:28 +05:30
Eleni Vlachopoulou	25bfd0a982	GTestSuite: Fix so that std::max to work properly on Windows. AMD-Internal: [CPUPL-4500] Change-Id: I73d55dd3040daf6f8aec94799cf7f3f0cc2bddc0	2024-05-20 15:59:16 +01:00
Edward Smyth	bc7d2df832	GTestSuite: misc corrections 2 - Correct value of alpha in ger ERS test. - rename ERS_IIT.cpp files to match naming convention used for other APIs. - Change all cases of gint_t to gtint_t except for dotxf, which is fixed in another commit. - Add TEST_UPPERCASE_ARGS to imatcopy and omatcopy{2} headers. - Corrected typo. AMD-Internal: [CPUPL-4500] Change-Id: I8844bb8c5941785e64daa9df5569092c19f91838	2024-05-20 03:51:21 -04:00
Eleni Vlachopoulou	e98d58b657	GTestSuite: Adjusting thresholds. -Adding multiplier for complex APIs. -Updating for trmv and trsv to reflect multiplication with alpha. AMD-Internal: [CPUPL-4500] Change-Id: I17361da5afa5d1e219b4c8a14542e2b216a7ea58	2024-05-17 09:11:59 -04:00
Edward Smyth	a69dc3669e	GTestSuite: test name consistency changes 5 Improve consistency in test names across different APIs. In this commit, standardize leading dimensions (lda, ldb, ldc) in test names. Also some misc tidying changes. AMD-Internal: [CPUPL-4500] Change-Id: Icbc82d0b9a3420ddfdb4f418396f9e56ab1765ab	2024-05-16 08:51:01 -04:00
Edward Smyth	782e009b66	GTestSuite: check data that should just be set is not read 2 Correction to commit `8657e661fc` to allocate matrix or vector correctly when special read-only case occurs. Also define a set_matrix generator for symmetric matrices to only set upper or lower triangle to the supplied value, while setting the unused elements to a large value to help catch incorrect access to those elements. AMD-Internal: [CPUPL-4548] Change-Id: I22b3a20e2ce8be70eb27179247cd47fdb2d87b9d	2024-05-15 11:56:16 -04:00
Edward Smyth	1f60b7c366	Export some BLIS internal symbols 2 Export more symbols for BLIS kernels so that AOCL libFLAME optimizations can call them directly. AMD-Internal: [CPUPL-5044] Change-Id: I45392b8a2a14ac2816141521b90b7ddb1216c733	2024-05-15 06:59:56 -04:00
Mangala V	64d9c96d45	ZGEMMT SUP: AVX512 GEMMT code for Upper variant 1. Enabled AVX512 path for - Upper variant - Different storage schemes for upper and lower variant 2. Modified mask value to handle all fringe cases correctly AMD_Internal: [CPUPL-5091] Change-Id: I4bf8aca24c1b87fff606deb05918b8e6216b729e	2024-05-15 13:08:32 +05:30
Shubham Sharma	b4bc71f3ac	Bug fix IN DAXPYF MT and Code Cleanup - Fixed bug in DAXPYF MT kernel when incx != inca. - Added AOCL Dynamic function for 1f kernels. - Moved all DOTXF and AXPYF kernels into one file. AMD-Internal: [CPUPL-4880] Change-Id: I7d9f44625bc42fad4a9e5b218ecc382efdf22cbe	2024-05-14 06:44:10 -04:00
Shubham Sharma	f4b06547fd	Enabled DGEMMT SUP optimized code for upper variant - Enabled DGEMMT SUP upper kernels in AVX512 code path. - Enabled use of optimized kernels for all the storages supported by optimized kernels. AMD-Internal: [CPUPL-4881] Change-Id: Id4486610dacaabc405fbc35b2588607c6508705e	2024-05-14 05:23:51 -04:00
Meghana Vankadari	3a8b9270e7	Implemented lpgemv for AVX512-INT8 variants - Implemented optimized lpgemv for both m == 1 and n == 1 cases. - Fixed few bugs in LPGEMV for bf16 and f32 datatypes. - Fixed few bugs in JIT-based implementation of LPGEMM for BF16 datatype. AMD-Internal: [SWLCSG-2354] Change-Id: I245fd97c8f160b148656f782d241f86097a0cf38	2024-05-14 01:55:49 +05:30
Edward Smyth	b2ed1000b3	GTestSuite: test name consistency changes 4 Improve consistency in test names across different APIs. Various changes in this patch: - Explicitly cast char variables to std::string when adding to test name. Adding the char directly was causing errors in name generation. - Use template version of print function in zdscalv and remove print function zdscalvGenericTestPrint. - Remove unused print function ztrsvPrint. - Eliminate some differences in gemm ukr print functions. - Remove extraneous API name labels in ukr axpyf and setv. - Make ukr/trsm/test_trsm_ukr.h more consistent with other files. AMD-Internal: [CPUPL-4500] Change-Id: Ib8092de216712586fe4ec0ae91698d0c1aaffd54	2024-05-13 11:11:02 -04:00
mkadavil	ec67289601	SWISH post-op support for BF16 JIT based kernels. SWISH post-op computes swish(x) = x / (1 + exp(-1 * alpha * x)). SiLU = SWISH with alpha = 1. Adding the support for swish in JIT based BF16 kernels. AMD-Internal: [SWLCSG-2387] Change-Id: I9eea0c801f5f067a5cfbd2941bc991708b86e45e	2024-05-13 01:50:32 -04:00
Edward Smyth	a94d2ddf44	GTestSuite: test name consistency changes 3 Improve consistency in test names across different APIs. In this commit, standardize storage, side, uplo, trans diag and conj in test names. AMD-Internal: [CPUPL-4500] Change-Id: Ifcdb6e9f684b134841d86087218d7aefd9cabe63	2024-05-10 08:35:19 -04:00
Hari Govind S	61d0f3b873	Additional optimisations on COPYV API - Reduced number of jump operations in AVX512 assembly kernel for SCOPYV, DCOPYV and ZCOPYV. - Fixed memory test failure for bli_zcopyv_zen_int_avx512 kernel. - Replaced existing AVX2 COPYV intrinsic kernels in bli_cntx_init_zen5.c with AVX512 assembly kernels. Change-Id: Idc11601b526d6d82cfbdf63af2fd331918b31159	2024-05-10 07:22:04 -04:00
Edward Smyth	8657e661fc	GTestSuite: check data that should just be set is not read Some BLAS routines do not require matrices or vectors to be initialized in certain use cases. For example, in GEMM when beta=zero, C is set rather than updated, thus input values of C should not be used. In these cases set the inital values of such matrices or vectors to an extreme value, to help detect if these are incorrectly being read. The extreme value can be NaN or Inf. The default is Inf, change it by running cmake ... -DEXT_VALUE=NaN AMD-Internal: [CPUPL-4548] Change-Id: I4a665363779d2496b8247f6357e970b7f23cd1eb	2024-05-10 06:29:03 -04:00
Hari Govind S	92847ae912	Gtestsuite: Memory testing for SCOPYV, DCOPYV and ZCOPYV APIs - Utilized the memory testing feature in GTestsuite to update the testing interfaces for micro-kernel testing of SCOPY, DCOPY and ZCOPY APIs. Change-Id: I3d6905f33b000b8d5e60727aa896bd869f4f441f	2024-05-09 12:10:17 -04:00
Shubham Sharma	f36468a9e9	Enabled vectorized division code in ZTRSM - Existing vectorizes code was disabled because of the failures observed in matlab tests. - The issue is caused by underflow during division when diagonal elements of A matrix are very small. - When diagonal is very small (4E-324 in case of matlab), sqauring the diagonal during divison causes the square to be rounded off to zero. - Fix is to normalise (ar) and (ai) by dividing (ar) and (ai) by max(ar, ai), this will make either (ar) or (ai) 1, and hence reduce the likelihood of underflow. AMD-Internal: [CPUPL-5052] Change-Id: Iff7893fdcb92907a12e6af8e102a92637a13ce4f	2024-05-09 01:35:39 -04:00
vignbala	ca6276d52b	Accuracy and memory testing of AVX512 ?SETV, ZAXPYV and ZAXPYF kernels - Added accuracy and memory tests for AVX2 and AVX512 ?SETV kernels, AVX512 ZAXPYV kernel and AVX512 ZAXPYF kernels, with fuse-factors 2, 4 and 8. - Cleanup of the code-section that declares and defines the reference compute for AXPYF operation. Corrected the type mismatch with the arguments that reference AXPYV would expect(this is used to decompose AXPYF as part of reference). Ensured usage of GTestSuite's internal alias for integer types. - Updated the API level testsuite and testing interface for AXPYF, based on the cleaup done to the reference code. AMD-Internal: [CPUPL-4974] Change-Id: I71de6c09d3877cd3dd1eaa20ab4f90e7c33eb1e1	2024-05-09 00:24:02 -04:00
Edward Smyth	a2beef3255	GTestSuite: break up long running tests Test programs for key APIs like GEMM take a long time to run, and even to generate the list of test cases. Break into separate test programs for different data types to enable these to run in parallel (at gtest level). In this patch we break up GEMM, TRSM, GEMV and TRSV. AMD-Internal: [CPUPL-4500] Change-Id: I21363b050d30e0402d5a1e8cbeaed2ebcc87aaeb	2024-05-08 13:36:38 -04:00
Edward Smyth	62c886feee	Export some BLIS internal symbols AOCL libFLAME optimizations directly call some internal BLIS symbols. Export them to enable this to work with the BLIS shared library. AMD-Internal: [CPUPL-5044] Change-Id: Icb62dcb51e12d72dde8434593ab17de3c227c93d	2024-05-08 12:51:32 -04:00
Arnav Sharma	cb27fad49c	ZSCALV AVX512 Kernel - Implemented ZSCALV kernel utilizing AVX512 intrinsics. - Gtestsuite: Added ukr tests for the new kernel. AMD-Internal: [CPUPL-5012] Change-Id: I75c7f4448ddd60b0f9afa53936eed37f5f99eeb2	2024-05-08 11:55:13 -04:00
Arnav Sharma	89a06cf252	Gtestsuite: Unit Tests for ZDOTV AVX512 Kernel - Updated DOTV Gtestsuite interface to invoke C/ZDOTC when conjx='c' and testing interface is either BLAS or CBLAS. - Added ukr tests for bli_zdotv_zen4_asm_avx512( ... ) and bli_zdotv_zen_int_avx512( ... ) kernels. AMD-Internal: [CPUPL-5011] Change-Id: I32fb69027a35d9ea92f997a095d412c8242a4b68	2024-05-08 09:20:31 -04:00
eseswari	e0b172174e	Added testcases for axpyv api * Functional tests are covered for saxpyv and zaxpyv. * As part of functional large size of m, stride greater than m, scalar combinations(including special cases), Zero increment tests are added for saxpyv and zaxpyv. Signed-off-by: eseswari <sangadala.eswari@amd.com> AMD-Internal: CPUPL-4413 Change-Id: I61473357680cb0f394e6e653796ec31110895fa4	2024-05-08 08:44:45 -04:00
Arnav Sharma	1dbeee4d19	ZDOTV AVX512 Kernel with MT Support - Added AVX512 kernel for ZDOTV. - Multithreaded both ZDOTC and ZDOTU with AOCL_DYNAMIC support. AMD-Internal: [CPUPL-5011] Change-Id: I56df9c07ab3b8df06267a99835b088dcada81bd8	2024-05-08 04:54:05 -04:00
eseswari	dd10c6dc5b	Added testcases for copyv API * As part of functional test cases, large size of m, stride greater than m,scalar combinations, Zero increment tests are added for ?copyv. Signed-off-by: eseswari <sangadala.eswari@amd.com> AMD-Internal: CPUPL-4412 Change-Id: I9fa74c147975bbe21263aaf48190170c6ed0a8fd	2024-05-08 04:41:43 -04:00
Eleni Vlachopoulou	7787d5af1a	GTestSuite: Updating CMake system to create executables depending on the directory structure. - Before the system was assuming 3 levels in the directory structure and was creating corresponding targets. - Now the system looks into the subdirectories of testsuite and creates a target for each subdirectory that has at least one cpp file. - Also deleted a directory that seems duplicate and was breaking builds. AMD-Internal: [CPUPL-4500] Change-Id: I03ca362b09783f1c7c5f37ab420d8ca2c2b45e2e	2024-05-08 03:46:14 -04:00
Arnav Sharma	b1d69180f9	Updated DOTV DTL in bla_dot.c - Updated DOTV DTL entry to include conjugate parameter. AMD-Internal: [CPUPL-5059] Change-Id: Id66be02fc06ff2faa18325dffe76559af2c6a5cf	2024-05-08 01:46:17 -04:00
Mangala V	e6cc2a3e22	ZGEMMT SUP Optimizations for AVX512 Existing Design: - GEMM AVX2 kernel performs computation and updates temporary C buffer - Portion of temporary C buffer is copied to output C buffer based on UPLO parameter - For diagonal blocks, using GEMM kernels is not efficient New Design: Implemented in current patch when UPLO='L' - GEMMT kernel used for computation, temporary buffer is not required. - Only required elements are computed using mask load store for all fringe cases - Exception: AVX2 code path is used when storage format is RRC, CRR, CRC - AOCL-Dynamic is added based on dimension - Check for AVX platform is added in SUP interface, It returns to native implementation if hardware doesnot support AVX platform - SUP ref_var2m is expanded for dcomplex datatype to avoid condition check which exists for double datatype AMD_Internal: [CPUPL-5006] Change-Id: I3e21404b732b8f2df9cbdba394303752fdf36286	2024-05-07 23:00:29 +05:30
Meghana Vankadari	1072770c63	Implemented LPGEMV for bf16 datatype 1. The 5 LOOP LPGEMM path is in-efficient when A or B is a vector (i.e, m == 1 or n == 1). 2. An efficient implementation is developed considering the b matrix reorder in case of m=1 and post-ops fusion. 3. When m = 1 the algorithm divide the GEMM workload in n dimension intelligently at a granularity of NR. Each thread work on A:1xk B:kx(>=NR) and produce C=1x(>NR). K is unrolled by 4 along with remainder loop. 4. When n = 1 the algorithm divide the GEMM workload in m dimension intelligently at a granularity of MR. Each thread work on A:(>=MR)xk B:kx1 and produce C = (>=MR)x1. When n=1 reordering of B is avoided to efficiently process in n one kernel. AMD-Internal: [SWLCSG-2355] Change-Id: I7497dad4c293587cbc171a5998b9f2817a4db880	2024-05-06 23:55:15 +05:30
Kiran Varaganti	fd61c69778	Fixed bug in omatcopy for when trans="t" Thanks to Zhenyu Zhu ajz34 for pointing out this bug. When trans="t" or "conjugate transpose" in the case of complex data-types the ldb should be greater than equal to cols. In the bug it was checked against "rows". Fixed this bug. Some minor code format is done. [CPUPL-4810][SWLCSG-2706] Change-Id: Ie796d25a361b2ba72eda80e8c5867d6352af901f	2024-05-06 12:57:38 -04:00
Shubham Sharma	be34169001	Fixed Matlab Failure in ZTRSM - In AVX512 ZTRSM kernel, vertorizes division code is causing failures in matlab. - The logic is identical in reference C code and intrinsics code, but intrinsics code is causing failure - Replaced optimized intrinsics code with C code. AMD-Internal: [CPUPL-5052] Change-Id: Iea184330b22c46d979867b870486066ef980eb84	2024-05-06 06:56:45 -04:00
mkadavil	118e955a22	SWISH post-op support for all LPGEMM APIs. SWISH post-op computes swish(x) = x / (1 + exp(-1 * alpha * x)). SiLU = SWISH with alpha = 1. AMD-Internal: [SWLCSG-2387] Change-Id: I55f50c74a8583a515f7ea58fa0878ccbcdd6cc26	2024-05-06 06:05:11 -04:00
Meghana Vankadari	75b9d46a40	Fix in LPGEMM for variable BLIS-int size - Modified all structs that are passed to JIT-generated code to use integer of type uint64_t rather than dim_t so that functionality is not affected when size of BLIS-internal integer is modified during configure time. Change-Id: Ib81c088072badf13da4ca73be2d4af4551b713d8	2024-05-06 02:56:47 -04:00
Shubham Sharma	7553abad8e	Fixed compilation error with AOCC in TRSV - Added a {} around zen4 switch case to avoid AOCC error. - Error is caused because in C declarations are not a statement, therefore they cannot be labled hence compiler is not able to create a lable for jump. AMD-Internal: [CPUPL-4880] Change-Id: Icfeedafd80bf9a955e430ca967b6a93dcbbf075e	2024-05-03 21:08:38 +05:30
vignbala	f8218bb9f2	Compiler warnings when using masked loads - Updated the AVX512 DOTXF kernels to use MASKZ loads instead of MASK loads when loading X vector in fringe case. This avoids compiler warnings of uninitialized vector as input to the intrinsic. - The functionality will not change when using either MASK or MASKZ loads on X, since A matrix is loaded using MASKZ loads. AMD-Internal: [CPUPL-4974] Change-Id: I1ef98a1292352d0e905cc09cd5667acd883df827	2024-05-03 09:53:36 -04:00
Edward Smyth	0a830626b2	GTestSuite: check stored value of INFO Check internal value of INFO for BLAS2 and BLAS3 routines using the bli_info_get_info_value() function added in AOCL 4.2. If testing a BLIS library that does not have this, use cmake ... -DCAN_TEST_INFO_VALUE=OFF AMD-Internal: [CPUPL-4993] Change-Id: Ida5d252b0f6727793ebfb74bb160e8cb96b61b74	2024-05-03 09:08:21 -04:00
Shubham Sharma	b70347d0d4	DGEMMT SUP Optimizations for AVX512 - In DGEMMT SUP AVX2 code path, traingular kernels are added in order to avoid temporary C buffer. - Since these kernels did not exist for AVX512, AVX2 kernels were being used in GEMMT. - AVX512 triangular GEMM kernel has been added to make sure that AVX512 kernels can be used without creating a temporary buffer. - This kernel is added only for Lower variant of GEMMT, for upper variant of DGEMMT, temporary C buffer is created, full GEMM kernel is called on temporary C and traingular region from temporary C is copied to C buffer. AMD-Internal: [CPUPL-4881] Change-Id: Id70645f79ae078ab9a7006e83d328505f1fae8a9	2024-05-03 05:11:11 -04:00
Shubham Sharma	b9e21e8701	Added ZTRSM AVX512 small code path - Kernel dimensions are 4x4. - Two kernels are implemented, Right Upper and Right lower. - In case of Left variants of TRSM, transpose is induced so that Right variant kernels can be used. - No packing is performed in these kernels. - Changes are made in the threshold to pick ZTRSM small code path. - BLIS_INLINE is removed from signature of "TRSMSMALL_KER_PROT". - These kernels do not support "ENABLE_TRSM_PREINVERSION". - Newly added kernels do not support conjugate transpose. - Added multithreading to ZTRSM small code path. AMD-Internal: [CPUPL-4324] Change-Id: I683b1d5239593e54f433e7f27497d72dfbd9141c	2024-05-03 05:10:41 -04:00
Shubham Sharma	1d983e6124	Added AVX512 kernels for DAXPYF and DDOTXF - Added DAXPYF and DDOTXF AVX512 kernels. - Fuse factor for ddotxf kernel is 8. - 2 DAXPYF kernels are added, with fuse factor 8 and 32. - Multithreading is also added to the DAXPYf kernel with fuse factor 32. - These kernels are internally used by TRSM. - Added changes in TRSV to call these kernels in ZEN4 AMD-Internal: [CPUPL-4880] Change-Id: I12850de974b437bbca07677b68bc3d6a35858770	2024-05-03 05:10:22 -04:00
Vignesh Balasubramanian	4e2966f9b0	AVX512 optimizations for ZGEMV API with transpose case - Implemented AVX512 kernels for handling the calls to ZGEMV with transpose to A matrix. - This includes the set of ZDOTXF and ZDOTXV kernels. ZDOTXF kernels include those with fuse-factor 8 (main kernel), 4 and 2(fringe kernels). - Updated the bli_zgemv_unf_var1( ... ) function to update the function pointers to these kernels, based on the configuration. AMD-Internal: [CPUPL-4974] Change-Id: I313ae0abe9dc119de849da42f9825b71f11b1fda	2024-05-03 04:38:52 -04:00
Vignesh Balasubramanian	53cb83d0cc	AVX512 optimizations for ZGEMV API with no-transpose case - Implemented AVX512 kernels for handling the calls to ZGEMV with no-transpose to A matrix. - This includes the ZAXPYF, ZAXPYV and ZSETV kernels. The set of ZAXPYF kernels include those with fuse-factor 8 (main kernel), 4 and 2(fringe kernels). - Updated the bli_zgemv_unf_var2( ... ) function to set the function pointers to these kernels, based on the configuration. Further added the call to ZSETV at this layer in case beta is 0. AMD-Internal: [CPUPL-4974] Change-Id: Iee4b724719e49023138bb16479765be44d677cd9	2024-05-03 07:04:47 +00:00
Eleni Vlachopoulou	edbbbe8791	GTestSuite: Templatizing printing function for test name. - Using a template class for the printing operator that depends on the type. - USe a macro to denote which interface is being tested. AMD-Internal: [CPUPL-4500] Change-Id: I453c4ef4842c354064f49ff32ec4bf42920cc17c	2024-05-02 12:00:17 -04:00
Edward Smyth	82e628b833	GTestSuite: seg faults in data generator Following a recent change to the data generators to allow a stride to be specified (`60cc23f3d3`), seg faults can occur if m<=0 for column storage or n<=0 for row storage. Prevent this by having separarate code paths to handle these scenarios. AMD-Internal: [CPUPL-4500] Change-Id: I23ed8b2dccaaca140e2ddfda45bcdb4c888d5708	2024-05-01 05:46:52 -04:00

1 2 3 4 5 ...

3316 Commits