amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-22 01:18:18 +00:00

Author	SHA1	Message	Date
Edward Smyth	9500cbee63	Code cleanup: spelling corrections Corrections for some spelling mistakes in comments. AMD-Internal: [CPUPL-3519] Change-Id: I9a82518cde6476bc77fc3861a4b9f8729c6380ba	2023-11-09 00:16:30 -05:00
Arnav Sharma	dd1cf23090	Gtestsuite Update for Pack and Compute Extension APIs - Pack and compute are now compared against GEMM operation of reference library when MKL is not used as a reference. - For the case where both A and B are unpacked, the reference GEMM is invoked with a unit-alpha scalar. - If MKL is used as reference, then these APIs are compared against pack and compute operations of MKL. - Updated description in ref_gemm_compute.cpp to reflect this behavior. AMD-Internal: [CPUPL-4084] Change-Id: Id0521c9cad8743a7ae471a7f3c547ceb67191f86	2023-11-03 09:45:42 -04:00
Arnav Sharma	44dfc7a515	Fix for gemm_compute BLAS Check - BLAS compute checks updated to properly check for rs_c and cs_c. - Updated BLAS compute checks to skip validity check if m==1 or n==1. For the same reason, added a check just before to validate rs_c and cs_c are greater than or equal to 1. - Added tiny size tests to gtestsuite as a sanity check. - Also updated the Invalid Input Tests to test for the updated checks. AMD-Internal: [CPUPL-4140] Change-Id: I984339ec7909778b58409ffcdbeed4ee33f28cfb	2023-11-03 09:41:16 -04:00
Vignesh Balasubramanian	84faccdd7d	Enabling the vectorized path for SNRM2_ - Enabled the vectorized AVX-2 code-path for SNRM2_. The framework queries the architecture ID and calls the vectorized kernel based on the architecture support. - In case of not having the architecture support, we use the default path based on the sumsqv method. AMD-Internal: [CPUPL-3277] Change-Id: Ic60c0782dec0b7eb09fac21818eb625e57b1d14f	2023-11-03 17:45:56 +05:30
Arnav Sharma	c1612f6838	Gtestsuite Framework and Unit Tests for Pack and Compute Extension APIs - Added framework for unit testing of BLAS and CBLAS interfaces for the Pack and Compute Extension APIs. - These test the integrated functionality of the trio of ?gemm_pack_get_size(), ?gemm_pack() and ?gemm_compute() APIs. - Note: Only MKL can be used as reference for now. AMD-Internal: [CPUPL-3560] Change-Id: I801654447a716da06c9ccf9db01d553817871571	2023-10-16 09:35:42 -04:00
Vignesh Balasubramanian	81161066e5	Multithreading the DNRM2 and DZNRM2 API - Updated the bli_dnormfv_unb_var1( ... ) and bli_znormfv_unb_var1( ... ) function to support multithreaded calls to the respective computational kernels, if and when the OpenMP support is enabled. - Added the logic to distribute the job among the threads such that only one thread has to deal with fringe case(if required). The remaining threads will execute only the AVX-2 code section of the computational kernel. - Added reduction logic post parallel region, to handle overflow and/or underflow conditions as per the mandate. The reduction for both the APIs involve calling the vectorized kernel of dnormfv operation. - Added changes to the kernel to have the scaling factors and thresholds prebroadcasted onto the registers, instead of broadcasting every time on a need basis. - Non-unit stride cases are packed to be redirected to the vectorized implementation. In case the packing fails, the input is handled by the fringe case loop in the kernel. - Added the SSE implementation in bli_dnorm2fv_unb_var1_avx2( ... ) and bli_dznorm2fv_unb_var1_avx2( ... ) kernels, to handle fringe cases of size = 2 ( and ) size = 1 or non-unit strides respectively. AMD-Internal: [CPUPL-3916][CPUPL-3633] Change-Id: Ib9131568d4c048b7e5f2b82526145622a5e8f93d	2023-10-16 07:26:27 -04:00
Harsh Dave	7a4f84fbac	Optimized dgemm for tiny input sizes. - This commit focused on enhancing the performance of dgemm for matrices for very small dimenstions. - blis_dgemm_tiny function re-uses dgemm sup kernels, bypassing the conventional SUP framework code path. As SUP framework code path requires the creation and initilization of blis objects, accessing all the needed meta-information from objects, querying contexts which adds performance penaulty while computing for matrices with very small dimensions. - To avoid such performance penaulty blis_dgemm_tiny function implements a lightweight support code so that it can re-use dgemm SUP kernels such a way that it directly operates on input buffers. It avoids framework overhead of creating and intializing blis objects, context intialization, accessing other large framework data structures. - blis_dgemm_tiny function checks for threshold condition to match before picking the kernel. For zen, zen2, zen3 architecture tiny kernel is invoked for any shape as long as m < 8 and k <= 1500 or m < 1000 and n <= 24 and k <=1500. While for zen4 as long as dimensions are less than 1500 for m,n,k tiny kernel is invoked. -blis_dgemm_tiny function supports single threaded computation as of now. AMD-Internal: [CPUPL-3574] Change-Id: Ife66d35b51add4fccbeebd29911e0c957e59a05f	2023-10-16 05:52:49 -04:00
Vignesh Balasubramanian	a6a67fea2d	ZAXPBYV optimizations for handling unit and non-unit strides - Updated the bli_zaxpbyv_zen_int( ... ) kernel's computational logic. The kernel performs two different sets of compute based on the value of alpha, for both unit and non-unit strides. There are no constraints on beta scaling of the 'y' vector. - Updated the logic to support 'x' conjugate in the computation. The kernel supports conjugate/no conjugate operation through the usage of _mm256_fmsubadd_pd( ... ) and _mm256_addsub_pd( ... ) intrinsics. - Updated the early return condition in the kernel to adhere to the standard compliance. - Updated the scalar computation with vector computation(using 128 bit registers), in case of dealing with a single element(fringe case) in unit-stride or vectors with non-unit strides. A single dcomplex element occupies 128 bits in memory, thereby providing scope for this optimization. - Added accuracy and extreme value testing with sufficient sizes and initializations, to test the required main and fringe cases of the computation. AMD-Internal: [CPUPL-3623] Change-Id: I7ae918856e7aba49424162290f3e3d592c244826	2023-10-12 06:31:08 -04:00
jagar	5d578684ea	GtestSuite: Update in source code to make it compatible on MSVC(windows) AMD-Internal: [CPUPL-2732] Change-Id: Ifd9372bf9b0f00c2bf24442ea8519bfcf4e5db5b	2023-10-09 04:43:29 -04:00
jagar	712a84d50f	Gtestsuite: Update in cmake to search reflib in given path AMD-Internal: [CPUPL-2732] Change-Id: Ide2b98a95f81f394c7c01cc3a3b5ae6fa0403a82	2023-10-05 05:39:27 -04:00
jagar	29711dd5a3	Gtestsuite: Updated testings_basics.* to print matrix/vector name AMD-Internal: [CPUPL-2732] Change-Id: I89b4ffc97ea852e66f42b82058af67c16144fbf6	2023-09-26 08:27:19 -04:00
Vignesh Balasubramanian	32104c400c	GTestSuite : Designing test cases for ZGEMM - Designed test cases for unit testing of ZGEMM compute kernel for handling inputs when k == 1. The design uses value-parameterized testing for checking accuracy, and verifying the mandate in case of exception values on the inputs/output. - The design uses type-parameterized testing for verifying BLAS standard for invalid input cases, and also for early return scenarios. - Added the function template set_ev_mat( ... ) as part of testinghelpers. This function is used as a helper for inducing exception values onto indices specified as arguments to the test_gemm( ... ) interface. - Abstracted the function definition of getValueString( ... ) from the NRM2 testing interface to testinghelpers(renamed as get_value_string( ... ) for naming consistency), in order to use it as a helper function across all APIs in case of exception value testing. AMD-Internal: [CPUPL-3823] Change-Id: I0fea21f9c8759bbbdc88ba0a016202753e28f2a7	2023-09-08 17:36:57 +05:30
Eleni Vlachopoulou	a6641dec0b	Updating GTestSuite CMake system to enable testing BLIS libraries on Windows. - Renaming ELEMENT_TYPE to BLIS_ELEMENT_TYPE, since the first is defined on a Windows header. - Updating refCBLAS object to have different implementation depending on the platform. - Removing dlfcn.h from all reference headers since it's linux specific and adding it conditionally on a higher level. - Changes on all CMakeLists.txt files to enable building on Windows. AMD-Internal: [CPUPL-2732] Change-Id: I6e35656a3779b35dc815a2409cf84c22dd27f3e7	2023-08-29 16:11:22 +05:30
Eleni Vlachopoulou	fa77d0415a	Updating nrm2 GTestSuite testing - Adding default template parameter for the type of the returned value from nrm2. - Bugfix on NaN/Inf comparator for scalars. - Tuning sizes of vector x to exercise the different paths for vectorized and scalar code. - Adding wrong parameters and extreme value testing. - Adding tests for overflow and underflow using max and min representable numbers for vectorized and scalar code. AMD-Internal: [CPUPL-2732] Change-Id: Ice8ee65095ecaa7b30ebd5f90ed2a890178533db	2023-07-28 05:03:00 -04:00
jagar	fb6f1380b2	Gtestsuite:Added util functions - Functions to print matrix and vector elements. - Functions to convert matrix to symmetric, hermitian triangular matrix and set diagonal elements in matrix. AMD-Internal: [CPUPL-2732] Change-Id: I1ffa5289329cbb8a9581bf545bdd157801cf5baa	2023-06-27 16:33:57 +05:30
jagar	003d1e9ae6	GTestSuite: Using ELEMENT_TYPE to specify generation of random numbers in tests. Since random numbers are specified from ELEMENT_TYPE and we never generate tests for both integer and floating point numbers at the same time, we update code as described below: - random vector/matrix generators are updated to use ELEMENT_TYPE as a default parameter. - ::testing::Values(ELEMENT_TYPE) is removed from all test generators. AMD-Internal: [CPUPL-2732] Change-Id: Ibc6b05044502f541c9e8a7687931b1ca2903fb0c	2023-06-21 11:30:15 -04:00
Edward Smyth	7e50ba669b	Code cleanup: No newline at end of file Some text files were missing a newline at the end of the file. One has been added. Also correct file format of windows/tests/inputs.yaml, which was missed in commit `0f0277e104` AMD-Internal: [CPUPL-2870] Change-Id: Icb83a4a27033dc0ff325cb84a1cf399e953ec549	2023-04-21 10:02:48 -04:00
Edward Smyth	0f0277e104	Code cleanup: dos2unix file conversion Source and other files in some directories were a mixture of Unix and DOS file formats. Convert all relevant files to Unix format for consistency. Some Windows-specific files remain in DOS format. AMD-Internal: [CPUPL-2870] Change-Id: Ic9a0fddb2dba6dc8bcf0ad9b3cc93774a46caeeb	2023-04-21 08:41:16 -04:00
Eleni Vlachopoulou	ea484f38e6	BLIS GTestSuite fixes for ILP64. - Adding doc regarding option setting for INT64 in README. - Bugfix on template instantiation on helper function. Updated to use gtint_t instead of int. AMD-Internal: [CPUPL-2732] Change-Id: Ia52407a1ef3fdd06e905c2e3d4aa5befb80e82d6	2023-04-19 03:41:55 -04:00
jagar	a77402968c	GTestsuite: Updates in CmakeLists.txt to check libraries Updated the CmakeLists.txt to check whether the specified libraries are present or abort cmake building AMD-Internal: [CPUPL-2732] Change-Id: I90115217c228430095aa53a82dc26d16935b320f	2023-04-14 08:56:41 -04:00
jagar	f164c7fe70	Added GTestSuite helper functions - Functions to convert to cblas enums from char. - Functions to print matrix and vector elements. - Functions to set matrix and vector elements with the given value. AMD-Internal: [CPUPL-2732] Change-Id: I1046b9578c8456e89eddba4a4e8577016b9361ca	2023-04-12 09:03:08 -04:00
Eleni Vlachopoulou	e8392fedb8	GTestSuite fix on trsm tests. - Fixing thresholds to be more appropriate. - Updating the way random entries of A and B are generated so that A is diagonally dominant and the algorithm doesn't diverge. AMD-Internal: [CPUPL-2732] Change-Id: I6d5691d744ecc623f66c45e94461bd88625d7179	2023-04-11 20:01:21 +05:30
jagar	1d5c1e5803	Code coverage support in gtestsuite framework - Tools used for code coverage are : Gcov and Lcov. - We need to use macros specified by gcov during compiliation of blis and gtestsuite. - Locv will generate coverage reports in html format. AMD-Internal: [CPUPL-2732] Change-Id: I17b30b4a322b8771f2d6a4ba28986cf0ccf3fba6	2023-04-10 07:48:15 -04:00
Eleni Vlachopoulou	fa024b82ad	Adding helper functionality for wrong input testing in GTestSuite. - Added a header with correct default values to be used in tests. - Updated README to include information on how to test for wrong parameters and some explanation on how lda increments work. AMD-Internal: [CPUPL-2732] Change-Id: I4f540d46013ffe91b4acb30da2b437251c09d3bc	2023-04-06 13:32:29 -04:00
Eleni Vlachopoulou	bf3f5cafa8	BLIS GTestSuite Updates: - Fix in README.md. - Updating abs overload for scomplex and dcomplex to avoid overflow by using std::abs. - Updating comparators to take into account NaNs and Infs when measuring error. AMD-Internal: [CPUPL-2732] Change-Id: I8c12bacd9d63b2e914d0a79f337f7525dc16b733	2023-04-05 06:11:34 -05:00
jagar	f9adfa8ee4	Updated CmakeLists.txt to remove cmake generated files cmake generated files and executables are cleaned within build directory by "make distclean" command. Change-Id: I4fd5193e92958122ff10ecc634b42096f3b3716e	2023-04-05 06:11:16 -04:00
Eleni Vlachopoulou	58f85bb8f1	Adding copyright notice in gtestsuite files. Change-Id: I5097831eb7a46c56a4a2a32da4d3ee69c8b36cb5	2023-03-29 09:01:48 -04:00
Eleni Vlachopoulou	04e091fdca	BLIS GTestSuite: Link OpenMP if we test serial BLIS, but MKL is used as a reference. Change-Id: Iacafa5ecf74622fa5e1180a81305cf7a23d79055	2023-03-28 04:43:58 -04:00
Eleni Vlachopoulou	155a64e734	Introducing upgrated BLIS GTestSuite. Key features: - able to test both static and dynamic libraries - able to test BLAS, CBLAS and BLIS-typed interface - can use any CBLAS library for reference results - can build and/or run tests depending on the BLAS level or a specific API AMD-Internal: [CPUPL-2732] Change-Id: Ibe0d7938e06081526bbc54d3182ac7d17affdaf6	2023-03-21 03:17:51 +05:30
Eleni Vlachopoulou	88e549e7bd	Using CMake as the build system for both Linux and Windows: - GoogleTest headers removed. GoogleTest gets fetched at configuration time. - BLIS headers removed. A BLIS installation path is required at configuration time. - Windows has been temporarily disabled. AMD-Internal: [CPUPL-2732] Change-Id: I9e55c8e43b2733f96cd8b6e5449d79623decad5c	2022-12-13 19:09:23 +05:30
jagar	cff29bde76	Added gtestsuite folder into blis repo Moved blis gtestsuite from lib-confscript to blis repo (branch: amd-main) Change-Id: If7ad391eef66bac6d26cf5223e6043d52b746072	2022-12-07 23:57:13 -05:00

31 Commits