amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-25 02:44:31 +00:00

Author	SHA1	Message	Date
Shubham Sharma	de92fb0680	Added Memory testing for DTRSM - Added framework for memory testing. - Out of bound reads and writes can be detected in both C and assembly. - Added memory tests for DTRSM. - Test methodology: - Use linux's protected pages to set some memory before and after the required buffer as protected. - Set the first and last page_size bytes as read, write and execute protected (red_zones). - If any part of code tries to read/write in redzones, a SIGSEGV signal will be generated, which can be used to detect a out of bounds read and write. - Page protection can only be set per page. If required size for buffer is not a multiple of pagesize we have to allocate more memory than required in order make sure the start and end of redzones align with page boundaries. - Overwrite malloc(size) to allocate 'buffer_size+(2*pagesize)' where buffer_size = minimum size such that buffer_size > 'size' and buffer_size is multiple of pagesize. - Use first and last page_size bytes of allocated buffer as redzones, use first 'size' of the middle buffer as first greenzone and last 'size' bytes as second greenzone. - Call test code once with first geenzone and then with second greenzone. Greenzones are surrounded by redzones, if test code read/writes before or after greenzones, it will be detected. \|_____________________________________________________\| \| red_zone1 \| green_zone1 greenzone_2 \| red_zone2\| \|_____________________________________________________\| AMD-Internal: [CPUPL-4403] Change-Id: Ic5c22a9adf8f833c77510686eee886485e894354	2024-02-19 23:41:28 -05:00
Edward Smyth	1bd9f0c856	Define symbol dzgemm_blis_impl for non-zen configurations Non-zen configurations will use frame/compat/bla_gemm.c rather than frame/compat/bla_gemm_amd.c. In the former, change dzgemm definition to have dzgemm_blis_impl and optional dzgemm_ wrapper, as in the AMD version. AMD-Internal: [CPUPL-4082] Change-Id: I66caff56e033bda8bb4ff2d60a16f7e52af122ea	2024-02-19 05:24:39 -05:00
mangala v	41b19ba6e6	Gtestuite: ZGEMM API testing Functionality testing for below apis are carried out with various input ranges and values Interface would invoke listed API's in the below sequence if the condition is satisified List of API's - Condition SCALM : alpha = 0 GEMV : m = 1 or n = 1 Small ST : ((m0k0) <= 16384) \|\| ((n0k0) <= 16384))) SUP AVX2 : (m \|\| n \|\| k) <= 128 SUP AVX512 : (m \|\| k) <= 128 \|\| n <= 110 Native : Default path, If above API's doesn't support the given input values AMD-Internal: [CPUPL-4426] Change-Id: I40cd30a11592e4e553e09f0d81153abf0bf0b002	2024-02-16 15:49:36 +05:30
mkadavil	01b7f8c945	Matrix Add post-operation support for integer(s16\|s32) LPGEMM APIs. -This post-operation computes C = (betaC + alphaA*B) + D, where D is a matrix with dimensions and data type the same as that of C matrix. -For clang compilers (including aocc), -march=znver1 is not enabled for zen kernels. Have updated CKVECFLAGS to capture the same. AMD-Internal: [SWLCSG-2424] Change-Id: Ie369f7ea5c80ab69eea3f3e03a8d9546e14f5c09	2024-02-12 23:51:36 +05:30
Edward Smyth	00accfb3b1	GTestSuite: option to test with threshold = zero Add cmake option to override thresholds and set them all to zero. In this case we don't switch to binary comparison as we want the error to be calculated and printed. This functionality is intended for: - Helping to determine or alter thresholds. - To compare different max errors between different reference libraries. - To test when we expect identical results, e.g. some comparisons of BLIS vs BLIS. To simplify coding, this is implemented by setting epsilon to zero in the testinghelpers function. AMD-Internal: [CPUPL-4400] Change-Id: I2cf021e0cc24c62e7600ba80fd810f3aa55a6ea5	2024-02-08 11:06:25 -05:00
Edward Smyth	f3cff28838	GTestSuite: option to test upper case character arguments Add cmake option to convert all character arguments to upper case to check compliance. AMD-Internal: [CPUPL-4499] Change-Id: Ic18416d78f63b999a78253463cc15c32f7d444f4	2024-02-08 08:53:26 -05:00
jagar	099b9863cb	CMake: CMake is updated for Code Coverage CMakelists.txt is Updated to generate code coverage report in html format just by configuring cmake with -DENABLE_COVERAGE=ON. Code supports only on linux with gcc compiler cmake .. -DENABLE_COVERAGE=ON AMD-Internal: [CPUPL-2748] Change-Id: I9b36b6cc3f1f97b53e1c4ee62948a017418e3d41	2024-02-07 06:12:51 -05:00
Vignesh Balasubramanian	b210417a59	Exception Value Testing(EVT) for DAXPY and DAXPBY APIs - Added test cases to verify the compliance of DAXPY and DAXPBY APIs, through Exception Value Testing(EVT). This is done by inducing exception values in the input operands. The induction is controlled by the user, through indices given as part of the parameterized test-cases. - Various combinations of zeros, NaNs and +/-Infs have been used to verify the compliance against the standard. These combinations help in determining whether the exception value has to be propagated, or handled seperately. - Updated the daxpyvGenericTestPrint logger for uniformity across the testing categories. - Added test cases for bli_daxpyv_zen_int10( ... ) micro kernel testing to cover the loops iterating in blocks of 52 and 16 respectively. AMD-Internal: [CPUPL-4402] Change-Id: Ida6cf5e08727b4c3cb87c93bfec6be76361cfaea	2024-02-07 12:14:58 +05:30
jagar	40b1af4c3f	CMake:Added cmake for bench CMakelists.txt is added in bench. Steps are provided to build for different targets. AMD-Internal: [CPUPL-2748] Change-Id: I58027f4e42d1323cafb151224c45868bc8337ff4	2024-02-06 06:50:34 -05:00
Harsh Dave	abc414f2ec	API level testing of DGEMM kernels - Added API level tests for avx512 and avx2 k1 kernels, tiny, small, sup and native DGEMM kernels for various value of storage, M, N, K, alpha, beta AMD-Internal: [CPUPL-4404] Change-Id: Ieadf407601a8efc5a2c0956d08d791dcfa69e44b	2024-02-06 16:30:13 +05:30
Arnav Sharma	92aeab1710	Early Return Scenario (ERS) tests for ?SCALV, ?DOTV and ?ASUMV - ERS tests have been added for the above APIs as per the BLAS compliance standards. - Following are the standard tests added: ?SCALV - n <= 0 - incx <= 0 - alpha == 1 ?DOTV - n <= 0 ?ASUMV - n <= 0 - incx <= 0 - Invalid Input Tests are not required for these APIs. - Updated the micro-kernel test files to include the new macros generated for enabling and disabling architecture specific tests. - Updated the function calls for mixed-precision typed_asumv tests. AMD-Internal: [CPUPL-4406] Change-Id: Ib34b2f39809d93075ae1168682b3ef2380e03a5a	2024-02-05 11:48:52 -05:00
Eleni Vlachopoulou	58b63f149f	CMake: Updating message when generating blis.h/cblas.h. Change-Id: I7be7fe31a392c77311664cff4bba3b65c4cc7e4e	2024-02-05 11:18:29 -05:00
Edward Smyth	ee91b032ab	GTestSuite: Ensure all elements are initialized in generators Rather than relying on implicit initialization of arrays, ensure all elements are explicitly set. Array elements that are not supposed to be altered by the BLAS or BLIS API are set to a large magnitude value to aid identication of incorrect usage. This includes: - Intervening elements in vectors when incx/incy > 1. - Extra elements in column/row when lda > matrix size. - Also set unused upper/lower values in triangular matrices to similar large magnitude value. AMD-Internal: [CPUPL-4430] Change-Id: Id5e8c1a4e80687f5f462e6b5aa2accac0ab8ec21	2024-02-05 10:29:56 -05:00
Shubham Sharma	d5cd5836b1	Fixed DGEMM 8x24 kernel for beta zero - Column stride is not taken into consideration in current implementation when writing to C buffer if beta is zero and C is column major stored. - Fixed C storage in case of column major stored C when beta is zero in 8x24 DGEMM kernel. AMD-Internal: [CPUPL-4404] Change-Id: I5b8dfce962995e3238cf902b5a09dd1bf90002a8	2024-02-05 06:57:06 -05:00
mangala v	aa5731eba7	Gtestsuite: Updated SGemm test scenario 1. Earlier tests were taking long time for initialisation and running Hence removed testcases which is already covered as part of another scenario 2. Added two category of tests: a. Tests to cover all sizes of m, n, k for bli_sgemmsup_rv_zen_asm_6x16m kernel b. Tests to cover various alpha and beta values for above kernel With current update building and running takes less than 2 minutes. Change-Id: I1479a8ca960c04d4642857fdc7949458646dafb7	2024-02-05 04:22:21 -05:00
Shubham Sharma	fc91932b4a	Fixed out of bounds read in DTRSM small kernels - In 3x1 fringe case in [RLN/RUT] kernel, 4 double precision floats are being read instead of 3 doubles. - Fixed the code to read only 3 double. AMD-Internal: [CPUPL-4403] Change-Id: If0afb155efefabe13487cf322d479981f1838aa2	2024-02-02 10:31:12 +05:30
mangala v	0659a647e0	Gtestsuite: Micro Kernel Testing of ZGEMM API Summary: - Aims to perform accuracy testing of ZGEMM micro kernel. - Blis kernel is called directly from gtestuite framework. - Micro kernel is invoked with required input, output parameters. - No objects are created to call micro kernel. - No framework code would be invoked in this method. Below AVX2 & AVX512 Micro kernels are being tested using gtestsuite Native Kernels: - AVX2: bli_zgemm_haswell_asm_3x4 bli_zgemm_zen_asm_2x6(Required for TRSM computation) - AVX512: bli_zgemm_zen4_asm_12x4 bli_zgemm_zen4_asm_4x12(Required for TRSM computation) SUP Kernels: - AVX2 Kernels: bli_zgemmsup_rd_zen_asm_3x4m bli_zgemmsup_rd_zen_asm_3x2m bli_zgemmsup_rd_zen_asm_3x4n bli_zgemmsup_rd_zen_asm_2x4n bli_zgemmsup_rd_zen_asm_(2/1)x4 bli_zgemmsup_rd_zen_asm_(2/1)x2 bli_zgemmsup_rv_zen_asm_(2/1)x4 bli_zgemmsup_rv_zen_asm_(2/1)x2 bli_zgemmsup_rv_zen_asm_3x4m bli_zgemmsup_rv_zen_asm_3x2m bli_zgemmsup_rv_zen_asm_3x4n bli_zgemmsup_rv_zen_asm_2x4n bli_zgemmsup_rv_zen_asm_1x4n bli_zgemmsup_rv_zen_asm_3x2 - AVX512 kernels: bli_zgemmsup_cv_zen4_asm_12x4m bli_zgemmsup_cv_zen4_asm_12x3m bli_zgemmsup_cv_zen4_asm_12x2m bli_zgemmsup_cv_zen4_asm_12x1m bli_zgemmsup_cv_zen4_asm_8x(4/3/2/1) bli_zgemmsup_cv_zen4_asm_4x(4/3/2/1) bli_zgemmsup_cv_zen4_asm_2x(4/3/2/1) Above kernels are tested with different combination of parameters such as storage, alpha, beta, transpose & dimensions. DGEMM: Minor update in DGEMM micro kernel (Buffer allocation, comment section, order of passing arguments) AMD-Internal: [CPUPL-4426] Change-Id: I9d6ab24278450f57d13589ad89151a4acc641f08	2024-01-31 10:30:57 -05:00
Eleni Vlachopoulou	b9a808e5d8	GTestSuite: Updating datagenerators helper functions. - Moved function definitions in the header to avoid explicit template instantiations. - Templatized from and to bounds to enable combinations of integer of floating-point values. - Used an enum class for the element type instead of a char to make it more robust since chars get casted to integers. Now we should be getting better error messages if there is a missmatch. - Deleted argument for datatypes that was a leftover from the past. Default argument is used instead. Change-Id: I3f95d73f03028de46324b310826edca8057e561d	2024-01-31 07:08:25 -05:00
eashdash	ef134dc49f	Added Trans A feature for all INT8 LPGEMM APIs 1. Added Trans A feature to handle column major inputs for A matrix. 2. Trans A is enabled by on-the-go pack of A matrix. 3. The on-the-go pack of A converts a column storage MCxKC block of A into row storage MCxKC block as LPGEMM kernels are row major kernels. 4. New pack routines are added for conversion of A matrix from column major storage to row major storage. 5. LPGEMM Cntx is updated with pack kernel function pointers. 6. Packing of A matrix: - Converts column major input A to row major in blocks of MCxKC with newly added pack A functions when cs_a > 1. 7. Pack routines are added for AVX512 and AVX2 INT8 LPGEMM APIs. 8. Trans A feature is now supported in: 1. u8s8s32os32/os8 2. u8s8s16os16/os8/ou8 3. s8s8s32os32/os8 4. s8s8s16os16/os8 AMD-Internal: SWLCSG-2582 Change-Id: I7ce331545525a9a09f3853280615b55fcf2edabf	2024-01-30 03:40:56 -05:00
Vignesh Balasubramanian	ddec0c1de0	Negative parameter testing for ?COPY, ?AXPY and ?AXPBY APIs - As per the standard compliance, the ?copy(), ?axpy() and ?axpby() APIs do not require invalid input testing(IIT) with respect to the input parameters they receive, as part of BLAS and CBLAS calls. - Thus, test-cases have been added to verify early return scenarios (ERS) as per the compliance. The testsuite is type-parameterized, since the compliance for early return cases is the same across the datatypes. - Updated the conditional directives in micro-kernel(ukr) test files to include the new set of macros generated as part of the buildsystem in GTestsuite. - Updated the conditional macro to enable the appropriate code section for compilation of ref_axpbyv(), based on our choice of reference library when building GTestsuite. AMD-Internal: [CPUPL-4402] Change-Id: Ibea2bc34469b008f4d4558ce359717c08b92e978	2024-01-29 06:31:18 -05:00
Kiran Varaganti	63be4c8ce4	AOCL-BLIS changed to AOCL-BLAS AOCL-BLIS replaced with AOCL-BLAS at various places like "configure", "CMakeLists.txt" and documentation files. Change-Id: I75c3fbe8a1abc91828eeacb25672fd7bc905d226	2024-01-25 04:31:25 -05:00
Eleni Vlachopoulou	e4ac153a3e	GTestSuite: Set macros for kernel testing depending on hardware capabilities. - During configuration, CMake system detects if AVX2, AVX512, AVX512VNNI or AVX512BF16 is supported and sets up a macro. - Those macros need to be used in addition to BLIS_KERNELS_ZEN* to build/run only those tests supported by a specific architecture. Change-Id: I60adc57d3a570f7bdd6dc834e2562da6bfb52bcc	2024-01-22 08:04:12 -05:00
Shubham Sharma	c1a3dbadf1	Micro-kernel testing of DTRSM kernels - Added unit tests for avx512 and avx2 native path DTRSM kernels for various value of storage, stride, K, alpha, ldc. AMD-Internal: [CPUPL-4403] Change-Id: I42b1f08aa98c73af39a6e3bd94049965e7c51ae9	2024-01-22 06:24:17 -05:00
Shubham Sharma	006b86c22f	Added tests for DTRSM - Added API tests for DTRSM. - Added Extreme Value Test cases (EVT) for DTRSM. - Tests for various combinations of INFs and NANs in A and B matrix are added. - Added Invalid input test cases (IIT). - Added tests to check for cases where inputs are not blas compliant. AMD-Internal: [CPUPL-4403] Change-Id: Id8af1f1ec65a4e5bc7abba4e86df2756bce6cd42	2024-01-22 06:23:57 -05:00
Harsh Dave	156bc734f0	Micro-kernel testing of DGEMM kernels - Added unit tests for avx512 and avx2 native and sup path DGEMM kernels for various value of storage, M, N K, alpha, beta, ldc. AMD-Internal: [CPUPL-4404] Change-Id: I33a8098b6a20b55c9f1f1bcffa6812bd792890b1	2024-01-22 05:39:45 -05:00
Arnav Sharma	823e8bfb2d	Functional Testing for DDOTV, DSCALV and DASUMV - Added unit-tests for the following kernels: DDOTV - bli_ddotv_zen_int( ... ) - bli_ddotv_zen_int10( ... ) - bli_ddotv_zen_int_avx512( ... ) DSCALV - bli_dscalv_zen_int( ... ) - bli_dscalv_zen_int10( ... ) - bli_dscalv_zen_int_avx512( ... ) - Added API level unit-tests for the following cases: - Unit Positive Increments - Non-Unit Positive Increments - Negative Increments - Added gtestsuite framework for (s/d/sc/dz)ASUMV. AMD-Internal: [CPUPL-4406] Change-Id: I086c51c563fecc7a7e67791c4c4eee8b56c5417b	2024-01-19 07:05:11 -05:00
Edward Smyth	05be482203	GTestSuite: Threshold comparison Changes to threshold comparison: - Use error <= threshold as measure of success rather than error < threshold. - Report error compared to epsilon as well as absolute value. - Correct typo. AMD-Internal: [CPUPL-4378] Change-Id: I58e718504ee863294dcdd6bd3cd7637de2638dbc	2024-01-19 05:05:10 -05:00
Vignesh Balasubramanian	476ae9359c	Functionality testing of DAXPBYV, DAXPYV and DCOPYV APIs - Implemented a design to allow isolation of micro-kernels to compare against the standard reference as part of GTestSuite. The design requires the kernel address to be passed as one of the values from the instantiator. This is further sent to the testing interface, which makes the call to the micro-kernel directly. - The testing interface is templatized with both the datatype and the function-pointer type. This interface makes the direct call to the micro-kernel(address passed as a parameter), in addition to the call to the reference API. - Added unit tests to cover the functionality testing of the following kernels : - bli_daxpbyv_zen_int10( ... ) and bli_daxpbyv_zen_int( ... ) - bli_daxpyv_zen_int10( ... ), bli_daxpyv_zen_int10( ... ) and bli_daxpyv_zen_int_avx512( ... ). - bli_dcopyvv_zen_int( ... ). Further added dummy tests for bli_saxpbyv_zen_int10( ... ), bli_saxpbyv_zen_int( ... ) and bli_zaxpbyv_zen_int( ... ) kernels to verify the templatized testing interface. - Added API level test-cases, to verify the functionality of DAXPY and DAXPBY APIs. These tests cover unit increments, negative increments(BLAS/CBLAS) and non-unit positive increments. Furthermore, for DAXPY an instantiator tests with sizes corresponding to the AOCL_DYNAMIC thresholds, since it is multithreaded at the framework level. - Updated the API-level tests for ZAXPBY to allow negative increment testing only if GTestsuite is not configured for native BLIS typed interface as reference. AMD-Internal: [CPUPL-4402] Change-Id: I86b3b52d0737075897a9e9bc5e8d9654f75072fc	2024-01-19 01:53:34 -05:00
Edward Smyth	f93ccb0cea	BLIS: zen5 cpuid and arch changes Implement initial support for Zen5 systems: - Detect new Zen5 AVXVNNI, AVX512VP2INTERSECT, MOVDIRI and MOVDIR64B instructions. - Assume for now that Zen5 will use Zen4 code path. BLIS_ARCH_TYPE=zen5 will therefore function as an alias for BLIS_ARCH_TYPE=zen4, but different hardware model will still be detected. AMD-Internal: [CPUPL-3518] Change-Id: I00fb413d743f152a5412ace3e740df1fd39a1600	2024-01-17 11:41:15 -05:00
mkadavil	864170f5cb	Scalar value support for zero-point and scale-factor. -As it stands, in LPGEMM, users are expected to pass an array of values with length the same as N dimension as inputs for zero point or scale factor. However at times, a single scalar value is used as zero point or scale factor for the entire downscaling operation. The mandate to pass an array requires the user to allocate extra memory and fill it with the scalar value so as to be used in downscaling. This limitation is lifted as part of this commit, and now scalar values can be passed as zero point or scale factor. -LPGEMM bench enhancements along with new input format to improve readability as well as flexibility. AMD-Internal: [SWLCSG-2581] Change-Id: Ibd0d89f03e1acadd099382dffcabfec324ceb50f	2024-01-12 04:37:35 +05:30
Eleni Vlachopoulou	29e4ce644d	CMake: Removing blatest-related targets for Windows/shared libs. Due to the way dlls resolve internal symbols, calling custom function xerbla_() is not possible. Because of that the targets result in errors which are independent of BLIS library. Testing Windows/static version will suffice. To enable make check and make test targets, we throw warnings for the Windows/shared cases and not depend on checkblas. AMD-Internal: [CPUPL-2748] Change-Id: Iaa93399dec5781277ee94611074f5ed4e70bcb37	2024-01-10 05:06:41 -05:00
jagar	1821c2142b	CMake:Fix in testsuite cmake to work for static-st on linux CMakeLists.txt is updated in blis/testsuite to make it work for static single thread version of BLIS. AMD-Internal: [CPUPL-2748] Change-Id: I004e19d4ddbf9cb94d6d23699893a2f684a3fb35	2024-01-09 09:13:03 -05:00
Meghana Vankadari	6567df7b12	bf16bf16f32o<bf16\|f32> Fix for scaling issue when transA is enabled. Details: - LPGEMM uses bli_pba_acquire_m with BLIS_BUFFER_FOR_A_BLOCK to checkout memory when A matrix needs to be packed. This multi-threaded lock overhead becomes prominent when m/n dimensions are relatively small, even when k is large. In order to address this, bli_pba_acquire_m is used with BLIS_BUFFER_FOR_GEN_USE for LPGEMM. For *GEN_USE, the memory is allocated using aligned malloc instead of checking out from memory pool. Experiments have shown malloc costs to be far lower than memory pool guarded by locks, especially for higher thread count. - Deleted few unnecessary instructions from packing kernels. - Replaced bench_input.txt with lesser number of inputs. AMD-Internal: [CPUPL-4329] Change-Id: I5982a0a4df9dc72fab0cffab795c23822d5c8774	2023-12-21 04:53:32 +05:30
Edward Smyth	98006ea422	fatal error: malloc.h: No such file or directory #785 Replace include of non-standard header malloc.h by stdlib.h to fix issue reported on upstream BLIS github. https://github.com/flame/blis/issues/785 AMD-Internal: [CPUPL-4307] Change-Id: I4ac5cb3164fe7050bba6579b08cc2d3ff412ccba	2023-12-13 04:05:07 -05:00
mkadavil	89bb999afa	Aligning rntm_t struct to 64 byte address to address performance issues. Non aligned rntm_t struct can potentially have its first/last cache line shared with other objects in memory. This could affect performance depending on how much the shared cache lines are used. rntm_t struct is aligned to 64 bytes to workaround this issue. Change-Id: Id0956fca771be062ada9f81e8cd75ac1f290fd8e	2023-12-12 03:39:04 -05:00
jagar	4c77ef4953	GTestsuite:Search library in user specified-path In Gtestsuite CMakeLists.txt, find_library() will search user-mentioned library in default system paths first then in user specified paths. To avoid this CMake is updated to search the user mentioned library in user specified path and ignore searching in default path. AMD-Internal: [CPUPL-4284] Change-Id: Ia99cf59eb39deac4110d3d733f17548d432dde64	2023-12-11 00:20:48 -05:00
Vignesh Balasubramanian	8693c996ac	Fixing coverity issues on SNRM2_ and SCNRM2_ - The bli_snormfv_unb_var1( ... ) and bli_cnormfv_unb_var1( ... ) functions posed an uninitialized pointer read coverity issue, due to the local rntm_t object being declared as part of the function scope, but initialized only on a need basis(i.e, when attempting to pack x vector if incx != 1). - The fix was to have the declaration and initialization inside the case where incx != 1, thereby making the scope of the rntm_t and mem_t objects more stringent. - This required an additional condition to call the kernel in case of unit stride. AMD-Internal: [CPUPL-4278] Change-Id: I763b1d4920532557749d8943f12b6df626aa5372	2023-12-06 23:56:09 +05:30
Shubham Sharma	054d4fde82	Changed threshold for ZTRSM small code path - Changed the threshold for using ZTRSM small code path when multithreading is enabled. - Very skinny matrices are not taken into consideration in existing threshold tuning. AMD-Internal: [CPUPL-4267] Change-Id: I4294ec58a8535af7a9d618ae8f0d86407b66f341	2023-12-01 02:05:28 -05:00
Edward Smyth	48444d4316	cblas.h: Correct order of including other header files Include bli_config.h before bli_system.h in ./frame/compat/cblas/src/cblas.h so that BLIS_ENABLE_SYSTEM is defined correctly before it is needed. This copies the change to ./frame/include/blis.h made in `1f527a93b9` (via merge `c6f3340125`). Also standardize some comments and formatting between blis.h and cblas.h AMD-Internal: [CPUPL-4251] Change-Id: Ie5cab646367f15003c25fa126344b02640d9106e	2023-11-24 11:46:17 -05:00
mangala v	70343cba5b	Gtestsuite: Updated sgemm testcase for sup Updated sgemm testcase to handle multiple values of alpha, beta for different input size Added sgemm testcase to cover m,n,k dimension till 20 size atleast instepsize of 1 Change-Id: Id10ba3d7a05154b171511ef11ea76297494672cd	2023-11-24 10:42:22 -05:00
Eleni Vlachopoulou	79ad303902	GTestSuite: Clean-up on build system. - and a small bugfix so that it works again on Windows. Change-Id: I986b81d74d0f00c55eee497712aed5b268211d5f	2023-11-24 04:59:36 -05:00
Bhaskar Nallani	21d6ab6a21	Improved thread balancing for aocl_gemm f32 API Description: 1. Updated the thread partition logic for aocl_gemm_f32f32f32of32 for m<MR, n<NR cases and also balanced thread in m, n directions such that each thread gets equal amount of work and not to span thread without any work. 2. Disabled dynamic enabling of packing of a and b matrixes for smaller sizes for genoa architecture. AMD-Internal: [SWLCSG-2353 , SWLCSG-2391] Change-Id: I03b2c50e592c2e9d336ea84c0e0394af63a34cec	2023-11-24 03:45:44 -05:00
mkadavil	2676ac8249	LPGEMM s32 micro-kernel updates to fix gcc10.2 compilation issue. Some AVX512 intrinsics(eg: _mm_loadu_epi8) were introduced in later versions of gcc (11+) in addition to already existing masked intrinsic (eg: _mm_mask_loadu_epi8). In order to support compilation using gcc 10.2, either the masked intrinsic or other gcc 10.2 compatible intrinsic needs to be used (eg: _mm_loadu_si128) in LPGEMM <u\|s>8s8os32 kernels. AMD-Internal: [SWLCSG-2542] Change-Id: I6cfedfdcb28711b19df63d162ab267f5eea8d2ef	2023-11-24 01:58:58 -05:00
Edward Smyth	ed5010d65b	Code cleanup: AMD copyright notice Standardize format of AMD copyright notice. AMD-Internal: [CPUPL-3519] Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0	2023-11-23 08:54:31 -05:00
Eleni Vlachopoulou	52fb555ea2	CMake: Improving how CMake system handles targets. - Instead of putting the built libraries in blis/bin directory, build them in the chosen build-cmake directory. - Install headers in <prefix>/include instead of <prefix>/include/blis. - Fix on some targets to match configure/make system. - Update documentation. AMD-Internal: [CPUPL-2748] Change-Id: I15553948209345dbee350e89965b6a3c72a4e340	2023-11-23 16:43:03 +05:30
Edward Smyth	50608f28df	BLIS: Missing clobbers (batch 7) Add missing clobbers in: - bli_gemmsup_rv_haswell kernels - spare copies of kernels in old, other and broken subdirectories - misc kernels for legacy platforms AMD-Internal: [CPUPL-3521] Change-Id: I7cdb7fd1cb29630d8b7fa914b1002a270dfe9ef5	2023-11-22 17:51:46 -05:00
Edward Smyth	f471615c66	Code cleanup: No newline at end of file Some text files were missing a newline at the end of the file. One has been added. AMD-Internal: [CPUPL-3519] Change-Id: I4b00876b1230b036723d6b56755c6ca844a7ffce	2023-11-22 17:11:10 -05:00
Edward Smyth	dc41fa3829	User selection of code path in single architecture builds User control over code path using AOCL_ENABLE_INSTRUCTIONS or BLIS_ARCH_TYPE only makes sense for fat binary builds. Thus this functionality is now disabled by default for single architecture builds. User can still override the default selections by using configure options --enable-blis-arch-type or --disable-blis-arch-type. Other changes: - include x86_64 family as using zen codepaths in cmake build system. - Update help and error messages to include AOCL_ENABLE_INSTRUCTIONS. AMD-Internal: [CPUPL-4202] Change-Id: I7aa5fcf89df8675bcc12d81f81781de647e0fcf8	2023-11-22 10:48:44 -05:00
mangala v	e0df20806a	Updated prefetching in SGEMM SUP (mask load/store) kernels 1. Prefetch only MR rows or rows required for fringe cases 2. Specify prefetching offset - the least column address supported by masked functions 3. Removed unnecessary prefetches in fringe case for mx4 kernels Updated gtestuite for sgemm calls AMD_Internal: [CPUPL-4221] Change-Id: I1e2e7d3ebce37dc54a2f0a5c1c70ce0a6d4c8d6c	2023-11-21 06:31:47 -05:00
Harsh Dave	e91d23ff05	Re-implements ddotv edge kernel using masked instructions - This commit uses avx2 and avx512 masked load instructions for handling edge case where vector size is not exact multiple of avx2/avx512 vector register size. - Thanks to Shubham, Sharma <shubham.sharma3@amd.com> for avx512 ddotv kernel changes Change-Id: I998651eeb1083caf3308f1b45bd7d55b7974bcb4	2023-11-21 02:25:00 -05:00

1 2 3 4 5 ...

3184 Commits