amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-25 19:04:32 +00:00

Author	SHA1	Message	Date
Shubham Sharma	c1a3dbadf1	Micro-kernel testing of DTRSM kernels - Added unit tests for avx512 and avx2 native path DTRSM kernels for various value of storage, stride, K, alpha, ldc. AMD-Internal: [CPUPL-4403] Change-Id: I42b1f08aa98c73af39a6e3bd94049965e7c51ae9	2024-01-22 06:24:17 -05:00
Shubham Sharma	006b86c22f	Added tests for DTRSM - Added API tests for DTRSM. - Added Extreme Value Test cases (EVT) for DTRSM. - Tests for various combinations of INFs and NANs in A and B matrix are added. - Added Invalid input test cases (IIT). - Added tests to check for cases where inputs are not blas compliant. AMD-Internal: [CPUPL-4403] Change-Id: Id8af1f1ec65a4e5bc7abba4e86df2756bce6cd42	2024-01-22 06:23:57 -05:00
Harsh Dave	156bc734f0	Micro-kernel testing of DGEMM kernels - Added unit tests for avx512 and avx2 native and sup path DGEMM kernels for various value of storage, M, N K, alpha, beta, ldc. AMD-Internal: [CPUPL-4404] Change-Id: I33a8098b6a20b55c9f1f1bcffa6812bd792890b1	2024-01-22 05:39:45 -05:00
Arnav Sharma	823e8bfb2d	Functional Testing for DDOTV, DSCALV and DASUMV - Added unit-tests for the following kernels: DDOTV - bli_ddotv_zen_int( ... ) - bli_ddotv_zen_int10( ... ) - bli_ddotv_zen_int_avx512( ... ) DSCALV - bli_dscalv_zen_int( ... ) - bli_dscalv_zen_int10( ... ) - bli_dscalv_zen_int_avx512( ... ) - Added API level unit-tests for the following cases: - Unit Positive Increments - Non-Unit Positive Increments - Negative Increments - Added gtestsuite framework for (s/d/sc/dz)ASUMV. AMD-Internal: [CPUPL-4406] Change-Id: I086c51c563fecc7a7e67791c4c4eee8b56c5417b	2024-01-19 07:05:11 -05:00
Edward Smyth	05be482203	GTestSuite: Threshold comparison Changes to threshold comparison: - Use error <= threshold as measure of success rather than error < threshold. - Report error compared to epsilon as well as absolute value. - Correct typo. AMD-Internal: [CPUPL-4378] Change-Id: I58e718504ee863294dcdd6bd3cd7637de2638dbc	2024-01-19 05:05:10 -05:00
Vignesh Balasubramanian	476ae9359c	Functionality testing of DAXPBYV, DAXPYV and DCOPYV APIs - Implemented a design to allow isolation of micro-kernels to compare against the standard reference as part of GTestSuite. The design requires the kernel address to be passed as one of the values from the instantiator. This is further sent to the testing interface, which makes the call to the micro-kernel directly. - The testing interface is templatized with both the datatype and the function-pointer type. This interface makes the direct call to the micro-kernel(address passed as a parameter), in addition to the call to the reference API. - Added unit tests to cover the functionality testing of the following kernels : - bli_daxpbyv_zen_int10( ... ) and bli_daxpbyv_zen_int( ... ) - bli_daxpyv_zen_int10( ... ), bli_daxpyv_zen_int10( ... ) and bli_daxpyv_zen_int_avx512( ... ). - bli_dcopyvv_zen_int( ... ). Further added dummy tests for bli_saxpbyv_zen_int10( ... ), bli_saxpbyv_zen_int( ... ) and bli_zaxpbyv_zen_int( ... ) kernels to verify the templatized testing interface. - Added API level test-cases, to verify the functionality of DAXPY and DAXPBY APIs. These tests cover unit increments, negative increments(BLAS/CBLAS) and non-unit positive increments. Furthermore, for DAXPY an instantiator tests with sizes corresponding to the AOCL_DYNAMIC thresholds, since it is multithreaded at the framework level. - Updated the API-level tests for ZAXPBY to allow negative increment testing only if GTestsuite is not configured for native BLIS typed interface as reference. AMD-Internal: [CPUPL-4402] Change-Id: I86b3b52d0737075897a9e9bc5e8d9654f75072fc	2024-01-19 01:53:34 -05:00
Edward Smyth	f93ccb0cea	BLIS: zen5 cpuid and arch changes Implement initial support for Zen5 systems: - Detect new Zen5 AVXVNNI, AVX512VP2INTERSECT, MOVDIRI and MOVDIR64B instructions. - Assume for now that Zen5 will use Zen4 code path. BLIS_ARCH_TYPE=zen5 will therefore function as an alias for BLIS_ARCH_TYPE=zen4, but different hardware model will still be detected. AMD-Internal: [CPUPL-3518] Change-Id: I00fb413d743f152a5412ace3e740df1fd39a1600	2024-01-17 11:41:15 -05:00
mkadavil	864170f5cb	Scalar value support for zero-point and scale-factor. -As it stands, in LPGEMM, users are expected to pass an array of values with length the same as N dimension as inputs for zero point or scale factor. However at times, a single scalar value is used as zero point or scale factor for the entire downscaling operation. The mandate to pass an array requires the user to allocate extra memory and fill it with the scalar value so as to be used in downscaling. This limitation is lifted as part of this commit, and now scalar values can be passed as zero point or scale factor. -LPGEMM bench enhancements along with new input format to improve readability as well as flexibility. AMD-Internal: [SWLCSG-2581] Change-Id: Ibd0d89f03e1acadd099382dffcabfec324ceb50f	2024-01-12 04:37:35 +05:30
Eleni Vlachopoulou	29e4ce644d	CMake: Removing blatest-related targets for Windows/shared libs. Due to the way dlls resolve internal symbols, calling custom function xerbla_() is not possible. Because of that the targets result in errors which are independent of BLIS library. Testing Windows/static version will suffice. To enable make check and make test targets, we throw warnings for the Windows/shared cases and not depend on checkblas. AMD-Internal: [CPUPL-2748] Change-Id: Iaa93399dec5781277ee94611074f5ed4e70bcb37	2024-01-10 05:06:41 -05:00
jagar	1821c2142b	CMake:Fix in testsuite cmake to work for static-st on linux CMakeLists.txt is updated in blis/testsuite to make it work for static single thread version of BLIS. AMD-Internal: [CPUPL-2748] Change-Id: I004e19d4ddbf9cb94d6d23699893a2f684a3fb35	2024-01-09 09:13:03 -05:00
Meghana Vankadari	6567df7b12	bf16bf16f32o<bf16\|f32> Fix for scaling issue when transA is enabled. Details: - LPGEMM uses bli_pba_acquire_m with BLIS_BUFFER_FOR_A_BLOCK to checkout memory when A matrix needs to be packed. This multi-threaded lock overhead becomes prominent when m/n dimensions are relatively small, even when k is large. In order to address this, bli_pba_acquire_m is used with BLIS_BUFFER_FOR_GEN_USE for LPGEMM. For *GEN_USE, the memory is allocated using aligned malloc instead of checking out from memory pool. Experiments have shown malloc costs to be far lower than memory pool guarded by locks, especially for higher thread count. - Deleted few unnecessary instructions from packing kernels. - Replaced bench_input.txt with lesser number of inputs. AMD-Internal: [CPUPL-4329] Change-Id: I5982a0a4df9dc72fab0cffab795c23822d5c8774	2023-12-21 04:53:32 +05:30
Edward Smyth	98006ea422	fatal error: malloc.h: No such file or directory #785 Replace include of non-standard header malloc.h by stdlib.h to fix issue reported on upstream BLIS github. https://github.com/flame/blis/issues/785 AMD-Internal: [CPUPL-4307] Change-Id: I4ac5cb3164fe7050bba6579b08cc2d3ff412ccba	2023-12-13 04:05:07 -05:00
mkadavil	89bb999afa	Aligning rntm_t struct to 64 byte address to address performance issues. Non aligned rntm_t struct can potentially have its first/last cache line shared with other objects in memory. This could affect performance depending on how much the shared cache lines are used. rntm_t struct is aligned to 64 bytes to workaround this issue. Change-Id: Id0956fca771be062ada9f81e8cd75ac1f290fd8e	2023-12-12 03:39:04 -05:00
jagar	4c77ef4953	GTestsuite:Search library in user specified-path In Gtestsuite CMakeLists.txt, find_library() will search user-mentioned library in default system paths first then in user specified paths. To avoid this CMake is updated to search the user mentioned library in user specified path and ignore searching in default path. AMD-Internal: [CPUPL-4284] Change-Id: Ia99cf59eb39deac4110d3d733f17548d432dde64	2023-12-11 00:20:48 -05:00
Vignesh Balasubramanian	8693c996ac	Fixing coverity issues on SNRM2_ and SCNRM2_ - The bli_snormfv_unb_var1( ... ) and bli_cnormfv_unb_var1( ... ) functions posed an uninitialized pointer read coverity issue, due to the local rntm_t object being declared as part of the function scope, but initialized only on a need basis(i.e, when attempting to pack x vector if incx != 1). - The fix was to have the declaration and initialization inside the case where incx != 1, thereby making the scope of the rntm_t and mem_t objects more stringent. - This required an additional condition to call the kernel in case of unit stride. AMD-Internal: [CPUPL-4278] Change-Id: I763b1d4920532557749d8943f12b6df626aa5372	2023-12-06 23:56:09 +05:30
Shubham Sharma	054d4fde82	Changed threshold for ZTRSM small code path - Changed the threshold for using ZTRSM small code path when multithreading is enabled. - Very skinny matrices are not taken into consideration in existing threshold tuning. AMD-Internal: [CPUPL-4267] Change-Id: I4294ec58a8535af7a9d618ae8f0d86407b66f341	2023-12-01 02:05:28 -05:00
Edward Smyth	48444d4316	cblas.h: Correct order of including other header files Include bli_config.h before bli_system.h in ./frame/compat/cblas/src/cblas.h so that BLIS_ENABLE_SYSTEM is defined correctly before it is needed. This copies the change to ./frame/include/blis.h made in `1f527a93b9` (via merge `c6f3340125`). Also standardize some comments and formatting between blis.h and cblas.h AMD-Internal: [CPUPL-4251] Change-Id: Ie5cab646367f15003c25fa126344b02640d9106e	2023-11-24 11:46:17 -05:00
mangala v	70343cba5b	Gtestsuite: Updated sgemm testcase for sup Updated sgemm testcase to handle multiple values of alpha, beta for different input size Added sgemm testcase to cover m,n,k dimension till 20 size atleast instepsize of 1 Change-Id: Id10ba3d7a05154b171511ef11ea76297494672cd	2023-11-24 10:42:22 -05:00
Eleni Vlachopoulou	79ad303902	GTestSuite: Clean-up on build system. - and a small bugfix so that it works again on Windows. Change-Id: I986b81d74d0f00c55eee497712aed5b268211d5f	2023-11-24 04:59:36 -05:00
Bhaskar Nallani	21d6ab6a21	Improved thread balancing for aocl_gemm f32 API Description: 1. Updated the thread partition logic for aocl_gemm_f32f32f32of32 for m<MR, n<NR cases and also balanced thread in m, n directions such that each thread gets equal amount of work and not to span thread without any work. 2. Disabled dynamic enabling of packing of a and b matrixes for smaller sizes for genoa architecture. AMD-Internal: [SWLCSG-2353 , SWLCSG-2391] Change-Id: I03b2c50e592c2e9d336ea84c0e0394af63a34cec	2023-11-24 03:45:44 -05:00
mkadavil	2676ac8249	LPGEMM s32 micro-kernel updates to fix gcc10.2 compilation issue. Some AVX512 intrinsics(eg: _mm_loadu_epi8) were introduced in later versions of gcc (11+) in addition to already existing masked intrinsic (eg: _mm_mask_loadu_epi8). In order to support compilation using gcc 10.2, either the masked intrinsic or other gcc 10.2 compatible intrinsic needs to be used (eg: _mm_loadu_si128) in LPGEMM <u\|s>8s8os32 kernels. AMD-Internal: [SWLCSG-2542] Change-Id: I6cfedfdcb28711b19df63d162ab267f5eea8d2ef	2023-11-24 01:58:58 -05:00
Edward Smyth	ed5010d65b	Code cleanup: AMD copyright notice Standardize format of AMD copyright notice. AMD-Internal: [CPUPL-3519] Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0	2023-11-23 08:54:31 -05:00
Eleni Vlachopoulou	52fb555ea2	CMake: Improving how CMake system handles targets. - Instead of putting the built libraries in blis/bin directory, build them in the chosen build-cmake directory. - Install headers in <prefix>/include instead of <prefix>/include/blis. - Fix on some targets to match configure/make system. - Update documentation. AMD-Internal: [CPUPL-2748] Change-Id: I15553948209345dbee350e89965b6a3c72a4e340	2023-11-23 16:43:03 +05:30
Edward Smyth	50608f28df	BLIS: Missing clobbers (batch 7) Add missing clobbers in: - bli_gemmsup_rv_haswell kernels - spare copies of kernels in old, other and broken subdirectories - misc kernels for legacy platforms AMD-Internal: [CPUPL-3521] Change-Id: I7cdb7fd1cb29630d8b7fa914b1002a270dfe9ef5	2023-11-22 17:51:46 -05:00
Edward Smyth	f471615c66	Code cleanup: No newline at end of file Some text files were missing a newline at the end of the file. One has been added. AMD-Internal: [CPUPL-3519] Change-Id: I4b00876b1230b036723d6b56755c6ca844a7ffce	2023-11-22 17:11:10 -05:00
Edward Smyth	dc41fa3829	User selection of code path in single architecture builds User control over code path using AOCL_ENABLE_INSTRUCTIONS or BLIS_ARCH_TYPE only makes sense for fat binary builds. Thus this functionality is now disabled by default for single architecture builds. User can still override the default selections by using configure options --enable-blis-arch-type or --disable-blis-arch-type. Other changes: - include x86_64 family as using zen codepaths in cmake build system. - Update help and error messages to include AOCL_ENABLE_INSTRUCTIONS. AMD-Internal: [CPUPL-4202] Change-Id: I7aa5fcf89df8675bcc12d81f81781de647e0fcf8	2023-11-22 10:48:44 -05:00
mangala v	e0df20806a	Updated prefetching in SGEMM SUP (mask load/store) kernels 1. Prefetch only MR rows or rows required for fringe cases 2. Specify prefetching offset - the least column address supported by masked functions 3. Removed unnecessary prefetches in fringe case for mx4 kernels Updated gtestuite for sgemm calls AMD_Internal: [CPUPL-4221] Change-Id: I1e2e7d3ebce37dc54a2f0a5c1c70ce0a6d4c8d6c	2023-11-21 06:31:47 -05:00
Harsh Dave	e91d23ff05	Re-implements ddotv edge kernel using masked instructions - This commit uses avx2 and avx512 masked load instructions for handling edge case where vector size is not exact multiple of avx2/avx512 vector register size. - Thanks to Shubham, Sharma <shubham.sharma3@amd.com> for avx512 ddotv kernel changes Change-Id: I998651eeb1083caf3308f1b45bd7d55b7974bcb4	2023-11-21 02:25:00 -05:00
Harsh Dave	c6ed490907	Fixed functionality failure in c/z trsm framework code. - For the inputs where either m or n is 1, based on right or left side, it invokes c/z scalv kernel and post that it scales the matrix post checking whether the input is blis conjugate transpose or not. - Previously the check condition was case sensitive *diaga = 'n', and as a result, it is always executing the "else" code-part. - Fixed the condition check. AMD-Internal: [CPUPL-4204] Change-Id: Iae2514c742ab17ac6c6e43036da095a74ad131c5	2023-11-19 09:58:46 -05:00
mangala v	3256a7b074	BugFix: Re-Designed SGEMM SUP kernel to use mask load/store instruction Segfault was reported through nightly jenkins job. Issue was observed when running in MT mode. Issue was due to extra broadcast being used. Extra broadcast would access out of bound memory on input buffer Cleaned up cobbler list by removing unused registers. AMD_Internal: [CPUPL-4180] Change-Id: I1c8715b2850ef855328f2ef12f215987299bdb2b	2023-11-17 18:14:34 +05:30
Edward Smyth	6e020ecc01	Include bli_lang_defs.h in cblas.h Changes in commit `64a1f786d5` (via merge `c6f3340125`) included in ./frame/include/bli_type_defs.h a prototype that uses the C restrict keyword. When using C++ we need to provide a definition for this C language keyword. This is done in bli_lang_defs.h which was included in blis.h but not in cblas.h. AMD-Internal: [CPUPL-4188] Change-Id: I75d5f32599d18794331ff452e562eb42afb5ae93	2023-11-14 10:52:35 -05:00
Eleni Vlachopoulou	9c9bc20c9e	CMake: Adding more dependencies in the generation of blis.h and cblas.h - In case the build directory doesn't get cleaned between different configurations this should re-generate the headers correctly. AMD-Internal: [CPUPL-2748] Change-Id: I57cd03a9ae87d8ddfee64fe8b1a1ee9ea1b7ad3c	2023-11-10 15:38:16 -05:00
Edward Smyth	c6f3340125	Merge commit '5013a6cb' into amd-main * commit '5013a6cb': More edits and fixes to docs/FAQ.md. Fixed newly broken link to CREDITS in FAQ.md. More minor fixes to FAQ.md and Sandboxes.md. Updates to FAQ.md, Sandboxes.md, and README.md. Safelist 'master', 'dev', 'amd' branches. Re-enable and fix `fb93d24`. Reverted `fb93d24`. Re-enable and fix `8e0c425` (BLIS_ENABLE_SYSTEM). Removed last vestige of #define BLIS_NUM_ARCHS. Added new packm var3 to 'gemmlike'. Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell. Fix more copy-paste errors in the haswell gemmsup code. Do a fast test on OSX. [ci skip] Fix AArch64 tests and consolidate some other tests. Use C++ cross-compiler for ARM tests. Attempt to fix cxx-test for OOT builds. Updated travis-ci.org link in README.md to .com. Disabled (at least temporarily) commit `8e0c425`. Define BLIS_OS_NONE when using --disable-system. Updated stale calls to malloc_intl() in gemmlike. Blacklist clang10/gcc9 and older for 'armsve'. Add test to Travis using C++ compiler to make sure blis.h is C++-compatible. Moved lang defs from _macro_def.h to _lang_defs.h. Minor tweaks to gemmlike sandbox. Added local _check() code to gemmlike sandbox. README.md citation updates (e.g. BLIS7 bibtex). Tweaks to gemmlike to facilitate 3rd party mods. Whitespace tweaks. Add row- and column-strides for A/B in obj_ukr_fn_t. Clean up some warnings that show up on clang/OSX. Remove schema field on obj_t (redundant) and add new API functions. Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects. Disabled sanity check in bli_pool_finalize(). Implement proposed new function pointer fields for obj_t. AMD-Internal: [CPUPL-2698] Change-Id: I6fc33351fa824580cf4f25b63f0370383cd9422d	2023-11-10 13:05:12 -05:00
mangala v	3136e57a39	Fixed memory leak issue reported by ASAN in testsuite. Memory allocated for pointer chars_for_dt was not freed at the end of function in testsuite. Freeing up of the buffer fixed the issue. AMD-Internal: [CPUPL-3932] Change-Id: I432c3ff95d289159f02a871b6d4fff5ab252ea9e	2023-11-10 12:11:55 -05:00
Eleni Vlachopoulou	acdaa91786	Revert "CMake : placing include folder under blis directory." This reverts commit `59f1333883`. Reason for revert: Potentially breaking CI Change-Id: I65a92a96896091cb92cc534d5b458070524ab75a	2023-11-10 10:37:56 -05:00
jagar	59f1333883	CMake : placing include folder under blis directory. Updating cmake files to place include folder under blis directory in new cmake system on windows. AMD-Internal: [CPUPL-2748] Change-Id: I650cca95193f7c89b39648ac1bda1fa1093b1560	2023-11-10 04:14:23 -04:00
Mangala V	f6046784ce	Re-Designed SGEMM SUP kernel to use mask load/store instruction Added all fringe kernels with mask load store support Fringe kernels cover m direction from 5 to 1 and n direction from 15 to 1 for row storage format - New edge kernels that uses masked load-store instructions for handling corner cases. - Mask load-store instruction macros are added. vmaskmovps, VMASKMOVPS for masked load-store. - It improves performance by reducing branching overhead and by being more cache friendly. - Mask load-store is added only for row storage format AMD-Internal: [CPUPL-4041] Change-Id: I563c036c79bf8e476a8ebde37f8f6db751fb3456	2023-11-10 01:23:48 -05:00
Harsh Dave	77161c1e5d	Design change of DGEMM 6x8 native kernel. - Following optimizations are included for dgemm 6x8 native kernel. 1) Reorganized the C update and store to reduce register dependencies. 2) moved the C prefetch to part-way through the kernel for efficiently prefetching C matrix at appropriate distance. 3) Offsetting A matrix, so that kernel can use a smaller instruction encoding saving, saving i-cache space. 4) Aligned the K iteration loop. - Thanks to Moore, Branden <Branden.Moore@amd.com> for these design changes of DGEMM 6x8 native kernels. - Additional change, reorganization of C update and store for beta zero case to facilitate out of order execution of storing of C matrix. Change-Id: I9d1ec8d39f1154b0f38b136bd6a04b05d7d1e6ba	2023-11-09 23:07:43 -05:00
Meghana Vankadari	77bd9a7f17	Added parameter checking for LPGEMM APIs Change-Id: I6ea89fd0d2516539e5a4e9cd8537570b23194d89	2023-11-09 21:50:55 -05:00
Vignesh Balasubramanian	bd0b50a077	Introduced fast-path to kernels in DNRM2_ and DZNRM2_ APIs - Added a conditional check to see if the vectorized kernels for DNRM2_ and DZNRM2_ can be called directly, without incurring any framework overhead. - The condition to satisfy this fast-path is for the size to be such that the ideal threads required is 1, with the vector having unit stride( so that packing at the framework-level can be avoided ). AMD-Internal: [CPUPL-4045] Change-Id: Ie37e86f802ada0e226dff88e74f0341e97ebfe28	2023-11-09 21:13:10 +05:30
Eashan Dash	e4e4fe55fb	Added Parameter Checks and DTL Trace for Extension APIs 1. Added input parameter checking for the extension APIs 1. gemm_pack_get_size API 2. gemm_pack API 2. Additionally added early returns for these APIs when m or n dimensions are 0. 3. Routines for input parameter check for all the 3 BLAS extension APIs - gemm_pack_get_size, gemm_pack and gemm_compute are defined in: frame/compat/check/bla_gemm_pack_compute_check.h 4. Added AOCL DTL TRACE for all the functions of 1. gemm_pack_get_size 2. gemm_pack 3. gemm_compute AMD-Internal: [CPUPL-3560] Change-Id: I4351b8494d888eae7e7431a7e1e23e442ffc8631	2023-11-09 18:53:59 +05:30
Eleni Vlachopoulou	75a4d2f72f	CMake: Adding new portable CMake system. - A completely new system, made to be closer to Make system. AMD-Internal: [CPUPL-2748] Change-Id: I83232786406cdc4f0a0950fb6ac8f551e5968529	2023-11-09 15:49:45 +05:30
Meghana Vankadari	0c12b72651	LPGEMM bench enhancements Details: - Moved the downscale & postop options from commmandline to input file. - Now the format of the input file is as follows: dt_in dt_out stor transa transb op_a op_b m n k lda ldb ldc postops - In case of no-postops, 'none' has to be passed in the place of postops. - Removed duplication of mat_mul_bench_main function for bf16 APIs. - Added a function called print_matrix for each datatype which can help in printing matrices while debugging. - Added printing of ref, computed and diff values while reporting failure. - Added new functions for memory allocation and freeing. Different types of memory allocation is chosen based on mode bench is running(performance or accuracy mode). Change-Id: Ia7d740c53035bc76e578a03869590c9f04396b72	2023-11-09 03:55:10 -05:00
mkadavil	ed052c6c44	Smart Threading for LPGEMM <u\|s>8s8s<16\|32>os<8\|16\|32> API. The LPGEMM micro-kernel operates on blocks of dimension MRxKC and KCxNR. Current LPGEMM design involves using all the available threads for computing the output. If the number of threads assigned along ic or jc direction is more than M/MR or N/NR blocks respectively, it could results in threads sleeping due to the lack of MR or NR blocks. This scenario is now handled by reducing the number of threads if there are threads without any work (MR or NR blocks). AMD-Internal: [SWLCSG-2354, SWLCSG-2389, SWLCSG-2267] Change-Id: I74819337c7a0d3ab05ea0e18bb42780f977ea8f6	2023-11-09 00:50:30 -05:00
Edward Smyth	d2ad268525	Compilation error when using pthreads on FreeBSD We are using pthread_self to get a thread id for use in the DTL tracing functionality to name individual output files per thread. This is not an appropriate use of pthread_self as its return type (pthread_t) is an opaque type that can vary between implementations. On linux we haven't had a problem, as pthread_t is an unsigned long int. However on freeBSD it is a pointer to an empty struct. The difference between this and the int type we used for its value within the BLIS code was causing a compile error. The best long term solution would be for pthread builds to maintain their own internal thread id. A mechanism to implement this has not yet been identifie. In the meantime, we make the following changes as a stopgap: - Explicitly cast from pthread_t return value to our BLIS internal data type AOCL_TID. - Make AOCL_TID a long int rather than pid_t (i.e. an int) in pthread builds to match the sizes expected on both Linux and FreeBSD. AMD-Internal: [CPUPL-4167] Change-Id: Ia07ee8f97273cc3bab46f6bca1eeb7954320415b	2023-11-09 00:20:13 -05:00
Edward Smyth	9500cbee63	Code cleanup: spelling corrections Corrections for some spelling mistakes in comments. AMD-Internal: [CPUPL-3519] Change-Id: I9a82518cde6476bc77fc3861a4b9f8729c6380ba	2023-11-09 00:16:30 -05:00
Harsh Dave	75356d45e5	DGEMM improvement for very tiny sizes less than 24. - This commit helps improving performance for very small input by reducing framework check and routing all such inputs to bli_dgemm_tiny_6x8_kernel. It forces single threaded computation for such sizes. - It invokes bli_dgemm_tiny_6x8_kernel for ZEN, ZEN2, ZEN3 and ZEN4 code path. Except for the case AOCL_ENABLE_INSTRUCTIONS environment variable is set to avx512. In that case, such a small inputs are routed to bli_dgemm_tiny_24x8_kernel avx512 kernel. AMD-Internal: [CPUPL-1701] Change-Id: Idf59f4a8ee76ee8f2514a33be2b618e3ce02383e	2023-11-08 23:45:57 -05:00
Arnav Sharma	008b77e94d	BLAS Compliance: SCALV Early Returns - According to BLAS Standards, SCALV should return when incx .le. 0. - To make SCALV compliant to this, added an early return inside the BLAS layer, for the cases where incx <= 0. - Also, added early return for the case where alpha is a unit scalar. AMD-Internal: [CPUPL-3562] Change-Id: Id474fdd6ed9232226f5c5381d0398f43384e4a49	2023-11-08 06:36:02 -05:00
Vignesh Balasubramanian	5f9c8c6929	Bugfix : Fallback mechanism in SNRM2 and SCNRM2 kernels if packing fails - Abstracted packing from the vectorized kernels for SNRM2 and SCNRM2 to a layer higher. - Added a scalar loop to handle compute in case of non-unit strides. This loop ensures functionality in case packing fails at the framework level. AMD-Internal: [CPUPL-3633] Change-Id: I555aea519d7434d43c541bb0f661f81105135b98	2023-11-08 15:16:10 +05:30
mangala v	fa355c0049	Removed warning during compilation of gemv api for non-zen config - When configured for haswell config "Warning unused variable 'zero'" was throwed during compilation. - Removed zero variable which is not being used AMD-Internal: [CPUPL-3973] Change-Id: I45a1f16b4c50307b07148bba63ca5332c48648b8	2023-11-08 01:43:33 -05:00

1 2 3 4 5 ...

3162 Commits