amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-04-19 15:18:52 +00:00

Author	SHA1	Message	Date
S, Hari Govind	08c757202d	Initialize mem_t structures safely and handle NULL communicator in threading - Explicitly initialize all fields of mem_t structures in bli_znormfv_unb_var1 and bli_dnormfv_unb_var1 to prevent undefined behavior when memory is not allocated. - Add a NULL check after bli_thread_broadcast() in bli_thrinfo_sup_create_for_cntl to ensure that the communicator is valid, and call bli_abort() if broadcast fails.	2025-09-17 14:10:37 +05:30
Smyth, Edward	ae6c7d86df	Tidying code - AMD specific BLAS1 and BLAS2 franework: changes to make variants more consistent with each other - Initialize kernel pointers to NULL where not immediately set - Fix code indentation and other other whitespace changes in DTL code and addon/aocl_gemm/frame/s8s8s32/lpgemm_s8s8s32_sym_quant.c - Fix typos in DTL comments - Add missing newline at end of test/CMakeLists.txt - Standardize on using arch_id variable name AMD-Internal: [CPUPL-6579]	2025-09-16 14:52:54 +01:00
Smyth, Edward	509aa07785	Standardize Zen kernel names Naming of Zen kernels and associated files was inconsistent with BLIS conventions for other sub-configurations and between different Zen generations. Other anomalies existed, e.g. dgemmsup 24x column preferred kernels names with _rv_ instead of _cv_. This patch renames kernels and file names to address these issues. AMD-Internal: [CPUPL-6579]	2025-08-19 18:19:51 +01:00
Smyth, Edward	021f6bc960	GEMMTR full set of APIs Commit `eaa76dfe28` added LAPACK 3.12 GEMMTR interfaces as aliases to existing BLIS GEMMT. Here we add full set of Fortran upper case and no underscore API aliases and _blis_impl variants. AMD-Internal: [CPUPL-6581]	2025-08-12 10:24:24 +01:00
Smyth, Edward	49ae7db89a	Avoid including .c files (#40 ) Including a C file directly in another C file is not recommended, and some build systems (e.g. Bazel and Buck) do not allow .c files to include other .c files. This commit changes the tapi and oapi framework files that are included from the _ex and _ba file variants from .c filenames to .h filenames. AMD-Internal: [CPUPL-6784] Co-authored-by: Varaganti, Kiran <Kiran.Varaganti@amd.com>	2025-06-10 11:33:33 +05:30
Edward Smyth	a5f11a1540	Add blis_impl wrappers for matrix copy etc APIs (2) Previous commit on this (`e0b86c69af`) was incorrect and incomplete. Add additional changes to enable blis_impl layer for extension APIs for copying and transposing matrices. Change-Id: Ic707e3585acc1c0c554d7e00435464620a8c85dc	2025-04-07 08:54:54 -04:00
Edward Smyth	e0b86c69af	Add blis_impl wrappers for matrix copy etc APIs BLAS and BLIS extension APIs for copying and transposing matrices currently only have one interface option. This patch adds a blis_impl layer and makes the top level interface enabled only if BLIS_ENABLE_BLAS is enabled, as with standard BLAS interfaces. Change-Id: I1b6c668e8492305b16e8735b9ed83bea3c0d3b6c	2025-04-01 08:34:26 -04:00
Vignesh Balasubramanian	da6e9defcb	Dynamic selection of AVX2 or AVX512 DNRM2 kernels - Added a kernel selection logic based on the input dimension(runtime parameter), to choose between deploying AVX2 or AVX512 computational kernel for single-thread execution. - An empirical analysis was conducted to arrive at the thresholds, for ZEN4 and ZEN5 architectures. - Updated the fast-path threshold for ZEN4 to be in hand with the tipping points of its dynamic thread-setter(used when AOCL_DYNAMIC is enabled). AMD-Internal: [CPUPL-5937] Change-Id: I96d7f167658c9e25a0098c4c67e12e4ba673e228	2024-12-10 10:53:54 +05:30
Edward Smyth	711dce14d0	Export full set of _blis_impl interfaces The _blis_impl layer provide a BLAS-like API for use in builds where BLAS and CBLAS interfaces are not desirable. This patch generates interfaces in uppercase and with and without trailing underscores, to match what is generated for the regular BLAS interface. AMD-Internal: [CPUPL-5650] Change-Id: I3ba9d0992291b0977479ab479acb71e42277c7c2	2024-09-03 04:13:06 -04:00
Vignesh Balasubramanian	68c54297bd	Fixing compiler warnings when configuring BLIS without OpenMP - Adjusted the macro-guards for variables specific to multithreading, when BLIS is configured with OpenMP. - This included calling the single-threaded kernel directly if increment is 0 as well, since this would remove an unnecessary dependency on one of the variables used only when we enable OpenMP. - Further updated the condition to pack the vector, to avoid it when increment is 0. In this case, we directly call the kernel. AMD-Internal: [CPUPL-5480] Change-Id: I31a9c6e3ffc3c4f9d5b03ed8745919ad65c99c79	2024-07-25 10:29:33 -04:00
Vignesh Balasubramanian	02da190560	AVX512 optimizations for DNRM2 - Implemented bli_dnorm2fv_unb_var1_avx512( ... ) AVX512 computational kernel for DNRM2 API. - Updated the header to include this kernel signature, as well as the framework layer to use this function in case of ZEN4 and ZEN5 configurations. - Updated the tipping points for ideal thread setting in DNRM2 for ZEN5 micro-architecture. These thresholds are specific to the library's linkage to LLVM's OpenMP or GNU's OpenMp. - Further abstracted the AOCL-DYNAMIC logic to separate functions for ?NRM2 APIs that currently support it(namely, DNRM2 and ZNRM2). - Further updated the ?NRM2 framework to accommodate the necessary changes to invoke the newer AOCL-DYNAMIC functions and the AVX512 kernel, when needed. - Added micro-kernel and memory tests for this kernel in GTestsuite, to validate accuracy and out-of-bounds read and write. AMD-Internal: [CPUPL-5265] Change-Id: I4fc0d0f1e6906bf27d46562ca387c338cc4d2049	2024-06-24 08:50:36 -04:00
srigovin	2c838dadfb	Updated return type of xerbla and xerbla_array APIs to void Return type of xerbla and xerbla_array APIs are defined as int in BLIS, but according to netlib it should be void. Updated the defination and declaration accordingly. Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com> Change-Id: I3072ba76111189de5c5cf08df83ea154163dd34d	2024-04-29 00:51:10 -04:00
Edward Smyth	2450a1813b	BLIS: Implement zen5 sub-configuration Implement full support for zen5 as a separate BLIS sub-configuration and code path within amdzen configuration family. AMD-Internal: [CPUPL-3518] Change-Id: Iaa5096e0b83bf0f0c3fd1c41e601ccd29bda3c09	2024-04-12 07:26:31 -04:00
Vignesh Balasubramanian	8693c996ac	Fixing coverity issues on SNRM2_ and SCNRM2_ - The bli_snormfv_unb_var1( ... ) and bli_cnormfv_unb_var1( ... ) functions posed an uninitialized pointer read coverity issue, due to the local rntm_t object being declared as part of the function scope, but initialized only on a need basis(i.e, when attempting to pack x vector if incx != 1). - The fix was to have the declaration and initialization inside the case where incx != 1, thereby making the scope of the rntm_t and mem_t objects more stringent. - This required an additional condition to call the kernel in case of unit stride. AMD-Internal: [CPUPL-4278] Change-Id: I763b1d4920532557749d8943f12b6df626aa5372	2023-12-06 23:56:09 +05:30
Edward Smyth	ed5010d65b	Code cleanup: AMD copyright notice Standardize format of AMD copyright notice. AMD-Internal: [CPUPL-3519] Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0	2023-11-23 08:54:31 -05:00
Vignesh Balasubramanian	bd0b50a077	Introduced fast-path to kernels in DNRM2_ and DZNRM2_ APIs - Added a conditional check to see if the vectorized kernels for DNRM2_ and DZNRM2_ can be called directly, without incurring any framework overhead. - The condition to satisfy this fast-path is for the size to be such that the ideal threads required is 1, with the vector having unit stride( so that packing at the framework-level can be avoided ). AMD-Internal: [CPUPL-4045] Change-Id: Ie37e86f802ada0e226dff88e74f0341e97ebfe28	2023-11-09 21:13:10 +05:30
Eleni Vlachopoulou	75a4d2f72f	CMake: Adding new portable CMake system. - A completely new system, made to be closer to Make system. AMD-Internal: [CPUPL-2748] Change-Id: I83232786406cdc4f0a0950fb6ac8f551e5968529	2023-11-09 15:49:45 +05:30
Vignesh Balasubramanian	5f9c8c6929	Bugfix : Fallback mechanism in SNRM2 and SCNRM2 kernels if packing fails - Abstracted packing from the vectorized kernels for SNRM2 and SCNRM2 to a layer higher. - Added a scalar loop to handle compute in case of non-unit strides. This loop ensures functionality in case packing fails at the framework level. AMD-Internal: [CPUPL-3633] Change-Id: I555aea519d7434d43c541bb0f661f81105135b98	2023-11-08 15:16:10 +05:30
Arnav Sharma	8885510db2	Fix for Missing Symbols for gemm_pack_get_size - Symbols for gemm_pack_get_size were not being exported properly when BLIS was built as a shared library. - Correctly assigned the BLIS_EXPORT_BLAS macro to ?gemm_pack_get_size_ function declaration. - Added missing gemm_pack and gemm_pack_get_size macros to bli_macro_defs.h file. - Removed an unnecessary BLIS_EXPORT_BLAS macro from dgemm_compute function definition. - Updated bli_util_api_wrap with no underscore API wrappers for pack and compute set of BLAS Extension APIs: 1. ?gemm_pack_get_size 2. ?gemm_pack 3. ?gemm_compute AMD-Internal: [CPUPL-4083] Change-Id: I78cd7642c2fcbfdf02676e654a377ad2aa5295c1	2023-11-03 08:58:59 -04:00
Vignesh Balasubramanian	84faccdd7d	Enabling the vectorized path for SNRM2_ - Enabled the vectorized AVX-2 code-path for SNRM2_. The framework queries the architecture ID and calls the vectorized kernel based on the architecture support. - In case of not having the architecture support, we use the default path based on the sumsqv method. AMD-Internal: [CPUPL-3277] Change-Id: Ic60c0782dec0b7eb09fac21818eb625e57b1d14f	2023-11-03 17:45:56 +05:30
Vignesh Balasubramanian	81161066e5	Multithreading the DNRM2 and DZNRM2 API - Updated the bli_dnormfv_unb_var1( ... ) and bli_znormfv_unb_var1( ... ) function to support multithreaded calls to the respective computational kernels, if and when the OpenMP support is enabled. - Added the logic to distribute the job among the threads such that only one thread has to deal with fringe case(if required). The remaining threads will execute only the AVX-2 code section of the computational kernel. - Added reduction logic post parallel region, to handle overflow and/or underflow conditions as per the mandate. The reduction for both the APIs involve calling the vectorized kernel of dnormfv operation. - Added changes to the kernel to have the scaling factors and thresholds prebroadcasted onto the registers, instead of broadcasting every time on a need basis. - Non-unit stride cases are packed to be redirected to the vectorized implementation. In case the packing fails, the input is handled by the fringe case loop in the kernel. - Added the SSE implementation in bli_dnorm2fv_unb_var1_avx2( ... ) and bli_dznorm2fv_unb_var1_avx2( ... ) kernels, to handle fringe cases of size = 2 ( and ) size = 1 or non-unit strides respectively. AMD-Internal: [CPUPL-3916][CPUPL-3633] Change-Id: Ib9131568d4c048b7e5f2b82526145622a5e8f93d	2023-10-16 07:26:27 -04:00
Vignesh Balasubramanian	9828039030	Bugfix : Inversion of sign bit with early return in SNRM2_ - The bli_snormfv_unb_var1( ... ) function returns early in case of n = 1, and uses the blis macro bli_fabs( ... ) to set the norm to the absolute value of the element. - This macro inverts the sign bit even if the element is 0.0. A check is added to re-invert the sign bit in this case, so that the norm is set to 0.0 instead of -0.0. - Added the same early exit condition on bli_dnormfv_unb_var1( ... ) when n = 1. AMD-Internal: [CPUPL-3923] Change-Id: If7f5ae41d2acfe89b505549d28215dde319d8c33	2023-10-10 04:21:09 -04:00
Edward Smyth	bb4c158e63	Merge commit 'b683d01b' into amd-main * commit 'b683d01b': Use extra #undef when including ba/ex API headers. Minor preprocessor/header cleanup. Fixed typo in cpp guard in bli_util_ft.h. Defined eqsc, eqv, eqm to test object equality. Defined setijv, getijv to set/get vector elements. Minor API breakage in bli_pack API. Add err_t* "return" parameter to malloc functions. Always stay initialized after BLAS compat calls. Renamed membrk files/vars/functions to pba. Switch allocator mutexes to static initialization. AMD-Internal: [CPUPL-2698] Change-Id: Ied2ca8619f144d4b8a7123ac45a1be0dda3875df	2023-08-21 07:01:38 -04:00
Edward Smyth	7e50ba669b	Code cleanup: No newline at end of file Some text files were missing a newline at the end of the file. One has been added. Also correct file format of windows/tests/inputs.yaml, which was missed in commit `0f0277e104` AMD-Internal: [CPUPL-2870] Change-Id: Icb83a4a27033dc0ff325cb84a1cf399e953ec549	2023-04-21 10:02:48 -04:00
Mangala V	5dc8e3fbca	AOCL progress callback pointer update per thread Thanks to Moore, Branden <Branden.Moore@amd.com> for identifying the race condition and suggesting the changes to fix the same Existing Design: - AOCL progress callback pointer is a global pointer which is shared across all threads Existing Design challenges: - The callback function cannot safely disable the progress mechanism, as another thread may have already checked to see if the function pointer is set, and then re-reads the pointer upon invocation of the callback. If one thread sets the callback to NULL in this time, then the resulting thread will attempt to call the null pointer as a function pointer, leading to a segfault. New Design : - Each thread maintains a local copy of progress pointer AMD-Internal: [SWLCSG-1971] Change-Id: I282989805a4a2a8a759a7373b645f3569bf42ed4	2023-04-20 05:33:12 -04:00
Shubham Sharma	036da2e651	Fixed compilation errors for generic configuration - In gemmt and normf, #ifdef BLIS_KERNELS_* is added to make sure only compiled kernels are used. - In bal_copy and bla_swap, missing '\' is added. AMD-Internal: [CPUPL-2870] Change-Id: I83452dff761f60db6957f557321ce210ab72c037	2023-04-18 00:27:05 -04:00
Edward Smyth	1ac03e64b5	BLIS cpuid tidy and bugfix. Improvements to BLIS cpuid functionality: - Tidy names of avx support test functions, especially rename bli_cpuid_is_avx_supported() to bli_cpuid_is_avx2fma3_supported() to more accurately describe what it tests. - Fix bug in frame/base/bli_check.c related to changes in commit `6861fcae91` AMD-Internal: [CPUPL-3031] Change-Id: Iacd8fb0ffbd45288e536fc6314660709055ea2d5	2023-04-03 08:46:37 -04:00
Eleni Vlachopoulou	ad7a812db2	Remove quick return for zero increments. Details: - To be BLAS compliant, if increment is zero then iterate through the first element n times. - For n<=0, the correct result (0) is returned so we remove this extra check. This is checked on BLIS-typed interface level. AMD-Internal: [SWLCSG-1900] Change-Id: I098bb9560a790050018bc8d8c63b06bfbcc1aebd	2023-03-23 23:35:03 -04:00
Aayush Kumar	5bd2a777ba	Fixed Compilation Fails when configured with --disable-blas - Moved _blis_impl function declaration outside the BLIS_ENABLE_BLAS guard. - Changed Makefile to continue to compile bla_ files to get _blis_impl interfaces. - Modify CBLAS headers, bli_macro_defs.h and bli_util_api_wrap.{c,h} to add BLIS_ENABLE_CBLAS guards. - Comment out BLIS_ENABLE_BLAS guards in various headers and utility functions. - Define BLIS Fortran-style functions lsame_blis_impl and xerbla_blis_impl. New macros PASTE_LSAME and PASTE_XERBLA are used in bla_*_check headers and some other places to select whether to call lsame and xerbla, or the _blis_impl versions. - Defined various other missing _blis_impl functions. - In bli_util_api_wrap.c, only define any functions if BLIS_ENABLE_BLAS is defined, and only define the subroutine versions of functions like dot, nrm2, etc if BLIS_ENABLE_CBLAS is defined. - BLAS layer is needed if CBLAS layer is enabled. Changed header files build/bli_config.h.in and bli_blas.h, and configure program to help ensure consistency in generated blis.h header and configure output. Undefining BLIS_ENABLE_BLAS_DEFS appears to be broken in UTA BLIS too, thus BLIS_ENABLE_BLAS_DEFS is currently permanently defined. AMD-Internal: [CPUPL-3015] Change-Id: I7c0fe07db85781db46f2c690e174451860b37635	2023-03-23 06:11:52 -04:00
Edward Smyth	1617589d24	Add consistent NaN/Inf handling in sumsqv. (#668 ) Details: - Changed sumsqv implementation as follows: - If there is a NaN (either real or imaginary), then return a sum of NaN and unit scale. - Else, if there is an Inf (either real or imaginary), then return a sum of +Inf and unit scale. - Otherwise behave as normal. (cherry picked from commit `b861c71b50`) AMD-Internal: [SWLCSG-1900] Change-Id: Ic7ba9cad1fbaf11823b9ba96e72a4ddd973db5b6	2023-03-09 06:36:44 -05:00
Sireesha Sanga	540509f374	Enabling AVX2 path for SCNRM2	2023-01-13 10:27:54 +05:30
Eleni Vlachopoulou	758d68467f	Disabling AVX2 path for SNRM2 and SCNRM2. AMD-Internal: [CPUPL-2865] Change-Id: I09c67115801a6b9446c7930c54fc937bd17908a3	2023-01-12 09:58:28 -05:00
Meghana Vankadari	f39dba9fd8	Added dzgemm, DZGEMM, DZGEMM_ prototypes to wrapper file. AMD-Internal: [CPUPL-2199] Change-Id: Ied814cb7be60d30b8217ec42ac436b4e628ea6d2	2023-01-12 01:35:29 -05:00
Edward Smyth	82c2eb4e8e	Code cleanup and warnings fixes Corrections for some occurances of: - Compiler warnings about initialization of float from double - Spelling mistakes in comments - Incorrect indentation of code and comments AMD-Internal: [CPUPL-2870] Change-Id: Icb68c789687bd0684844331d43071bfffecac9fc	2023-01-09 04:34:52 -05:00
Eleni Vlachopoulou	13aa3c8cd0	Adding AVX2 support for SNRM2 and SCNRM2 - For the cases where AVX2 is available, an optimized function is called, based on Blue's algorithm. The fallback method based on sumsqv is used otherwise. - Scaling is used to avoid overflow and underflow. - Works correctly for negative increments. AMD-Internal: [SWLCSG-1080] Change-Id: I6bf2f42652ba6b8a8631a0a9e6f6297d5b3ea5d9	2022-12-14 04:25:45 -05:00
Harihara Sudhan S	42d631bced	Copyright modification - Added copyright information to modified/newly created files missing them Change-Id: If4e73b680246d0363de09587d6dc54bee00ecd71	2022-10-14 12:43:35 +05:30
Eleni Vlachopoulou	863b73dfaf	Adding AVX2 support for DZNRM2 - For the cases where AVX2 is available, an optimized function is called, based on Blue's algorithm. The fallback method based on sumsqv is used otherwise. - Scaling is used to avoid overflow and underflow. - Works correctly for negative increments. - Cleaned up some white space in the AVX2 implementation for DNRM2. AMD-Internal: [CPUPL-2551] Change-Id: I0875234ea735540307168fe7efc3f10fe6c40ffc	2022-09-30 07:51:04 -04:00
Eleni Vlachopoulou	1c1a0027a8	Bugfix in DNRM2 AVX path Description: Enabled DNRM2 AVX path and fixed bug that caused numerical accuracy errors. AMD-Internal: [CPUPL-2576] Change-Id: Ic9fda9d9668bdfe233621f79db6acce518b4d10e	2022-09-29 04:46:04 -04:00
Nallani Bhaskar	1e5d98322d	Disabled DNRM2 AVX path temporarily Description: Disabled AVX2 optimized path for DNRM2 to avoid accuracy issues in netlib blas test. AMD-Internal: CPUPL-2576 ] Change-Id: I0764725d4f6b1e4e0b5f60a255bc681bb698560e	2022-09-22 13:00:04 +05:30
Eleni Vlachopoulou	a5891f7ead	Adding AVX2 support for DNRM2 - For the cases where AVX2 is available, an optimized function is called, based on Blue's algorithm. The fallback method based on sumsqv is used otherwise. - Scaling is used to avoid overflow and underflow. - Works correctly for negative increments. AMD-Internal: [CPUPL-2551] Change-Id: I5d8976b29b5af463a8981061b2be907ea647123c	2022-09-20 06:05:01 -04:00
Dipal M Zambare	2cdeea3c66	CBLAS/BLAS interface decoupling for the level 1 APIs -In BLIS, the CBLAS interface is implemented as a wrapper around the BLAS interface. For example the CBLAS API ‘cblas_dscal’ internally invokes the BLAS API ‘dscal_’. -This coupling between CBLAS and BLAS interface prevents the end user from overriding them individually by the application or other libraries. -This change separates the CBLAS and BLAS implementation by adding an additional level of abstraction. The implementation of the API is moved to the new function which is invoked directly from the CBLAS and BLAS wrappers. AMD-Internal: [SWLCSG-1477] Change-Id: I0e80071398af29c9313296d2a92e61e3897ac28e	2022-09-19 21:50:29 +05:30
Dipal M Zambare	866e8de7bf	CBLAS/BLAS interface decoupling for the level 2 APIs -In BLIS, the CBLAS interface is implemented as a wrapper around the BLAS interface. For example the CBLAS API ‘cblas_dgemv’ internally invokes the BLAS API ‘dgemv_’. -This coupling between CBLAS and BLAS interface prevents the end user from overriding them individually by the application or other libraries. -This change separates the CBLAS and BLAS implementation by adding an additional level of abstraction. The implementation of the API is moved to the new function which is invoked directly from the CBLAS and BLAS wrappers. AMD-Internal: [SWLCSG-1477] Change-Id: Ie7cbbac86bbfa1075a5064b31b365e911f67786c	2022-09-15 17:51:05 +05:30
Dipal M Zambare	e18db8a172	CBLAS/BLAS interface decoupling for the level 3 APIs -In BLIS, the CBLAS interface is implemented as a wrapper around the BLAS interface. For example the CBLAS API ‘cblas_dgemm’ internally invokes the BLAS API ‘dgemm_’. -This coupling between CBLAS and BLAS interface prevents the end user from overriding them individually by the application or other libraries. -This change separates the CBLAS and BLAS implementation by adding an additional level of abstraction. The implementation of the API is moved to the new function which is invoked directly from the CBLAS and BLAS wrappers. AMD-Internal: [SWLCSG-1477] Change-Id: Id9e307154342d2c17b0ac6db580c36f1a9ee6409	2022-09-15 06:23:46 -04:00
Dipal M Zambare	61232d540c	AOCL progress callback hardening - BLIS uses callback function to report the progress of the operation. The callback is implemented in the user application and is invoked by BLIS. - Updated callback function prototype to make all arguments const. This will ensure that any attempt to write using callback’s argument is prevented at the compile time itself. AMD-Internal: [CPUPL-2504] Change-Id: I8ceb671242365d2a9155b485301cd8c75043e667	2022-09-14 15:32:10 +05:30
Dipal M Zambare	5c42afada8	Revert "CBLAS/BLAS interface decoupling for level 3 APIs" This reverts commit `d925ebeb06`. Change-Id: I2e842b29c1fedbe14bf913949cf978f3e7515ff3	2022-08-30 14:50:38 +05:30
Dipal M Zambare	7e42b3d2e0	Revert "CBLAS/BLAS interface decoupling for level 2 APIs" This reverts commit `192f5313a1`. Change-Id: I876cad90902970ebc61550f109eb0ce32539ea1c	2022-08-30 11:53:46 +05:30
Dipal M Zambare	6cff8b030e	Revert "CBLAS/BLAS interface decoupling for level 1 APIs" This reverts commit `95169ca806`. Change-Id: Ic441aca616be6f27c7f1ba64e4480edcc6b17632	2022-08-30 11:34:34 +05:30
Dipal M Zambare	40c71dd2e1	Revert "CBLAS/BLAS interface decoupling for swap api" This reverts commit `2beaa6a0e6`. Reverting it as it is planned for the next release. Change-Id: Ib9271acd0b5b4cfd10c8f8b7bbb6ef93a3d594ea	2022-08-30 10:10:06 +05:30
Edward Smyth	abf848ad12	Code cleanup and warnings fixes - Removed some additional compiler warnings reported by GCC 12.1 - Fixed a couple of typos in comments - frame/3/bli_l3_sup.c: routines were returning before final call to AOCL_DTL_TRACE_EXIT - frame/2/gemv/bli_gemv_unf_var1_amd.c: bli_multi_sgemv_4x2 is only defined in header file if BLIS_ENABLE_OPENMP is defined AMD-Internal: [CPUPL-2460] Change-Id: I2eacd5687f2548d8f40c24bd1b930859eefbbcde	2022-08-29 08:22:30 -04:00
jagar	2beaa6a0e6	CBLAS/BLAS interface decoupling for swap api - In BLIS the cblas interface is implemented as a wrapper around the blas interface. For example the CBLAS api ‘cblas_dgemm’ internally invokes BLAS API ‘dgemm_’. - If the end user wants to use the different libraries for CBLAS and BLAS, current implantation of BLIS doesn’t allow it. - This change separates the CBLAS and BLAS implantation by adding an additional level of abstraction. The implementation of the API is moved to the new function which is invoked directly from the CBLAS and BLAS wrappers. AMD-Internal: [SWLCSG-1477] Change-Id: I8d81072aaca739f175318b82f6510d386103c24b	2022-08-29 16:26:01 +05:30

1 2 3

141 Commits