141 Commits

Author SHA1 Message Date
S, Hari Govind
08c757202d Initialize mem_t structures safely and handle NULL communicator in threading
- Explicitly initialize all fields of mem_t structures in bli_znormfv_unb_var1 and bli_dnormfv_unb_var1 to prevent undefined behavior when memory is not allocated.
- Add a NULL check after bli_thread_broadcast() in bli_thrinfo_sup_create_for_cntl to ensure that the communicator is valid, and call bli_abort() if broadcast fails.
2025-09-17 14:10:37 +05:30
Smyth, Edward
ae6c7d86df Tidying code
- AMD specific BLAS1 and BLAS2 franework: changes to make variants
  more consistent with each other
- Initialize kernel pointers to NULL where not immediately set
- Fix code indentation and other other whitespace changes in DTL
  code and addon/aocl_gemm/frame/s8s8s32/lpgemm_s8s8s32_sym_quant.c
- Fix typos in DTL comments
- Add missing newline at end of test/CMakeLists.txt
- Standardize on using arch_id variable name

AMD-Internal: [CPUPL-6579]
2025-09-16 14:52:54 +01:00
Smyth, Edward
509aa07785 Standardize Zen kernel names
Naming of Zen kernels and associated files was inconsistent with BLIS
conventions for other sub-configurations and between different Zen
generations. Other anomalies existed, e.g. dgemmsup 24x column
preferred kernels names with _rv_ instead of _cv_. This patch renames
kernels and file names to address these issues.

AMD-Internal: [CPUPL-6579]
2025-08-19 18:19:51 +01:00
Smyth, Edward
021f6bc960 GEMMTR full set of APIs
Commit eaa76dfe28 added LAPACK 3.12 GEMMTR
interfaces as aliases to existing BLIS GEMMT. Here we add full set of
Fortran upper case and no underscore API aliases and _blis_impl variants.

AMD-Internal: [CPUPL-6581]
2025-08-12 10:24:24 +01:00
Smyth, Edward
49ae7db89a Avoid including .c files (#40)
Including a C file directly in another C file is not recommended, and some
build systems (e.g. Bazel and Buck) do not allow .c files to include other
.c files. This commit changes the tapi and oapi framework files that are
included from the _ex and _ba file variants from .c filenames to .h
filenames.

AMD-Internal: [CPUPL-6784]

Co-authored-by: Varaganti, Kiran <Kiran.Varaganti@amd.com>
2025-06-10 11:33:33 +05:30
Edward Smyth
a5f11a1540 Add blis_impl wrappers for matrix copy etc APIs (2)
Previous commit on this (e0b86c69af)
was incorrect and incomplete. Add additional changes to enable
blis_impl layer for extension APIs for copying and transposing
matrices.

Change-Id: Ic707e3585acc1c0c554d7e00435464620a8c85dc
2025-04-07 08:54:54 -04:00
Edward Smyth
e0b86c69af Add blis_impl wrappers for matrix copy etc APIs
BLAS and BLIS extension APIs for copying and transposing matrices
currently only have one interface option. This patch adds a
blis_impl layer and makes the top level interface enabled only if
BLIS_ENABLE_BLAS is enabled, as with standard BLAS interfaces.

Change-Id: I1b6c668e8492305b16e8735b9ed83bea3c0d3b6c
2025-04-01 08:34:26 -04:00
Vignesh Balasubramanian
da6e9defcb Dynamic selection of AVX2 or AVX512 DNRM2 kernels
- Added a kernel selection logic based on the input
  dimension(runtime parameter), to choose between
  deploying AVX2 or AVX512 computational kernel for
  single-thread execution.

- An empirical analysis was conducted to arrive at the
  thresholds, for ZEN4 and ZEN5 architectures.

- Updated the fast-path threshold for ZEN4 to be in hand
  with the tipping points of its dynamic thread-setter(used
  when AOCL_DYNAMIC is enabled).

AMD-Internal: [CPUPL-5937]
Change-Id: I96d7f167658c9e25a0098c4c67e12e4ba673e228
2024-12-10 10:53:54 +05:30
Edward Smyth
711dce14d0 Export full set of _blis_impl interfaces
The _blis_impl layer provide a BLAS-like API for use in builds
where BLAS and CBLAS interfaces are not desirable. This patch
generates interfaces in uppercase and with and without trailing
underscores, to match what is generated for the regular BLAS
interface.

AMD-Internal: [CPUPL-5650]
Change-Id: I3ba9d0992291b0977479ab479acb71e42277c7c2
2024-09-03 04:13:06 -04:00
Vignesh Balasubramanian
68c54297bd Fixing compiler warnings when configuring BLIS without OpenMP
- Adjusted the macro-guards for variables specific to
  multithreading, when BLIS is configured with OpenMP.

- This included calling the single-threaded kernel directly
  if increment is 0 as well, since this would remove an
  unnecessary dependency on one of the variables used only
  when we enable OpenMP.

- Further updated the condition to pack the vector, to
  avoid it when increment is 0. In this case, we directly
  call the kernel.

AMD-Internal: [CPUPL-5480]
Change-Id: I31a9c6e3ffc3c4f9d5b03ed8745919ad65c99c79
2024-07-25 10:29:33 -04:00
Vignesh Balasubramanian
02da190560 AVX512 optimizations for DNRM2
- Implemented bli_dnorm2fv_unb_var1_avx512( ... ) AVX512
  computational kernel for DNRM2 API.

- Updated the header to include this kernel signature, as well
  as the framework layer to use this function in case of ZEN4
  and ZEN5 configurations.

- Updated the tipping points for ideal thread setting in DNRM2
  for ZEN5 micro-architecture. These thresholds are specific
  to the library's linkage to LLVM's OpenMP or GNU's OpenMp.

- Further abstracted the AOCL-DYNAMIC logic to separate functions
  for ?NRM2 APIs that currently support it(namely, DNRM2 and ZNRM2).

- Further updated the ?NRM2 framework to accommodate the necessary
  changes to invoke the newer AOCL-DYNAMIC functions and the AVX512
  kernel, when needed.

- Added micro-kernel and memory tests for this kernel in GTestsuite,
  to validate accuracy and out-of-bounds read and write.

AMD-Internal: [CPUPL-5265]
Change-Id: I4fc0d0f1e6906bf27d46562ca387c338cc4d2049
2024-06-24 08:50:36 -04:00
srigovin
2c838dadfb Updated return type of xerbla and xerbla_array APIs to void
Return type of xerbla and xerbla_array APIs are defined as int in BLIS, but according to netlib it should be void. Updated the defination and declaration accordingly.

Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: I3072ba76111189de5c5cf08df83ea154163dd34d
2024-04-29 00:51:10 -04:00
Edward Smyth
2450a1813b BLIS: Implement zen5 sub-configuration
Implement full support for zen5 as a separate BLIS sub-configuration
and code path within amdzen configuration family.

AMD-Internal: [CPUPL-3518]
Change-Id: Iaa5096e0b83bf0f0c3fd1c41e601ccd29bda3c09
2024-04-12 07:26:31 -04:00
Vignesh Balasubramanian
8693c996ac Fixing coverity issues on SNRM2_ and SCNRM2_
- The bli_snormfv_unb_var1( ... ) and bli_cnormfv_unb_var1( ... )
  functions posed an uninitialized pointer read coverity issue,
  due to the local rntm_t object being declared as part of the
  function scope, but initialized only on a need basis(i.e, when
  attempting to pack x vector if incx != 1).

- The fix was to have the declaration and initialization inside
  the case where incx != 1, thereby making the scope of the rntm_t
  and mem_t objects more stringent.

- This required an additional condition to call the kernel in case
  of unit stride.

AMD-Internal: [CPUPL-4278]
Change-Id: I763b1d4920532557749d8943f12b6df626aa5372
2023-12-06 23:56:09 +05:30
Edward Smyth
ed5010d65b Code cleanup: AMD copyright notice
Standardize format of AMD copyright notice.

AMD-Internal: [CPUPL-3519]
Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0
2023-11-23 08:54:31 -05:00
Vignesh Balasubramanian
bd0b50a077 Introduced fast-path to kernels in DNRM2_ and DZNRM2_ APIs
- Added a conditional check to see if the vectorized kernels
  for DNRM2_ and DZNRM2_ can be called directly, without
  incurring any framework overhead.

- The condition to satisfy this fast-path is for the size to be
  such that the ideal threads required is 1, with the vector having
  unit stride( so that packing at the framework-level can be avoided ).

AMD-Internal: [CPUPL-4045]
Change-Id: Ie37e86f802ada0e226dff88e74f0341e97ebfe28
2023-11-09 21:13:10 +05:30
Eleni Vlachopoulou
75a4d2f72f CMake: Adding new portable CMake system.
- A completely new system, made to be closer to Make system.

AMD-Internal: [CPUPL-2748]
Change-Id: I83232786406cdc4f0a0950fb6ac8f551e5968529
2023-11-09 15:49:45 +05:30
Vignesh Balasubramanian
5f9c8c6929 Bugfix : Fallback mechanism in SNRM2 and SCNRM2 kernels if packing fails
- Abstracted packing from the vectorized kernels for SNRM2 and SCNRM2 to
  a layer higher.

- Added a scalar loop to handle compute in case of non-unit strides.
  This loop ensures functionality in case packing fails at the
  framework level.

AMD-Internal: [CPUPL-3633]
Change-Id: I555aea519d7434d43c541bb0f661f81105135b98
2023-11-08 15:16:10 +05:30
Arnav Sharma
8885510db2 Fix for Missing Symbols for gemm_pack_get_size
- Symbols for gemm_pack_get_size were not being exported properly when
  BLIS was built as a shared library.
- Correctly assigned the BLIS_EXPORT_BLAS macro to ?gemm_pack_get_size_
  function declaration.
- Added missing gemm_pack and gemm_pack_get_size macros to
  bli_macro_defs.h file.
- Removed an unnecessary BLIS_EXPORT_BLAS macro from dgemm_compute
  function definition.
- Updated bli_util_api_wrap with no underscore API wrappers for pack and
  compute set of BLAS Extension APIs:
  1. ?gemm_pack_get_size
  2. ?gemm_pack
  3. ?gemm_compute

AMD-Internal: [CPUPL-4083]
Change-Id: I78cd7642c2fcbfdf02676e654a377ad2aa5295c1
2023-11-03 08:58:59 -04:00
Vignesh Balasubramanian
84faccdd7d Enabling the vectorized path for SNRM2_
- Enabled the vectorized AVX-2 code-path for SNRM2_. The
  framework queries the architecture ID and calls the
  vectorized kernel based on the architecture support.

- In case of not having the architecture support, we use
  the default path based on the sumsqv method.

AMD-Internal: [CPUPL-3277]
Change-Id: Ic60c0782dec0b7eb09fac21818eb625e57b1d14f
2023-11-03 17:45:56 +05:30
Vignesh Balasubramanian
81161066e5 Multithreading the DNRM2 and DZNRM2 API
- Updated the bli_dnormfv_unb_var1( ... ) and
  bli_znormfv_unb_var1( ... ) function to support
  multithreaded calls to the respective computational
  kernels, if and when the OpenMP support is enabled.

- Added the logic to distribute the job among the threads such
  that only one thread has to deal with fringe case(if required).
  The remaining threads will execute only the AVX-2 code section
  of the computational kernel.

- Added reduction logic post parallel region, to handle overflow
  and/or underflow conditions as per the mandate. The reduction
  for both the APIs involve calling the vectorized kernel of
  dnormfv operation.

- Added changes to the kernel to have the scaling factors and
  thresholds prebroadcasted onto the registers, instead of
  broadcasting every time on a need basis.

- Non-unit stride cases are packed to be redirected to the
  vectorized implementation. In case the packing fails, the
  input is handled by the fringe case loop in the kernel.

- Added the SSE implementation in bli_dnorm2fv_unb_var1_avx2( ... )
  and bli_dznorm2fv_unb_var1_avx2( ... ) kernels, to handle fringe
  cases of size = 2 ( and ) size = 1 or non-unit strides respectively.

AMD-Internal: [CPUPL-3916][CPUPL-3633]
Change-Id: Ib9131568d4c048b7e5f2b82526145622a5e8f93d
2023-10-16 07:26:27 -04:00
Vignesh Balasubramanian
9828039030 Bugfix : Inversion of sign bit with early return in SNRM2_
- The bli_snormfv_unb_var1( ... ) function returns early in
  case of n = 1, and uses the blis macro bli_fabs( ... ) to
  set the norm to the absolute value of the element.

- This macro inverts the sign bit even if the element is 0.0.
  A check is added to re-invert the sign bit in this case, so
  that the norm is set to 0.0 instead of -0.0.

- Added the same early exit condition on bli_dnormfv_unb_var1( ... )
  when n = 1.

AMD-Internal: [CPUPL-3923]
Change-Id: If7f5ae41d2acfe89b505549d28215dde319d8c33
2023-10-10 04:21:09 -04:00
Edward Smyth
bb4c158e63 Merge commit 'b683d01b' into amd-main
* commit 'b683d01b':
  Use extra #undef when including ba/ex API headers.
  Minor preprocessor/header cleanup.
  Fixed typo in cpp guard in bli_util_ft.h.
  Defined eqsc, eqv, eqm to test object equality.
  Defined setijv, getijv to set/get vector elements.
  Minor API breakage in bli_pack API.
  Add err_t* "return" parameter to malloc functions.
  Always stay initialized after BLAS compat calls.
  Renamed membrk files/vars/functions to pba.
  Switch allocator mutexes to static initialization.

AMD-Internal: [CPUPL-2698]
Change-Id: Ied2ca8619f144d4b8a7123ac45a1be0dda3875df
2023-08-21 07:01:38 -04:00
Edward Smyth
7e50ba669b Code cleanup: No newline at end of file
Some text files were missing a newline at the end of the file.
One has been added.

Also correct file format of windows/tests/inputs.yaml, which
was missed in commit 0f0277e104

AMD-Internal: [CPUPL-2870]
Change-Id: Icb83a4a27033dc0ff325cb84a1cf399e953ec549
2023-04-21 10:02:48 -04:00
Mangala V
5dc8e3fbca AOCL progress callback pointer update per thread
Thanks to Moore, Branden <Branden.Moore@amd.com> for identifying the
race condition and suggesting the changes to fix the same

Existing Design:
- AOCL progress callback pointer is a global pointer which is shared
  across all threads

Existing Design challenges:
 - The callback function cannot safely disable the progress mechanism,
   as another thread may have already checked to see if the function
   pointer is set, and then re-reads the pointer upon invocation of
   the callback. If one thread sets the callback to NULL in this time,
   then the resulting thread will attempt to call the null pointer as a
   function pointer, leading to a segfault.

New Design :
- Each thread maintains a local copy of progress pointer

AMD-Internal: [SWLCSG-1971]

Change-Id: I282989805a4a2a8a759a7373b645f3569bf42ed4
2023-04-20 05:33:12 -04:00
Shubham Sharma
036da2e651 Fixed compilation errors for generic configuration
- In gemmt and normf, #ifdef BLIS_KERNELS_* is added
  to make sure only compiled kernels are used.
- In bal_copy and bla_swap, missing '\' is added.

AMD-Internal: [CPUPL-2870]
Change-Id: I83452dff761f60db6957f557321ce210ab72c037
2023-04-18 00:27:05 -04:00
Edward Smyth
1ac03e64b5 BLIS cpuid tidy and bugfix.
Improvements to BLIS cpuid functionality:
- Tidy names of avx support test functions, especially rename
  bli_cpuid_is_avx_supported() to bli_cpuid_is_avx2fma3_supported()
  to more accurately describe what it tests.
- Fix bug in frame/base/bli_check.c related to changes in commit
  6861fcae91

AMD-Internal: [CPUPL-3031]
Change-Id: Iacd8fb0ffbd45288e536fc6314660709055ea2d5
2023-04-03 08:46:37 -04:00
Eleni Vlachopoulou
ad7a812db2 Remove quick return for zero increments.
Details:
- To be BLAS compliant, if increment is zero then iterate through the first element n times.
- For n<=0, the correct result (0) is returned so we remove this extra check. This is checked on BLIS-typed interface level.

AMD-Internal: [SWLCSG-1900]
Change-Id: I098bb9560a790050018bc8d8c63b06bfbcc1aebd
2023-03-23 23:35:03 -04:00
Aayush Kumar
5bd2a777ba Fixed Compilation Fails when configured with --disable-blas
- Moved *_blis_impl function declaration outside the BLIS_ENABLE_BLAS
  guard.
- Changed Makefile to continue to compile bla_ files to get
  *_blis_impl interfaces.
- Modify CBLAS headers, bli_macro_defs.h and bli_util_api_wrap.{c,h}
  to add BLIS_ENABLE_CBLAS guards.
- Comment out BLIS_ENABLE_BLAS guards in various headers and utility
  functions.
- Define BLIS Fortran-style functions lsame_blis_impl and
  xerbla_blis_impl. New macros PASTE_LSAME and PASTE_XERBLA are
  used in bla_*_check headers and some other places to select
  whether to call lsame and xerbla, or the _blis_impl versions.
- Defined various other missing _blis_impl functions.
- In bli_util_api_wrap.c, only define any functions if
  BLIS_ENABLE_BLAS is defined, and only define the subroutine
  versions of functions like dot, nrm2, etc if BLIS_ENABLE_CBLAS
  is defined.
- BLAS layer is needed if CBLAS layer is enabled. Changed header
  files build/bli_config.h.in and bli_blas.h, and configure
  program to help ensure consistency in generated blis.h header
  and configure output.

Undefining BLIS_ENABLE_BLAS_DEFS appears to be broken in UTA BLIS
too, thus BLIS_ENABLE_BLAS_DEFS is currently permanently defined.

AMD-Internal: [CPUPL-3015]

Change-Id: I7c0fe07db85781db46f2c690e174451860b37635
2023-03-23 06:11:52 -04:00
Edward Smyth
1617589d24 Add consistent NaN/Inf handling in sumsqv. (#668)
Details:
- Changed sumsqv implementation as follows:
  - If there is a NaN (either real or imaginary), then return a sum of
    NaN and unit scale.
  - Else, if there is an Inf (either real or imaginary), then return a
    sum of +Inf and unit scale.
  - Otherwise behave as normal.

(cherry picked from commit b861c71b50)

AMD-Internal: [SWLCSG-1900]
Change-Id: Ic7ba9cad1fbaf11823b9ba96e72a4ddd973db5b6
2023-03-09 06:36:44 -05:00
Sireesha Sanga
540509f374 Enabling AVX2 path for SCNRM2 2023-01-13 10:27:54 +05:30
Eleni Vlachopoulou
758d68467f Disabling AVX2 path for SNRM2 and SCNRM2.
AMD-Internal: [CPUPL-2865]
Change-Id: I09c67115801a6b9446c7930c54fc937bd17908a3
2023-01-12 09:58:28 -05:00
Meghana Vankadari
f39dba9fd8 Added dzgemm, DZGEMM, DZGEMM_ prototypes to wrapper file.
AMD-Internal: [CPUPL-2199]
Change-Id: Ied814cb7be60d30b8217ec42ac436b4e628ea6d2
2023-01-12 01:35:29 -05:00
Edward Smyth
82c2eb4e8e Code cleanup and warnings fixes
Corrections for some occurances of:
- Compiler warnings about initialization of float from double
- Spelling mistakes in comments
- Incorrect indentation of code and comments

AMD-Internal: [CPUPL-2870]
Change-Id: Icb68c789687bd0684844331d43071bfffecac9fc
2023-01-09 04:34:52 -05:00
Eleni Vlachopoulou
13aa3c8cd0 Adding AVX2 support for SNRM2 and SCNRM2
- For the cases where AVX2 is available, an optimized function is called,
    based on Blue's algorithm. The fallback method based on sumsqv is used
    otherwise.

    - Scaling is used to avoid overflow and underflow.

    - Works correctly for negative increments.

    AMD-Internal: [SWLCSG-1080]

Change-Id: I6bf2f42652ba6b8a8631a0a9e6f6297d5b3ea5d9
2022-12-14 04:25:45 -05:00
Harihara Sudhan S
42d631bced Copyright modification
- Added copyright information to modified/newly created
          files missing them

Change-Id: If4e73b680246d0363de09587d6dc54bee00ecd71
2022-10-14 12:43:35 +05:30
Eleni Vlachopoulou
863b73dfaf Adding AVX2 support for DZNRM2
- For the cases where AVX2 is available, an optimized function is called,
based on Blue's algorithm. The fallback method based on sumsqv is used
otherwise.

- Scaling is used to avoid overflow and underflow.

- Works correctly for negative increments.

- Cleaned up some white space in the AVX2 implementation for DNRM2.

AMD-Internal: [CPUPL-2551]
Change-Id: I0875234ea735540307168fe7efc3f10fe6c40ffc
2022-09-30 07:51:04 -04:00
Eleni Vlachopoulou
1c1a0027a8 Bugfix in DNRM2 AVX path
Description:
Enabled DNRM2 AVX path and fixed bug that caused numerical accuracy
errors.

AMD-Internal: [CPUPL-2576]
Change-Id: Ic9fda9d9668bdfe233621f79db6acce518b4d10e
2022-09-29 04:46:04 -04:00
Nallani Bhaskar
1e5d98322d Disabled DNRM2 AVX path temporarily
Description:
Disabled AVX2 optimized path for DNRM2 to avoid accuracy
issues in netlib blas test.

AMD-Internal: CPUPL-2576 ]
Change-Id: I0764725d4f6b1e4e0b5f60a255bc681bb698560e
2022-09-22 13:00:04 +05:30
Eleni Vlachopoulou
a5891f7ead Adding AVX2 support for DNRM2
- For the cases where AVX2 is available, an optimized function is called,
based on Blue's algorithm. The fallback method based on sumsqv is used
otherwise.

- Scaling is used to avoid overflow and underflow.

- Works correctly for negative increments.

AMD-Internal: [CPUPL-2551]
Change-Id: I5d8976b29b5af463a8981061b2be907ea647123c
2022-09-20 06:05:01 -04:00
Dipal M Zambare
2cdeea3c66 CBLAS/BLAS interface decoupling for the level 1 APIs
-In BLIS, the CBLAS  interface is implemented as a wrapper around
 the BLAS interface. For example the CBLAS API  ‘cblas_dscal’
 internally invokes the BLAS API ‘dscal_’.

-This coupling between CBLAS and BLAS interface prevents the end
 user from overriding them individually by the application or
 other libraries.

-This change separates the CBLAS and BLAS implementation by adding
 an additional level of abstraction. The implementation of the
 API is moved to the new function which is invoked directly from
 the CBLAS and BLAS wrappers.

AMD-Internal: [SWLCSG-1477]
Change-Id: I0e80071398af29c9313296d2a92e61e3897ac28e
2022-09-19 21:50:29 +05:30
Dipal M Zambare
866e8de7bf CBLAS/BLAS interface decoupling for the level 2 APIs
-In BLIS, the CBLAS  interface is implemented as a wrapper around
 the BLAS interface. For example the CBLAS API  ‘cblas_dgemv’
 internally invokes the BLAS API ‘dgemv_’.

-This coupling between CBLAS and BLAS interface prevents the end
 user from overriding them individually by the application or
 other libraries.

-This change separates the CBLAS and BLAS implementation by adding
 an additional level of abstraction. The implementation of the
 API is moved to the new function which is invoked directly from
 the CBLAS and BLAS wrappers.

AMD-Internal: [SWLCSG-1477]
Change-Id: Ie7cbbac86bbfa1075a5064b31b365e911f67786c
2022-09-15 17:51:05 +05:30
Dipal M Zambare
e18db8a172 CBLAS/BLAS interface decoupling for the level 3 APIs
-In BLIS, the CBLAS  interface is implemented as a wrapper around
 the BLAS interface. For example the CBLAS API  ‘cblas_dgemm’
 internally invokes the BLAS API ‘dgemm_’.
-This coupling between CBLAS and BLAS interface prevents the end
 user from overriding them individually by the application or
 other libraries.
-This change separates the CBLAS and BLAS implementation by adding
 an additional level of abstraction. The implementation of the
 API is moved to the new function which is invoked directly from
 the CBLAS and BLAS wrappers.

AMD-Internal: [SWLCSG-1477]
Change-Id: Id9e307154342d2c17b0ac6db580c36f1a9ee6409
2022-09-15 06:23:46 -04:00
Dipal M Zambare
61232d540c AOCL progress callback hardening
- BLIS uses callback function to report the progress of the
  operation. The callback is implemented in the user application
  and is invoked by BLIS.

- Updated callback function prototype to make all arguments const.
  This will ensure that any attempt to write using callback’s
  argument is prevented at the compile time itself.

AMD-Internal: [CPUPL-2504]
Change-Id: I8ceb671242365d2a9155b485301cd8c75043e667
2022-09-14 15:32:10 +05:30
Dipal M Zambare
5c42afada8 Revert "CBLAS/BLAS interface decoupling for level 3 APIs"
This reverts commit d925ebeb06.

Change-Id: I2e842b29c1fedbe14bf913949cf978f3e7515ff3
2022-08-30 14:50:38 +05:30
Dipal M Zambare
7e42b3d2e0 Revert "CBLAS/BLAS interface decoupling for level 2 APIs"
This reverts commit 192f5313a1.

Change-Id: I876cad90902970ebc61550f109eb0ce32539ea1c
2022-08-30 11:53:46 +05:30
Dipal M Zambare
6cff8b030e Revert "CBLAS/BLAS interface decoupling for level 1 APIs"
This reverts commit 95169ca806.

Change-Id: Ic441aca616be6f27c7f1ba64e4480edcc6b17632
2022-08-30 11:34:34 +05:30
Dipal M Zambare
40c71dd2e1 Revert "CBLAS/BLAS interface decoupling for swap api"
This reverts commit 2beaa6a0e6.

Reverting it as it is planned for the next release.

Change-Id: Ib9271acd0b5b4cfd10c8f8b7bbb6ef93a3d594ea
2022-08-30 10:10:06 +05:30
Edward Smyth
abf848ad12 Code cleanup and warnings fixes
- Removed some additional compiler warnings reported by GCC 12.1
- Fixed a couple of typos in comments
- frame/3/bli_l3_sup.c: routines were returning before final call
  to AOCL_DTL_TRACE_EXIT
- frame/2/gemv/bli_gemv_unf_var1_amd.c: bli_multi_sgemv_4x2 is
  only defined in header file if BLIS_ENABLE_OPENMP is defined

AMD-Internal: [CPUPL-2460]
Change-Id: I2eacd5687f2548d8f40c24bd1b930859eefbbcde
2022-08-29 08:22:30 -04:00
jagar
2beaa6a0e6 CBLAS/BLAS interface decoupling for swap api
-   In BLIS the cblas interface is implemented as a wrapper around
    the blas interface. For example the CBLAS api ‘cblas_dgemm’
    internally invokes BLAS API ‘dgemm_’.
-   If the end user wants to use the different libraries for CBLAS
    and BLAS, current implantation of BLIS doesn’t allow it.
-   This change separates the CBLAS and BLAS implantation by adding
    an additional level of abstraction. The implementation of the
    API is moved to the new function which is invoked directly from
    the CBLAS and BLAS wrappers.

AMD-Internal: [SWLCSG-1477]

Change-Id: I8d81072aaca739f175318b82f6510d386103c24b
2022-08-29 16:26:01 +05:30