Commit Graph

3207 Commits

Author SHA1 Message Date
jagar
394eee90f6 CMake: CMake is updated to support Address-Sanatizer
CMakelists.txt is updated to support ASAN to find
memory related errors in blis library. ASAN is enabled
by configuring cmake with the following option .

$ cmake .. -DENABLE_ASAN=ON

ASAN supports only on linux with clang compiler.
And redzone size default size is 16 bytes and maximum
redzone size is 2048 bytes.

$ ASAN_OPTIONS=redzone=2048 <exe>

AMD-Internal: [CPUPL-2748]
Change-Id: I0b70af5c41cf5c68602150daeb67d7432bbe5cb8
2024-03-05 23:19:22 -05:00
Arnav Sharma
dac5de195d Early Return Scenario tests for Mixed Precision SCALV
- Updated existing ERS and IIT test framework in SCALV to handle mixed
  precision types (CSSCAL/ZDSCAL).

AMD-Internal: [CPUPL-4673]
Change-Id: I72399675e4e5b8a3e16d81d747db73a3c88ce1ef
2024-03-05 09:59:49 -05:00
Harsh Dave
e7246cca78 GTestsuite: SGEMM micro-kernel, API level and memory testing
- Added micro-kernel and API level tests for avx512 and avx2 small, sup
  and native SGEMM kernels for various value of storage,
  M, N, K, alpha, beta

- Added memory testing for sgemm kernels

AMD-Internal: [CPUPL-4681]

Change-Id: I72f94960e7c497ae75da872412eee69c23637348
2024-03-05 04:09:36 -05:00
Bhaskar Nallani
2ce47e6f5e Implemented optimal AVX512-variant of f32 LPGEMV
1. The 5 LOOP LPGEMM path is in-efficient when A or B is a vector
   (i.e, m == 1 or n == 1).

2. An efficient implementation of lpgemv_rowvar_f32 is developed
   considering the b matrix reorder in case of m=1 and post-ops fusion.

3. When m = 1 the algorithm divide the GEMM workload in n dimension
   intelligently at a granularity of NR. Each thread work on A:1xk
   B:kx(>=NR) and produce C=1x(>NR).  K is unrolled by 4 along with
   remainder loop.

4. When n = 1 the algorithm divide the GEMM workload in m dimension
   intelligently at a granularity of MR. Each thread work on A:(>=MR)xk
   B:kx1 and produce C = (>=MR)x1. When n=1 reordering of B is avoided
   to efficiently process in n one kernel.

5. Fixed few warnings while loading 2 f32 bias elements using
   _mm_load_sd using float pointer. Typecasted to (const double *)

AMD-Internal: [SWLCSG-2391, SWLCSG-2353]
Change-Id: If1d0b8d59e0278f5f16b499de1d629e63da5b599
2024-03-04 23:53:23 +05:30
Vignesh Balasubramanian
deea4c611c Added functionality tests for ?NRM2 micro-kernels
- Added unit-test cases for the following AVX2 kernels:
   - bli_snorm2fv_unb_var1_avx2( ... )
   - bli_scnorm2fv_unb_var1_avx2( ... )
   - bli_dnorm2fv_unb_var1_avx2( ... )
   - bli_dznorm2fv_unb_var1_avx2( ... )

- Defined a templatized testing interface and function-pointer
  type. This is used as part of the test-fixture class and
  testsuite definitions, when writing the unit tests.

- The test cases cover the necessary range of values for the sizes
  to ensure code-coverage in the kernels.

- Further added memory tests for these kernels, to check for
  out-of-bounds reads/writes.

AMD-Internal: [CPUPL-4637]
Change-Id: I747ab104b947e87b5f8eda597256b7b8b6f7c2f2
2024-03-04 04:11:25 -05:00
Harsh Dave
a5ad1f55d1 Added memory test for DGEMM
- Added memory tests for DGEMM micro-kernels.

AMD-Internal: [CPUPL-4404]

Change-Id: If67aea77a33611cd02762f3e48e0e419cd390217
2024-03-04 10:08:39 +05:30
Shubham Sharma
01b2af0af3 GTestSuite: Added Tests for [C\Z]TRSM
- Added API tests for [C\Z]TRSM.
  - Added Extreme Value Test cases (EVT) for [C\Z]TRSM.
    - Tests for various combinations of INFs
       and NANs in A and B matrix are added.
  - Added Invalid input test cases (IIT).
  - Added micro kernel testing for ZTRSM
    - Added unit tests for small and native
      path kernels.
 - Added memory testing for ZTRSM
   kernels.

AMD-Internal: [CPUPL-4641]
Change-Id: I0db6b2c75b59821e1cde33532fb13400fab43412
2024-02-29 23:40:33 -05:00
Shubham Sharma
9968821ed9 GTestSuite: Added tests for STRSM
- Added API tests for STRSM.
  - Added Extreme Value Test cases (EVT) for STRSM.
    - Tests for various combinations of (+/-) INFs
       and NANs in A and B matrix are added.
  - Added micro kernel testing
    - Added unit tests for small and native
      path kernels.
 - Added memory testing for STRSM
   kernels.
 - Edited the protected buffer in memory testing to
   make sure that greenzone1 and greenzone2 do not
   intersect.

AMD-Internal: [CPUPL-4640]
Change-Id: Ic48590d3b4ad12c4f2f6beaec2e1106a7aaa5213
2024-02-29 23:40:17 -05:00
Chandrashekara K R
9f7e5b7dbf CMake: Modified flatten-headers.py file to fix issue observed with ninja on windows.
While build blis library using ninja generator on windows, observed
ninja is randomly adding "|| '(set', 'FAIL_LINE=3&', 'goto', ':ABORT)'"
as extra arguments for add_custom_command. Due to this flatten-headers
python script was failing to create blis.h and cblas.h headers.
Modified the python script to fix above issue.

AMD-Internal: [CPUPL-2748]
Change-Id: I83b753d08e46f94b282176fcc661ce34e5eee3cf
2024-02-29 15:42:02 +05:30
Arnav Sharma
98b28368d8 Functional Tests for ZSCALV and ZDSCALV
- Updated test_scalv and ref_scalv templates for SCALV gtestsuite to
  support unit-tests for mixed precision SCALV.

- Added unit-tests for the following kernels:
  ZSCALV
	- bli_zscalv_zen_int( ... )

  ZDSCALV
	- bli_zdscalv_zen_int10( ... )
	- bli_zdscalv_zen_int_avx512( ... )

- Also, added API level unit-tests for the following cases:
	- Unit Positive Increments
	- Non-Unit Positive Increments

- Updated comments in DSCALV unit-tests with the correct kernel name.

AMD-Internal: [CPUPL-4624]
Change-Id: I96db8d3612687be07cd0e638a3119d41c3641ce8
2024-02-28 12:21:05 +05:30
Vignesh Balasubramanian
c73673839a Exception Value Testing(EVT) for SAXPY and ZAXPY APIs
- Added test cases to verify the compliance of SAXPY and ZAXPY
  APIs, through Exception Value Testing(EVT). This is done by
  inducing exception values in the input operands. The induction
  is controlled by the user, through indices given as part of the
  parameterized test-cases.

- Various combinations of zeros, NaNs and +/-Infs have been used to
  verify the compliance against the standard. These combinations
  help in determining whether the exception value has to be
  propagated, or handled seperately.

- Updated the comments, class names and test-case loggers for
  uniformity.

- Added special cases of alpha and beta values to API level
  functionality tests, to check for any possible framework
  level optimizations against the standard.

AMD-Internal: [CPUPL-4655]
Change-Id: I3d817d44c6d239cbc61d146583707b3c8338de29
2024-02-27 23:14:48 -05:00
Edward Smyth
936a0a29df GTestSuite: BLAS2 thresholds
Modify thresholds to reflect number of operations that
accumulate results into each output element. Different
limits are set for early return and special cases.

Constants are still subject to experimentation and change.

AMD-Internal: [CPUPL-4378]
Change-Id: Ic4540a2f1f6cd6380228b6a2884ac62850d6d8c6
2024-02-27 11:52:48 -05:00
Arnav Sharma
38af5752c4 Simplified and Fixed gtestsuite get_value_string
- Simplified the get_value_string( ... ) for complex types.

AMD-Internal: [CPUPL-4653]
Change-Id: I5bf8f6fe5753d0037b52bc4e31f87ad27b5d2c1c
2024-02-27 10:37:41 -05:00
mangala v
0ec3581940 Gtestsuite: Memory testing of ZGEMM micro kernels
- Testing out of bound read and write of input and output matrix
  for SUP and Native micro kernels
- Protected buffers and memory testing feature available in gtestuite
  is used to validate memory error

AMD_Internal: [CPUPL-4623]

Change-Id: I620fd3cd4eed1002e08b6233effb89b47beb073f
2024-02-27 19:19:43 +05:30
Vignesh Balasubramanian
53bbc7866f Added functionality and memory tests for SAXPY and ZAXPY kernels
- Added unit-test cases for bli_zaxpyv_zen_int5( ... ),
  bli_saxpyv_zen_int10( ... ) and bli_saxpyv_zen_int_avx512( ... )
  kernels.

- The test cases cover the necessary range of values for the sizes
  and the scaling factor(alpha), to ensure code-coverage and check
  for compliance with the standard.

- Further added memory tests for these kernels, to check for
  out-of-bounds reads/writes.

AMD-Internal: [CPUPL-4629]
Change-Id: If5e626ca2d0270e34dc2d951ae5c81f839a78ef0
2024-02-27 05:54:50 -05:00
Kiran Varaganti
0784679d4d Fix gcc 7.5 compilation error for zen4 and above configs
For gcc greater than or equal to 7.0 version added AVX512 compiler flags
    in makde_defs.mk and make_defs.cmake. AVX512VNNI compiler flag is only
    supported from gcc version 8 or greater. So added another else condition
    for gcc version greater than or equal to 7 - enabling avx512 flags.
    This enables compilation of AVX512 assembly code paths with gcc 7.5 version.

Change-Id: I2cda00e578010db5e5a515b506c0b99f685307e0
2024-02-26 05:20:35 -05:00
Arnav Sharma
aacb5f6b3a Extreme Value Tests for DSCALV, DDOTV and DASUMV
- These tests explicitly include NaNs and (+/-)Infs in the input vector
  to verify the handling or propagation of NaNs and Infs according to
  the compliance.

AMD-Internal: [CPUPL-4406]
Change-Id: I3063805eb3fdfd58be3168b24cdb97de2c175c3c
2024-02-25 21:39:23 -05:00
Vignesh Balasubramanian
16aaafc8ec Added memory testing for DAXPY, DAXPBY and DCOPY kernels.
- Utilized the memory testing feature in GTestsuite
   to update the testing interfaces for micro-kernel
   testing of DAXPY, DAXPBY and DCOPY APIs.

 - The interface allocates memory using objects of
   ProtectedBuffer class, which define the redzones
   and greenzones as per the requirement.

 - Updated the test fixture classes, test-case loggers and
   the instantiators to use the new testing interface for
   memory testing.

 - Added special cases of alpha and beta values to API
   level functionality tests, to check for any possible
   framework level optimizations against the standard.

 - Code cleanup of ?_generic.cpp and ?_evt_testing.cpp
   files of DAXPY, DAXPBY and DCOPY APIs.

AMD-Internal: [CPUPL-4402]
Change-Id: Id945cabbbb42604d76a9e34269bff0f9f6712604
2024-02-23 09:06:08 -05:00
Arnav Sharma
970a655ee4 Fix for build issue when Mixed Datatypes are disabled
- Warning is raised for the implicit declaration of bli_gemm_md_is_ccr()
  when BLIS is configured with --disable-mixed-dt flag.

- Encapsulated the usage of bli_gemm_md_is_ccr( ... ) inside the
  BLIS_ENABLE_GEMM_MD macro.

AMD-Internal: [CPUPL-4630]
Change-Id: Icc59b1bcd3a21492daaaf6bcec80a5bf67012ace
2024-02-23 04:02:49 -05:00
mkadavil
d00e84ced3 Matrix Add post-operation support for float(bf16|f32) LPGEMM APIs.
-This post-operation computes C = (beta*C + alpha*A*B) + D, where D is
a matrix with dimensions and data type the same as that of C matrix.

AMD-Internal: [SWLCSG-2424]
Change-Id: I9464d1f514e3b04275fe93441489b4503a08937a
2024-02-23 02:02:33 -05:00
srpogula
4546e53ee0 Functionality testing & Early Return Scenario (ERS) tests for ?SUBV
- Added API level test-cases, to verify the functionality
  of ?SUBV APIs. These tests cover unit increments and
  non-unit positive increments for input params x or conj(x),
  vector length n, stride size of x, stride size of y

- ERS tests have been added for the ?SUBV APIs as per the BLIS
  compliance standards.

- Following are the standard tests added:
  ?SUBV
	- n <= 0

- Invalid Input Tests are not required for these APIs.

Change-Id: Ia300bce41d15105ad48143aa7e0943fb676d73b2
2024-02-22 04:36:35 -05:00
Harsh Dave
44173cacdf Added negative parameter tests for GEMM
- Added Invalid input test cases (IIT).
  - Added tests to check for cases where inputs
    are not blas compliant.

AMD-Internal: [CPUPL-4404]
Change-Id: Ibbd7494b2fc6a9bebe93cd9d66be57b9b43f25f2
2024-02-21 21:43:58 +05:30
mangala v
9283783de2 Gtestsuite: DGEMM and ZGEMM EVT (exception value testing)
1. NAN and +/-INF are considered to be exception values.
2. Inserting NAN and +/- INF at random indices of Matrix A, B & C.
3. NAN and +/-INF are also passed as alpha, beta values
4. Even with these values present in matrices,
   Output should be complaint with reference/standard solution

AMD-Internal: [CPUPL-4426]

Change-Id: Ibf0ad03ea1a3a2b63f2702a4dd6bbc8f9f116ddd
2024-02-20 12:17:40 -05:00
Shubham Sharma
de92fb0680 Added Memory testing for DTRSM
- Added framework for memory testing.
- Out of bound reads and writes can be
  detected in both C and assembly.
- Added memory tests for DTRSM.
- Test methodology:
  - Use linux's protected pages to set some memory
    before and after the required buffer as protected.
  - Set the first and last page_size bytes as
    read, write and execute protected (red_zones).
  - If any part of code tries to read/write
    in redzones, a SIGSEGV signal will be
    generated, which can be used to detect a
    out of bounds read and write.
  - Page protection can only be set per page.
    If required size for buffer is not a multiple
    of pagesize we have to allocate more memory
    than required in order make sure the start and
    end of redzones align with page boundaries.
  - Overwrite malloc(size) to allocate
    'buffer_size+(2*pagesize)' where buffer_size =
    minimum size such that buffer_size > 'size' and
    buffer_size is multiple of pagesize.
  - Use first and last page_size bytes of allocated
    buffer as redzones, use first 'size' of the middle
    buffer as first greenzone and last 'size' bytes as
    second greenzone.
  - Call test code once with first geenzone and then
    with second greenzone. Greenzones are surrounded
    by redzones, if test code read/writes before or after
    greenzones, it will be detected.

   |_____________________________________________________|
   |  red_zone1 |  green_zone1    greenzone_2 | red_zone2|
   |_____________________________________________________|

AMD-Internal: [CPUPL-4403]
Change-Id: Ic5c22a9adf8f833c77510686eee886485e894354
2024-02-19 23:41:28 -05:00
Edward Smyth
1bd9f0c856 Define symbol dzgemm_blis_impl for non-zen configurations
Non-zen configurations will use frame/compat/bla_gemm.c rather than
frame/compat/bla_gemm_amd.c. In the former, change dzgemm definition
to have dzgemm_blis_impl and optional dzgemm_ wrapper, as in the
AMD version.

AMD-Internal: [CPUPL-4082]
Change-Id: I66caff56e033bda8bb4ff2d60a16f7e52af122ea
2024-02-19 05:24:39 -05:00
mangala v
41b19ba6e6 Gtestuite: ZGEMM API testing
Functionality testing for below apis are carried out with various input ranges and values

Interface would invoke listed API's in the below sequence if the condition is satisified
 List of API's - Condition
   SCALM       : alpha = 0
   GEMV        : m = 1 or n = 1
   Small ST    : ((m0*k0) <= 16384) || ((n0*k0) <= 16384)))
   SUP AVX2    : (m || n || k) <= 128
   SUP AVX512  : (m || k) <= 128  || n <= 110
   Native      : Default path, If above API's doesn't support
                 the given input values

AMD-Internal: [CPUPL-4426]
Change-Id: I40cd30a11592e4e553e09f0d81153abf0bf0b002
2024-02-16 15:49:36 +05:30
mkadavil
01b7f8c945 Matrix Add post-operation support for integer(s16|s32) LPGEMM APIs.
-This post-operation computes C = (beta*C + alpha*A*B) + D, where D is
a matrix with dimensions and data type the same as that of C matrix.
-For clang compilers (including aocc), -march=znver1 is not enabled for
zen kernels. Have updated CKVECFLAGS to capture the same.

AMD-Internal: [SWLCSG-2424]
Change-Id: Ie369f7ea5c80ab69eea3f3e03a8d9546e14f5c09
2024-02-12 23:51:36 +05:30
Edward Smyth
00accfb3b1 GTestSuite: option to test with threshold = zero
Add cmake option to override thresholds and set them all to zero.
In this case we don't switch to binary comparison as we want the
error to be calculated and printed. This functionality is intended for:
- Helping to determine or alter thresholds.
- To compare different max errors between different reference libraries.
- To test when we expect identical results, e.g. some comparisons of
  BLIS vs BLIS.

To simplify coding, this is implemented by setting epsilon to zero
in the testinghelpers function.

AMD-Internal: [CPUPL-4400]
Change-Id: I2cf021e0cc24c62e7600ba80fd810f3aa55a6ea5
2024-02-08 11:06:25 -05:00
Edward Smyth
f3cff28838 GTestSuite: option to test upper case character arguments
Add cmake option to convert all character arguments to upper
case to check compliance.

AMD-Internal: [CPUPL-4499]
Change-Id: Ic18416d78f63b999a78253463cc15c32f7d444f4
2024-02-08 08:53:26 -05:00
jagar
099b9863cb CMake: CMake is updated for Code Coverage
CMakelists.txt is Updated to generate code coverage
report in html format just by configuring cmake with
-DENABLE_COVERAGE=ON. Code supports only on linux
with gcc compiler

cmake .. -DENABLE_COVERAGE=ON

AMD-Internal: [CPUPL-2748]
Change-Id: I9b36b6cc3f1f97b53e1c4ee62948a017418e3d41
2024-02-07 06:12:51 -05:00
Vignesh Balasubramanian
b210417a59 Exception Value Testing(EVT) for DAXPY and DAXPBY APIs
- Added test cases to verify the compliance of DAXPY and DAXPBY
  APIs, through Exception Value Testing(EVT). This is done by
  inducing exception values in the input operands. The induction
  is controlled by the user, through indices given as part of the
  parameterized test-cases.

- Various combinations of zeros, NaNs and +/-Infs have been used to
  verify the compliance against the standard. These combinations
  help in determining whether the exception value has to be
  propagated, or handled seperately.

- Updated the daxpyvGenericTestPrint logger for uniformity across
  the testing categories.

- Added test cases for  bli_daxpyv_zen_int10( ... ) micro kernel
  testing to cover the loops iterating in blocks of 52 and 16
  respectively.

AMD-Internal: [CPUPL-4402]
Change-Id: Ida6cf5e08727b4c3cb87c93bfec6be76361cfaea
2024-02-07 12:14:58 +05:30
jagar
40b1af4c3f CMake:Added cmake for bench
CMakelists.txt is added in bench.
Steps are provided to build for different targets.

AMD-Internal: [CPUPL-2748]
Change-Id: I58027f4e42d1323cafb151224c45868bc8337ff4
2024-02-06 06:50:34 -05:00
Harsh Dave
abc414f2ec API level testing of DGEMM kernels
- Added API level tests for avx512 and avx2 k1 kernels,
  tiny, small, sup and native DGEMM kernels for various
  value of storage, M, N, K, alpha, beta

AMD-Internal: [CPUPL-4404]
Change-Id: Ieadf407601a8efc5a2c0956d08d791dcfa69e44b
2024-02-06 16:30:13 +05:30
Arnav Sharma
92aeab1710 Early Return Scenario (ERS) tests for ?SCALV, ?DOTV and ?ASUMV
- ERS tests have been added for the above APIs as per the BLAS
  compliance standards.

- Following are the standard tests added:
  ?SCALV
	- n <= 0
	- incx <= 0
	- alpha == 1
  ?DOTV
	- n <= 0
  ?ASUMV
	- n <= 0
	- incx <= 0

- Invalid Input Tests are not required for these APIs.

- Updated the micro-kernel test files to include the new macros
  generated for enabling and disabling architecture specific tests.

- Updated the function calls for mixed-precision typed_asumv tests.

AMD-Internal: [CPUPL-4406]
Change-Id: Ib34b2f39809d93075ae1168682b3ef2380e03a5a
2024-02-05 11:48:52 -05:00
Eleni Vlachopoulou
58b63f149f CMake: Updating message when generating blis.h/cblas.h.
Change-Id: I7be7fe31a392c77311664cff4bba3b65c4cc7e4e
2024-02-05 11:18:29 -05:00
Edward Smyth
ee91b032ab GTestSuite: Ensure all elements are initialized in generators
Rather than relying on implicit initialization of arrays, ensure all
elements are explicitly set. Array elements that are not supposed
to be altered by the BLAS or BLIS API are set to a large magnitude
value to aid identication of incorrect usage. This includes:
- Intervening elements in vectors when incx/incy > 1.
- Extra elements in column/row when lda > matrix size.
- Also set unused upper/lower values in triangular matrices to
  similar large magnitude value.

AMD-Internal: [CPUPL-4430]
Change-Id: Id5e8c1a4e80687f5f462e6b5aa2accac0ab8ec21
2024-02-05 10:29:56 -05:00
Shubham Sharma
d5cd5836b1 Fixed DGEMM 8x24 kernel for beta zero
- Column stride is not taken into consideration in
  current implementation when writing to C buffer
  if beta is zero and C is column major stored.

- Fixed C storage in case of column major stored C
  when beta is zero in 8x24 DGEMM kernel.

AMD-Internal: [CPUPL-4404]
Change-Id: I5b8dfce962995e3238cf902b5a09dd1bf90002a8
2024-02-05 06:57:06 -05:00
mangala v
aa5731eba7 Gtestsuite: Updated SGemm test scenario
1. Earlier tests were taking long time for initialisation and running
   Hence removed testcases which is already covered as part of another
   scenario
2. Added two category of tests:
   a. Tests to cover all sizes of m, n, k for
      bli_sgemmsup_rv_zen_asm_6x16m kernel
   b. Tests to cover various alpha and beta values for above kernel

With current update building and running takes less than 2 minutes.

Change-Id: I1479a8ca960c04d4642857fdc7949458646dafb7
2024-02-05 04:22:21 -05:00
Shubham Sharma
fc91932b4a Fixed out of bounds read in DTRSM small kernels
- In 3x1 fringe case in [RLN/RUT] kernel, 4 double
  precision floats are being read instead of 3 doubles.

- Fixed the code to read only 3 double.

AMD-Internal: [CPUPL-4403]
Change-Id: If0afb155efefabe13487cf322d479981f1838aa2
2024-02-02 10:31:12 +05:30
mangala v
0659a647e0 Gtestsuite: Micro Kernel Testing of ZGEMM API
Summary:
- Aims to perform accuracy testing of ZGEMM micro kernel.
- Blis kernel is called directly from gtestuite framework.
- Micro kernel is invoked with required input, output parameters.
- No objects are created to call micro kernel.
- No framework code would be invoked in this method.

Below AVX2 & AVX512 Micro kernels are being tested using gtestsuite

Native Kernels:
 - AVX2: bli_zgemm_haswell_asm_3x4
         bli_zgemm_zen_asm_2x6(Required for TRSM computation)
 - AVX512: bli_zgemm_zen4_asm_12x4
           bli_zgemm_zen4_asm_4x12(Required for TRSM computation)

SUP Kernels:
- AVX2 Kernels:
      bli_zgemmsup_rd_zen_asm_3x4m
      bli_zgemmsup_rd_zen_asm_3x2m
      bli_zgemmsup_rd_zen_asm_3x4n
      bli_zgemmsup_rd_zen_asm_2x4n
      bli_zgemmsup_rd_zen_asm_(2/1)x4
      bli_zgemmsup_rd_zen_asm_(2/1)x2
      bli_zgemmsup_rv_zen_asm_(2/1)x4
      bli_zgemmsup_rv_zen_asm_(2/1)x2
      bli_zgemmsup_rv_zen_asm_3x4m
      bli_zgemmsup_rv_zen_asm_3x2m
      bli_zgemmsup_rv_zen_asm_3x4n
      bli_zgemmsup_rv_zen_asm_2x4n
      bli_zgemmsup_rv_zen_asm_1x4n
      bli_zgemmsup_rv_zen_asm_3x2

- AVX512 kernels:
     bli_zgemmsup_cv_zen4_asm_12x4m
     bli_zgemmsup_cv_zen4_asm_12x3m
     bli_zgemmsup_cv_zen4_asm_12x2m
     bli_zgemmsup_cv_zen4_asm_12x1m
     bli_zgemmsup_cv_zen4_asm_8x(4/3/2/1)
     bli_zgemmsup_cv_zen4_asm_4x(4/3/2/1)
     bli_zgemmsup_cv_zen4_asm_2x(4/3/2/1)

Above kernels are tested with different combination of parameters such as storage, alpha, beta, transpose & dimensions.

DGEMM: Minor update in DGEMM micro kernel (Buffer allocation, comment section, order of passing arguments)

AMD-Internal: [CPUPL-4426]

Change-Id: I9d6ab24278450f57d13589ad89151a4acc641f08
2024-01-31 10:30:57 -05:00
Eleni Vlachopoulou
b9a808e5d8 GTestSuite: Updating datagenerators helper functions.
- Moved function definitions in the header to avoid explicit template
  instantiations.
- Templatized from and to bounds to enable combinations of integer of
  floating-point values.
- Used an enum class for the element type instead of a char to make it
  more robust since chars get casted to integers. Now we should be
  getting better error messages if there is a missmatch.
- Deleted argument for datatypes that was a leftover from the past.
  Default argument is used instead.

Change-Id: I3f95d73f03028de46324b310826edca8057e561d
2024-01-31 07:08:25 -05:00
eashdash
ef134dc49f Added Trans A feature for all INT8 LPGEMM APIs
1. Added Trans A feature to handle column major inputs
   for A matrix.
2. Trans A is enabled by on-the-go pack of A matrix.
3. The on-the-go pack of A converts a column storage
   MCxKC block of A into row storage MCxKC block as
   LPGEMM kernels are row major kernels.
4. New pack routines are added for conversion of A matrix
   from column major storage to row major storage.
5. LPGEMM Cntx is updated with pack kernel function
   pointers.
6. Packing of A matrix:
   -  Converts column major input A to row major
      in blocks of MCxKC with newly added pack A
      functions when cs_a > 1.
7. Pack routines are added for AVX512 and AVX2
   INT8 LPGEMM APIs.
8. Trans A feature is now supported in:
   1. u8s8s32os32/os8
   2. u8s8s16os16/os8/ou8
   3. s8s8s32os32/os8
   4. s8s8s16os16/os8

AMD-Internal: SWLCSG-2582
Change-Id: I7ce331545525a9a09f3853280615b55fcf2edabf
2024-01-30 03:40:56 -05:00
Vignesh Balasubramanian
ddec0c1de0 Negative parameter testing for ?COPY, ?AXPY and ?AXPBY APIs
- As per the standard compliance, the ?copy(), ?axpy() and
  ?axpby() APIs do not require invalid input testing(IIT)
  with respect to the input parameters they receive, as part
  of BLAS and CBLAS calls.

- Thus, test-cases have been added to verify early return scenarios
  (ERS) as per the compliance. The testsuite is type-parameterized,
  since the compliance for early return cases is the same across the
  datatypes.

- Updated the conditional directives in micro-kernel(ukr) test files
  to include the new set of macros generated as part of the
  buildsystem in GTestsuite.

- Updated the conditional macro to enable the appropriate code
  section for compilation of ref_axpbyv(), based on our choice
  of reference library when building GTestsuite.

AMD-Internal: [CPUPL-4402]
Change-Id: Ibea2bc34469b008f4d4558ce359717c08b92e978
2024-01-29 06:31:18 -05:00
Kiran Varaganti
63be4c8ce4 AOCL-BLIS changed to AOCL-BLAS
AOCL-BLIS replaced with AOCL-BLAS at various places like "configure",
"CMakeLists.txt" and documentation files.

Change-Id: I75c3fbe8a1abc91828eeacb25672fd7bc905d226
2024-01-25 04:31:25 -05:00
Eleni Vlachopoulou
e4ac153a3e GTestSuite: Set macros for kernel testing depending on hardware capabilities.
- During configuration, CMake system detects if AVX2, AVX512, AVX512VNNI or AVX512BF16 is supported and sets up a macro.
- Those macros need to be used in addition to BLIS_KERNELS_ZEN* to build/run only those tests supported by a specific architecture.

Change-Id: I60adc57d3a570f7bdd6dc834e2562da6bfb52bcc
2024-01-22 08:04:12 -05:00
Shubham Sharma
c1a3dbadf1 Micro-kernel testing of DTRSM kernels
- Added unit tests for avx512 and avx2 native path
  DTRSM kernels for various value of storage, stride,
  K, alpha, ldc.

AMD-Internal: [CPUPL-4403]

Change-Id: I42b1f08aa98c73af39a6e3bd94049965e7c51ae9
2024-01-22 06:24:17 -05:00
Shubham Sharma
006b86c22f Added tests for DTRSM
- Added API tests for DTRSM.
- Added Extreme Value Test cases (EVT) for DTRSM.
  - Tests for various combinations of INFs
    and NANs in A and B matrix are added.
- Added Invalid input test cases (IIT).
  - Added tests to check for cases where inputs
    are not blas compliant.

AMD-Internal: [CPUPL-4403]

Change-Id: Id8af1f1ec65a4e5bc7abba4e86df2756bce6cd42
2024-01-22 06:23:57 -05:00
Harsh Dave
156bc734f0 Micro-kernel testing of DGEMM kernels
- Added unit tests for avx512 and avx2 native and sup path
  DGEMM kernels for various value of storage, M, N
  K, alpha, beta, ldc.

AMD-Internal: [CPUPL-4404]
Change-Id: I33a8098b6a20b55c9f1f1bcffa6812bd792890b1
2024-01-22 05:39:45 -05:00
Arnav Sharma
823e8bfb2d Functional Testing for DDOTV, DSCALV and DASUMV
- Added unit-tests for the following kernels:
  DDOTV
	- bli_ddotv_zen_int( ... )
	- bli_ddotv_zen_int10( ... )
	- bli_ddotv_zen_int_avx512( ... )
  DSCALV
	- bli_dscalv_zen_int( ... )
	- bli_dscalv_zen_int10( ... )
	- bli_dscalv_zen_int_avx512( ... )

- Added API level unit-tests for the following cases:
	- Unit Positive Increments
	- Non-Unit Positive Increments
	- Negative Increments

- Added gtestsuite framework for (s/d/sc/dz)ASUMV.

AMD-Internal: [CPUPL-4406]
Change-Id: I086c51c563fecc7a7e67791c4c4eee8b56c5417b
2024-01-19 07:05:11 -05:00
Edward Smyth
05be482203 GTestSuite: Threshold comparison
Changes to threshold comparison:
- Use error <= threshold as measure of success rather than
  error < threshold.
- Report error compared to epsilon as well as absolute value.
- Correct typo.

AMD-Internal: [CPUPL-4378]
Change-Id: I58e718504ee863294dcdd6bd3cd7637de2638dbc
2024-01-19 05:05:10 -05:00