Updated compiler id in cmake related files from
CMAKE_CXX_COMPILER_ID to CMAKE_C_COMPILER_ID
AMD-Internal: [CPUPL-2748]
Change-Id: Ib0e2a2e3ec8fafeb423fe56b9842a93db0115371
CGEMM:
API: Functional testing of CGEMM
Covers different matrix sizes
Hence it covers SUP and Native code path
EVT: Insertion of Exception values like NAN, +/-INF in Matrix
EV is inserted in user provided indices of in/out Matrices
EV is passed as alpha and beta values
Expectation is output should be complaint with standard output
MEM: To check for out of bound read or write through protected pages
ZGEMM:
- Updated EVT tests for special case for alpha, beta when
imaginary component is 0
- Updated SUP & Native method to support C/Z datatype
AMD-Internal: [CPUPL-4712]
Change-Id: If8ba99998e0a494375a764bb7756d45147388965
- dotxf is a blis specific kernel, which performs dotxv
operation but in multiple of fused factors to speed up
the operations.
- So dotxf reference function is implemented for gtestsuite,
where dotxf computation compared against computation done by
looping over dotxv function.
AMD-Internal: [CPUPL-4764]
Change-Id: I342dab066ceb1710649e54bb73afc5a23e2a8177
- Testcases with exception values such as nan and +/-inf.
- Randomly inserting nan, +/- inf in A,B or C matrix along with
alpha and beta with extreme values
AMD-Internal: [CPUPL-4681]
Change-Id: Ia92bcdb4519e9a0e4c6026e93b5e2e2f0e19b065
- axpyf is a blis specific kernel, which performs axpy
operation but in multiple of fused factors to speed up
the operations.
- So axpyf reference function is implemented for gtestsuite,
where axpyf computation compared against computation done by
looping over axpy function.
AMD-Internal: [CPUPL-4763]
Change-Id: I4713fd0b0d9e9cf688c9aaa82ac0e6ae07a05989
Details:
- Added new folder named JIT/ under addon/aocl_gemm/. This folder
will contain all the JIT related code.
- Modified lpgemm_cntx_init code to generate main and fringe kernels
for 6x64 bf16 microkernel and store function pointers to all the
generated kernels in a global function pointer array. This happens
only when gcc version is < 11.2
- When gcc version < 11.2, microkernel uses JIT-generated kernels.
otherwise, microkernel uses the intrinsics based implementation.
AMD-Internal: [SWLCSG-2622]
Change-Id: I16256c797b2546a8cd2049680001947346260461
- Added unit-test cases for verifying the accuracy of
bli_zaxpbyv_zen_int( ... ) kernel.
- The test cases cover the necessary range of values for the sizes
and the scaling factors(alpha and beta), to ensure code-coverage
and check for compliance with the standard.
- Added memory tests for these kernels, to check for
out-of-bounds reads/writes.
- Further updated the test-cases for exception value testing(EVT)
of ZAXPBY API. These test-cases verify the compliance against the
standard and help in determining whether the exception value has to
be propagated, or handled seperately.
AMD-Internal: [CPUPL-4698]
Change-Id: If3c470c051f94393be3a1d444ed424f626ae6f5f
- Updated SCALV test template to handle mixed-precision datatypes.
- These tests explicitly induce NaNs and (+/-)Infs in the input vector
to verify the handling or propagation of NaNs and Infs according to
the compliance.
AMD-Internal: [CPUPL-4710]
Change-Id: Iab4b671677542f1137631060dc0592086acf874c
- Utilized the memory testing feature in gtestsuite to add memory tests
for D/Z/ZDSCALV kernels.
- Updated the test fixtures, loggers and instantiators to use the new
testing interface for memory testing.
AMD-Internal: [CPUPL-4700]
Change-Id: I13cad2271198423e7b0d361f6a5cccdc8b401183
- Utilized the memory testing feature in gtestsuite to add memory tests
for DDOTV micro-kernels.
- Updated the test fixtures, loggers and instantiators to use the new
testing interface for memory testing.
- Use --gtest_filter="*mem_test_disabled*" to disable memory tests or
--gtest_filter="*mem_test_enabled" to run only memory tests.
AMD-Internal: [CPUPL-4406]
Change-Id: I887a89f33ca43e504479702263b6c66ddd7937de
- Updated the existing benchmarking file for SCALV API, to include
support to call the BLAS and CBLAS mixed-precision SCALV, namely
cblas_csscalv(), csscalv_(), cblas_zdscalv(), zdscalv_().
- The input is expected to be given with the datatype 'ZD' and 'CS'
in order to benchmark the associated mixed-precision APIs.
AMD-Internal: [CPUPL-4722]
Change-Id: I4ab0fb19fe1949468cf707d0a857e8a1681addeb
Description
1. when mr0=1 case the accumulator register and operand
registers for an fma instruction got swapped. Corrected
the copy paste error.
2. Removed fill array for c_ref in bench_lpgemm.c and used
memcpy from c buf, because fill array now using rand()
function to initialize data which can be different
when c_ref and c called separately, this was working
because data was fixed (i=0 ... i%5).
Change-Id: Ia513331ba49d28adc7bcdc0ec78d443abe66780b
- Added test cases to verify the compliance of ?SUBV APIs,
through Exception Value Testing(EVT). This is done by
inducing exception values in the input operands. The induction
is controlled by the user, through indices given as part of the
parameterized test-cases.
- Various combinations of zeros, NaNs and +/-Infs have been used to
verify the compliance against the standard.
Change-Id: If7ce582f2d0ab92acaf02215126f6e4caff3af8d
CMakelists.txt is updated to support ASAN to find
memory related errors in blis library. ASAN is enabled
by configuring cmake with the following option .
$ cmake .. -DENABLE_ASAN=ON
ASAN supports only on linux with clang compiler.
And redzone size default size is 16 bytes and maximum
redzone size is 2048 bytes.
$ ASAN_OPTIONS=redzone=2048 <exe>
AMD-Internal: [CPUPL-2748]
Change-Id: I0b70af5c41cf5c68602150daeb67d7432bbe5cb8
- Updated existing ERS and IIT test framework in SCALV to handle mixed
precision types (CSSCAL/ZDSCAL).
AMD-Internal: [CPUPL-4673]
Change-Id: I72399675e4e5b8a3e16d81d747db73a3c88ce1ef
- Added micro-kernel and API level tests for avx512 and avx2 small, sup
and native SGEMM kernels for various value of storage,
M, N, K, alpha, beta
- Added memory testing for sgemm kernels
AMD-Internal: [CPUPL-4681]
Change-Id: I72f94960e7c497ae75da872412eee69c23637348
1. The 5 LOOP LPGEMM path is in-efficient when A or B is a vector
(i.e, m == 1 or n == 1).
2. An efficient implementation of lpgemv_rowvar_f32 is developed
considering the b matrix reorder in case of m=1 and post-ops fusion.
3. When m = 1 the algorithm divide the GEMM workload in n dimension
intelligently at a granularity of NR. Each thread work on A:1xk
B:kx(>=NR) and produce C=1x(>NR). K is unrolled by 4 along with
remainder loop.
4. When n = 1 the algorithm divide the GEMM workload in m dimension
intelligently at a granularity of MR. Each thread work on A:(>=MR)xk
B:kx1 and produce C = (>=MR)x1. When n=1 reordering of B is avoided
to efficiently process in n one kernel.
5. Fixed few warnings while loading 2 f32 bias elements using
_mm_load_sd using float pointer. Typecasted to (const double *)
AMD-Internal: [SWLCSG-2391, SWLCSG-2353]
Change-Id: If1d0b8d59e0278f5f16b499de1d629e63da5b599
- Added unit-test cases for the following AVX2 kernels:
- bli_snorm2fv_unb_var1_avx2( ... )
- bli_scnorm2fv_unb_var1_avx2( ... )
- bli_dnorm2fv_unb_var1_avx2( ... )
- bli_dznorm2fv_unb_var1_avx2( ... )
- Defined a templatized testing interface and function-pointer
type. This is used as part of the test-fixture class and
testsuite definitions, when writing the unit tests.
- The test cases cover the necessary range of values for the sizes
to ensure code-coverage in the kernels.
- Further added memory tests for these kernels, to check for
out-of-bounds reads/writes.
AMD-Internal: [CPUPL-4637]
Change-Id: I747ab104b947e87b5f8eda597256b7b8b6f7c2f2
- Added API tests for [C\Z]TRSM.
- Added Extreme Value Test cases (EVT) for [C\Z]TRSM.
- Tests for various combinations of INFs
and NANs in A and B matrix are added.
- Added Invalid input test cases (IIT).
- Added micro kernel testing for ZTRSM
- Added unit tests for small and native
path kernels.
- Added memory testing for ZTRSM
kernels.
AMD-Internal: [CPUPL-4641]
Change-Id: I0db6b2c75b59821e1cde33532fb13400fab43412
- Added API tests for STRSM.
- Added Extreme Value Test cases (EVT) for STRSM.
- Tests for various combinations of (+/-) INFs
and NANs in A and B matrix are added.
- Added micro kernel testing
- Added unit tests for small and native
path kernels.
- Added memory testing for STRSM
kernels.
- Edited the protected buffer in memory testing to
make sure that greenzone1 and greenzone2 do not
intersect.
AMD-Internal: [CPUPL-4640]
Change-Id: Ic48590d3b4ad12c4f2f6beaec2e1106a7aaa5213
While build blis library using ninja generator on windows, observed
ninja is randomly adding "|| '(set', 'FAIL_LINE=3&', 'goto', ':ABORT)'"
as extra arguments for add_custom_command. Due to this flatten-headers
python script was failing to create blis.h and cblas.h headers.
Modified the python script to fix above issue.
AMD-Internal: [CPUPL-2748]
Change-Id: I83b753d08e46f94b282176fcc661ce34e5eee3cf
- Updated test_scalv and ref_scalv templates for SCALV gtestsuite to
support unit-tests for mixed precision SCALV.
- Added unit-tests for the following kernels:
ZSCALV
- bli_zscalv_zen_int( ... )
ZDSCALV
- bli_zdscalv_zen_int10( ... )
- bli_zdscalv_zen_int_avx512( ... )
- Also, added API level unit-tests for the following cases:
- Unit Positive Increments
- Non-Unit Positive Increments
- Updated comments in DSCALV unit-tests with the correct kernel name.
AMD-Internal: [CPUPL-4624]
Change-Id: I96db8d3612687be07cd0e638a3119d41c3641ce8
- Added test cases to verify the compliance of SAXPY and ZAXPY
APIs, through Exception Value Testing(EVT). This is done by
inducing exception values in the input operands. The induction
is controlled by the user, through indices given as part of the
parameterized test-cases.
- Various combinations of zeros, NaNs and +/-Infs have been used to
verify the compliance against the standard. These combinations
help in determining whether the exception value has to be
propagated, or handled seperately.
- Updated the comments, class names and test-case loggers for
uniformity.
- Added special cases of alpha and beta values to API level
functionality tests, to check for any possible framework
level optimizations against the standard.
AMD-Internal: [CPUPL-4655]
Change-Id: I3d817d44c6d239cbc61d146583707b3c8338de29
Modify thresholds to reflect number of operations that
accumulate results into each output element. Different
limits are set for early return and special cases.
Constants are still subject to experimentation and change.
AMD-Internal: [CPUPL-4378]
Change-Id: Ic4540a2f1f6cd6380228b6a2884ac62850d6d8c6
- Testing out of bound read and write of input and output matrix
for SUP and Native micro kernels
- Protected buffers and memory testing feature available in gtestuite
is used to validate memory error
AMD_Internal: [CPUPL-4623]
Change-Id: I620fd3cd4eed1002e08b6233effb89b47beb073f
- Added unit-test cases for bli_zaxpyv_zen_int5( ... ),
bli_saxpyv_zen_int10( ... ) and bli_saxpyv_zen_int_avx512( ... )
kernels.
- The test cases cover the necessary range of values for the sizes
and the scaling factor(alpha), to ensure code-coverage and check
for compliance with the standard.
- Further added memory tests for these kernels, to check for
out-of-bounds reads/writes.
AMD-Internal: [CPUPL-4629]
Change-Id: If5e626ca2d0270e34dc2d951ae5c81f839a78ef0
For gcc greater than or equal to 7.0 version added AVX512 compiler flags
in makde_defs.mk and make_defs.cmake. AVX512VNNI compiler flag is only
supported from gcc version 8 or greater. So added another else condition
for gcc version greater than or equal to 7 - enabling avx512 flags.
This enables compilation of AVX512 assembly code paths with gcc 7.5 version.
Change-Id: I2cda00e578010db5e5a515b506c0b99f685307e0
- These tests explicitly include NaNs and (+/-)Infs in the input vector
to verify the handling or propagation of NaNs and Infs according to
the compliance.
AMD-Internal: [CPUPL-4406]
Change-Id: I3063805eb3fdfd58be3168b24cdb97de2c175c3c
- Utilized the memory testing feature in GTestsuite
to update the testing interfaces for micro-kernel
testing of DAXPY, DAXPBY and DCOPY APIs.
- The interface allocates memory using objects of
ProtectedBuffer class, which define the redzones
and greenzones as per the requirement.
- Updated the test fixture classes, test-case loggers and
the instantiators to use the new testing interface for
memory testing.
- Added special cases of alpha and beta values to API
level functionality tests, to check for any possible
framework level optimizations against the standard.
- Code cleanup of ?_generic.cpp and ?_evt_testing.cpp
files of DAXPY, DAXPBY and DCOPY APIs.
AMD-Internal: [CPUPL-4402]
Change-Id: Id945cabbbb42604d76a9e34269bff0f9f6712604
- Warning is raised for the implicit declaration of bli_gemm_md_is_ccr()
when BLIS is configured with --disable-mixed-dt flag.
- Encapsulated the usage of bli_gemm_md_is_ccr( ... ) inside the
BLIS_ENABLE_GEMM_MD macro.
AMD-Internal: [CPUPL-4630]
Change-Id: Icc59b1bcd3a21492daaaf6bcec80a5bf67012ace
-This post-operation computes C = (beta*C + alpha*A*B) + D, where D is
a matrix with dimensions and data type the same as that of C matrix.
AMD-Internal: [SWLCSG-2424]
Change-Id: I9464d1f514e3b04275fe93441489b4503a08937a
- Added API level test-cases, to verify the functionality
of ?SUBV APIs. These tests cover unit increments and
non-unit positive increments for input params x or conj(x),
vector length n, stride size of x, stride size of y
- ERS tests have been added for the ?SUBV APIs as per the BLIS
compliance standards.
- Following are the standard tests added:
?SUBV
- n <= 0
- Invalid Input Tests are not required for these APIs.
Change-Id: Ia300bce41d15105ad48143aa7e0943fb676d73b2
- Added Invalid input test cases (IIT).
- Added tests to check for cases where inputs
are not blas compliant.
AMD-Internal: [CPUPL-4404]
Change-Id: Ibbd7494b2fc6a9bebe93cd9d66be57b9b43f25f2
1. NAN and +/-INF are considered to be exception values.
2. Inserting NAN and +/- INF at random indices of Matrix A, B & C.
3. NAN and +/-INF are also passed as alpha, beta values
4. Even with these values present in matrices,
Output should be complaint with reference/standard solution
AMD-Internal: [CPUPL-4426]
Change-Id: Ibf0ad03ea1a3a2b63f2702a4dd6bbc8f9f116ddd
- Added framework for memory testing.
- Out of bound reads and writes can be
detected in both C and assembly.
- Added memory tests for DTRSM.
- Test methodology:
- Use linux's protected pages to set some memory
before and after the required buffer as protected.
- Set the first and last page_size bytes as
read, write and execute protected (red_zones).
- If any part of code tries to read/write
in redzones, a SIGSEGV signal will be
generated, which can be used to detect a
out of bounds read and write.
- Page protection can only be set per page.
If required size for buffer is not a multiple
of pagesize we have to allocate more memory
than required in order make sure the start and
end of redzones align with page boundaries.
- Overwrite malloc(size) to allocate
'buffer_size+(2*pagesize)' where buffer_size =
minimum size such that buffer_size > 'size' and
buffer_size is multiple of pagesize.
- Use first and last page_size bytes of allocated
buffer as redzones, use first 'size' of the middle
buffer as first greenzone and last 'size' bytes as
second greenzone.
- Call test code once with first geenzone and then
with second greenzone. Greenzones are surrounded
by redzones, if test code read/writes before or after
greenzones, it will be detected.
|_____________________________________________________|
| red_zone1 | green_zone1 greenzone_2 | red_zone2|
|_____________________________________________________|
AMD-Internal: [CPUPL-4403]
Change-Id: Ic5c22a9adf8f833c77510686eee886485e894354
Non-zen configurations will use frame/compat/bla_gemm.c rather than
frame/compat/bla_gemm_amd.c. In the former, change dzgemm definition
to have dzgemm_blis_impl and optional dzgemm_ wrapper, as in the
AMD version.
AMD-Internal: [CPUPL-4082]
Change-Id: I66caff56e033bda8bb4ff2d60a16f7e52af122ea
Functionality testing for below apis are carried out with various input ranges and values
Interface would invoke listed API's in the below sequence if the condition is satisified
List of API's - Condition
SCALM : alpha = 0
GEMV : m = 1 or n = 1
Small ST : ((m0*k0) <= 16384) || ((n0*k0) <= 16384)))
SUP AVX2 : (m || n || k) <= 128
SUP AVX512 : (m || k) <= 128 || n <= 110
Native : Default path, If above API's doesn't support
the given input values
AMD-Internal: [CPUPL-4426]
Change-Id: I40cd30a11592e4e553e09f0d81153abf0bf0b002
-This post-operation computes C = (beta*C + alpha*A*B) + D, where D is
a matrix with dimensions and data type the same as that of C matrix.
-For clang compilers (including aocc), -march=znver1 is not enabled for
zen kernels. Have updated CKVECFLAGS to capture the same.
AMD-Internal: [SWLCSG-2424]
Change-Id: Ie369f7ea5c80ab69eea3f3e03a8d9546e14f5c09
Add cmake option to override thresholds and set them all to zero.
In this case we don't switch to binary comparison as we want the
error to be calculated and printed. This functionality is intended for:
- Helping to determine or alter thresholds.
- To compare different max errors between different reference libraries.
- To test when we expect identical results, e.g. some comparisons of
BLIS vs BLIS.
To simplify coding, this is implemented by setting epsilon to zero
in the testinghelpers function.
AMD-Internal: [CPUPL-4400]
Change-Id: I2cf021e0cc24c62e7600ba80fd810f3aa55a6ea5
Add cmake option to convert all character arguments to upper
case to check compliance.
AMD-Internal: [CPUPL-4499]
Change-Id: Ic18416d78f63b999a78253463cc15c32f7d444f4
CMakelists.txt is Updated to generate code coverage
report in html format just by configuring cmake with
-DENABLE_COVERAGE=ON. Code supports only on linux
with gcc compiler
cmake .. -DENABLE_COVERAGE=ON
AMD-Internal: [CPUPL-2748]
Change-Id: I9b36b6cc3f1f97b53e1c4ee62948a017418e3d41
- Added test cases to verify the compliance of DAXPY and DAXPBY
APIs, through Exception Value Testing(EVT). This is done by
inducing exception values in the input operands. The induction
is controlled by the user, through indices given as part of the
parameterized test-cases.
- Various combinations of zeros, NaNs and +/-Infs have been used to
verify the compliance against the standard. These combinations
help in determining whether the exception value has to be
propagated, or handled seperately.
- Updated the daxpyvGenericTestPrint logger for uniformity across
the testing categories.
- Added test cases for bli_daxpyv_zen_int10( ... ) micro kernel
testing to cover the loops iterating in blocks of 52 and 16
respectively.
AMD-Internal: [CPUPL-4402]
Change-Id: Ida6cf5e08727b4c3cb87c93bfec6be76361cfaea
CMakelists.txt is added in bench.
Steps are provided to build for different targets.
AMD-Internal: [CPUPL-2748]
Change-Id: I58027f4e42d1323cafb151224c45868bc8337ff4
- Added API level tests for avx512 and avx2 k1 kernels,
tiny, small, sup and native DGEMM kernels for various
value of storage, M, N, K, alpha, beta
AMD-Internal: [CPUPL-4404]
Change-Id: Ieadf407601a8efc5a2c0956d08d791dcfa69e44b
- ERS tests have been added for the above APIs as per the BLAS
compliance standards.
- Following are the standard tests added:
?SCALV
- n <= 0
- incx <= 0
- alpha == 1
?DOTV
- n <= 0
?ASUMV
- n <= 0
- incx <= 0
- Invalid Input Tests are not required for these APIs.
- Updated the micro-kernel test files to include the new macros
generated for enabling and disabling architecture specific tests.
- Updated the function calls for mixed-precision typed_asumv tests.
AMD-Internal: [CPUPL-4406]
Change-Id: Ib34b2f39809d93075ae1168682b3ef2380e03a5a