In Gtestsuite CMakeLists.txt, find_library() will search
user-mentioned library in default system paths first then
in user specified paths. To avoid this CMake is updated
to search the user mentioned library in user specified
path and ignore searching in default path.
AMD-Internal: [CPUPL-4284]
Change-Id: Ia99cf59eb39deac4110d3d733f17548d432dde64
Standardize format of AMD copyright notice.
AMD-Internal: [CPUPL-3519]
Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0
(cherry picked from commit ed5010d65b)
Updated sgemm testcase to handle multiple values of alpha, beta for different input size
Added sgemm testcase to cover m,n,k dimension till 20 size atleast instepsize of 1
Change-Id: Id10ba3d7a05154b171511ef11ea76297494672cd
Some text files were missing a newline at the end of the file.
One has been added.
AMD-Internal: [CPUPL-3519]
Change-Id: I4b00876b1230b036723d6b56755c6ca844a7ffce
(cherry picked from commit f471615c66)
Segfault was reported through nightly jenkins job.
Issue was observed when running in MT mode.
Issue was due to extra broadcast being used.
Extra broadcast would access out of bound memory on input buffer
Cleaned up cobbler list by removing unused registers.
AMD_Internal: [CPUPL-4180]
Change-Id: I1c8715b2850ef855328f2ef12f215987299bdb2b
- Pack and compute are now compared against GEMM operation of reference
library when MKL is not used as a reference.
- For the case where both A and B are unpacked, the reference GEMM is
invoked with a unit-alpha scalar.
- If MKL is used as reference, then these APIs are compared against pack
and compute operations of MKL.
- Updated description in ref_gemm_compute.cpp to reflect this behavior.
AMD-Internal: [CPUPL-4084]
Change-Id: Id0521c9cad8743a7ae471a7f3c547ceb67191f86
- BLAS compute checks updated to properly check for rs_c and cs_c.
- Updated BLAS compute checks to skip validity check if m==1 or n==1.
For the same reason, added a check just before to validate rs_c and
cs_c are greater than or equal to 1.
- Added tiny size tests to gtestsuite as a sanity check.
- Also updated the Invalid Input Tests to test for the updated checks.
AMD-Internal: [CPUPL-4140]
Change-Id: I984339ec7909778b58409ffcdbeed4ee33f28cfb
- Enabled the vectorized AVX-2 code-path for SNRM2_. The
framework queries the architecture ID and calls the
vectorized kernel based on the architecture support.
- In case of not having the architecture support, we use
the default path based on the sumsqv method.
AMD-Internal: [CPUPL-3277]
Change-Id: Ic60c0782dec0b7eb09fac21818eb625e57b1d14f
- Added framework for unit testing of BLAS and CBLAS interfaces for the
Pack and Compute Extension APIs.
- These test the integrated functionality of the trio of
?gemm_pack_get_size(), ?gemm_pack() and ?gemm_compute() APIs.
- Note: Only MKL can be used as reference for now.
AMD-Internal: [CPUPL-3560]
Change-Id: I801654447a716da06c9ccf9db01d553817871571
- Updated the bli_dnormfv_unb_var1( ... ) and
bli_znormfv_unb_var1( ... ) function to support
multithreaded calls to the respective computational
kernels, if and when the OpenMP support is enabled.
- Added the logic to distribute the job among the threads such
that only one thread has to deal with fringe case(if required).
The remaining threads will execute only the AVX-2 code section
of the computational kernel.
- Added reduction logic post parallel region, to handle overflow
and/or underflow conditions as per the mandate. The reduction
for both the APIs involve calling the vectorized kernel of
dnormfv operation.
- Added changes to the kernel to have the scaling factors and
thresholds prebroadcasted onto the registers, instead of
broadcasting every time on a need basis.
- Non-unit stride cases are packed to be redirected to the
vectorized implementation. In case the packing fails, the
input is handled by the fringe case loop in the kernel.
- Added the SSE implementation in bli_dnorm2fv_unb_var1_avx2( ... )
and bli_dznorm2fv_unb_var1_avx2( ... ) kernels, to handle fringe
cases of size = 2 ( and ) size = 1 or non-unit strides respectively.
AMD-Internal: [CPUPL-3916][CPUPL-3633]
Change-Id: Ib9131568d4c048b7e5f2b82526145622a5e8f93d
- This commit focused on enhancing the performance of dgemm
for matrices for very small dimenstions.
- blis_dgemm_tiny function re-uses dgemm sup kernels, bypassing
the conventional SUP framework code path. As SUP framework code path
requires the creation and initilization of blis objects,
accessing all the needed meta-information from objects, querying contexts
which adds performance penaulty while computing for matrices with very
small dimensions.
- To avoid such performance penaulty blis_dgemm_tiny function implements
a lightweight support code so that it can re-use dgemm SUP kernels such a way
that it directly operates on input buffers. It avoids framework overhead of
creating and intializing blis objects, context intialization, accessing other
large framework data structures.
- blis_dgemm_tiny function checks for threshold condition to match before
picking the kernel. For zen, zen2, zen3 architecture tiny kernel is invoked
for any shape as long as m < 8 and k <= 1500 or m < 1000 and n <= 24 and k <=1500.
While for zen4 as long as dimensions are less than 1500 for m,n,k tiny kernel is
invoked.
-blis_dgemm_tiny function supports single threaded computation as of now.
AMD-Internal: [CPUPL-3574]
Change-Id: Ife66d35b51add4fccbeebd29911e0c957e59a05f
- Updated the bli_zaxpbyv_zen_int( ... ) kernel's computational
logic. The kernel performs two different sets of compute based
on the value of alpha, for both unit and non-unit strides. There
are no constraints on beta scaling of the 'y' vector.
- Updated the logic to support 'x' conjugate in the computation.
The kernel supports conjugate/no conjugate operation through the
usage of _mm256_fmsubadd_pd( ... ) and _mm256_addsub_pd( ... )
intrinsics.
- Updated the early return condition in the kernel to adhere to
the standard compliance.
- Updated the scalar computation with vector computation(using 128
bit registers), in case of dealing with a single element(fringe case)
in unit-stride or vectors with non-unit strides. A single dcomplex
element occupies 128 bits in memory, thereby providing scope for
this optimization.
- Added accuracy and extreme value testing with sufficient sizes
and initializations, to test the required main and fringe cases
of the computation.
AMD-Internal: [CPUPL-3623]
Change-Id: I7ae918856e7aba49424162290f3e3d592c244826
- Designed test cases for unit testing of ZGEMM compute
kernel for handling inputs when k == 1. The design
uses value-parameterized testing for checking accuracy,
and verifying the mandate in case of exception values
on the inputs/output.
- The design uses type-parameterized testing for verifying
BLAS standard for invalid input cases, and also for early
return scenarios.
- Added the function template set_ev_mat( ... ) as part of
testinghelpers. This function is used as a helper for
inducing exception values onto indices specified as
arguments to the test_gemm( ... ) interface.
- Abstracted the function definition of getValueString( ... )
from the NRM2 testing interface to testinghelpers(renamed
as get_value_string( ... ) for naming consistency), in order
to use it as a helper function across all APIs in case of
exception value testing.
AMD-Internal: [CPUPL-3823]
Change-Id: I0fea21f9c8759bbbdc88ba0a016202753e28f2a7
- Renaming ELEMENT_TYPE to BLIS_ELEMENT_TYPE, since the first is defined on a Windows header.
- Updating refCBLAS object to have different implementation depending on the platform.
- Removing dlfcn.h from all reference headers since it's linux specific and adding it conditionally on a higher level.
- Changes on all CMakeLists.txt files to enable building on Windows.
AMD-Internal: [CPUPL-2732]
Change-Id: I6e35656a3779b35dc815a2409cf84c22dd27f3e7
- Adding default template parameter for the type of the returned value from nrm2.
- Bugfix on NaN/Inf comparator for scalars.
- Tuning sizes of vector x to exercise the different paths for vectorized and scalar code.
- Adding wrong parameters and extreme value testing.
- Adding tests for overflow and underflow using max and min representable numbers for vectorized and scalar code.
AMD-Internal: [CPUPL-2732]
Change-Id: Ice8ee65095ecaa7b30ebd5f90ed2a890178533db
- Functions to print matrix and vector elements.
- Functions to convert matrix to symmetric, hermitian
triangular matrix and set diagonal elements in matrix.
AMD-Internal: [CPUPL-2732]
Change-Id: I1ffa5289329cbb8a9581bf545bdd157801cf5baa
Since random numbers are specified from ELEMENT_TYPE and we never generate tests for both integer and floating point numbers at the same time, we update code as described below:
- random vector/matrix generators are updated to use ELEMENT_TYPE as a default parameter.
- ::testing::Values(ELEMENT_TYPE) is removed from all test generators.
AMD-Internal: [CPUPL-2732]
Change-Id: Ibc6b05044502f541c9e8a7687931b1ca2903fb0c
Some text files were missing a newline at the end of the file.
One has been added.
Also correct file format of windows/tests/inputs.yaml, which
was missed in commit 0f0277e104
AMD-Internal: [CPUPL-2870]
Change-Id: Icb83a4a27033dc0ff325cb84a1cf399e953ec549
Source and other files in some directories were a mixture of
Unix and DOS file formats. Convert all relevant files to Unix
format for consistency. Some Windows-specific files remain in
DOS format.
AMD-Internal: [CPUPL-2870]
Change-Id: Ic9a0fddb2dba6dc8bcf0ad9b3cc93774a46caeeb
- Adding doc regarding option setting for INT64 in README.
- Bugfix on template instantiation on helper function. Updated to use gtint_t instead of int.
AMD-Internal: [CPUPL-2732]
Change-Id: Ia52407a1ef3fdd06e905c2e3d4aa5befb80e82d6
Updated the CmakeLists.txt to check whether the specified
libraries are present or abort cmake building
AMD-Internal: [CPUPL-2732]
Change-Id: I90115217c228430095aa53a82dc26d16935b320f
- Functions to convert to cblas enums from char.
- Functions to print matrix and vector elements.
- Functions to set matrix and vector elements with
the given value.
AMD-Internal: [CPUPL-2732]
Change-Id: I1046b9578c8456e89eddba4a4e8577016b9361ca
- Fixing thresholds to be more appropriate.
- Updating the way random entries of A and B are generated so that A is diagonally dominant and the algorithm doesn't diverge.
AMD-Internal: [CPUPL-2732]
Change-Id: I6d5691d744ecc623f66c45e94461bd88625d7179
- Tools used for code coverage are : Gcov and Lcov.
- We need to use macros specified by gcov during
compiliation of blis and gtestsuite.
- Locv will generate coverage reports in html format.
AMD-Internal: [CPUPL-2732]
Change-Id: I17b30b4a322b8771f2d6a4ba28986cf0ccf3fba6
- Added a header with correct default values to be used in tests.
- Updated README to include information on how to test for wrong parameters and some explanation on how lda increments work.
AMD-Internal: [CPUPL-2732]
Change-Id: I4f540d46013ffe91b4acb30da2b437251c09d3bc
- Fix in README.md.
- Updating abs overload for scomplex and dcomplex to avoid overflow by using std::abs.
- Updating comparators to take into account NaNs and Infs when measuring error.
AMD-Internal: [CPUPL-2732]
Change-Id: I8c12bacd9d63b2e914d0a79f337f7525dc16b733
cmake generated files and executables are cleaned within
build directory by "make distclean" command.
Change-Id: I4fd5193e92958122ff10ecc634b42096f3b3716e
Key features:
- able to test both static and dynamic libraries
- able to test BLAS, CBLAS and BLIS-typed interface
- can use any CBLAS library for reference results
- can build and/or run tests depending on the BLAS level or a specific API
AMD-Internal: [CPUPL-2732]
Change-Id: Ibe0d7938e06081526bbc54d3182ac7d17affdaf6
- GoogleTest headers removed. GoogleTest gets fetched at
configuration time.
- BLIS headers removed. A BLIS installation path is required at
configuration time.
- Windows has been temporarily disabled.
AMD-Internal: [CPUPL-2732]
Change-Id: I9e55c8e43b2733f96cd8b6e5449d79623decad5c