Details:
- Added SIMD code
- Processing 5 rows at a time in SIMD loop to improve performance
AMD-Internal: [CPUPL-1054]
Change-Id: I2ac93f25895dccfc42e14be0689e6d4e655d6a0a
Note that there is know issue with Intel 19+ as explained
in https://github.com/flame/blis/issues/371.
AMD version needs this support as some user applications
need ICC support.
AMD-Internal: [CPUPL-1223]
Change-Id: I86ddee068ae18bd940a5952d60960228d8100e97
When library is built as single thread and trace is enabled, the test
applications in test folder fail to compile. In the file aoclos.c the function
AOCL_gettid() uses "omp_get_thread_num() to get thread_id, which is only
enabled when OpenMP based parallel BLIS library is generated. To fix this in
single thread case we now return zero for thread id, openmp function is used
only when BLIS_ENABLE_OPENMP macro is defined. However this is not a complete
fix. If library is built with pthread, AOCL_gettid() always return 0, which is
not the intended behaviour.
Change-Id: I5b79ed57d27d0022d3dcab0e2a3a557c8e4ff8ee
Details:
- Amin api returns index of minimum absolute value in a vector.
- Added amin reference blis kernel.
- Added blas and cblas interface for amin.
AMD-Internal: [CPUPL-1155]
Change-Id: I89c1e37e86950a4582bba70a5d8fc70ac915bd3c
In DTL_Trace() function - we call bli_init_auto() inorder to enable trace for
BLAS/CBLAS APIs
Added logging of input parameters for sgemv_ and dgemv_. Added function
tracing for amax_, axpy_ gemv_ and dotv_ BLAS functions.
Change-Id: I4483c1e918c3c78946f7377b4b69eba6af4e925e
Details
- Added Framework optimizations for BLAS and CBLAS interfaces for caxpyv_(cblas_caxpyv) and zaxpyv_ (cblas_zaxpyv).
- Added new axpyv AVX2 kernels for c and z data types for AMD EPYC family.
AMD-Internal: [CPUPL-1231]
Change-Id: I9bc0c21fef9da84533adcef76427977430b27ea7
Replaced gettid() syscall with omp_get_thread_num() to
create files for logging the data. This will ensure that there
is one to one mapping between threads created in BLIS and
ID's used to name trace and log files.
AMD-Internal: [CPUPL-1236]
Change-Id: I45b1721a7a9c855eeec43e7cbb5089f2a955ff72
Details:
- Kernel is called directly from API call to avoid framework overhead in case of complex float and complex double precisions.
- Added SIMD code for complex float and complex double and unrolled for loop 5 times to improve performance
AMD-Internal: [CPUPL-1057]
Change-Id: I3b9d202398cacc0168882c9d6da2b450c27466a0
Details:
- Introduced a new macro 'BLIS_CONFIG_EPYC' to enable blas and cblas
framework optimizations for zen family configurations.
- The macro needs to be defined in family.h files of respective arch
configs.
- Moved zen2-specific optimized kernels to zen folder, in order to be
accessible to all zen family architectures.
Change-Id: I8da2db6b7ab22ef350a01d86c214006e812eb06d
Fixed AOCL DTL logs printing incorrect alpha and beta values for single
precision. Added missing info like data-types, lower or upper traingular and
Side parameter in the case of TRSM.
Code cleanup and formatting the files test_cabs1.c, test_axpbyv.c and
test_gemm.c. In dumping trsm parameters replaced 'side' with bli_is_right(side).
Change-Id: Ic81503ae696956eb074ec208f7109d1a394183d7
Details:
- Added debug trace support for DGEMMT and DTRSM APIs.
- Added log support for gemmt, trsm APIs.
- Modified gemm dump_sizes function to dump transpose parameters.
AMD-Internal: [CPUPL-1210]
Change-Id: Ice1effe27ec349203ce5def030a6b85b204bd91e
Details:
- gemm_batch API computes a series of GEMM for groups of general
matrices.
- Each group contains matrices with same parameters.
- This API is part BLAS extension APIs.
AMD-Internal: [CPUPL-1184]
Change-Id: Ic23772830eb1d157da4db45158a039b0826419fd
Details:
- added cblas extension cblas_?cabs1.
- Functionality : res=|Re(z)|+|Im(z)|, z is a complex number, and res is a value containing the absolute value of a complex number z.
AMD-Internal: [CPUPL-1129]
Change-Id: I4a3c265c89527c8fd3060c5d2ed38b1953ce6343
Details:
- Corrected "#if" directive in line 89
- Commented out "#define print" to disable printing the vectors
Change-Id: I9ec3cbfb716540dd3e2264f5c3925d9e0c0c294a
Modified Makefile in test folder to enable calling BLAS interfaces for BLIS as
well. This is possible by replacing -DBLIS with -DBLAS=\"aocl\" in the
makefile. Also added linking to multi-threaded MKL library.
Change-Id: Iccf2ec99b48bb35da985b69218bc680f678ff7c9
Details:
- The axpby routines perform a vector-vector operation defined as
y = a*x + b*y where a, b are scalars and x, y are vectors.
- This API is part of BLAS-like extension APIs
Change-Id: I17a53b03bba97de7ae1995a9f086084bd241bcdc
AMD-Internal: [CPUPL-1118]
Details:
- For GEMV whenever beta = 0, we should not scale vector 'y' with beta,
instead overwrite the 'y' vector with zeroes before carrying out the
operation.
Change-Id: I159afba6c6ac3b72b74718fab7a4f4ec293012c5
Some of the SUP kernels now use rbp register.
This register was also used by compiler to support
automatic function call tracing, which was creating
the conflict. Automatic call tracing feature is
removed for now. If needed it can be enabled for
non kernel code.
Change-Id: Ib7ad00875f501ee2ad552cbb2ecdc245002d63b7
AMD-Internal: [CPUPL-1135]
Corrections in bli_gemm_front.c, taken the corrections from both public repo and 2.2.1 branch
[CPUPL-1067]
Change-Id: I4887ece6aa20bdfb87d97e7acebbe04cb9feea02
- Bug fix in sgemmsup 1x16 Kernel for Beta Zero and with C col storage
rcx register incrementing was missing because of this 4 values
in output are overwritten
Change-Id: Ia3028040dce3e615f1db5a331498d86faadcf916