znver3 flag will be enabled if compiler is AOCC Clang version 3.0
and configuration is zen3
Change-Id: Ie164f4d469bf3f8df31ccf8fed9f80dfc62efb39
AMD-Internal: [CPUPL-1353]
Details:
- when BLIS_CONFIG_EPYC is not defined, zdotc is defined twice.
- One definition is part of macro based code.
- Other definition is implemented as part of framework optimizations.
- Modified the bla_dot.c file to choose macro based code for configs
other than zen family.
AMD-Internal: [CPUPL-1348]
Change-Id: I9ef6a590a6199e173d38248c3fb72feddfb20922
Description:
[AMD Internal]: CPUPL-1336
Removed extra/un-nesseary loads in dgemmmsup kernels which are
accessing the memory beyond the boundaries and causing segmentation
issue.
Kernels:
bli_dgemmsup_rd_haswell_asm_1x4
bli_dgemmsup_rv_haswell_asm_1x6
Change-Id: Idaeed36ebd9f13550943394a37e372b8d015b2d3
Added traces in cblas layer for these API's.
These test drivers didn't have calls for complex data
types, the drivers are updated to support them.
AMD-Internal : [CPUPL-1315]
Change-Id: Ia52ecca68ea17314315d626b57c46a2f5973985b
Fixed test driver code for her, her2
Support added to handle complex and double complex data type in test driver.
Change-Id: If65939e99d8cf77e0fb70561166d84bf67d0321d
AMD-Internal: [CPUPL-1326]
Verifying the valid values of m, n, k, lda, ldb and ldc is removed.
Since the bench app is run on logs collected from AOCL traces.
The correct way of checking should consider transpose parameter and storage order.
Change-Id: If0fbf733c2650c6f328661293eb99d062685d638
Fixed test driver code for her, her2, herk and her2k function.
Above functions supports only complex and double complex data type, test code is updated accordingly.
Change-Id: Iee7b79abda4a2959a265c420d23879bf47f2c38d
AMD-Internal: [CPUPL-1313]
Block sizes (MC, KC, NC) for DGEMM are determined at runtime
based on following parameters
- Single or multithreaded build
- Processor Architecture (currently support only zen3)
- Number of threads requested while running the library
Change-Id: Ia793484b77adb87486e630d0d3b4c7856ae52094
AMD-Internal: [CPUPL-660, CPUPL-661]
Added blis.h in aoclos.c in order to check if BLIS was
build with openmp support.
AOCL-Internal: [CPUPL-1238]
Change-Id: I366da030266b9d7f2ad09dc722847a7d86b85933
Details:
Native method is being enabled for complex gemm
Need to run performance for large dataset to enable induced method
MD-Internal: [CPUPL-1300]
Change-Id: I5444dd31e8b8e73da73f789da8b64276e8e40de8
Details:
- Added SIMD code
- Processing 5 rows at a time in SIMD loop to improve performance
AMD-Internal: [CPUPL-1054]
Change-Id: I2ac93f25895dccfc42e14be0689e6d4e655d6a0a
Note that there is know issue with Intel 19+ as explained
in https://github.com/flame/blis/issues/371.
AMD version needs this support as some user applications
need ICC support.
AMD-Internal: [CPUPL-1223]
Change-Id: I86ddee068ae18bd940a5952d60960228d8100e97
When library is built as single thread and trace is enabled, the test
applications in test folder fail to compile. In the file aoclos.c the function
AOCL_gettid() uses "omp_get_thread_num() to get thread_id, which is only
enabled when OpenMP based parallel BLIS library is generated. To fix this in
single thread case we now return zero for thread id, openmp function is used
only when BLIS_ENABLE_OPENMP macro is defined. However this is not a complete
fix. If library is built with pthread, AOCL_gettid() always return 0, which is
not the intended behaviour.
Change-Id: I5b79ed57d27d0022d3dcab0e2a3a557c8e4ff8ee
Details:
- Amin api returns index of minimum absolute value in a vector.
- Added amin reference blis kernel.
- Added blas and cblas interface for amin.
AMD-Internal: [CPUPL-1155]
Change-Id: I89c1e37e86950a4582bba70a5d8fc70ac915bd3c
In DTL_Trace() function - we call bli_init_auto() inorder to enable trace for
BLAS/CBLAS APIs
Added logging of input parameters for sgemv_ and dgemv_. Added function
tracing for amax_, axpy_ gemv_ and dotv_ BLAS functions.
Change-Id: I4483c1e918c3c78946f7377b4b69eba6af4e925e
Details
- Added Framework optimizations for BLAS and CBLAS interfaces for caxpyv_(cblas_caxpyv) and zaxpyv_ (cblas_zaxpyv).
- Added new axpyv AVX2 kernels for c and z data types for AMD EPYC family.
AMD-Internal: [CPUPL-1231]
Change-Id: I9bc0c21fef9da84533adcef76427977430b27ea7
Replaced gettid() syscall with omp_get_thread_num() to
create files for logging the data. This will ensure that there
is one to one mapping between threads created in BLIS and
ID's used to name trace and log files.
AMD-Internal: [CPUPL-1236]
Change-Id: I45b1721a7a9c855eeec43e7cbb5089f2a955ff72
Details:
- Kernel is called directly from API call to avoid framework overhead in case of complex float and complex double precisions.
- Added SIMD code for complex float and complex double and unrolled for loop 5 times to improve performance
AMD-Internal: [CPUPL-1057]
Change-Id: I3b9d202398cacc0168882c9d6da2b450c27466a0
Details:
- Introduced a new macro 'BLIS_CONFIG_EPYC' to enable blas and cblas
framework optimizations for zen family configurations.
- The macro needs to be defined in family.h files of respective arch
configs.
- Moved zen2-specific optimized kernels to zen folder, in order to be
accessible to all zen family architectures.
Change-Id: I8da2db6b7ab22ef350a01d86c214006e812eb06d
Fixed AOCL DTL logs printing incorrect alpha and beta values for single
precision. Added missing info like data-types, lower or upper traingular and
Side parameter in the case of TRSM.
Code cleanup and formatting the files test_cabs1.c, test_axpbyv.c and
test_gemm.c. In dumping trsm parameters replaced 'side' with bli_is_right(side).
Change-Id: Ic81503ae696956eb074ec208f7109d1a394183d7
Details:
- Added debug trace support for DGEMMT and DTRSM APIs.
- Added log support for gemmt, trsm APIs.
- Modified gemm dump_sizes function to dump transpose parameters.
AMD-Internal: [CPUPL-1210]
Change-Id: Ice1effe27ec349203ce5def030a6b85b204bd91e
Details:
- gemm_batch API computes a series of GEMM for groups of general
matrices.
- Each group contains matrices with same parameters.
- This API is part BLAS extension APIs.
AMD-Internal: [CPUPL-1184]
Change-Id: Ic23772830eb1d157da4db45158a039b0826419fd
Details:
- added cblas extension cblas_?cabs1.
- Functionality : res=|Re(z)|+|Im(z)|, z is a complex number, and res is a value containing the absolute value of a complex number z.
AMD-Internal: [CPUPL-1129]
Change-Id: I4a3c265c89527c8fd3060c5d2ed38b1953ce6343
Details:
- Corrected "#if" directive in line 89
- Commented out "#define print" to disable printing the vectors
Change-Id: I9ec3cbfb716540dd3e2264f5c3925d9e0c0c294a