Commit Graph

2198 Commits

Author SHA1 Message Date
mkurumel
39f7a4eecf Add AOCL DTL logging.
Added logging for syr,syr2,syrk,syr2k,trmm,trmv,trsv.

	AMD-Internal: [CPUPL-1256]

Change-Id: I628ef5d48796cfc68ec68886b8c1b0555261b3d1
2020-11-09 14:40:43 +05:30
Madan mohan Manokar
bded7f9392 Log fix
Added function defintion of her2k

Change-Id: Ia7ccd72772cdcafdcf5cb8a21c6746b13c70b158
AMD-Internal: [CPUPL-1249]
2020-11-09 10:33:08 +05:30
Mangala V
1578e8b874 Merge "Optimised AXPYF routine for complex float and complex double" into amd-staging-milan-3.0 2020-11-06 02:05:02 -05:00
managalv
aae48c2221 Optimised AXPYF routine for complex float and complex double
Details:
    - Added SIMD code
    - Processing 5 rows at a time in SIMD loop to improve performance

AMD-Internal: [CPUPL-1054]

Change-Id: I2ac93f25895dccfc42e14be0689e6d4e655d6a0a
2020-11-06 18:42:13 +05:30
Dipal Madhukar Zambare
1fb9e1d029 Merge "Re-enable support for Intel 19+ compiler." into amd-staging-milan-3.0 2020-11-06 02:00:34 -05:00
Kiran Varaganti
a15e531374 Merge "Benchmark using AOCL Logs as input" into amd-staging-milan-3.0 2020-11-06 01:51:38 -05:00
Kiran Varaganti
60642d98a3 Benchmark using AOCL Logs as input
Added benchmark application for gemm - input is a log file generated from AOCL
DTL from BLIS.

Change-Id: I2ac7a3c48d5a37c5b24ec0f0cff7e7886dad0b99
2020-11-06 14:31:53 +05:30
Dipal M Zambare
4347d2d823 Re-enable support for Intel 19+ compiler.
Note that there is know issue with Intel 19+ as explained
in https://github.com/flame/blis/issues/371.

AMD version needs this support as some user applications
need ICC support.

AMD-Internal: [CPUPL-1223]

Change-Id: I86ddee068ae18bd940a5952d60960228d8100e97
2020-11-06 11:11:46 +05:30
Madan mohan Manokar
ec35717174 Logging
Added logging for her2k, her, herk, nrm2, symm, symv

Change-Id: Ib3af83b6f8aaafb69fb5d78e964c45504f74f79c
AMD-Internal: [CPUPL-1249]
2020-11-06 10:16:31 +05:30
managalv
68b4ff976b Added Function trace and Input logging for dotv and gemv
Change-Id: I992bd80b2322d6c387f609ecd70c1109c13f6254
AMD-Internal: [CPUPL-1274]
2020-11-06 03:53:43 +05:30
Nageshwar Singh
c40bb45bdf Added debug log and trace support for hemm
AMD-Internal: [CPUPL-1253]

Change-Id: I95e9a864800a09f24c94926936ada8ec8728f1a5
2020-11-05 18:31:41 +05:30
Nageshwar Singh
963277f8f9 Added debug log support for axpy, axpyb, amax, asum, hemv, her2
AMD-Internal: [CPUPL-1253]

Change-Id: I90cabed86a3796385656b34d368588500e9df71c
2020-11-03 20:44:12 +05:30
Meghana Vankadari
0775f09b41 Added debug trace and log support for copy and ger routines
Change-Id: Id7fb64c0a626b2f8f53e89ee7df4391693eb4f4c
2020-11-02 22:56:58 -05:00
Kiran Varaganti
65daaab6ac Fix Bug in DTL
When library is built as single thread and trace is enabled, the test
applications in test folder fail to compile. In the file aoclos.c the function
AOCL_gettid() uses "omp_get_thread_num() to get thread_id, which is only
enabled when OpenMP based parallel BLIS library is generated. To fix this in
single thread case we now return zero for thread id, openmp function is used
only when BLIS_ENABLE_OPENMP macro is defined. However this is not a complete
fix. If library is built with pthread, AOCL_gettid() always return 0, which is
not the intended behaviour.

Change-Id: I5b79ed57d27d0022d3dcab0e2a3a557c8e4ff8ee
2020-11-02 12:05:09 +05:30
Madan mohan Manokar
cd9a751aa0 Trace and logging
Added logging for scal, swap.

Change-Id: Ie7ebf77eb8e0a961fe8cf9f42d99600e6daff8ff
AMD-Internal: [CPUPL-1249]
2020-10-30 19:12:19 +05:30
Madan Mohan Manokar
86f3d1a412 Merge "Trace and logging" into amd-staging-milan-3.0 2020-10-30 08:35:02 -04:00
Manideep Kurumella
1c29554b9e Merge "Trace and logging." into amd-staging-milan-3.0 2020-10-30 08:10:12 -04:00
Nageshwar Singh
33b5867db3 Merge "Trace and logging" into amd-staging-milan-3.0 2020-10-30 06:54:56 -04:00
mkurumel
6ab8f607d8 Trace and logging.
Added tracing for syr,syr2,syrk,syr2k,trmm,trmv,trsv.

	AMD-Internal: [CPUPL-1255]

Change-Id: Id04ff95d7c8fb5854440f79a14e47a5f40096ded
2020-10-30 11:54:39 +05:30
Madan mohan Manokar
2803e9b761 Trace and logging
Added function tracing for her2k, her, herk, nrm2, scal, swap, symm, symv.

Change-Id: I93a97b7000c632f550eab1317b3cafad8c539937
AMD-Internal: [CPUPL-1249]
2020-10-30 11:46:34 +05:30
Nageshwar Singh
5a83365a6c Trace and logging
Details:
   - Added function tracing for bla_asum, bla_axpby, bla_hemm, bla_hemv, bla_her2.

AMD-Internal: [CPUPL-1253]

Change-Id: I08b4cab46d167aceb8123c7f8b19e21a263fe2b8
2020-10-30 10:46:09 +05:30
bhaskarn
376ac5856b Added BLAS Extension API's: CBLAS_?GEMM3M
AMD-Internal: [CPUPL-1151]

Induced 3M1 method is enabled for CGEMM3M and ZGEMM3M

Change-Id: I8276c5018340d0a45694551f48aad5b735819eae
2020-10-29 17:06:30 +05:30
Nageshwar Singh
dd5b38d221 Added BLIS, BLAS, and CBLAS interface for cblas?amin
Details:
      - Amin api returns index of minimum absolute value in a vector.
      - Added amin reference blis kernel.
      - Added blas and cblas interface for amin.

AMD-Internal: [CPUPL-1155]

Change-Id: I89c1e37e86950a4582bba70a5d8fc70ac915bd3c
2020-10-28 17:50:27 +05:30
Kiran Varaganti
5534c56559 Trace and Logging
In DTL_Trace() function - we call bli_init_auto() inorder to enable trace for
BLAS/CBLAS APIs
Added logging of input parameters for sgemv_ and dgemv_. Added function
tracing for amax_, axpy_ gemv_ and dotv_ BLAS functions.

Change-Id: I4483c1e918c3c78946f7377b4b69eba6af4e925e
2020-10-28 13:43:47 +05:30
Nageshwar Singh
dbd7b28373 Development of AVX2 axpyv kernels for c and z datatypes.
Details
    - Added Framework optimizations for BLAS and CBLAS interfaces for caxpyv_(cblas_caxpyv) and zaxpyv_ (cblas_zaxpyv).
    - Added new axpyv AVX2 kernels for c and z data types for AMD EPYC family.

AMD-Internal: [CPUPL-1231]

Change-Id: I9bc0c21fef9da84533adcef76427977430b27ea7
2020-10-23 09:33:35 +05:30
Dipal M Zambare
e0e0760ed6 AOCLDTL: Corrected mapping between BLIS threads and trace files
Replaced gettid() syscall with omp_get_thread_num() to
create files for logging the data. This will ensure that there
is one to one mapping between threads created in BLIS and
ID's used to name trace and log files.

AMD-Internal: [CPUPL-1236]

Change-Id: I45b1721a7a9c855eeec43e7cbb5089f2a955ff72
2020-10-22 15:03:39 +05:30
Nageshwar Singh
b245ea9c65 cblas_?cabs1 test cblas header file bug
Details:
    - Added BLIS_ENABLE_CBLAS around cblas header file.

AMD-Internal: [CPUPL-1129]

Change-Id: I3baacd26aa96c8eeb753d95210817ffe9b2a3f85
2020-10-13 01:43:55 -04:00
managalv
90f30e4c37 Optimised dotv kernel by SIMD approach and by removing framework overhead
Details:
    - Kernel is called directly from API call to avoid framework overhead in case of complex float and complex double precisions.
    - Added SIMD code for complex float and complex double and unrolled for loop 5 times to improve performance

AMD-Internal: [CPUPL-1057]

Change-Id: I3b9d202398cacc0168882c9d6da2b450c27466a0
2020-10-13 18:59:31 +05:30
Meghana Vankadari
029ed033f1 Added decision logic for zgemmt
AMD-Internal: [CPUPL-1032]
Change-Id: I8ba1c66b06cd91a864b16a249b263b3694ac1d5e
2020-10-09 12:11:53 +05:30
Meghana Vankadari
016885348c Added CBLAS interface and test file for gemm_batch API
AMD-Internal: [CPUPL-1184]
Change-Id: Icc5c41429b0d92f1a66a955769cc0518ca4706ee
2020-10-07 06:55:08 -04:00
Meghana Vankadari
47744663d9 Enabling framework optimizations for zen family architectures.
Details:
- Introduced a new macro 'BLIS_CONFIG_EPYC' to enable blas and cblas
  framework optimizations for zen family configurations.
- The macro needs to be defined in family.h files of respective arch
  configs.
- Moved zen2-specific optimized kernels to zen folder, in order to be
  accessible to all zen family architectures.

Change-Id: I8da2db6b7ab22ef350a01d86c214006e812eb06d
2020-10-07 13:10:50 +05:30
Meghana Vankadari
74c9d3f36e Added decision logic for DGEMMT for zen2 configuration
AMD-Internal: [CPUPL-1044]
Change-Id: Ifc4b82dcfce5aa6770928010d430b0832c07cb41
2020-10-05 12:50:44 +05:30
Kiran Varaganti
aa56c36b82 Fixed Logs & code cleanup
Fixed AOCL DTL logs printing incorrect alpha and beta values for single
precision. Added missing info like data-types, lower or upper traingular and
Side parameter in the case of TRSM.
Code cleanup and formatting the files test_cabs1.c, test_axpbyv.c and
test_gemm.c. In dumping trsm parameters replaced 'side' with bli_is_right(side).

Change-Id: Ic81503ae696956eb074ec208f7109d1a394183d7
2020-10-04 22:17:31 +05:30
Meghana Vankadari
9a330f1754 Added debug trace and log support for gemmt and TRSM APIs
Details:
- Added debug trace support for DGEMMT and DTRSM APIs.
- Added log support for gemmt, trsm APIs.
- Modified gemm dump_sizes function to dump transpose parameters.

AMD-Internal: [CPUPL-1210]
Change-Id: Ice1effe27ec349203ce5def030a6b85b204bd91e
2020-10-02 12:31:47 +05:30
Meghana Vankadari
b5d0c81178 Added BLAS interface for gemm_batch API
Details:
- gemm_batch API computes a series of GEMM for groups of general
  matrices.
- Each group contains matrices with same parameters.
- This API is part BLAS extension APIs.

AMD-Internal: [CPUPL-1184]
Change-Id: Ic23772830eb1d157da4db45158a039b0826419fd
2020-09-24 23:52:31 -04:00
Nageshwar Singh
5243da5cec BLIS: CBLAS Extensions. cblas_?cabs1 : Absolute value of a complex number cabs1
Details:
   - added cblas extension cblas_?cabs1.
   - Functionality : res=|Re(z)|+|Im(z)|, z is a complex number, and res is a      value containing the absolute value of a complex number z.

AMD-Internal: [CPUPL-1129]

Change-Id: I4a3c265c89527c8fd3060c5d2ed38b1953ce6343
2020-09-23 19:01:02 +05:30
Meghana Vankadari
80828f6fda Added few changes in test_axpbyv.c file
Details:
- Corrected "#if" directive in line 89
- Commented out "#define print" to disable printing the vectors

Change-Id: I9ec3cbfb716540dd3e2264f5c3925d9e0c0c294a
2020-09-16 23:48:01 -04:00
Kiran Varaganti
5a8bd9f41c Enable BLAS interface in BLIS
Modified Makefile in test folder to enable calling BLAS interfaces for BLIS as
well. This is possible by replacing -DBLIS with -DBLAS=\"aocl\" in the
makefile. Also added linking to multi-threaded MKL library.

Change-Id: Iccf2ec99b48bb35da985b69218bc680f678ff7c9
2020-09-16 23:26:44 +05:30
Meghana Vankadari
6c9bf36424 Added BLAS and CBLAS interfaces for axpby API
Details:
- The axpby routines perform a vector-vector operation defined as
  y = a*x + b*y where a, b are scalars and x, y are vectors.
- This API is part of BLAS-like extension APIs

Change-Id: I17a53b03bba97de7ae1995a9f086084bd241bcdc
AMD-Internal: [CPUPL-1118]
2020-09-16 09:49:31 +05:30
Meghana Vankadari
43d90e3110 Handling beta=0 case seperately for gemv inside bli_dgemv_zen_ref_c function
Details:
- For GEMV whenever beta = 0, we should not scale vector 'y' with beta,
  instead overwrite the 'y' vector with zeroes before carrying out the
operation.

Change-Id: I159afba6c6ac3b72b74718fab7a4f4ec293012c5
2020-09-08 07:50:34 -04:00
dzambare
95bbdc12ab Added -fomit-frame-pointer in kernel options
Some of the SUP kernels now use rbp register.
This register was also used by compiler to support
automatic function call tracing, which was creating
the conflict. Automatic call tracing feature is
removed for now. If needed it can be enabled for
non kernel code.

Change-Id: Ib7ad00875f501ee2ad552cbb2ecdc245002d63b7
AMD-Internal: [CPUPL-1135]
2020-08-31 09:53:16 +05:30
Kiran Varaganti
339a9314c4 Fixed merges in testcpp/test.sh
Change-Id: I3c388448a342153f2ee9c5a6a7ae102ebd1c0ea0
2020-08-14 10:52:49 +05:30
Kiran Varaganti
fb0b4b57c1 Fixed missing changes
Corrections in bli_gemm_front.c, taken the corrections from both public repo and 2.2.1 branch
[CPUPL-1067]

Change-Id: I4887ece6aa20bdfb87d97e7acebbe04cb9feea02
2020-08-13 00:29:37 +05:30
bhaskarn
d186cfdf2e CPUPL-1074:
- Bug fix in sgemmsup 1x16 Kernel for Beta Zero and with C col storage
       rcx register incrementing was missing because of this 4 values
       in output are overwritten

Change-Id: Ia3028040dce3e615f1db5a331498d86faadcf916
2020-08-11 01:26:26 -04:00
Dipal M Zambare
7bbcae5a18 Fixed build issue in cpp testsuite.
This issue was caused by incorrect merging of cpp and testcpp files.

Change-Id: Idc40fbdaa55b6052a6a061d2d3e5cfae76b99916
AMD-Internal: [CPUPL-1067]
2020-08-10 12:17:09 +05:30
dzambare
3177db4888 Updated version number.
Change-Id: Iba3659b04f2d85ec7dc008ceb84da73c7c66530a
2020-08-06 15:00:46 +05:30
dzambare
f30c3c7766 mend
Change-Id: Iba3659b04f2d85ec7dc008ceb84da73c7c66530a
2020-08-06 14:29:24 +05:30
dzambare
267a959af1 Rebased amd-staging-milan-3.0 branch on master
-- Rebased on top of master commit # 6e522e5823
  -- Updated merged code to remove duplicated code added by auto-merging
  -- Updated merged code to rename bool_t type
  -- Updated merged code to rename bli_thread_obarrier
  -- Updated merged code to rename bli_thread_obroadcast

Change-Id: I39879f1ef3b42ecbe5808af3b559d88c36dbbf6c
AMD-Internal: [CPUPL-1067]
2020-08-06 10:09:29 +05:30
phakumar
c7a914411f BLIS library porting on to Windows:
GEMMT changes porting on to Windows

AMD Internal : [CPUPL-1061]

Change-Id: I587d1789cd29ea18b04f8ab43e5742b4d902067a
2020-08-06 10:09:29 +05:30
Mangala V
5b8c2bc9e2 Revert "CPUPL-1059: Failures seen in DGEMM SUP for specific size is fixed"
This reverts commit 725bf5aceb.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I7dd6b84731f091c8b39080ed9321a708fa5f11d8
2020-08-06 10:09:29 +05:30