Commit Graph

2213 Commits

Author SHA1 Message Date
Dipal M Zambare
3c3160fb38 Fixed issue with dgemm performance regression
Using block size KC=256 for all sizes on zen3 architecture

Change-Id: I9fb571e014d152e9156e6549b4d2c0407f6733bb
AMD-Internal: [CPUPL-1401]
AOCL-3.0-rc6
2021-03-15 11:32:17 +05:30
Meghana Vankadari
fa6e3739c0 Fixing "Conflicting types for zdotc" error for zen family configs
AMD-Internal : [CPUPL-1383]
Change-Id: I592522c8cb677407a4a9d808267ee62dff1153c0
2021-03-15 11:32:17 +05:30
Dipal M Zambare
f8ab9f6317 Enabled znver3 flag for zen3 architecture
znver3 flag will be enabled if compiler is AOCC Clang version 3.0
and configuration is zen3

Change-Id: Ie164f4d469bf3f8df31ccf8fed9f80dfc62efb39
AMD-Internal: [CPUPL-1353]
2021-03-15 11:32:17 +05:30
Dipal M Zambare
14e21603c9 Update amd64 bundle configuration
The configuration is updated to

   - Enable EPYC architecture optimizations
   - Macros to override block sizes.

AMD-Internal : [CPUPL-1350]

Change-Id: Id712f9abe6e81c9ece2baaab9d965b405e72977a
2021-03-15 11:32:17 +05:30
Meghana Vankadari
f8168956e4 Correcting zdotc definition error for configs other than zen family
Details:
- when BLIS_CONFIG_EPYC is not defined, zdotc is defined twice.
  - One definition is part of macro based code.
  - Other definition is implemented as part of framework optimizations.
- Modified the bla_dot.c file to choose macro based code for configs
  other than zen family.

AMD-Internal: [CPUPL-1348]
Change-Id: I9ef6a590a6199e173d38248c3fb72feddfb20922
2021-03-15 11:32:17 +05:30
bhaskarn
99e381b02f Fix for segmentation crash in dgemmsup kernels
Description:

[AMD Internal]: CPUPL-1336

Removed extra/un-nesseary loads in dgemmmsup kernels which are
accessing the memory beyond the boundaries and causing segmentation
issue.

Kernels:
bli_dgemmsup_rd_haswell_asm_1x4
bli_dgemmsup_rv_haswell_asm_1x6

Change-Id: Idaeed36ebd9f13550943394a37e372b8d015b2d3
2021-03-15 11:32:17 +05:30
Kumar, Phani
53a33f1afb Cmake script changes and blis.h changes for amd-staging-milan-3.0
AMD Internal : [CPUPL-1083]

Change-Id: Ia29a1f328ee32e2aec59a7fc70c04400d6ee6580
2021-03-15 11:32:17 +05:30
Dipal M Zambare
a7c81d1298 Updated test drivers for dotv, scalv and swapv.
Added traces in cblas layer for these API's.
These test drivers didn't have calls for complex data
types, the drivers are updated to support them.

AMD-Internal : [CPUPL-1315]

Change-Id: Ia52ecca68ea17314315d626b57c46a2f5973985b
2021-03-15 11:32:17 +05:30
Meghana Vankadari
5884732df1 Modified log routines for gemm, gemmt and trsm
Details:
- Modified log routines to accept inputs from blas layer instead of
  oapi level.

AMD-Internal: [CPUPL-1332]
Change-Id: If33c3585af92e617910ae8f7d442d1275119bbfc
2021-03-15 11:32:17 +05:30
Madan mohan Manokar
3a1b63259d Test driver fix for her and her2
Fixed test driver code for her, her2
Support added to handle complex and double complex data type in test driver.

Change-Id: If65939e99d8cf77e0fb70561166d84bf67d0321d
AMD-Internal: [CPUPL-1326]
2021-03-15 11:32:17 +05:30
managalv
e2179ec69f Added debug log and trace for gemv and dotv for blis and cblas interface
AMD Internal: [CPUPL-1314]

Change-Id: I2708fd9c73419c968c8e02ff11545645dc639052
2021-03-15 11:32:17 +05:30
Kiran Varaganti
18fcfaef18 Fixed wrong dimensions check in bench/bench_gemm.c application
Verifying the valid values of m, n, k, lda, ldb and ldc is removed.
Since the bench app is run on logs collected from AOCL traces.
The correct way of checking should consider transpose parameter and storage order.

Change-Id: If0fbf733c2650c6f328661293eb99d062685d638
2021-03-15 11:32:17 +05:30
Madan mohan Manokar
a98922f236 Test driver fix for her, her2, herk and her2k
Fixed test driver code for her, her2, herk and her2k function.
Above functions supports only complex and double complex data type, test code is updated accordingly.

Change-Id: Iee7b79abda4a2959a265c420d23879bf47f2c38d
AMD-Internal: [CPUPL-1313]
2021-03-15 11:32:17 +05:30
satish kumar nuggu
3108dfc670 Added Blas interface for ?imatcopy, ?omatcopy, ?omatadd, ?omatcopy2
AMD-Internal: [CPUPL-1116]
Original review was in this commit http://gerrit-git.amd.com/c/cpulibraries/er/blis/+/428165.
Added new commit for transpose API's

Change-Id: I322389cc0be0aaccf82d1d0bb4476beea8694cd8
2021-03-15 11:32:17 +05:30
Dipal M Zambare
f9d06c74b5 Added dynamic block size selection logic for DGEMM.
Block sizes (MC, KC, NC) for DGEMM are determined at runtime
based on following parameters

    - Single or multithreaded build
    - Processor Architecture (currently support only zen3)
    - Number of threads requested while running the library

Change-Id: Ia793484b77adb87486e630d0d3b4c7856ae52094
AMD-Internal: [CPUPL-660, CPUPL-661]
2021-03-15 11:32:16 +05:30
Dipal M Zambare
f617a49e4c Corrected thread id generation in DTL for BLIS.
Added blis.h in aoclos.c in order to check if BLIS was
build with openmp support.

AOCL-Internal: [CPUPL-1238]

Change-Id: I366da030266b9d7f2ad09dc722847a7d86b85933
2021-03-15 11:32:16 +05:30
managalv
928c649458 Disable 3m1 method for complex GEMM
Details:
Native method is being enabled for complex gemm
Need to run performance for large dataset to enable induced method

MD-Internal: [CPUPL-1300]

Change-Id: I5444dd31e8b8e73da73f789da8b64276e8e40de8
2021-03-15 11:32:16 +05:30
bhaskarn
0aec941586 Added bench application for trsm
Description:
     Added bench_trsm.c to read inputs from AOCL DTL logs to benchmark
     Added sample input file

Change-Id: I6806e42244bf775cbed457553ca07fb0222ef597
2021-03-15 11:32:16 +05:30
Madan mohan Manokar
fcaff122f2 Fixing the logs
fixing data type issue in logs.

Change-Id: I3b9fb2921fd9db57a734c7a2866b53f1b51adfdb
AMD-Internal: [CPUPL-1249]
2021-03-15 11:32:16 +05:30
mkurumel
cc0658b145 Add AOCL DTL logging.
Added logging for syr,syr2,syrk,syr2k,trmm,trmv,trsv.

	AMD-Internal: [CPUPL-1256]

Change-Id: I628ef5d48796cfc68ec68886b8c1b0555261b3d1
2021-03-15 11:32:16 +05:30
Madan mohan Manokar
b8c93cfa8c Log fix
Added function defintion of her2k

Change-Id: Ia7ccd72772cdcafdcf5cb8a21c6746b13c70b158
AMD-Internal: [CPUPL-1249]
2021-03-15 11:32:16 +05:30
managalv
134e4e278a Optimised AXPYF routine for complex float and complex double
Details:
    - Added SIMD code
    - Processing 5 rows at a time in SIMD loop to improve performance

AMD-Internal: [CPUPL-1054]

Change-Id: I2ac93f25895dccfc42e14be0689e6d4e655d6a0a
2021-03-15 11:32:16 +05:30
Kiran Varaganti
89d3cab0f3 Benchmark using AOCL Logs as input
Added benchmark application for gemm - input is a log file generated from AOCL
DTL from BLIS.

Change-Id: I2ac7a3c48d5a37c5b24ec0f0cff7e7886dad0b99
2021-03-15 11:32:16 +05:30
Dipal M Zambare
da801c6055 Re-enable support for Intel 19+ compiler.
Note that there is know issue with Intel 19+ as explained
in https://github.com/flame/blis/issues/371.

AMD version needs this support as some user applications
need ICC support.

AMD-Internal: [CPUPL-1223]

Change-Id: I86ddee068ae18bd940a5952d60960228d8100e97
2021-03-15 11:32:16 +05:30
Madan mohan Manokar
7427f23763 Logging
Added logging for her2k, her, herk, nrm2, symm, symv

Change-Id: Ib3af83b6f8aaafb69fb5d78e964c45504f74f79c
AMD-Internal: [CPUPL-1249]
2021-03-15 11:32:16 +05:30
managalv
f7eb5c79ca Added Function trace and Input logging for dotv and gemv
Change-Id: I992bd80b2322d6c387f609ecd70c1109c13f6254
AMD-Internal: [CPUPL-1274]
2021-03-15 11:32:16 +05:30
Nageshwar Singh
67a7f5b0fb Added debug log and trace support for hemm
AMD-Internal: [CPUPL-1253]

Change-Id: I95e9a864800a09f24c94926936ada8ec8728f1a5
2021-03-15 11:32:16 +05:30
Nageshwar Singh
6a28e2e24e Added debug log support for axpy, axpyb, amax, asum, hemv, her2
AMD-Internal: [CPUPL-1253]

Change-Id: I90cabed86a3796385656b34d368588500e9df71c
2021-03-15 11:32:16 +05:30
Meghana Vankadari
b552ad6231 Added debug trace and log support for copy and ger routines
Change-Id: Id7fb64c0a626b2f8f53e89ee7df4391693eb4f4c
2021-03-15 11:32:16 +05:30
Kiran Varaganti
a13ee3a818 Fix Bug in DTL
When library is built as single thread and trace is enabled, the test
applications in test folder fail to compile. In the file aoclos.c the function
AOCL_gettid() uses "omp_get_thread_num() to get thread_id, which is only
enabled when OpenMP based parallel BLIS library is generated. To fix this in
single thread case we now return zero for thread id, openmp function is used
only when BLIS_ENABLE_OPENMP macro is defined. However this is not a complete
fix. If library is built with pthread, AOCL_gettid() always return 0, which is
not the intended behaviour.

Change-Id: I5b79ed57d27d0022d3dcab0e2a3a557c8e4ff8ee
2021-03-15 11:32:16 +05:30
Madan mohan Manokar
dc04a622ff Trace and logging
Added logging for scal, swap.

Change-Id: Ie7ebf77eb8e0a961fe8cf9f42d99600e6daff8ff
AMD-Internal: [CPUPL-1249]
2021-03-15 11:32:16 +05:30
mkurumel
f77c555d13 Trace and logging.
Added tracing for syr,syr2,syrk,syr2k,trmm,trmv,trsv.

	AMD-Internal: [CPUPL-1255]

Change-Id: Id04ff95d7c8fb5854440f79a14e47a5f40096ded
2021-03-15 11:32:16 +05:30
Madan mohan Manokar
fefc6c0cf3 Trace and logging
Added function tracing for her2k, her, herk, nrm2, scal, swap, symm, symv.

Change-Id: I93a97b7000c632f550eab1317b3cafad8c539937
AMD-Internal: [CPUPL-1249]
2021-03-15 11:32:16 +05:30
Nageshwar Singh
fa18c0198e Trace and logging
Details:
   - Added function tracing for bla_asum, bla_axpby, bla_hemm, bla_hemv, bla_her2.

AMD-Internal: [CPUPL-1253]

Change-Id: I08b4cab46d167aceb8123c7f8b19e21a263fe2b8
2021-03-15 11:32:16 +05:30
bhaskarn
711cc0ef35 Added BLAS Extension API's: CBLAS_?GEMM3M
AMD-Internal: [CPUPL-1151]

Induced 3M1 method is enabled for CGEMM3M and ZGEMM3M

Change-Id: I8276c5018340d0a45694551f48aad5b735819eae
2021-03-15 11:32:16 +05:30
Nageshwar Singh
4b56cc94da Added BLIS, BLAS, and CBLAS interface for cblas?amin
Details:
      - Amin api returns index of minimum absolute value in a vector.
      - Added amin reference blis kernel.
      - Added blas and cblas interface for amin.

AMD-Internal: [CPUPL-1155]

Change-Id: I89c1e37e86950a4582bba70a5d8fc70ac915bd3c
2021-03-15 11:32:16 +05:30
Kiran Varaganti
602b99a41d Trace and Logging
In DTL_Trace() function - we call bli_init_auto() inorder to enable trace for
BLAS/CBLAS APIs
Added logging of input parameters for sgemv_ and dgemv_. Added function
tracing for amax_, axpy_ gemv_ and dotv_ BLAS functions.

Change-Id: I4483c1e918c3c78946f7377b4b69eba6af4e925e
2021-03-15 11:32:15 +05:30
Nageshwar Singh
c3cbabf25b Development of AVX2 axpyv kernels for c and z datatypes.
Details
    - Added Framework optimizations for BLAS and CBLAS interfaces for caxpyv_(cblas_caxpyv) and zaxpyv_ (cblas_zaxpyv).
    - Added new axpyv AVX2 kernels for c and z data types for AMD EPYC family.

AMD-Internal: [CPUPL-1231]

Change-Id: I9bc0c21fef9da84533adcef76427977430b27ea7
2021-03-15 11:32:15 +05:30
Dipal M Zambare
87fbb867de AOCLDTL: Corrected mapping between BLIS threads and trace files
Replaced gettid() syscall with omp_get_thread_num() to
create files for logging the data. This will ensure that there
is one to one mapping between threads created in BLIS and
ID's used to name trace and log files.

AMD-Internal: [CPUPL-1236]

Change-Id: I45b1721a7a9c855eeec43e7cbb5089f2a955ff72
2021-03-15 11:32:15 +05:30
Nageshwar Singh
b76e6fc5d6 cblas_?cabs1 test cblas header file bug
Details:
    - Added BLIS_ENABLE_CBLAS around cblas header file.

AMD-Internal: [CPUPL-1129]

Change-Id: I3baacd26aa96c8eeb753d95210817ffe9b2a3f85
2021-03-15 11:32:15 +05:30
managalv
5716dd8cf9 Optimised dotv kernel by SIMD approach and by removing framework overhead
Details:
    - Kernel is called directly from API call to avoid framework overhead in case of complex float and complex double precisions.
    - Added SIMD code for complex float and complex double and unrolled for loop 5 times to improve performance

AMD-Internal: [CPUPL-1057]

Change-Id: I3b9d202398cacc0168882c9d6da2b450c27466a0
2021-03-15 11:32:15 +05:30
Meghana Vankadari
1c6cf5c891 Added decision logic for zgemmt
AMD-Internal: [CPUPL-1032]
Change-Id: I8ba1c66b06cd91a864b16a249b263b3694ac1d5e
2021-03-15 11:32:15 +05:30
Meghana Vankadari
9b3bb86ebd Added CBLAS interface and test file for gemm_batch API
AMD-Internal: [CPUPL-1184]
Change-Id: Icc5c41429b0d92f1a66a955769cc0518ca4706ee
2021-03-15 11:32:15 +05:30
Meghana Vankadari
8d1c8ef35a Enabling framework optimizations for zen family architectures.
Details:
- Introduced a new macro 'BLIS_CONFIG_EPYC' to enable blas and cblas
  framework optimizations for zen family configurations.
- The macro needs to be defined in family.h files of respective arch
  configs.
- Moved zen2-specific optimized kernels to zen folder, in order to be
  accessible to all zen family architectures.

Change-Id: I8da2db6b7ab22ef350a01d86c214006e812eb06d
2021-03-15 11:32:15 +05:30
Meghana Vankadari
a29536f21c Added decision logic for DGEMMT for zen2 configuration
AMD-Internal: [CPUPL-1044]
Change-Id: Ifc4b82dcfce5aa6770928010d430b0832c07cb41
2021-03-15 11:32:15 +05:30
Kiran Varaganti
4e7ea09e44 Fixed Logs & code cleanup
Fixed AOCL DTL logs printing incorrect alpha and beta values for single
precision. Added missing info like data-types, lower or upper traingular and
Side parameter in the case of TRSM.
Code cleanup and formatting the files test_cabs1.c, test_axpbyv.c and
test_gemm.c. In dumping trsm parameters replaced 'side' with bli_is_right(side).

Change-Id: Ic81503ae696956eb074ec208f7109d1a394183d7
2021-03-15 11:32:15 +05:30
Meghana Vankadari
a2ab035c33 Added debug trace and log support for gemmt and TRSM APIs
Details:
- Added debug trace support for DGEMMT and DTRSM APIs.
- Added log support for gemmt, trsm APIs.
- Modified gemm dump_sizes function to dump transpose parameters.

AMD-Internal: [CPUPL-1210]
Change-Id: Ice1effe27ec349203ce5def030a6b85b204bd91e
2021-03-15 11:32:15 +05:30
Meghana Vankadari
9f968bf78d Added BLAS interface for gemm_batch API
Details:
- gemm_batch API computes a series of GEMM for groups of general
  matrices.
- Each group contains matrices with same parameters.
- This API is part BLAS extension APIs.

AMD-Internal: [CPUPL-1184]
Change-Id: Ic23772830eb1d157da4db45158a039b0826419fd
2021-03-15 11:32:15 +05:30
Nageshwar Singh
45c0dc1e5e BLIS: CBLAS Extensions. cblas_?cabs1 : Absolute value of a complex number cabs1
Details:
   - added cblas extension cblas_?cabs1.
   - Functionality : res=|Re(z)|+|Im(z)|, z is a complex number, and res is a      value containing the absolute value of a complex number z.

AMD-Internal: [CPUPL-1129]

Change-Id: I4a3c265c89527c8fd3060c5d2ed38b1953ce6343
2021-03-15 11:32:15 +05:30
Meghana Vankadari
ac217b642d Added few changes in test_axpbyv.c file
Details:
- Corrected "#if" directive in line 89
- Commented out "#define print" to disable printing the vectors

Change-Id: I9ec3cbfb716540dd3e2264f5c3925d9e0c0c294a
2021-03-15 11:32:15 +05:30