Commit Graph

2227 Commits

Author SHA1 Message Date
Madan mohan Manokar
3ab9104dae Handling zgemm real(+/-1) alpha and beta
1.Improved performance when zgemm's alpha and beta are real and equal to +/-1.
2.change done in bli_zgemmsup_rv_zen_asm_3x4n.
3.change done in bli_zgemmsup_rv_zen_asm_3x4m.
4.change done in bli_zgemm_haswell_asm_3x4.

Change-Id: Ic14d8507b264c24a8748febf6bc73eb60e476430
AMD-Internal: [CPUPL-1352]
2021-02-10 02:58:58 -05:00
managalv
1ff4981203 Modified blas interface of TRSM to call TRSV whenever m=1 or n=1.
Case1: Call TRSV when matrix C & B are vector & A is matrix,
         When n = 1 for left side and when m = 1 for right side
  Case2: Divide B/A when matrix C & B are vector & A is scalar(Diagonal element),
         When m = 1 for left side and when n = 1 for right side
  For right side, Transpose complete operation, Change upper to lower and
                  vice versa when A is being transposed

Change-Id: Ie87e4a263c287ba554832ccc56b629f982e3ac4c
2021-02-08 19:02:25 +05:30
Madan mohan Manokar
f1ea1f1d34 Adpative zgemm
1. 3m1 choosen for (m<=128) &  (68>n<=128) & (k<=128)
2. Default blis3.1 path for rest of the sizes.

Change-Id: I1e50dece013e72a67f1162faef5cbeb9bfbbc23a
AMD-Internal: [CPUPL-1352]
2021-02-03 12:43:57 +05:30
Meghana Vankadari
2e7cf8d82f Added 16x4 AXPYF kernel for zen2 config
Details:
- Added a new AXPYF kernel with fuse_factor = 4 and iter_unroll = 4.
- Modified blas interface of GEMM to call GEMV whenever m=1 or n=1.

Change-Id: I3f5acd37b009f53cf63f462cec79fd3e73676dbc
2021-02-02 21:22:44 +05:30
dzambare
48f2366b6f Updated BLIS version string to "AOCL BLIS X.x" format
AMD-Internal : [CPUPL-1394]

Change-Id: Ifebcb14d9eb064d231b831f5a1e151853ad5a009
2021-01-07 12:38:32 +05:30
Nagendra Prasad M
566f586547 Merge "Blis: DOTC Additional argument for Complex types when using FLANG" into amd-staging-milan-3.1 2020-12-21 06:03:11 -05:00
nprasadm
10ac4e2aba Blis: DOTC Additional argument for Complex types when using FLANG
Merged the changes done in UT Austin BLIS repo for DOTC Additional
argument.
Other modifications related to test application included.

Verifed the above code changes through scalapack test applications 'xztrd' , 'xctrd'

Change-Id: I7e16f3953db71890f9e8fbb0f7b363eaad899f62
Signed-off-by: Nagendra <Nagendra.PrasadM@amd.com>
AMD-Internal: [CPUPL-1323]
2020-12-16 14:03:10 +05:30
Kiran Varaganti
fc80892bb2 Improve sup GEMM performance (CCC - row prefer kernel)
Column-storage (CCC) case m is large and n & k are relatively small - row preferred kernels,
in this case var1n sup kernels are called. But actually block-panel var2m works better here.
After induced transposition the n becomes m which is large and m becomes n which is smaller.
The micropanels of induced B are larger than micropanels of induced A, therefore var2m is better option than var1n.
[CPUPL-1376]

Change-Id: I9214140d340ea4ac3edfefc31c465c926ba93326
2020-12-10 19:16:44 +05:30
Dipal M Zambare
66fd5e547a Update AMD copyright notice for current year.
Change-Id: I2ffd3d3306499922be15638d37c4d1e806acd36c
AMD-Internal: [CPUPL-1367]
2020-12-10 13:44:29 +05:30
Dipal M Zambare
38a8008cd8 Enabled znver3 flag for zen3 architecture
znver3 flag will be enabled if compiler is AOCC Clang version 3.0
and configuration is zen3

Change-Id: Ie164f4d469bf3f8df31ccf8fed9f80dfc62efb39
AMD-Internal: [CPUPL-1353]
2020-12-04 12:28:22 +05:30
Meghana Vankadari
e083caf01d Merge "Correcting zdotc definition error for configs other than zen family" into amd-staging-milan-3.0 2020-12-01 06:20:10 -05:00
Dipal M Zambare
c2f63fcc54 Update amd64 bundle configuration
The configuration is updated to

   - Enable EPYC architecture optimizations
   - Macros to override block sizes.

AMD-Internal : [CPUPL-1350]

Change-Id: Id712f9abe6e81c9ece2baaab9d965b405e72977a
2020-12-01 14:37:13 +05:30
Meghana Vankadari
11b4cd8fc5 Correcting zdotc definition error for configs other than zen family
Details:
- when BLIS_CONFIG_EPYC is not defined, zdotc is defined twice.
  - One definition is part of macro based code.
  - Other definition is implemented as part of framework optimizations.
- Modified the bla_dot.c file to choose macro based code for configs
  other than zen family.

AMD-Internal: [CPUPL-1348]
Change-Id: I9ef6a590a6199e173d38248c3fb72feddfb20922
2020-12-01 13:33:59 +05:30
bhaskarn
91909c1562 Fix for segmentation crash in dgemmsup kernels
Description:

[AMD Internal]: CPUPL-1336

Removed extra/un-nesseary loads in dgemmmsup kernels which are
accessing the memory beyond the boundaries and causing segmentation
issue.

Kernels:
bli_dgemmsup_rd_haswell_asm_1x4
bli_dgemmsup_rv_haswell_asm_1x6

Change-Id: Idaeed36ebd9f13550943394a37e372b8d015b2d3
2020-11-24 10:15:57 -05:00
Kumar, Phani
477fc41fff Cmake script changes and blis.h changes for amd-staging-milan-3.0
AMD Internal : [CPUPL-1083]

Change-Id: Ia29a1f328ee32e2aec59a7fc70c04400d6ee6580
2020-11-24 06:12:25 -05:00
Dipal M Zambare
0a3d94c9a2 Updated test drivers for dotv, scalv and swapv.
Added traces in cblas layer for these API's.
These test drivers didn't have calls for complex data
types, the drivers are updated to support them.

AMD-Internal : [CPUPL-1315]

Change-Id: Ia52ecca68ea17314315d626b57c46a2f5973985b
2020-11-24 10:26:32 +05:30
Meghana Vankadari
97753d8e6b Modified log routines for gemm, gemmt and trsm
Details:
- Modified log routines to accept inputs from blas layer instead of
  oapi level.

AMD-Internal: [CPUPL-1332]
Change-Id: If33c3585af92e617910ae8f7d442d1275119bbfc
2020-11-23 04:53:15 -05:00
Madan mohan Manokar
1d8fab0996 Test driver fix for her and her2
Fixed test driver code for her, her2
Support added to handle complex and double complex data type in test driver.

Change-Id: If65939e99d8cf77e0fb70561166d84bf67d0321d
AMD-Internal: [CPUPL-1326]
2020-11-23 04:10:43 -05:00
Dipal Madhukar Zambare
22270aa9e4 Merge "Added debug log and trace for gemv and dotv for blis and cblas interface" into amd-staging-milan-3.0 2020-11-23 03:51:48 -05:00
managalv
fdc0e70cd8 Added debug log and trace for gemv and dotv for blis and cblas interface
AMD Internal: [CPUPL-1314]

Change-Id: I2708fd9c73419c968c8e02ff11545645dc639052
2020-11-23 19:55:21 +05:30
Kiran Varaganti
80a516382e Fixed wrong dimensions check in bench/bench_gemm.c application
Verifying the valid values of m, n, k, lda, ldb and ldc is removed.
Since the bench app is run on logs collected from AOCL traces.
The correct way of checking should consider transpose parameter and storage order.

Change-Id: If0fbf733c2650c6f328661293eb99d062685d638
2020-11-20 20:39:20 +05:30
Madan Mohan Manokar
35d33bab6a Merge "Test driver fix for her, her2, herk and her2k" into amd-staging-milan-3.0 2020-11-19 05:47:31 -05:00
Madan mohan Manokar
38698f0dfd Test driver fix for her, her2, herk and her2k
Fixed test driver code for her, her2, herk and her2k function.
Above functions supports only complex and double complex data type, test code is updated accordingly.

Change-Id: Iee7b79abda4a2959a265c420d23879bf47f2c38d
AMD-Internal: [CPUPL-1313]
2020-11-19 12:58:21 +05:30
satish kumar nuggu
17f994bd15 Added Blas interface for ?imatcopy, ?omatcopy, ?omatadd, ?omatcopy2
AMD-Internal: [CPUPL-1116]
Original review was in this commit http://gerrit-git.amd.com/c/cpulibraries/er/blis/+/428165.
Added new commit for transpose API's

Change-Id: I322389cc0be0aaccf82d1d0bb4476beea8694cd8
2020-11-18 12:55:36 +05:30
Dipal M Zambare
ce99b1ecef Added dynamic block size selection logic for DGEMM.
Block sizes (MC, KC, NC) for DGEMM are determined at runtime
based on following parameters

    - Single or multithreaded build
    - Processor Architecture (currently support only zen3)
    - Number of threads requested while running the library

Change-Id: Ia793484b77adb87486e630d0d3b4c7856ae52094
AMD-Internal: [CPUPL-660, CPUPL-661]
2020-11-12 22:40:38 +05:30
Dipal M Zambare
6abd193144 Corrected thread id generation in DTL for BLIS.
Added blis.h in aoclos.c in order to check if BLIS was
build with openmp support.

AOCL-Internal: [CPUPL-1238]

Change-Id: I366da030266b9d7f2ad09dc722847a7d86b85933
2020-11-12 09:13:15 +05:30
managalv
5a57eeadfa Disable 3m1 method for complex GEMM
Details:
Native method is being enabled for complex gemm
Need to run performance for large dataset to enable induced method

MD-Internal: [CPUPL-1300]

Change-Id: I5444dd31e8b8e73da73f789da8b64276e8e40de8
2020-11-11 19:57:59 +05:30
bhaskarn
008fe49df6 Added bench application for trsm
Description:
     Added bench_trsm.c to read inputs from AOCL DTL logs to benchmark
     Added sample input file

Change-Id: I6806e42244bf775cbed457553ca07fb0222ef597
2020-11-09 13:06:39 -05:00
Madan mohan Manokar
ac6dbdcdfb Fixing the logs
fixing data type issue in logs.

Change-Id: I3b9fb2921fd9db57a734c7a2866b53f1b51adfdb
AMD-Internal: [CPUPL-1249]
2020-11-09 20:39:23 +05:30
mkurumel
39f7a4eecf Add AOCL DTL logging.
Added logging for syr,syr2,syrk,syr2k,trmm,trmv,trsv.

	AMD-Internal: [CPUPL-1256]

Change-Id: I628ef5d48796cfc68ec68886b8c1b0555261b3d1
2020-11-09 14:40:43 +05:30
Madan mohan Manokar
bded7f9392 Log fix
Added function defintion of her2k

Change-Id: Ia7ccd72772cdcafdcf5cb8a21c6746b13c70b158
AMD-Internal: [CPUPL-1249]
2020-11-09 10:33:08 +05:30
Mangala V
1578e8b874 Merge "Optimised AXPYF routine for complex float and complex double" into amd-staging-milan-3.0 2020-11-06 02:05:02 -05:00
managalv
aae48c2221 Optimised AXPYF routine for complex float and complex double
Details:
    - Added SIMD code
    - Processing 5 rows at a time in SIMD loop to improve performance

AMD-Internal: [CPUPL-1054]

Change-Id: I2ac93f25895dccfc42e14be0689e6d4e655d6a0a
2020-11-06 18:42:13 +05:30
Dipal Madhukar Zambare
1fb9e1d029 Merge "Re-enable support for Intel 19+ compiler." into amd-staging-milan-3.0 2020-11-06 02:00:34 -05:00
Kiran Varaganti
a15e531374 Merge "Benchmark using AOCL Logs as input" into amd-staging-milan-3.0 2020-11-06 01:51:38 -05:00
Kiran Varaganti
60642d98a3 Benchmark using AOCL Logs as input
Added benchmark application for gemm - input is a log file generated from AOCL
DTL from BLIS.

Change-Id: I2ac7a3c48d5a37c5b24ec0f0cff7e7886dad0b99
2020-11-06 14:31:53 +05:30
Dipal M Zambare
4347d2d823 Re-enable support for Intel 19+ compiler.
Note that there is know issue with Intel 19+ as explained
in https://github.com/flame/blis/issues/371.

AMD version needs this support as some user applications
need ICC support.

AMD-Internal: [CPUPL-1223]

Change-Id: I86ddee068ae18bd940a5952d60960228d8100e97
2020-11-06 11:11:46 +05:30
Madan mohan Manokar
ec35717174 Logging
Added logging for her2k, her, herk, nrm2, symm, symv

Change-Id: Ib3af83b6f8aaafb69fb5d78e964c45504f74f79c
AMD-Internal: [CPUPL-1249]
2020-11-06 10:16:31 +05:30
managalv
68b4ff976b Added Function trace and Input logging for dotv and gemv
Change-Id: I992bd80b2322d6c387f609ecd70c1109c13f6254
AMD-Internal: [CPUPL-1274]
2020-11-06 03:53:43 +05:30
Nageshwar Singh
c40bb45bdf Added debug log and trace support for hemm
AMD-Internal: [CPUPL-1253]

Change-Id: I95e9a864800a09f24c94926936ada8ec8728f1a5
2020-11-05 18:31:41 +05:30
Nageshwar Singh
963277f8f9 Added debug log support for axpy, axpyb, amax, asum, hemv, her2
AMD-Internal: [CPUPL-1253]

Change-Id: I90cabed86a3796385656b34d368588500e9df71c
2020-11-03 20:44:12 +05:30
Meghana Vankadari
0775f09b41 Added debug trace and log support for copy and ger routines
Change-Id: Id7fb64c0a626b2f8f53e89ee7df4391693eb4f4c
2020-11-02 22:56:58 -05:00
Kiran Varaganti
65daaab6ac Fix Bug in DTL
When library is built as single thread and trace is enabled, the test
applications in test folder fail to compile. In the file aoclos.c the function
AOCL_gettid() uses "omp_get_thread_num() to get thread_id, which is only
enabled when OpenMP based parallel BLIS library is generated. To fix this in
single thread case we now return zero for thread id, openmp function is used
only when BLIS_ENABLE_OPENMP macro is defined. However this is not a complete
fix. If library is built with pthread, AOCL_gettid() always return 0, which is
not the intended behaviour.

Change-Id: I5b79ed57d27d0022d3dcab0e2a3a557c8e4ff8ee
2020-11-02 12:05:09 +05:30
Madan mohan Manokar
cd9a751aa0 Trace and logging
Added logging for scal, swap.

Change-Id: Ie7ebf77eb8e0a961fe8cf9f42d99600e6daff8ff
AMD-Internal: [CPUPL-1249]
2020-10-30 19:12:19 +05:30
Madan Mohan Manokar
86f3d1a412 Merge "Trace and logging" into amd-staging-milan-3.0 2020-10-30 08:35:02 -04:00
Manideep Kurumella
1c29554b9e Merge "Trace and logging." into amd-staging-milan-3.0 2020-10-30 08:10:12 -04:00
Nageshwar Singh
33b5867db3 Merge "Trace and logging" into amd-staging-milan-3.0 2020-10-30 06:54:56 -04:00
mkurumel
6ab8f607d8 Trace and logging.
Added tracing for syr,syr2,syrk,syr2k,trmm,trmv,trsv.

	AMD-Internal: [CPUPL-1255]

Change-Id: Id04ff95d7c8fb5854440f79a14e47a5f40096ded
2020-10-30 11:54:39 +05:30
Madan mohan Manokar
2803e9b761 Trace and logging
Added function tracing for her2k, her, herk, nrm2, scal, swap, symm, symv.

Change-Id: I93a97b7000c632f550eab1317b3cafad8c539937
AMD-Internal: [CPUPL-1249]
2020-10-30 11:46:34 +05:30
Nageshwar Singh
5a83365a6c Trace and logging
Details:
   - Added function tracing for bla_asum, bla_axpby, bla_hemm, bla_hemv, bla_her2.

AMD-Internal: [CPUPL-1253]

Change-Id: I08b4cab46d167aceb8123c7f8b19e21a263fe2b8
2020-10-30 10:46:09 +05:30