288 Commits

Author SHA1 Message Date
Edward Smyth
1f0fb05277 Code cleanup: Copyright notices (2)
More changes to standardize copyright formatting and correct years
for some files modified in recent commits.

AMD-Internal: [CPUPL-5895]
Change-Id: Ie95d599710c1e0605f14bbf71467ca5f5352af12
2025-02-07 05:41:44 -05:00
Arnav Sharma
2f2741f4ab Fixed Bug in test_scalv
- Since bli_obj_length(&a) is being used to get the length of A vector,
  we need to initialize A as a vector of M x 1 dimension as the
  bli_obj_length(...) will return the M dimension of the object.
- The vector was being initialized with a dimension of 1 x N resulting
  in bli_obj_length(&a) always returning 1 as the length of the vector.

AMD-Internal: [CPUPL-6297]
Change-Id: Id0e79752f9b81c1573deda3dd32ef0fef10df50c
2025-01-14 23:53:53 -05:00
jagar
ef011c9d92 CMake : Added cmake support for "test" in blis.
command to build test
$ "cmake --build . --target test_blis"

AMD-Internal: [CPUPL-2748]
Change-Id: I088af68115f3d9fdd007203dd22a42f53478a44f
2024-08-06 19:34:57 +05:30
Edward Smyth
82bdf7c8c7 Code cleanup: Copyright notices
- Standardize formatting (spacing etc).
- Add full copyright to cmake files (excluding .json)
- Correct copyright and disclaimer text for frame and
  zen, skx and a couple of other kernels to cover all
  contributors, as is commonly used in other files.
- Fixed some typos and missing lines in copyright
  statements.

AMD-Internal: [CPUPL-4415]
Change-Id: Ib248bb6033c4d0b408773cf0e2a2cda6c2a74371
2024-08-05 15:35:08 -04:00
Edward Smyth
2ee46a3a3a Merge commit 'cfa3db3f' into amd-main
* commit 'cfa3db3f':
  Fixed bug in mixed-dt gemm introduced in e9da642.
  Removed support for 3m, 4m induced methods.
  Updated do_sde.sh to get SDE from GitHub.
  Disable SDE testing of old AMD microarchitectures.
  Fixed substitution bug in configure.
  Allow use of 1m with mixing of row/col-pref ukrs.

AMD-Internal: [CPUPL-2698]
Change-Id: I961f0066243cf26aeb2e174e388b470133cc4a5f
2024-07-08 06:09:11 -04:00
Edward Smyth
ed5010d65b Code cleanup: AMD copyright notice
Standardize format of AMD copyright notice.

AMD-Internal: [CPUPL-3519]
Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0
2023-11-23 08:54:31 -05:00
Edward Smyth
9500cbee63 Code cleanup: spelling corrections
Corrections for some spelling mistakes in comments.

AMD-Internal: [CPUPL-3519]
Change-Id: I9a82518cde6476bc77fc3861a4b9f8729c6380ba
2023-11-09 00:16:30 -05:00
Edward Smyth
f5505be9f3 Merge commit 'e366665c' into amd-main
* commit 'e366665c':
  Fixed stale API calls to membrk API in gemmlike.
  Fixed bli_init.c compile-time error on OSX clang.
  Fixed configure breakage on OSX clang.
  Fixed one-time use property of bli_init() (#525).
  CREDITS file update.
  Added Graviton2 Neoverse N1 performance results.
  Remove unnecesary windows/zen2 directory.
  Add vzeroupper to Haswell microkernels. (#524)
  Fix Win64 AVX512 bug.
  Add comment about make checkblas on Windows
  CREDITS file update.
  Test installation in Travis CI
  Add symlink to blis.pc.in for out-of-tree builds
  Revert "Always run `make check`."
  Always run `make check`.
  Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script.   if the string contains zen and zen2, and zen need to be replaced with another string, then zen2   also be incorrectly replaced.
  Update POWER10.md
  Rework POWER10 sandbox
  Skip clearing temp microtile in gemmlike sandbox.
  Fix asm warning
  Sandbox header edits trigger full library rebuild.
  Add vhsubpd/vhsubpd.
  Fixed bugs in cpackm kernels, gemmlike code.
  Armv8A Rename Regs for Safe Darwin Compile
  Armv8A Rename Regs for Clang Compile: FP32 Part
  Armv8A Rename Regs for Clang Compile: FP64 Part
  Asm Flag Mingling for Darwin_Aarch64
  Added a new 'gemmlike' sandbox.
  Updated Fugaku (a64fx) performance results.
  Add explicit compiler check for Windows.
  Remove `rm-dupls` function in common.mk.
  Travis CI Revert Unnecessary Extras from 91d3636
  Adjust TravisCI
  Travis Support Arm SVE
  Added 512b SVE-based a64fx subconfig + SVE kernels.
  Replace bli_dlamch with something less archaic (#498)
  Allow clang for ThunderX2 config

AMD-Internal: [CPUPL-2698]
Change-Id: I561ca3959b7049a00cc128dee3617be51ae11bc4
2023-10-18 09:09:54 -04:00
Chandrashekara K R
7b78d93282 Removing omp library linking to static multithreaded library build.
Description: We have seen the library dependency issue when we are
linking the libomp.lib or libiomp5md.lib while building the library
for static multithreaded scenario. So we are removing the linking of
openmp library for static multithreaded blis library build. So that
user can link any openmp library(libomp.lib or libiomp5md.lib) while
building their applications by linking static multithreaded blis library.

AMD-Internal: [SWLCSG-2196]
Change-Id: I96722f3587ee555af12de664957c211c56fcf03d
2023-07-13 06:54:02 -04:00
Edward Smyth
7e50ba669b Code cleanup: No newline at end of file
Some text files were missing a newline at the end of the file.
One has been added.

Also correct file format of windows/tests/inputs.yaml, which
was missed in commit 0f0277e104

AMD-Internal: [CPUPL-2870]
Change-Id: Icb83a4a27033dc0ff325cb84a1cf399e953ec549
2023-04-21 10:02:48 -04:00
Edward Smyth
0f0277e104 Code cleanup: dos2unix file conversion
Source and other files in some directories were a mixture of
Unix and DOS file formats. Convert all relevant files to Unix
format for consistency. Some Windows-specific files remain in
DOS format.

AMD-Internal: [CPUPL-2870]
Change-Id: Ic9a0fddb2dba6dc8bcf0ad9b3cc93774a46caeeb
2023-04-21 08:41:16 -04:00
Harihara Sudhan S
42d631bced Copyright modification
- Added copyright information to modified/newly created
          files missing them

Change-Id: If4e73b680246d0363de09587d6dc54bee00ecd71
2022-10-14 12:43:35 +05:30
Dipal M Zambare
61232d540c AOCL progress callback hardening
- BLIS uses callback function to report the progress of the
  operation. The callback is implemented in the user application
  and is invoked by BLIS.

- Updated callback function prototype to make all arguments const.
  This will ensure that any attempt to write using callback’s
  argument is prevented at the compile time itself.

AMD-Internal: [CPUPL-2504]
Change-Id: I8ceb671242365d2a9155b485301cd8c75043e667
2022-09-14 15:32:10 +05:30
Chandrashekara K R
43c16d8e08 AOCL-Windows: Updating the blis windows build system.
1. Removed the libomp.lib hardcoded from cmake scripts and made it user configurable. By default libomp.lib is used as an omp library.
2. Added the STATIC_LIBRARY_OPTIONS property in set_target_properties cmake command to link omp library to build static-mt blis library.
3. Updated the blis_ref_kernel_mirror.py to give support for zen4 architecture.

AMD-Internal: CPUPL-1630
Change-Id: I54b04cde2fa6a1ddc4b4303f1da808c1efe0484a
2022-05-17 18:12:51 +05:30
Dipal M Zambare
e712ffe139 Added AOCL progress support for BLIS
-- AOCL libraries are used for lengthy computations which can go
     on for hours or days, once the operation is started, the user
     doesn’t get any update on current state of the computation.
     This (AOCL progress) feature enables user to receive a periodic
     update from the libraries.
  -- User registers a callback with the library if it is interested
     in receiving the periodic update.
  -- The library invokes this callback periodically with information
     about current state of the operation.
  -- The update frequency is statically set in the code, it can be
     modified as needed if the library is built from source.
  -- These feature is supported for GEMM and TRSM operations.

  -- Added example for GEMM and TRSM.
  -- Cleaned up and reformatted test_gemm.c and test_trsm.c to
     remove warnings and making indentation consistent across the
     file.

AMD-Internal: [CPUPL-2082]
Change-Id: I2aacdd8fb76f52e19e3850ee0295df49a8b7a90e
2022-05-17 18:10:39 +05:30
Chandrashekara K R
97fbff4b65 AOCL_Windows: Updated windows build system.
Updated the windows build system to link the user given openmp
library using -DOpenMP_libomp_LIBRARY=<Desired lib name> option
using command line or through cmake-gui application to build
blis library and its test applications. If user not given any
openmp library then by default openmp library will be
C:/Program Files/LLVM/lib/libomp.lib.

Change-Id: I07542c79454496f88e65e26327ad76a7f49c7a8c
2022-05-17 18:08:57 +05:30
Meghana Vankadari
c11fd5a8f6 Added functionality support for dzgemm
AMD-Internal: [SWLCSG-1012]
Change-Id: I2eac3131d2dcd534f84491289cbd3fe7fb7de3da
2022-05-17 18:01:55 +05:30
Chandrashekara K R
b3553c08fa AOCL-Windows: Updating the blis windows build system.
1. Removed the libomp.lib hardcoded from cmake scripts and made it user configurable. By default libomp.lib is used as an omp library.
2. Added the STATIC_LIBRARY_OPTIONS property in set_target_properties cmake command to link omp library to build static-mt blis library.
3. Updated the blis_ref_kernel_mirror.py to give support for zen4 architecture.

AMD-Internal: CPUPL-1630
Change-Id: I54b04cde2fa6a1ddc4b4303f1da808c1efe0484a
2021-12-22 14:47:15 +05:30
lcpu
30038af896 Reverted: To fix accuracy issues for complex datatypes
Details:
-- reverted cscalv,zscalv,ctrsm,ztrsm changes to address accuracy issues
observed by libflame and scalapack application testing.
-- AMD-Internal: [CPUPL-1906], [CPUPL-1914]

Change-Id: Ic364eacbdf49493dd3a166a66880c12ee84c2204
2021-11-12 08:58:57 +05:30
Nageshwar Singh
a263146a4c Optimized scalv for complex data-types c and z (cscalv and zscalv)
AMD-Internal: [CPUPL-1551]
Change-Id: Ie6855409d89f1edfd2a27f9e5f9efa6cd94bc0c9
2021-11-12 08:58:53 +05:30
Meghana Vankadari
9a5b15da68 Disabled dzgemm_ function call in test_gemm.c
Change-Id: I31da12cbaf50cf7fe44baf97de3c39896c4ccfb1
2021-11-12 08:58:52 +05:30
Field G. Van Zee
f065a8070f Removed support for 3m, 4m induced methods.
Details:
- Removed support for all induced methods except for 1m. This included
  removing code related to 3mh, 3m1, 4mh, 4m1a, and 4m1b as well as any
  code that existed only to support those implementations. These
  implementations were rarely used and posed code maintenance challenges
  for BLIS's maintainers going forward.
- Removed reference kernels for packm that pack 3m and 4m micropanels,
  and removed 3m/4m-related code from bli_cntx_ref.c.
- Removed support for 3m/4m from the code in frame/ind, then reorganized
  and streamlined the remaining code in that directory. The *ind(),
  *nat(), and *1m() APIs were all removed. (These additional API layers
  no longer made as much sense with only one induced method (1m) being
  supported.) The bli_ind.c file (and header) were moved to frame/base
  and bli_l3_ind.c (and header) and bli_l3_ind_tapi.h were moved to
  frame/3.
- Removed 3m/4m support from the code in frame/1m/packm.
- Removed 3m/4m support from trmm/trsm macrokernels and simplified some
  pointer arithmetic that was previously expressed in terms of the
  bli_ptr_inc_by_frac() static inline function (whose definition was
  also removed).
- Removed the following subdirectories of level-0 macro headers from
  frame/include/level0: ri3, rih, ri, ro, rpi. The level-0 scalar macros
  defined in these directories were used exclusively for 3m and 4m
  method codes.
- Simplified bli_cntx_set_blkszs() and bli_cntx_set_ind_blkszs() in
  light of 1m being the only induced method left within BLIS.
- Removed dt_on_output field within auxinfo_t and its associated
  accessor functions.
- Re-indexed the 1e/1r pack schemas after removing those associated with
  variants of the 3m and 4m methods. This leaves two bits unused within
  the pack format portion of the schema bitfield. (See bli_type_defs.h
  for more info.)
- Spun off the basic and expert interfaces to the object and typed APIs
  into separate files: bli_l3_oapi.c and bli_l3_oapi_ex.c; bli_l3_tapi.c
  and bli_l3_tapi_ex.c.
- Moved the level-3 operation-specific _check function calls from the
  operations' _front() functions to the corresponding _ex() function of
  the object API. (This change roughly maintains where the _check()
  functions are called in the call stack but lays the groundwork for
  future changes that may come to the level-3 object APIs.) Minor
  modifications to bli_l3_check.c to allow the check() functions to be
  called from the expert interface APIs.
- Removed support within the testsuite for testing the aforementioned
  induced methods, and updated the standalone test drivers in the 'test'
  directory so reflect the retirement of those induced methods.
- Modified the sandbox contract so that the user is obliged to define
  bli_gemm_ex() instead of bli_gemmnat(). (This change was made in light
  of the *nat() functions no longer existing.) Also updated the existing
  'power10' and 'gemmlike' sandboxes to come into compliance with the
  new sandbox rules.
- Updated BLISObjectAPI.md, BLISTypedAPI.md, Testsuite.md documentation
  to reflect the retirement of 3m/4m, and also modified Sandboxes.md to
  bring the document into alignment with new conventions.
- Updated various comments; removed segments of commented-out code.
2021-10-28 16:05:43 -05:00
Field G. Van Zee
cc9206df66 Added Graviton2 Neoverse N1 performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on a Graviton2
  Neoverse N1 server. Special thanks to Nicholai Tukanov for
  collecting these results via the Arm-HPC/AWS hackaton.
- Corrected what was supposed to be a temporary tweak to the legend
  labels in test/3/octave/plot_l3_perf.m.
2021-07-16 15:48:37 -05:00
Meghana Vankadari
cb3a40ab9d Added blas interface for dzgemm
- Added blas interface for dzgemm. This function will call
  native implementation of gemm.
- Mixed datatype support is already present in BLIS. But this
  implementation requires alpha_imag value to be 0.
- Modified test_gemm.c to support testing of dzgemm.

Change-Id: I496fffdede9f0f778b9a33b405eb6861c6dcc334
2021-06-27 09:34:18 -04:00
Chandrashekara KR
d7377f967c Merge "AOCL-Windows: Update BLIS build system" into amd-staging-milan-3.1 2021-06-17 08:49:55 -04:00
Chandrashekara K R
f94e3ad237 AOCL-Windows: Update BLIS build system
1. Added support in cmake scripts for linking libomp for blis multithreading build.
 2. Added ${CMAKE_CURRENT_SOURCE_DIR}/bli_axpyf_zen_int_6.c statement in blis\kernels\zen\1f cmake file to build newly added file.
 3. Added the new macros in blis/frame/include/bli_macro_defs.h for ENABLE_NO_UNDERSCORE_API support for gemm_batch and axpby API's.
 4. Modified the file open mode from binary to text mode in blis/testsuite/src/test_libblis.c file to avoid the line ending issue on different OS.
 5. Added the definition for the macro BLIS_DISABLE_TRSM_PREINVERSION in main CmakeLists.txt file.

AMD Internal : [CPUPL-1630]

Change-Id: Iba1b7b6d014a4317de7cbaf42f812cad20111e4f
2021-06-15 16:49:08 +05:30
Nageshwar Singh
6ca50e1b72 Added bench utility for copyv API
AOCL-Internal: [CPUPL-1591]
Change-Id: I00ddad565cb87cd9371d7b1df2b57394fef437e0
2021-06-09 12:29:49 +05:30
Nageshwar Singh
6842c2a30e Bench trsv logging error
Details
  - Passing enum rather than char for uplo, transa, and diaga
  - Deleting log file, and other temp files, merged in the codebase from amax

AOCL-Internal: [CPUPL-1591]
Change-Id: Ife85a388b45659aa608a552d18a65fe828b046b2
2021-06-08 11:54:55 +05:30
Nageshwar Singh
61b7584580 Bench addition for amaxv API
AOCL-Internal: [CPUPL-1591]
Change-Id: Ia9754dfed1a7302d5c267858f9005c8f64e28b46
2021-06-04 17:45:04 +05:30
Field G. Van Zee
82af05f54c Updated Fugaku (a64fx) performance results.
Details:
- Updated the performance graphs (pdfs and pngs) for the Fugaku/a64fx
  entry within Performance.md, and also updated the experiment details
  accordingly. Thanks to RuQing Xu for re-running the BLIS and SSL2
  experiments reflected in this commit.
- In Performance.md, added an English translation of the project name
  under which the Fugaku results were gathered, courtesy of RuQing Xu.
2021-05-25 15:25:08 -05:00
Dipal Madhukar Zambare
821fa267c9 Merge "Updated makefiles to fix issues introduced in merge" into amd-staging-milan-3.1 2021-05-05 23:42:15 -04:00
Dipal M Zambare
7454cca9e7 Updated makefiles to fix issues introduced in merge
- Updated Makefile to include DTL files in library build
   - Updated Makefile to include cpp header file installation
   - Updated test/makefile to include extra API added by AMD team.

AMD-Internal: [CPUPL-1559]
Change-Id: I249c6935d5ff5fb645f9deec7e0218575484be13
2021-05-05 14:59:15 +05:30
Nallani Bhaskar
f917d826b5 Updated test application to work with row major cblas
Details:
1. Fixed reading leading dimenstions in test_gemm.c based on row/col major
2. Reduced redundent code and adjusted alignment

Change-Id: I8ca8c81223386fc21c6cc7c1d8f8a2109c9f5343
2021-05-02 23:09:13 +05:30
lcpu
7401effc03 BLIS:merge:
Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch

Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond)

Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations.

Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations.

Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu)

Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu)

Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs.

Minor code consolidation in all level-3 _front() functions.

Reorganized Windows cpp branch of bli_pthreads.c.

Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS.

Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion.

Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv.

AMD-internal-[CPUPL-1523]

Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd
2021-04-27 11:09:48 +05:30
satish kumar nuggu
c0d16f70e4 Added CBLAS API interface and memory alignment check
Details:
1. Added CBLAS API in test_gemm.c and test_trsm.c
2. Creating matrixes with memory aligned leading dimensions irrespective of dimension.
3. Shifting powers of 2 aligned memory to non-powers of 2 by adding extra cache line size at the end.

   AMD-Internal: [CPUPL-1500] [CPUPL-1450]

Change-Id: I4180cec29b62d0388b974abee3e9b699cce3af6a
2021-04-26 13:44:30 +05:30
Field G. Van Zee
ba3ba8da83 Minor updates and fixes to test/3/octave scripts.
Details:
- Fixed an issue where the wrong string was being passed in for the
  vendor legend string.
- Changed the graph in which the legends appear.
- Updates to runthese.m.
2021-04-06 18:39:58 -05:00
Kiran Varaganti
657f58b82b Fix test_dotv.c when complex-return=intel
When BLIS is built with --complex-return=intel, the zdotu_ and cdotu_ function prototype changes.
Now "return parameter" will become the first argument of these functions and these functions return void.
This fix addresses the change in the function declarations when --complex-return=intel is enabled.
Missing Trace and Log statements for this configuration are now  added.
[CPUPL-1376]

Change-Id: Ib420989da71839211c16088bf431a2ad775a3978
2021-04-06 15:38:12 +05:30
Field G. Van Zee
159ca6f01a Made test/3/octave scripts robust to missing data.
Details:
- Modified the octave scripts in test/3 so that the script does not
  choke when one or more of the expected OpenBLAS, Eigen, or vendor data
  files is missing. (The BLIS data set, however, must be complete.) When
  a file is missing, that data series is simply not included on that
  particular graph. Also factored out a lot of the redundant logic from
  plot_panel_4x5.m into a separate function in read_data.m.
2021-03-24 15:57:32 -05:00
nphaniku
b3628cdfd3 AOCL Windows: 3.1 BLIS changes
1. CMake script changes for build with Clang compiler.
 2. CMake script changes for build test and testsuite based on the lib type ST/MT
 3. CMake script changes for testcpp and blastest
 4. Added python scripts to support library build and testsuite build.

AMD Internal : [CPUPL-1422]

Change-Id: Ie34c3e60e9f8fbf7ea69b47fd1b50ee90099c898
2021-03-08 19:04:17 +05:30
Kumar, Phani
477fc41fff Cmake script changes and blis.h changes for amd-staging-milan-3.0
AMD Internal : [CPUPL-1083]

Change-Id: Ia29a1f328ee32e2aec59a7fc70c04400d6ee6580
2020-11-24 06:12:25 -05:00
Dipal M Zambare
0a3d94c9a2 Updated test drivers for dotv, scalv and swapv.
Added traces in cblas layer for these API's.
These test drivers didn't have calls for complex data
types, the drivers are updated to support them.

AMD-Internal : [CPUPL-1315]

Change-Id: Ia52ecca68ea17314315d626b57c46a2f5973985b
2020-11-24 10:26:32 +05:30
Madan mohan Manokar
1d8fab0996 Test driver fix for her and her2
Fixed test driver code for her, her2
Support added to handle complex and double complex data type in test driver.

Change-Id: If65939e99d8cf77e0fb70561166d84bf67d0321d
AMD-Internal: [CPUPL-1326]
2020-11-23 04:10:43 -05:00
Madan Mohan Manokar
35d33bab6a Merge "Test driver fix for her, her2, herk and her2k" into amd-staging-milan-3.0 2020-11-19 05:47:31 -05:00
Madan mohan Manokar
38698f0dfd Test driver fix for her, her2, herk and her2k
Fixed test driver code for her, her2, herk and her2k function.
Above functions supports only complex and double complex data type, test code is updated accordingly.

Change-Id: Iee7b79abda4a2959a265c420d23879bf47f2c38d
AMD-Internal: [CPUPL-1313]
2020-11-19 12:58:21 +05:30
satish kumar nuggu
17f994bd15 Added Blas interface for ?imatcopy, ?omatcopy, ?omatadd, ?omatcopy2
AMD-Internal: [CPUPL-1116]
Original review was in this commit http://gerrit-git.amd.com/c/cpulibraries/er/blis/+/428165.
Added new commit for transpose API's

Change-Id: I322389cc0be0aaccf82d1d0bb4476beea8694cd8
2020-11-18 12:55:36 +05:30
Field G. Van Zee
88ad841434 Squash-merge 'pr' into 'squash'. (#457)
Merged contributions from AMD's AOCL BLIS (#448).
  
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
  only the lower or upper triangle of a square matrix C. For now, only
  the conventional/large code path will be supported (in vanilla BLIS).
  This was accomplished by leveraging the existing variant logic for
  herk. However, some of the infrastructure to support a gemmtsup is
  included in this commit, including
  - A bli_gemmtsup() front-end, similar to bli_gemmsup().
  - A bli_gemmtsup_ref() reference handler function.
  - A bli_gemmtsup_int() variant chooser function (with variant calls
    commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
  wrapper to a set of polymorphic CBLAS-like function wrappers defined
  in another header (cblas.hh). These two headers are installed if
  running the 'install' target with INSTALL_HH is set to 'yes'. (Also
  added a set of unit tests that exercise blis.hh, although they are
  disabled for now because they aren't compatible with out-of-tree
  builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
  within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
  various minor updates to dotv and scalv kernels. Also added various
  sup kernels contributed by AMD to kernels/zen/3. However, these
  kernels are (for now) not yet used, in part because they caused
  AppVeyor clang failures, and also because I have not found time to
  review and vet them.
- Output the python found during configure into the definition of PYTHON
  in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
  to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
  bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
  bug surfaced because the gemmt module verifies its computation using
  gemm with its beta parameter set to zero, which, on a cortexa15 system
  caused the gemm kernel code to unconditionally multiply the
  uninitialized C data by beta. The C matrix likely contained
  non-numeric values such as NaN, which then would have resulted in a
  false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
  in bli_l3_blocksize.c, was inadvertantly being defined in terms of
  helper functions meant for trmm. This bug was probably harmless since
  the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
  kernels/zen/3/bli_gemm_small.c since those macros are not used in
  vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
  accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
  Windows systems.
- Various whitespace changes.
2020-11-14 09:39:48 -06:00
managalv
aae48c2221 Optimised AXPYF routine for complex float and complex double
Details:
    - Added SIMD code
    - Processing 5 rows at a time in SIMD loop to improve performance

AMD-Internal: [CPUPL-1054]

Change-Id: I2ac93f25895dccfc42e14be0689e6d4e655d6a0a
2020-11-06 18:42:13 +05:30
managalv
68b4ff976b Added Function trace and Input logging for dotv and gemv
Change-Id: I992bd80b2322d6c387f609ecd70c1109c13f6254
AMD-Internal: [CPUPL-1274]
2020-11-06 03:53:43 +05:30
Meghana Vankadari
0775f09b41 Added debug trace and log support for copy and ger routines
Change-Id: Id7fb64c0a626b2f8f53e89ee7df4391693eb4f4c
2020-11-02 22:56:58 -05:00
Kiran Varaganti
65daaab6ac Fix Bug in DTL
When library is built as single thread and trace is enabled, the test
applications in test folder fail to compile. In the file aoclos.c the function
AOCL_gettid() uses "omp_get_thread_num() to get thread_id, which is only
enabled when OpenMP based parallel BLIS library is generated. To fix this in
single thread case we now return zero for thread id, openmp function is used
only when BLIS_ENABLE_OPENMP macro is defined. However this is not a complete
fix. If library is built with pthread, AOCL_gettid() always return 0, which is
not the intended behaviour.

Change-Id: I5b79ed57d27d0022d3dcab0e2a3a557c8e4ff8ee
2020-11-02 12:05:09 +05:30