Commit Graph

2064 Commits

Author SHA1 Message Date
prangana
3620e472e3 Replace back major version number variable in Makefile
Change-Id: I0f902e32085058ec618d08470793f5e5e49719b3
2020-06-10 13:11:14 +05:30
Dipal M Zambare
305c744131 Added traces in dgemm and sgemm paths.
Added traces from blas/cblas API's till kernels for dgemm and sgemm.
By default the traces will be disabled, user need to enable them
in their local workspace, please check aocl_dtl/aocldtlcf.h file.

AMD Internal : CPUPL-806

Change-Id: I83b310509fb1a599c114387192bcf882ef0480f9
2020-06-08 12:01:22 +05:30
Meghana
9fce1ec4a4 Optimized SGEMV kernel and changed BLAS interface call
Details:
- Optimized saxpyf kernel with fuse_factor=5 and iter_unroll=2.
- Modified framework files of sgemv to remove dependency on cntx
variable.
- Updated cntx_init file of zen2 to choose optimized kernels.
- Modified BLAS interface call for SGEMV to reduce framework overhread.
- Currently these changes are applicable for zen2 configuration.

Change-Id: Iabc36ae640e82e65f8764f3c6dee513ad64b22fd
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-707]
2020-06-04 02:49:08 -04:00
Dipal Madhukar Zambare
8a367c993e Merge "Checking for zero dimension is moved to bli_gemm_xx call." into amd-staging-rome-2.2 2020-06-04 02:16:56 -04:00
Meghana
f4d2bb2fed Enabled AOCC specific flags for all versions of AOCC compiler
Change-Id: Icad0ff1c1858c1762792ba8f2c5c3e846909cbb5
2020-06-03 10:50:00 -04:00
dzambare
5d57d67cb3 Checking for zero dimension is moved to bli_gemm_xx call.
This will ensure early return in case full gemm processing is not needed.

Based on dimension which is found to be zero following actions will be taken:

If 'c' has zero dimension, no further processing is requried
If alpha is zero or if 'a' or 'b' has zero diemension, we
perform scalm operation instead of gemm. (c = alpha*a + beta*b)

Change-Id: Icc031944fc4e80138adf991974547f2d57ab570b
AMD-Internal: [CPUPL-904]
2020-06-03 16:50:11 +05:30
managalv
b4e599ecc2 CPUPL-929: Improve Complex GEMM performance - Support all storage formats and non Transpose/Conjugate Matrices
Failure was seen in libflame function (FLASH_UDdate_UT_inc)
Due to typecasting double complex pointer as double pointer

Change-Id: If6e2f4663575450a13a9a07dddd5622628f5c6b0
2020-06-02 22:27:54 +05:30
Nallani Bhaskar
6f01cd2c54 Fix for sblat3.x failure in make check
Details:
Using of ymm registers storing 8 float values than 4 floats values
Changed register from ymm to xmm in required places. This can be found
only when leading dimension is greater than the actual dimension.

Change-Id: I39f04eac18c4fa3a8c93048c977d6a83aa92b800
2020-06-01 17:04:59 +05:30
managalv
f7bc37ea32 CPUPL-929: Improve Complex GEMM performance - Support all storage formats and non Transpose/Conjugate Matrices
Details
Added Support of N SUP kernel for complex float and complex double
Removed prefetching in M SUP kernels for complex float and complex double
Removed all warnings

Change-Id: I05ffde0f0613681927fe7576db7f5f1a4486fd05
2020-06-01 06:24:12 -04:00
Kiran Varaganti
c8f3cec5f7 Merge "Code cleanup in 6xk DGEMM pack Kernel" into amd-staging-rome-2.2 2020-06-01 05:08:58 -04:00
Nallani Bhaskar
5e0ad13f8e Code Cleanup and replaced vzeroall with vxorps
Change-Id: I74c2cc2183a407aad86eab5c3285c33690de9abd
2020-06-01 10:14:06 +05:30
Nallani Bhaskar
2413c31672 CPUPL-923: Implemented dot Product Kernels in SGEMM SUP for transpose cases.
Details:
Added two new kernels bli_sgemmsup_rd_zen_asm_6x16m and bli_sgemmsup_rd_zen_asm_6x16n
to support dot product in Row Major (A * Tranpose(B)) and in Column Major (Tranpose(A) * B)

Change-Id: I264fd75c4c4b68fb7dc4fd229eaa44d09e9f3432
2020-05-31 22:37:03 +05:30
Kiran Varaganti
3ebd5f8aa0 Code cleanup in 6xk DGEMM pack Kernel
Removed conditional check if(*kappa_cast==0.0) in 6xk dgemm packing kernel

Change-Id: Ie543787133d303aeb2532e67b83d6ba96e3d558e
2020-05-31 21:41:45 +05:30
prangana
711f26129e Update AMD BLIS version to 2.2
Also updated Makefile to fix issue of multiple symbolic links being
created

Change-Id: Ie9a680cedd5c96fcd7f6af1ce0f849a58c3ed4d3
2020-05-31 21:37:32 +05:30
Kiran Varaganti
f8ddd48594 Code Clean-up in DGEMM packing kernels
Removed conditional check for (*kappa_cast == 1.0) because its always 1.0 in
DGEMM packing kernels.
[CPUPL-636]

Change-Id: Ib04f2a3cdbb0f138036a8b0486d1dec073e40407
2020-05-30 21:55:29 +05:30
prangana
0c52aaefe1 Merge branch 'ref/heads/amd-staging-rome-2.2' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging-rome-2.2
Change-Id: I46acf48354ff73fb4eaeac255132d21095ea4d98
2020-05-30 10:31:10 +05:30
prangana
bb7eeec843 Change loop test expression in bli_packm_zen_int.c
PRAGMA SIMD loop has issues with test expression (k !=0)
Changed usage to (k > 0)

Change-Id: I50204dbd0194de43f0d6cdcbfc586bb16aa25968
2020-05-30 10:00:21 +05:30
Kiran Varaganti
739803a441 DGEMM Packing Kernels for Native DGEMM implementation
[CPUPL-858] Packing kernels for dgemm 6x8 kernel are added explicitly
for zen2 configuration. Apart from generic packing kernels used by level-3
routines and for all combinations of the input parameters, introduced DGEMM
specific packing kernels for the case op(A) & op(B) is no transpose. This
helps us to vectorize these packing kernels and eliminate un-necessary branch
conditional checks. The packed kernels are also optimized at the boundary.
These boundary condition optimization help when the input matrix dimensions
"m" and "n" are not multiples of register block-sizes "MR & NR".
Typical DGEMM operation is C = beta*C + alpha *op(A) * op(B). Kindly note
the multiplication with alpha is handled inside kernel, hence in these dgemm
packing routines alpha is always consider 1.0. These routines are
"bli_dpackm_8xk_nn_zen" & "bli_dpackm_6xk_nn_zen". The generic packing
routines
are "bli_dpackm_6xk_gen_zen" & bli_dpackm_8xk_gen_zen". These routines are
enabled from "bli_cntx_init_zen2()" through bli_cntx_set_packm_kers(). In this
checkout wthe generic packing kernels are enabled by default". Later will
introduce run-time mechanism to change these packing kernels based on the
DGEMM input parameters.

Change-Id: I079b4dce0757d558224cb8c55d024bfea6a4de91
2020-05-28 02:01:43 -04:00
managalv
9b09dd7d6c CPUPL-929:Improve Complex GEMM performance
Added context for CCR format

Change-Id: I81ac1b882f176235b1c48f4952ec30f44c6b138c
2020-05-23 11:10:35 +05:30
managalv
154bedc785 CPUPL-929:Improve Complex GEMM performance
Removed print which was part of kernel

Change-Id: I288e0151ba8da8d6dd4415734c88ed3474ba3a5b
2020-05-22 14:39:12 +05:30
dzambare
8ce6e49a34 Added file and copyright header for aoclflist.c file.
AMD Internal: CPUPL-968

Change-Id: I1d0905fac62cac8224e38ae8d0912b9c5663799d
2020-05-22 12:54:18 +05:30
managalv
11570dbc14 CPUPL-929:Improve Complex GEMM performance
Updated BLIS_MC value and created SUP context for CCC storage format

Change-Id: I5032b29834ea545d7b5f7a9469bc5655c71b7fe5
2020-05-22 10:47:28 +05:30
Guodong Xu
72443e7173 avoid loading twice in armv8a gemm kernel (#403)
This bug happens at a corner case, when k_iter == 0 and we jump to
CONSIDERKLEFT.

In current design, first row/col. of a and b are loaded twice.

The fix is to rearrange a and b (first row/col.) loading instructions.

Change-Id: I4a985a3abf9b1e7a0ee29e17c7d39a4a27138c4c
Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
2020-05-21 12:37:53 +05:30
Field G. Van Zee
f973f00d94 Defined netlib equivalent of xerbla_array().
Details:
- Added a function definition for xerbla_array_(), which largely mirrors
  its netlib implementation. Thanks to Isuru Fernando for suggesting the
  addition of this function.

Change-Id: Ie9c619f5604e60a32edfda2db2b66f0c762581d3
2020-05-21 11:57:54 +05:30
Field G. Van Zee
994a2d8de5 Documented Perl prerequisite for build system.
Details:
- Added Perl to list of prerequisites for building BLIS. This is in part
  (and perhaps completely?) due to some substitution commands used at
  the end of configure that include '\n' characters that are not
  properly interpreted by the version of sed included on some versions
  of OS X. This new documentation addresses issue #398.
2020-05-21 11:56:45 +05:30
Guodong Xu
66ec22705b New kernel set for Arm SVE using assembly (#396)
Here adds two kernels for Arm SVE vector extensions.
1. a gemm  kernel for double at sizes 8x8.
2. a packm kernel for double at dimension 8xk.

To achive best performance, variable length agonostic programming
is not used. Vector length (VL) of 256 bits is mandated in both kernels.
Kernels to support other VLs can be added later.

"SVE is a vector extension for AArch64 execution mode for the A64
instruction set of the Armv8 architecture. Unlike other SIMD architectures,
SVE does not define the size of the vector registers, but constrains into
a range of possible values, from a minimum of 128 bits up to a maximum of
2048 in 128-bit wide units. Therefore, any CPU vendor can implement the
extension by choosing the vector register size that better suits the
workloads the CPU is targeting. Instructions are provided specifically
to query an implementation for its register size, to guarantee that
the applications can run on different implementations of the ISA without
the need to recompile the code."  [1]

[1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning

Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
2020-05-21 11:56:45 +05:30
Yingbo Ma
562b9eeaaf Update KernelsHowTo.md (#395) 2020-05-21 11:56:45 +05:30
Field G. Van Zee
8e3f143439 Adding missing conjy to her2/syr2 in typed API doc.
Details:
- Fixed a missing argument (conjy) in the function signatures of
  bli_?her2() and bli_?syr2() in docs/BLISTypedAPI.md. Thanks to Robert
  van de Geijn for reporting this omission.

Change-Id: Ifd1e01d5d7f943db4b1d67b467eb57e4a5c44165
2020-05-21 11:56:36 +05:30
Field G. Van Zee
93023d0e02 README.md update to promote supmt dgemm.
Details:
- Updated the sup entry in the "What's New" section of the README.md
  file to promote the multithreaded dgemm sup feature introduced in
  c0558fd.
2020-05-21 11:55:32 +05:30
Field G. Van Zee
4907b32c2c CHANGELOG update (0.7.0) 2020-05-21 11:55:32 +05:30
Field G. Van Zee
052a3c589f ReleaseNotes.md update in advance of next version.
Details:
- Updated docs/ReleaseNotes.md in preparation for next version.

Change-Id: I6c3e0dbaebcb855dff9420196092da5cb0bcce89
2020-05-21 11:55:20 +05:30
Field G. Van Zee
27b2911ab3 Rename more bli_thread_obarrier(), _obroadcast().
Details:
- Renamed instances of bli_thread_obarrier() and bli_thread_obroadcast()
  that were made in the supmt-specific code commited to the 'amd'
  branch, which has now been merged with 'master'. Prior to the merge,
  'master' received commit c01d249, which applied these renamings to
  the existing, non-sup codebase.
2020-05-21 11:54:54 +05:30
Field G. Van Zee
4a5e76e15e Minor updates/elaborations to RELEASING file. 2020-05-21 11:54:54 +05:30
Satish Balay
d560d105d2 OSX: specify the full path to the location of libblis.dylib (#390)
* OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime

Before this change:

Appication gives runtime error [when linked with blis]
dyld: Library not loaded: libblis.3.dylib

balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
        libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

After this change:
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
	/Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)

* INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR

Co-Authored-By: Jed Brown <jed@jedbrown.org>

* CREDITS file update.

Co-authored-by: Jed Brown <jed@jedbrown.org>
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-05-21 11:54:54 +05:30
Field G. Van Zee
3597284b9d Updates, tweaks to runme.sh in test/1m4m.
Details:
- Made several updates to test/1m4m/runme.sh, including:
  - Added missing handling for 1m and 4m1a implementations when setting
    the BLIS_??_NT environment variables.
  - Added support for using numactl to run the test executables.
  - Several other cleanups.
2020-05-21 11:54:53 +05:30
Field G. Van Zee
b325f1ea62 Warn user when auto-detection returns 'generic'.
Details:
- Added logic to configure that causes the script to output a warning
  to the user if/when "./configure auto" is run and the underlying
  hardware feature detection code is unable to identify the hardware.
  In these cases, the auto-detect code will return 'generic', which
  is likely not what the user expected, and a flag will be set so that
  a message is printed at the end of the configure output. (Thankfully,
  we don't expect this scenario to play out very often.) Thanks to
  Devin Matthews for suggesting this fix #384.
2020-05-21 11:54:53 +05:30
Field G. Van Zee
9e76059f15 Renamed bli_thread_obarrier(), _obroadcast().
Details:
- Renamed two bli_thread_*() APIs:
    bli_thread_obarrier()   -> bli_thread_barrier()
    bli_thread_obroadcast() -> bli_thread_broadcast()
  The 'o' was a leftover from when thrcomm_t objects tracked both
  "inner" and "outer" communicators. They have long since been
  simplified to only support the latter, and thus the 'o' is
  superfluous.

Change-Id: If9ec9a2383dfb02e1cfc74918f87a1fabddbd55b
2020-05-21 11:54:37 +05:30
Field G. Van Zee
6a957d7247 List Gentoo under supported external packages.
Details:
- Add mention of Gentoo Linux under the list of external packages in
  the README.md file. Thanks to M. Zhou for maintaining this package.
2020-05-21 11:50:37 +05:30
Field G. Van Zee
c7faae9442 Merged test/sup, test/supmt into test/sup.
Details:
- Updated the Makefile, test_gemm.c, and runme.sh in test/sup to be able
  to compile and run both single-threaded and multithreaded experiments.
  This should help with maintenance going forward.
- Created a test/sup/octave_st directory of scripts (based on the
  previous test/sup/octave scripts) as well as a test/sup/octave_mt
  directory (based on the previous test/supmt/octave scripts). The
  octave scripts are slightly different and not easily mergeable, and
  thus for now I'll maintain them separately.
- Preserved the previous test/sup directory as test/sup/old/supst and
  the previous test/supmt directory as test/sup/old/supmt.

Change-Id: Ia230fc65185fd9a34eec714721004aa9e0bd40ed
2020-05-21 11:50:19 +05:30
Field G. Van Zee
6d369532e3 Updated sup[mt] Makefiles for variable dim ranges.
Details:
- Updated test/sup/Makefile and test/supmt/Makefile to allow specifying
  different problem size ranges for the drivers where one, two, or three
  matrix dimensions is large. This will facilitate the generation of
  more meaningful graphs, particularly when two dimensions are tiny.
2020-05-21 11:46:36 +05:30
Field G. Van Zee
2096f41aa6 Updates to octave scripts in test/sup[mt]/octave.
Details:
- Optimized scripts in test/sup/octave and test/supmt/octave for use
  with octave 5.2.0 on Ubuntu 18.04.
- Fixed stray 'end' keywords in gen_opsupnames.m and plot_l3sup_perf.m,
  which were not only unnecessary but also causing issues with versions
  5.x.
2020-05-21 11:46:35 +05:30
Field G. Van Zee
08709d4117 Removed sorting on LDFLAGS in common.mk (#373).
Details:
- Removed a line of code in common.mk that passed LDFLAGS through the
  sort function. The purpose was not to sort the contents, but rather
  to remove duplicates. However, there is valid syntax in a string of
  linker flags that, when sorted, yields different/broken behavior.
  So I've removed the line in common.mk that sorts LDFLAGS. Also, for
  future use, I've added a new function, rm-dupls, that removes
  duplicates without sorting. (This function was based on code from a
  stackoverflow thread that is linked to in the comments for that
  code.) Thanks to Isuru Fernando for reporting this issue (#373).

Change-Id: Ie355cc111fd2c6669f0c3088e8fa5dc7c407a3b9
2020-05-21 11:45:48 +05:30
Field G. Van Zee
b3c0309009 CHANGELOG update (0.6.1) 2020-05-21 11:42:04 +05:30
Field G. Van Zee
d6496d55cc ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.

Change-Id: I2aa6f944ce2584de85ae7b6921ff0193b3b7020a
2020-05-21 11:41:49 +05:30
Field G. Van Zee
51f87f3e42 Removed 'attic/windows' (to prevent confusion).
Details:
- Finally removed 'attic/windows' and its contents. This directory once
  contained "proto" Windows support for BLIS, but we've since moved on
  to (thanks to Isuru Fernando) providing Windows DLL support via
  AppVeyor's build artifacts. Furthermore, since 'windows' was the only
  subdirectory within 'attic', the directory path would show up in
  GitHub's listing at https://github.com/flame/blis, which probably led
  to someone being confused about how BLIS provides Windows support. I
  assume (but don't know for sure) that nobody is using these files, so
  this is admittedly a case of shoot first and ask questions later.
2020-05-21 11:41:10 +05:30
Field G. Van Zee
142df1b1e9 CREDITS file update. 2020-05-21 11:41:10 +05:30
Dave Love
291ee5f748 Fix parsing in vpu_count on workstation SKX (#351)
* Fix parsing in vpu_count on workstation SKX

* Document Skylake-X as Haswell for single FMA

* Update vpu_count for Skylake and Cascade Lake models

* Support printing the configuration selected, controlled by the environment

Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.

* Move bli_log outside the cpp condition, and use it where intended

* Add Fixme comment (Skylake D)

* Mostly superficial edits to commits towards #351.

Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
  to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
  relates to single-VPU Skylake-Xs.

* Fix comment typos

Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-05-21 11:40:57 +05:30
Field G. Van Zee
99da76fd64 Fixed 'configure' breakage introduced in 6433831.
Details:
- Added a missing 'fi' (endif) keyword to a conditional block added in
  the configure script in commit 6433831.
2020-05-21 11:40:00 +05:30
Field G. Van Zee
38ecda47e7 Updated 1m draft article link in README.md. 2020-05-21 11:40:00 +05:30
Jeff Hammond
570d51483b blacklist ICC 18 for knl/skx due to test failures
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
2020-05-21 11:40:00 +05:30