Commit Graph

1659 Commits

Author SHA1 Message Date
Isuru Fernando
e5fc00a2e7 Add symbol export macro for all functions (#302)
* initial export of blis functions

* Regenerate def file for master

* restore bli_extern_defs exporting for now
2019-08-23 14:18:07 +05:30
Field G. Van Zee
56286b4729 Updated level-3 BLAS to call object API directly.
Details:
- Updated the BLAS compatibility layer for level-3 operations so that
  the corresponding BLIS object API is called directly rather than first
  calling the typed BLIS API. The previous code based on the typed BLIS
  API calls is still available in a deactivated cpp macro branch, which
  may be re-activated by #defining BLIS_BLAS3_CALLS_TAPI. (This does not
  yet correspond to a configure option. If it seems like people might
  want to toggle this behavior more regularly, a configure option can be
  added in the future.)
- Updated the BLIS typed API to statically "pre-initialize" objects via
  new initializor macros. Initialization is then finished via calls to
  static functions bli_obj_init_finish_1x1() and bli_obj_init_finish(),
  which are similar to the previously-called functions,
  bli_obj_create_1x1_with_attached_buffer() and
  bli_obj_create_with_attached_buffer(), respectively. (The BLAS
  compatibility layer updates mentioned above employ this new technique
  as well.)
- Transformed certain routines in bli_param_map.c--specifically, the
  ones that convert netlib-style parameters to BLIS equivalents--into
  static functions, now in bli_param_map.h. (The remaining three classes
  of conversation routines were left unchanged.)
- Added the aforementioned pre-initializor macros to bli_type_defs.h.
- Relocated bli_obj_init_const() and bli_obj_init_constdata() from
  bli_obj_macro_defs.h to bli_type_defs.h.
- Added a few macros to bli_param_macro_defs.h for testing domains for
  real/complexness and precisions for single/double-ness.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
84282bba54 Updates to 3m4m/matlab scripts.
Details:
- Minor updates to matlab graph-generating scripts.
- Added a plot_all.m script that is more of a scratchpad for copying and
  pasting function invocations into matlab to generate plots that are
  presently of interest to us.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
29d5bcb1c8 Changed unsafe-loop to unsafe-math optimizations.
Details:
- Changed -funsafe-loop-optimizations (re-)introduced in 7690855 for
  make_defs.mk files' CRVECFLAGS to -funsafe-math-optimizations (to
  account for a miscommunication in issue #300). Thanks to Dave Love
  for this suggestion and Jeff Hammond for his feedback on the topic.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
9a42c1a323 Restored -funsafe-loop-optimizations to subconfigs.
Details:
- Restored use of -funsafe-loop-optimizations in the definitions of
  CRVECFLAGS (when using gcc), but only for sub-configurations (and
  not configuration families such as amd64, intel64, and x86_64).
  This more or less reverts 5190d05 and 6cf1550.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
e62bdd4df1 Disable TBM, XOP, LWP instructions in AMD configs.
Details:
- Added -mno-tbm -mno-xop -mno-lwp to CKVECFLAGS in bulldozer,
  piledriver, steamroller, and excavator configurations to explicitly
  disable AMD's bulldozer-era TBM, XOP, and LWP instruction sets in an
  attempt to fix the invalid instruction error that has plagued Travis
  CI builds since 6a014a3. Thanks to Devin Matthews for pointing out
  that the offending instruction was part of TBM (issue #300).
- Restored -O3 to piledriver configuration's COPTFLAGS.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
e7b73bf1ed Reverted piledriver COPTFLAGS from -O3 to -O2.
Details:
- Debugging continues; changing COPTFLAGS for piledriver subconfig from
  -O3 to -O2, its original value prior to 6a014a3.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
176e4c6860 Removed -funsafe-loop-optimizations from all configs.
Details:
- Error persists. Removed -funsafe-loop-optimizations from all remaining
  sub-configurations.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
24adee071c Removed -funsafe-loop-optimizations from piledriver.
Details:
- Error persists; continuing debugging from bf0fb78c by removing
  -funsafe-loop-optimizations from piledriver configuration.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
7128d4b94b Removed -funsafe-loop-optimizations from families.
Details:
- Removed -funsafe-loop-optimizations from the configuration families
  affected by 6a014a3, specifically: intel64, amd64, and x86_64.
  This is part of an attempt to debug why the sde, as executed by
  Travis CI, is crashing via the following error:

    TID 0 SDE-ERROR: Executed instruction not valid for specified chip
    (ICELAKE): 0x9172a5: bextr_xop rax, rcx, 0x103
2019-08-23 14:18:07 +05:30
Field G. Van Zee
b7c4f1e305 Standardized optimization flags in make_defs.mk.
Details:
- Per Dave Love's recommendation in issue #300, this commit defines
    COPTFLAGS := -03
  and
    CRVECFLAGS := $(CKVECFLAGS) -funsafe-loop-optimizations
  in the make_defs.mk for all Intel- and AMD-based configurations.
2019-08-23 14:18:07 +05:30
Meghana
fdce1a5648 changed gcc version check condition from 'ifeq' to 'if greater or equal'
Change-Id: Ie4c461867829bcc113210791bbefb9517e52c226
AOCL2.0 2.0
2019-07-24 15:04:41 +05:30
Meghana
c9486e0c4f code to detect version of gcc and set flags accordingly for zen2
Change-Id: I29b0311d0000dee1a2533ee29941acf53f9e9f34
2019-07-24 09:45:17 +05:30
Meghana
dcc0ce12fd Added a global Makefile for AMD architectures in config/zen folder
This Makefile(amd_config.mk) has all the flags that are common to EPYC series

Change-Id: Ic02c60a8293ccdd37f0f292e631acd198e6895de
2019-07-22 17:12:01 +05:30
Meghana Vankadari
b84cee29f4 Merge "Added compiler flags for vanilla clang" into amd-staging-rome2.0 2019-07-08 02:03:07 -04:00
kdevraje
1f80858abf This checkin solves the dgemm performance issue jira ticket CPUPL 458, as #else was missed during integration, it was always following else path to get the block sizes
Change-Id: I0084b5856c2513ab1066c08c15b5086db6532717
2019-07-05 16:05:11 +05:30
Meghana
c7dd6e6cd2 Added compiler flags for vanilla clang
Change-Id: I13c00b4c0d65bbda4c929848fd48b0ab611952ab
2019-07-04 09:32:51 +05:30
Meghana
2acd49b764 fix for test failures using AOCC 2.0
Change-Id: If44eaccc64bbe96bbbe1d32279b1b5773aba08d1
2019-07-01 15:44:07 +05:30
kdevraje
cac127182d Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis
with public repo commit id 565fa3853b.

Change-Id: I68b9824b110cf14df248217a24a6191b3df79d42
2019-06-24 14:05:54 +05:30
Kiran Devrajegowda
3a45ecb154 Merge "Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup" into amd-staging-rome2.0 2019-05-31 06:47:02 -04:00
Kiran Varaganti
b69fb0b74a Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup
Change-Id: I9f5d8225254676a99c6f2b09a0825e545206d0fc
2019-05-31 15:14:22 +05:30
kdevraje
3f867c96ca When running HPL with pure MPI without DGEMM Threading (Single Threaded BLIS ), making this macro 1 gives best performance.wq
Change-Id: I24fd0bf99216f315e49f1c74c44c3feaffd7078d
2019-05-31 14:31:49 +05:30
kdevraje
13806ba3b0 This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019
Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041
2019-05-27 16:24:43 +05:30
Meghana
ee123f5358 Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME
Updated copyright information for kernels/zen/bli_trsm_small.c file
Removed separate kernels for zen2 architecture
Instead added threshold conditions in zen kernels both for ROME and NAPLES

Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5
2019-05-27 15:36:44 +05:30
prangana
9d93a4caa2 update version 2.0 2019-05-24 17:59:13 +05:30
Meghana
e05171118c Implemented TRSM for small matrices for cases where A is on the right
Added separate kernels for zen and zen2

Change-Id: I6318ddc250cf82516c1aa4732718a35eae0c9134
2019-05-23 16:17:19 +05:30
kdevraje
02920f5c48 make checkblis fails for matrix dimension check at the begining hence reverting it
Change-Id: Ibd2ee8c2d4914598b72003fbfc5845be9c9c1e87
2019-05-23 15:29:59 +05:30
kdevraje
84215022f2 Adding threshold condition to dgemm small matrix kernels, defining the constants in zen2 configuration
Change-Id: I53a58b5d734925a6fcb8d8bea5a02ddb8971fcd5
2019-05-23 14:33:47 +05:30
kdevraje
a3554eb1dc Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis to configure zen2
Change-Id: I97e17bca9716b80b862925f97bb513c07b4b0cae
2019-05-23 11:53:32 +05:30
kdevraje
ea082f8390 adding empty zen2 directory with .gitignore file
Change-Id: Ifa37cf54b2578aa19ad335372b44bca17043fe4b
2019-05-23 10:38:29 +05:30
Kiran Varaganti
b80bd5bcb2 config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2
config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H.
test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples

Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2
2019-05-22 05:51:22 -04:00
Kiran Varaganti
a042db011d Modified make_defs.mk for zen2 to get compiled by gcc version less than gcc9.0
Change-Id: I8fcac30538ee39534c296932639053b47b9a2d43
2019-05-22 05:51:10 -04:00
Kiran Varaganti
a23f92594c config_registry: New AMD zen2 architecture configuration added.
frame/base/bli_arch.c: #ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; #endif added. zen2 is added in config_name[BLIS_NUM_ARCHS]
  frame/base/bli_cpuid.c : #ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; #endif, defined new function bool bli_cpuid_is_zen2(...).
  frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..).
  frame/base/bli_gks.c : #ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); #endif
  frame/include/bli_arch_config.h : #ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) #endif #ifdef BLIS_FAMILY_ZEN2 #include "bli_family_zen2.h" #endif
  frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20

Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866
2019-05-22 05:28:16 -04:00
kdevraje
17b878b66d adding license same as in ut-austin-amd-branch
Change-Id: I6790768d2bf5d42369d304ef93e34701f95fbaff
2019-05-22 14:02:53 +05:30
kdevraje
df755848b8 Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis into rome2.0
Change-Id: Ie8aad1ab810f0f3c0b90ec67f9dd3dfb8dcc74cc
2019-05-22 13:30:07 +05:30
Nisanth M P
c72ae27ade Re-enabling the small matrix gemm optimization for target zen
Change-Id: I13872784586984634d728cd99a00f71c3f904395
2019-05-22 01:05:13 -04:00
sraut
ab0818af80 Review comments incorporated for small TRSM.
Change-Id: Ia64b7b2c0375cc501c2cb0be8a1af93111808cd9
2019-05-22 00:43:10 -04:00
Kiran Varaganti
ca4b33c001 Added compiler option (-mno-avx256-split-unaligned-store) in the file config/zen/make_defs.mk to improve performance of intrinsic codes, this flag ensures compiler generates 256-bit stores for the equivalent intrinsics code.
Change-Id: I8f8cd81a3604869df18d38bc42097a04f178d324
2019-04-24 15:02:39 +05:30
kdevraje
9d76688ad9 Fix for single rank crash with HPL application. When computing offset of C buffer, as integer variables are used for a row and column index, the intermediate result value overflows and a negative value gets added to the buffer, when the negative value is too large it would index the buffer out of the range resulting in segmentation fault. Although the crash is a result of dgemm kernel, added similar code in sgemm kernel also.
Change-Id: I171119b0ec0dfbd8e63f1fcd6609a94384aabd27
2019-04-11 10:23:26 +05:30
Kiran Varaganti
53842c7e7d Removed printing alpha and beta values
Change-Id: I49102db510311a30f6a936f9d843f35838f50d23
2019-03-22 13:57:14 +05:30
Kiran Varaganti
6805db45e3 Corrected setting alpha & beta values- alpha = -1 and beta = 1 - bli_setc(-1.0, 0, &alpha) should be used rather than bli_setc(0.0, -1.0, &alpha). This corrected now
Change-Id: Ic1102dfd6b50ccf212386a1211c6f31e8d987ef9
2019-03-22 12:55:35 +05:30
Kiran Varaganti
20153cd4b5 Modified test_gemm.c file in test folder
A Macro 'FILE_IN_OUT" is defined to read input parameters from a csv file.
Format for input file:
Each line defines a gemm problem with following parameters: m k n cs_a cs_b cs_c
The operation always implemented is C = C - A*B and column-major format.
When macro is disabled - it reverts back to original implementation.
Usage: ./test_gemm_<mkl/blis/openblas>.x input.csv output.csv
GEMM is called through BLAS interface
For BLIS - the test application also prints either 'S' indicating small gemm routine or 'N' - conventional BLIS gemm
for MKL/OpenBLAS - ignore this character

Change-Id: I0924ef2c1f7bdea48d4cdb230b888e2af2c86a36
2019-03-21 16:23:53 +05:30
Kiran Varaganti
3a929a3d0b Fixed code merging: bli_gemm_small.c - missed conditional checks for L!=0 && K!=0. Now they are added. This fix is done to pass blastest
Change-Id: Idc9c9a04d2015a68a19553c437ecaf8f1584026c
2019-03-18 10:51:41 +05:30
Kiran Varaganti
7fe4474838 Disabled BLIS_ENABLE_ZEN_BLOCK_SIZES in bli_family_zen.h for ROME tuning
Change-Id: Iec47fcf51f4d4396afef1ce3958e58cf02c59a57
2019-03-06 16:23:31 +05:30
Kiran Varaganti
f5ed95ecd7 Merged BLIS Release 1.3
Modified config/zen/make_defs.mk, now CKVECFLAGS     := -mavx2 -mfpmath=sse -mfma -march=znver1

Change-Id: Ia0942d285a21447cd0c470de1bc021fe63e80d81
2019-03-05 15:03:57 +05:30
praveeng
b06244d98c Merge branch 'ut-austin-amd' of ssh://git.amd.com:29418/cpulibraries/er/blis into ut-austin-amd 2019-02-21 12:56:15 +05:30
praveeng
e938ff08ce deleted test.txt
Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3
2019-02-21 12:49:16 +05:30
mkv
ed13ad465d added test file for initial commit 2019-02-21 12:49:16 +05:30
praveeng
4c7e668083 deleted test.txt
Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3
2019-02-21 12:44:38 +05:30
mkv
95e070581c added test file for initial commit 2019-02-21 01:04:16 -05:00