Commit Graph

1634 Commits

Author SHA1 Message Date
kdevraje
3f867c96ca When running HPL with pure MPI without DGEMM Threading (Single Threaded BLIS ), making this macro 1 gives best performance.wq
Change-Id: I24fd0bf99216f315e49f1c74c44c3feaffd7078d
2019-05-31 14:31:49 +05:30
kdevraje
13806ba3b0 This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019
Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041
2019-05-27 16:24:43 +05:30
Meghana
ee123f5358 Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME
Updated copyright information for kernels/zen/bli_trsm_small.c file
Removed separate kernels for zen2 architecture
Instead added threshold conditions in zen kernels both for ROME and NAPLES

Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5
2019-05-27 15:36:44 +05:30
prangana
9d93a4caa2 update version 2.0 2019-05-24 17:59:13 +05:30
Meghana
e05171118c Implemented TRSM for small matrices for cases where A is on the right
Added separate kernels for zen and zen2

Change-Id: I6318ddc250cf82516c1aa4732718a35eae0c9134
2019-05-23 16:17:19 +05:30
kdevraje
02920f5c48 make checkblis fails for matrix dimension check at the begining hence reverting it
Change-Id: Ibd2ee8c2d4914598b72003fbfc5845be9c9c1e87
2019-05-23 15:29:59 +05:30
kdevraje
84215022f2 Adding threshold condition to dgemm small matrix kernels, defining the constants in zen2 configuration
Change-Id: I53a58b5d734925a6fcb8d8bea5a02ddb8971fcd5
2019-05-23 14:33:47 +05:30
kdevraje
a3554eb1dc Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis to configure zen2
Change-Id: I97e17bca9716b80b862925f97bb513c07b4b0cae
2019-05-23 11:53:32 +05:30
kdevraje
ea082f8390 adding empty zen2 directory with .gitignore file
Change-Id: Ifa37cf54b2578aa19ad335372b44bca17043fe4b
2019-05-23 10:38:29 +05:30
Kiran Varaganti
b80bd5bcb2 config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2
config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H.
test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples

Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2
2019-05-22 05:51:22 -04:00
Kiran Varaganti
a042db011d Modified make_defs.mk for zen2 to get compiled by gcc version less than gcc9.0
Change-Id: I8fcac30538ee39534c296932639053b47b9a2d43
2019-05-22 05:51:10 -04:00
Kiran Varaganti
a23f92594c config_registry: New AMD zen2 architecture configuration added.
frame/base/bli_arch.c: #ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; #endif added. zen2 is added in config_name[BLIS_NUM_ARCHS]
  frame/base/bli_cpuid.c : #ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; #endif, defined new function bool bli_cpuid_is_zen2(...).
  frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..).
  frame/base/bli_gks.c : #ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); #endif
  frame/include/bli_arch_config.h : #ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) #endif #ifdef BLIS_FAMILY_ZEN2 #include "bli_family_zen2.h" #endif
  frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20

Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866
2019-05-22 05:28:16 -04:00
kdevraje
17b878b66d adding license same as in ut-austin-amd-branch
Change-Id: I6790768d2bf5d42369d304ef93e34701f95fbaff
2019-05-22 14:02:53 +05:30
kdevraje
df755848b8 Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis into rome2.0
Change-Id: Ie8aad1ab810f0f3c0b90ec67f9dd3dfb8dcc74cc
2019-05-22 13:30:07 +05:30
Nisanth M P
c72ae27ade Re-enabling the small matrix gemm optimization for target zen
Change-Id: I13872784586984634d728cd99a00f71c3f904395
2019-05-22 01:05:13 -04:00
sraut
ab0818af80 Review comments incorporated for small TRSM.
Change-Id: Ia64b7b2c0375cc501c2cb0be8a1af93111808cd9
2019-05-22 00:43:10 -04:00
Kiran Varaganti
ca4b33c001 Added compiler option (-mno-avx256-split-unaligned-store) in the file config/zen/make_defs.mk to improve performance of intrinsic codes, this flag ensures compiler generates 256-bit stores for the equivalent intrinsics code.
Change-Id: I8f8cd81a3604869df18d38bc42097a04f178d324
2019-04-24 15:02:39 +05:30
kdevraje
9d76688ad9 Fix for single rank crash with HPL application. When computing offset of C buffer, as integer variables are used for a row and column index, the intermediate result value overflows and a negative value gets added to the buffer, when the negative value is too large it would index the buffer out of the range resulting in segmentation fault. Although the crash is a result of dgemm kernel, added similar code in sgemm kernel also.
Change-Id: I171119b0ec0dfbd8e63f1fcd6609a94384aabd27
2019-04-11 10:23:26 +05:30
Kiran Varaganti
53842c7e7d Removed printing alpha and beta values
Change-Id: I49102db510311a30f6a936f9d843f35838f50d23
2019-03-22 13:57:14 +05:30
Kiran Varaganti
6805db45e3 Corrected setting alpha & beta values- alpha = -1 and beta = 1 - bli_setc(-1.0, 0, &alpha) should be used rather than bli_setc(0.0, -1.0, &alpha). This corrected now
Change-Id: Ic1102dfd6b50ccf212386a1211c6f31e8d987ef9
2019-03-22 12:55:35 +05:30
Kiran Varaganti
20153cd4b5 Modified test_gemm.c file in test folder
A Macro 'FILE_IN_OUT" is defined to read input parameters from a csv file.
Format for input file:
Each line defines a gemm problem with following parameters: m k n cs_a cs_b cs_c
The operation always implemented is C = C - A*B and column-major format.
When macro is disabled - it reverts back to original implementation.
Usage: ./test_gemm_<mkl/blis/openblas>.x input.csv output.csv
GEMM is called through BLAS interface
For BLIS - the test application also prints either 'S' indicating small gemm routine or 'N' - conventional BLIS gemm
for MKL/OpenBLAS - ignore this character

Change-Id: I0924ef2c1f7bdea48d4cdb230b888e2af2c86a36
2019-03-21 16:23:53 +05:30
Kiran Varaganti
3a929a3d0b Fixed code merging: bli_gemm_small.c - missed conditional checks for L!=0 && K!=0. Now they are added. This fix is done to pass blastest
Change-Id: Idc9c9a04d2015a68a19553c437ecaf8f1584026c
2019-03-18 10:51:41 +05:30
Kiran Varaganti
7fe4474838 Disabled BLIS_ENABLE_ZEN_BLOCK_SIZES in bli_family_zen.h for ROME tuning
Change-Id: Iec47fcf51f4d4396afef1ce3958e58cf02c59a57
2019-03-06 16:23:31 +05:30
Kiran Varaganti
f5ed95ecd7 Merged BLIS Release 1.3
Modified config/zen/make_defs.mk, now CKVECFLAGS     := -mavx2 -mfpmath=sse -mfma -march=znver1

Change-Id: Ia0942d285a21447cd0c470de1bc021fe63e80d81
2019-03-05 15:03:57 +05:30
praveeng
b06244d98c Merge branch 'ut-austin-amd' of ssh://git.amd.com:29418/cpulibraries/er/blis into ut-austin-amd 2019-02-21 12:56:15 +05:30
praveeng
e938ff08ce deleted test.txt
Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3
2019-02-21 12:49:16 +05:30
mkv
ed13ad465d added test file for initial commit 2019-02-21 12:49:16 +05:30
praveeng
4c7e668083 deleted test.txt
Change-Id: I3871f5fe76e548bc29ec2733745b29964e829dd3
2019-02-21 12:44:38 +05:30
mkv
95e070581c added test file for initial commit 2019-02-21 01:04:16 -05:00
Field G. Van Zee
6b83273126 Generalized ref kernels' pragma omp simd usage.
Details:
- Replaced direct usage of _Pragma( "omp simd" ) in reference kernels
  with PRAGMA_SIMD, which is defined as a function of the compiler being
  used in a new bli_pragma_macro_defs.h file. That definition is cleared
  when BLIS detects that the -fopenmp-simd command line option is
  unsupported. Thanks to Devin Matthews and Jeff Hammond for suggestions
  that guided this commit.
- Updated configure and bli_config.h.in so that the appropriate anchor
  is substituted in (when the corresponding pragma omp simd support is
  present).
2019-02-12 16:01:28 -06:00
Field G. Van Zee
b1f5ce8622 Minor updates to scripts in test/mixeddt/matlab. 2019-02-05 17:38:50 -06:00
Devangi N. Parikh
38203ecd15 Added thunderx2 system in the mixeddt test scripts
Details:
 - Added thunderx2 (tx2) as a system in the runme.sh in test/mixeddt
2019-02-04 15:28:28 -05:00
Devangi N. Parikh
dfc91843ea Fixed gcc flags for thunderx2 subconfiguration
Details:
- Fixed -march flag. Thunderx2 is an armv8.1a architecture not armv8a.
2019-02-04 15:23:40 -05:00
Field G. Van Zee
c665eb9b88 Minor updates to docs, Makefiles.
Details:
- Changed all occurrances of
    micro-kernel -> microkernel
    macro-kernel -> macrokernel
    micro-panel  -> micropanel
  in all markdown documents in 'docs' directory. This change is being
  made since we've reached the point in adoption and acceptance of
  BLIS's insights where words such as "microkernel" are no longer new,
  and therefore now merit being unhyphenated.
- Updated "Implementation Notes" sections of KernelsHowTo.md, which
  still contained references to nonexistent cpp macros such as
  BLIS_DEFAULT_MR_? and BLIS_PACKDIM_MR_?.
- Added 'run-fast' and 'check-fast' targets to testsuite/Makefile.
- Minor updates to Testsuite.md, including suggesting use of
  'make check' and 'make check-fast' when running from the local
  testsuite directory.
- Added a comment to top-level Makefile explaining the purpose behind
  the TESTSUITE_WRAPPER variable, which at first glance appears to serve
  no purpose.
2019-01-28 16:22:23 -06:00
M. Zhou
1aa280d052 Amend OS detection for kFreeBSD. (#295) 2019-01-27 15:40:48 -06:00
Field G. Van Zee
fffc23bb35 CREDITS file update. 2019-01-25 13:35:31 -06:00
Field G. Van Zee
26c5cf495c Fixed bug in skx subconfig related to bdd46f9.
Details:
- Fixed code in the skx subconfiguration that became a bug after
  committing bdd46f9. Specifically, the bli_cntx_init_skx() function
  was overwriting default blocksizes for the scomplex and dcomplex
  microkernels despite the fact that only single and double real
  microkernels were being registered. This was not a problem prior to
  bdd46f9 since all microkernels used dynamically-queried (at runtime)
  register blocksizes for loop bounds. However, post-bdd46f9, this
  became a bug because the reference ukernels for scomplex and dcomplex
  were written with their register blocksizes hard-coded as constant
  loop bounds, which conflicted the the erroneous scomplex and dcomplex
  values that bli_cntx_init_skx() was setting in the context. The
  lesson here is that going forward, all subconfigurations must not set
  any blocksizes for datatypes corresponding to default/reference
  microkernels. (Note that a blocksize is left unchanged by the
  bli_cntx_set_blkszs() function if it was set to -1.)
2019-01-24 18:49:31 -06:00
Field G. Van Zee
180f8e42e1 Fixed undefined behavior trsm ukr bug in bdd46f9.
Details:
- Fixed a bug that mainfested anytime a configuration was used in which
  optimized microkernels were registered and the trsm operation (or
  kernel) was invoked. The bug resulted from the optimized microkernels'
  register blocksizes conflicting with the hard-coded values--expressed
  in the form of constant loop bounds--used in the new reference trsm
  ukernels that were introduced in bdd46f9. The fix was easy: reverting
  back to the implementation that uses variable-bound loops, which
  amounted to changing an #if 0 to #if 1 (since I preserved the older
  implementation in the file alongside the new code based on constant-
  bound loops). It should be noted that this fix must be permanent,
  since the trsm kernel code with constant-bound loops can never work
  with gemm ukernels that use different register blocksizes.
2019-01-24 18:01:15 -06:00
Field G. Van Zee
bdd46f9ee8 Rewrote reference kernels to use #pragma omp simd.
Details:
- Rewrote level-1v, -1f, and -3 reference kernels in terms of simplified
  indexing annotated by the #pragma omp simd directive, which a compiler
  can use to vectorize certain constant-bounded loops. (The new kernels
  actually use _Pragma("omp simd") since the kernels are defined via
  templatizing macros.) Modest speedup was observed in most cases using
  gcc 5.4.0, which may improve with newer versions. Thanks to Devin
  Matthews for suggesting this via issue #286 and #259.
- Updated default blocksizes defined in ref_kernels/bli_cntx_ref.c to
  be 4x16, 4x8, 4x8, and 4x4 for single, double, scomplex and dcomplex,
  respectively, with a default row preference for the gemm ukernel. Also
  updated axpyf, dotxf, and dotxaxpyf fusing factors to 8, 6, and 4,
  respectively, for all datatypes.
- Modified configure to verify that -fopenmp-simd is a valid compiler
  option (via a new detect/omp_simd/omp_simd_detect.c file).
- Added a new header in which prefetch macros are defined according to
  which compiler is detected (via macros such as __GNUC__). These
  prefetch macros are not yet employed anywhere, though.
- Updated the year in copyrights of template license headers in
  build/templates and removed AMD as a default copyright holder.
2019-01-24 17:23:18 -06:00
Field G. Van Zee
63de2b0090 Prevent redef of ftnlen in blastest f2c_types.h.
Details:
- Guard typedef of ftnlen in f2c_types.h with a #ifndef HAVE_BLIS_H
  directive to prevent the redefinition of that type. Thanks to Jeff
  Diamond for reporting this compiler warning (and apologies for the
  delay in committing a fix).
2019-01-23 12:16:27 -06:00
Field G. Van Zee
eec2e183a7 Added escaping to '/' in os_name in configure.
Details:
- Add os_name to the list of variables into which the '/' character is
  escaped. This is meant to address (or at least make progress toward
  addressing) #293. Thanks to Isuru Fernando for spotting this as the
  potential fix, and also thanks to M. Zhou for the original report.
2019-01-21 12:12:18 -06:00
Field G. Van Zee
adf5c17f08 Formally registered thunderx2 subconfiguration.
Details:
- Added a separate subconfiguration for thunderx2, which now uses
  different optimization flags than cortexa57/cortexa53.
2019-01-18 15:14:45 -06:00
M. Zhou
094cfdf7df Port BLIS to GNU Hurd OS. (#294)
Prevent blis.h from misidentifying Hurd as OSX.
2019-01-18 12:46:13 -06:00
Field G. Van Zee
5d7d616e8e README.md update re: mixeddt TOMS paper. 2019-01-15 20:52:51 -06:00
Field G. Van Zee
58c7fb4788 Added more matlab scripts for mixeddt paper.
Details:
- Added a variant set of matlab scripts geared to producing plots that
  reflect performance data gathered with and without extra memory
  optimizations enabled. These scripts reside (for now) in
  test/mixeddt/matlab/wawoxmem.
2019-01-08 17:00:27 -06:00
Field G. Van Zee
34286eb914 Minor update to docs/HardwareSupport.md. 2019-01-08 11:41:20 -06:00
Field G. Van Zee
108b04dc5b Regenerated symbols in build/libblis-symbols.def.
Details:
- Reran ./build/regen-symbols.sh after running
  'configure --enable-cblas auto' to reflect removal of
  bli_malloc_pool() and bli_free_pool().
2019-01-07 20:16:31 -06:00
Field G. Van Zee
706cbd9d56 Minor tweaks/cleanups to bli_malloc.c, _apool.c.
Details:
- Removed malloc_ft and free_ft function pointer arguments from the
  interface to bli_apool_init() after deciding that there is no need to
  specify the malloc()/free() for blocks within the apool. (The apool
  blocks are actually just array_t structs.) Instead, we simply call
  bli_malloc_intl()/_free_intl() directly. This has the added benefit
  of allowing additional output when memory tracing is enabled via
  --enable-mem-tracing. Also made corresponding changes elsewhere in
  the apool API.
- Changed the inner pools (elements of the array_t within the apool_t)
  to use BLIS_MALLOC_POOL and BLIS_FREE_POOL instead of BLIS_MALLOC_INTL
  and BLIS_FREE_INTL.
- Disabled definitions of bli_malloc_pool() and bli_free_pool() since
  there are no longer any consumers of these functions.
- Very minor comment / printf() updates.
2019-01-07 18:28:19 -06:00
Minh Quan Ho
579145039d Initialize error messages at compile time (#289)
* Initialize error messages at compile time

- Assigning strings directly to the bli_error_string array, instead of
snprintf() at execution-time.

* Retired bli_error_init(), _finalize().

Details:
- Removed functions obviated by changes in 80e8dc6: bli_error_init(),
  bli_error_finalize(), and bli_error_init_msgs(), as well as calls to
  the former two in bli_init.c.

* Regenerated symbols in build/libblis-symbols.def.

Details:
- Reran ./build/regen-symbols.sh after running
  'configure --enable-cblas auto'.
2019-01-07 16:00:15 -06:00
Field G. Van Zee
aafbca086e Updated external package language in README.md.
Details:
- Updated/added comments about Fedora, OpenSUSE, and GNU Guix under the
  newly-renamed "External GNU/Linux packages" section. Thanks to Dave
  Love for providing these revisions.
2019-01-07 12:38:21 -06:00