Commit Graph

1776 Commits

Author SHA1 Message Date
prangana
33648bbf31 CPP Test comparison util function fix
Change-Id: I6a9769efcef5f313eb318921275d37353df2b127
2019-11-21 15:57:41 +05:30
prangana
ba86a38143 Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-blis-cpp
Change-Id: I49bc3fa15e41fc287e1ca26c357edf144044943f
2019-11-21 10:04:24 +05:30
Meghana
5560f75c0c Modified makefiles for zen and zen2 to pick up compiler flags based on architecture and compiler versions
Change-Id: I443e47c38e0ffd12f4b303f546abd46d02aa31ca
2019-11-21 09:53:44 +05:30
prangana
3d20128aea Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-blis-cpp
Change-Id: I97a10ab7546d475474b0ff733bafb8248843c352
2019-11-21 00:54:16 +05:30
prangana
d63f9b7d7f checkcpp test rule in Makefile
Change-Id: If01fe55e258e563a96cd8da9ea93d21063b730c2
2019-11-21 00:52:47 +05:30
prangana
49c27040d1 Instll CPP Template headers
Change-Id: Ib15dc9bda08d1f3fdc68e31520daee90a287357c
2019-11-20 21:52:22 +05:30
prangana
5f04fdd618 CPP Templatee test files update
Change-Id: Ia9637556b50b10cb4409e18f369a3e7fc35569fb
2019-11-20 21:32:37 +05:30
Devrajegowda, Kiran
b5475f527d Adding context initialisation for SUP kernels in zen2 architecture
Change-Id: I9de533abb039d0dff348728be51554cc53679d10
2019-11-12 13:59:26 +05:30
prangana
d21c726003 update version 2.1
Change-Id: I531fe8005f63ad138077320c3f0b03a05a7c7dd2
2019-10-30 15:33:37 +05:30
Kiran Varaganti
c3d4464f03 Removed extra 'endif' statement causing build failures for zen configuration - Fixed now
Change-Id: Ia7f164209124ffae5c70e1ff7c3d131cd44b9294
2019-10-24 14:56:04 +05:30
Kiran Varaganti
97a4236c82 Matrices are not initialized when inputs dimensions are fed through file, now these are fixed. test_gemm.c contains matrices initialized for file-based inputs as well.
Change-Id: I4c3625a51dcbf64c99f56f354dcd898e66035cb1
2019-10-24 13:57:55 +05:30
Devrajegowda, Kiran
4158e7fffe missed changes while rebasing field's SUP code
Change-Id: I560b93c42901ca2bbd4c22e833f55ba884a38a50
2019-10-23 10:33:43 +05:30
Kiran Devrajegowda
b2479b1a6d Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging-rome2.1
Change-Id: I340e417fde52385deb3ee231e2c219214d4e278d
2019-10-07 11:19:32 +05:30
Chithra Sankar
574bdaeb48 Modified cblas.hh not to include cblas.h ,as this file gets generated after make install BLIS 2019-10-03 17:04:57 +05:30
Chithra Sankar
a000c617f9 test/Makefile reverted to correct version to retain copyright information 2019-10-03 14:46:52 +05:30
Chithra Sankar
95d6e2b1f1 test folder files reverted to previous commit 2019-10-03 14:23:16 +05:30
Chithra Sankar
9777b8e901 Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-blis-cpp 2019-10-01 15:48:43 +05:30
Chithra Sankar
851589c488 Return typename corrected in dot function 2019-09-30 16:42:41 +05:30
Chithra Sankar
be25ec0065 CPP Implementtaion of dsdot included. Test application refactored to include review comments
Change-Id: Iec0b973c23a2825e61f2ec9da236b3aea327d98a
2019-09-20 11:52:55 +05:30
Kiran Varaganti
ea25ba255a Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code clean
Change-Id: I6827b58d2dab1041fe182fef5a007b679ac4bb1f
2019-09-19 00:13:35 +05:30
Chithra Sankar
ce0b1caa7f Added Doxygen Comment to all functions; Fixed Review comments; Modified test application to use template functions
Change-Id: I920c335776bc4597af1c988b538e8dda706195fa
2019-09-05 12:17:45 +05:30
Chithra Sankar
c195d9a576 CPP template wrapper implementation done for all BLASroutines
Change-Id: I1692988b81971379f2ba2e8f7bd126c8ccccb926
2019-08-28 12:11:53 +05:30
kdevraje
c4368c66ed This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019
Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041
2019-08-23 14:18:55 +05:30
Meghana
ec907c3f4b Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME
Updated copyright information for kernels/zen/bli_trsm_small.c file
Removed separate kernels for zen2 architecture
Instead added threshold conditions in zen kernels both for ROME and NAPLES

Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5
2019-08-23 14:18:55 +05:30
Meghana
3f88a44779 Implemented TRSM for small matrices for cases where A is on the right
Added separate kernels for zen and zen2

Change-Id: I6318ddc250cf82516c1aa4732718a35eae0c9134
2019-08-23 14:18:55 +05:30
kdevraje
2e9b5c36d2 make checkblis fails for matrix dimension check at the begining hence reverting it
Change-Id: Ibd2ee8c2d4914598b72003fbfc5845be9c9c1e87
2019-08-23 14:18:55 +05:30
kdevraje
874aee6d84 Adding threshold condition to dgemm small matrix kernels, defining the constants in zen2 configuration
Change-Id: I53a58b5d734925a6fcb8d8bea5a02ddb8971fcd5
2019-08-23 14:18:55 +05:30
Kiran Varaganti
b5eb348734 config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2
config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H.
test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples

Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2
2019-08-23 14:18:55 +05:30
Kiran Varaganti
d605a19e42 config_registry: New AMD zen2 architecture configuration added.
frame/base/bli_arch.c: #ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; #endif added. zen2 is added in config_name[BLIS_NUM_ARCHS]
  frame/base/bli_cpuid.c : #ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; #endif, defined new function bool bli_cpuid_is_zen2(...).
  frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..).
  frame/base/bli_gks.c : #ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); #endif
  frame/include/bli_arch_config.h : #ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) #endif #ifdef BLIS_FAMILY_ZEN2 #include "bli_family_zen2.h" #endif
  frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20

Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866
2019-08-23 14:18:55 +05:30
Kiran Varaganti
34c2c22ae8 Disabled BLIS_ENABLE_ZEN_BLOCK_SIZES in bli_family_zen.h for ROME tuning
Change-Id: Iec47fcf51f4d4396afef1ce3958e58cf02c59a57
2019-08-23 14:18:09 +05:30
Kiran Varaganti
016acd387c Merged BLIS Release 1.3
Modified config/zen/make_defs.mk, now CKVECFLAGS     := -mavx2 -mfpmath=sse -mfma -march=znver1

Change-Id: Ia0942d285a21447cd0c470de1bc021fe63e80d81
2019-08-23 14:18:09 +05:30
sraut
d6bb56d088 Fixed BLAS test failures of small matrix SYRK for single and double precision.
Details:
- SYRK for small matrix was implemented by reusing small GEMM routine. This was
  resulting in output written to the full C matrix, and C being symmetric the
  lower and upper triangles of C matrix contained same results. BLAS SYRK API
  spec demands either lower or upper triangle of C matrix to be written with
  results. So, this was resulting in BLAS test failures, even though testsuite
  of BLIS was passing small SYRK operation.
- To fix BLAS test failures of small matrix SYRK, separate kernel routines are
  implemented for small SYRK for both single and double precision. The newly
  added small SYRK routines are in file kernels/zen/3/bli_syrk_small.c.
  Now the intermediate results of matrix C are written to a scratch buffer.
  Final results are written from scratch buffer to matrix C using SIMD
  copy to either lower or upper traingle part of matrix C.
- Source and header files frame/3/syrk/bli_syrk_front.c and
  frame/3/syrk/bli_syrk_front.h are changed to invoke new small SYRK routines.

Change-Id: I9cfb1116c93d150aefac673fca033952ecac97cb
2019-08-23 14:18:09 +05:30
sraut
2752b51c37 Fix on EPYC machine for multi instance performance issue,
Issue: For the default values of mc, kc and nc with multi instance mode the performance across the cores dip drastically.
Fix: After experimentation found different set of values (mc, kc and nc) which fits in the cache size, and performance across the remains same across all the cores.

Change-Id: I98265e3b7e61cd7602a0cc5596240e86c08c03fe
2019-08-23 14:18:09 +05:30
pradeeptrgit
1720efe630 Update version number to 1.2
Change-Id: Ibb31f6683cdecca6b218bc2f0c14701d7e92ebf3
2019-08-23 14:18:09 +05:30
Kiran V
d805fdf169 This is a fix to floating-point exception error for BLIS SGEMM with larger matrix sizes.
BUG No: CPUPL-197 fixed by Thangaraj Santanu
The bli_clock_min_diff() function in BLIS assumed that if the time taken is greater than 1 hour then the reading must be wrong. However this is not the case in general, while the other checks such as time taken closer to zero or nsec is ofcourse valid.
gerrit review: http://git.amd.com:8080/#/c/118694/1/frame/base/bli_clock.c

Change-Id: I9dc313d7c5fdc20684f67a516bf3237de3e0694a
2019-08-23 14:18:09 +05:30
sraut
73ddc58df0 Small TRSM optimization changes :- 1) single precision small trsm kernels for XAt=B case are further optimized for performance. 2) double precision small trsm kernels for AX=B and XAtB cases are implemented. 3) single precision small trsm kernels for AutX=B are implemented in intrinsics to improve the current performance.
Change-Id: Ic9d67ae6d8522615257dde018903f049dcffa2cf
2019-08-23 14:18:09 +05:30
sraut
bc9dbce512 AMD Copyright information changed to 2018
Change-Id: Idfd11afd5d252f8063d0158680d24bf7e2854469
2019-08-23 14:18:09 +05:30
sraut
d56ca14589 small matrix trsm intrinsics optimization code for AX=B and XA'=B
Change-Id: I90123c4d9adbd314c867995cd19dc975150b448c
2019-08-23 14:18:09 +05:30
Nisanth M P
ca6f5b762d Re-enabling the small matrix gemm optimization for target zen
Change-Id: I13872784586984634d728cd99a00f71c3f904395
2019-08-23 14:18:09 +05:30
Nisanth M P
d9c0b8b4aa Re-enabling Zen optimized cache block sizes for config target zen
Change-Id: I8191421b876755b31590323c66156d4a814575f1
2019-08-23 14:18:09 +05:30
Field G. Van Zee
f85d3363ec CHANGELOG update (0.3.0)
Change-Id: Id038b00a62de51c9818ad249651ec5dc662f4415
2019-08-23 14:18:09 +05:30
Field G. Van Zee
9034c885fc Added "Education and Learning" ToC entry to README. 2019-08-23 14:18:09 +05:30
Field G. Van Zee
99c7d15f1e Added "Education and Learning" section to README.
Details:
- Added a short section after the Intro of the README.md file titled
  "Education and Learning" that directs interested readers to the
  "LAFF-On Programming for High-Performance" massive open online course
  (MOOC) hosted via edX.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
06c5a5c4a9 Added test/1m4m driver directory.
Details:
- Added a new standalone test driver directory named '1m4m' that can
  build and run performance experiments for BLIS 1m, 4m1a, assembly,
  OpenBLAS, and the vendor library (MKL). This new driver directory
  was used to regenerate performance results for the 1m paper.
- Added alternate (commented-out) cache blocksizes to
  config/haswell/bli_cntx_init_haswell.c. These blocksizes tend to
  work well on an a 12-core Intel Xeon E5-2650 v3.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
d44c42dce7 Updated haswell MC cache blocksizes.
Details:
- Updated the default MC cache blocksizes used by the haswell subconfig
  for both row-preferential (the default) and column-preferential
  microkernels.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
ba4a77177c Updated -march flags for sandybridge, haswell.
Details:
- Updated the '-march=corei7-avx' flag in the sandybridge subconfig
  to '-march=sandybridge' and the '-march=core-avx2' flag in the
  haswell subconfig to '-march=haswell'. The older flags were used
  by older versions of gcc and should have been updated to the newer
  forms a long time ago. (The older flags were clearly working, even
  though they are no longer documented in the gcc man page.)
2019-08-23 14:18:09 +05:30
Field G. Van Zee
0e3f0ce634 More updates to comments in testsuite modules.
Details:
- Updated most comments in testsuite modules that describe how the
  correctness test is performed so that it is clear whether the vector
  (normfv) or matrix (normfm) form of Frobenius norm is used.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
b3974dafac New cntx_t blksz "set" functions + misc tweaks.
Details:
- Defined two new static functions in bli_cntx.h:
    bli_cntx_set_blksz_def_dt()
    bli_cntx_set_blksz_max_dt()
  which developers may find convenient when experimenting with different
  values of cache blocksizes.
- Updated one- and two-socket multithreaded problem size range and
  increment values in test/3/Makefile.
- Changed default to column storage in test/3/test_gemm.c.
- Fixed typo in comment in testsuite/src/test_subm.c.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
7366bf25aa Fixed thrinfo_t printing bug for small problems.
Details:
- Fixed a bug in bli_l3_thrinfo_print_gemm_paths() and
  bli_l3_thrinfo_print_trsm_paths(), defined in bli_l3_thrinfo.c,
  whereby subnodes of the thrinfo_t tree are "dereferenced" near the
  beginning of the functions, which may lead to segfaults in certain
  situations where the thread tree was not fully formed because the
  matrix problem was too small for the level of parallelism specified.
  (That is, too small because some problems were assigned no work due
  to the smallest units in the m and n dimensions being defined by the
  register blocksizes mr and nr.) The fix requires several nested levels
  of if statements, and this is one of those few instances where use of
  goto statements results in (mostly) prettier code, especially in the
  case of _gemm_paths(). And while it wasn't necessary, I ported this
  goto usage to the loop body that prints the thrinfo_t work_id and
  comm_id values for each thread. Thanks to Nicholai Tukanov for helping
  to find this bug.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
66c43ca427 Updated BLASFEO results in PerformanceSmall.md.
Details:
- Updated the BLASFEO performance graphs shown in PerformanceSmall.md
  using a new commit of BLASFEO (2c9f312); updated PerformanceSmall.md
  accordingly.
- Updated test/sup/octave/plot_l3sup_perf.m so that the .m files
  containing the mpnpkp results do not need to be preprocessed in order
  to plot half the problem size range (ie: up to 400 instead of the
  800 range of the other shape cases).
- Trivial updates to runme.m.
2019-08-23 14:18:09 +05:30