Commit Graph

27 Commits

Author SHA1 Message Date
Devrajegowda, Kiran
1c76723320 Block parameters tuning to improve sgemm performance on Rome
Details:
    - Tuned block sizes to get better performance for sgemm default path.

Change-Id: I892e8642fa2d03a07a6d53537131536e6b1b091e
Signed-off-by: Kiran N D <kiran.Devrajegowda@amd.com>
AMD-Internal: [CPUPL-832]
2020-04-21 07:34:12 -04:00
Meghana
b846059bcf Added opt kernels for SWAPV
Details:
-Added SIMD kernels for SWAPV for both single and double precisions.
-Modified cntx_init file for zen and zen2 configurations to choose opt kernels for
 SWAPV.
-Added test_swapv.c in test folder.
-Modified test/Makefile to include test_swapv.c

Change-Id: Ida786eec722e634aee0dacdd51c327823c80f01a
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-847]
2020-04-20 01:21:44 -05:00
Vasanthakumar Rajagopal
489d501f2e Merge "Execution and Debug trace support." into amd-staging-rome-2.2 2020-04-15 06:30:50 -04:00
Meghana
e56cf63a3f Optimized "bli_dotv_zen_int10" kernels
Details:
- Fixed issues in "bli_dotv_zen_int10" kernels and optimized them.
- Changed cntx_init file to choose "bli_dotv_zen_int10" kernel for dotv
 API call.

Change-Id: Iee8d7519f3a22a2d41166390be6047e9cb37557f
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-824]
2020-04-14 09:52:57 +05:30
dzambare
d40edf7dac Execution and Debug trace support.
Added support add debug logging, execution trace and decode.

Change-Id: I024bf6165daa9e23a62423f2401c0f1c5de459ba
AMD-Internal: [CPUPL-806]
2020-04-07 08:48:59 +05:30
Field G. Van Zee
1a284828d1 Support multithreading within the sup framework.
Details:
- Added multithreading support to the sup framework (via either OpenMP
  or pthreads). Both variants 1n and 2m now have the appropriate
  threading infrastructure, including data partitioning logic, to
  parallelize computation. This support handles all four combinations
  of packing on matrices A and B (neither, A only, B only, or both).
  This implementation tries to be a little smarter when automatic
  threading is requested (e.g. via BLIS_NUM_THREADS) in that it will
  recalculate the factorization in units of micropanels (rather than
  using the raw dimensions) in bli_l3_sup_int.c, when the final
  problem shape is known and after threads have already been spawned.
- Implemented bli_?packm_sup_var2(), which packs to conventional row-
  or column-stored matrices. (This is used for the rrc and crc storage
  cases.) Previously, copym was used, but that would no longer suffice
  because it could not be parallelized.
- Minor reorganization of packing-related sup functions. Specifically,
  bli_packm_sup_init_mem_[ab]() are called from within packm_sup_[ab]()
  instead of from the variant functions. This has the effect of making
  the variant functions more readable.
- Added additional bli_thrinfo_set_*() static functions to bli_thrinfo.h
  and inserted usage of these functions within bli_thrinfo_init(), which
  previously was accessing thrinfo_t fields via the -> operator.
- Renamed bli_partition_2x2() to bli_thread_partition_2x2().
- Added an auto_factor field to the rntm_t struct in order to track
  whether automatic thread factorization was originally requested.
- Added new test drivers in test/supmt that perform multithreaded sup
  tests, as well as appropriate octave/matlab scripts to plot the
  resulting output files.
- Added additional language to docs/Multithreading.md to make it clear
  that specifying any BLIS_*_NT variable, even if it is set to 1, will
  be considered manual specification for the purposes of determining
  whether to auto-factorize via BLIS_NUM_THREADS.
- Minor comment updates.
AMD-Internal: [CPUPL-713]

Change-Id: I9536648e7befac4d2dc17805e44ef34470961662
2020-03-13 01:09:29 -04:00
Meghana Vankadari
cc98047fd6 Made framework changes to initialize specific cache block sizes for TRSM.
Details:
-This commit addresses the performance optimization(single-thread and
 multi-thread) for DTRSM on zen2.
-This new optimization employs different MC, KC & NC values for TRSM than
 what is being used in other Level-3 routines like DGEMM.
-Changed TRSM framework code to choose these blocksizes for TRSM
 on zen family configurations.
-Added a new field called "trsm_blkszs" to cntx structure in order to
 store TRSM specific block sizes.
-Implemented routines to initialize, set and query the TRSM-specific
 block sizes.
-Defined a new macro "AOCL_BLIS_ZEN" in configure script.
 This macro is automatically defined for zen family architectures.
 It enables us to choose different cache block sizes for TRSM instead of common level-3 block sizes.

Change-Id: Id8557b1c962a316b1edecca9cd582675eaf35fe6
Signed-off-by: Meghana Vankadari <meghana.vankadari@amd.com>
AMD-Internal: [CPUPL-656]
2020-03-09 10:33:42 +05:30
Meghana Vankadari
62e00b4d64 Merge "Change in threshold condition for trsm_small kernels" into amd-staging-rome-rel-2.1 2019-12-17 23:54:01 -05:00
Meghana Vankadari
8eb264f78b Change in threshold condition for trsm_small kernels
Change-Id: I396e246b1639d300fcb94bdf7e5fa8bc8c87e994
2019-12-16 18:54:48 +05:30
Devrajegowda, Kiran
1fe8edbed0 "Merge Selective Packing code from amd branch flame/blis"
Change-Id: Ifbdf49735f56a66fbbc96dab6d3ca6069302daed
2019-12-16 14:48:53 +05:30
Nallani Bhaskar
af94ba29cf Added sup support for sgemm under zen and related frame work changes.
Change-Id: Ia7e88b96d3a3617e8d24754f50db081ffe2e9955
2019-12-04 10:56:10 +05:30
Devrajegowda, Kiran
85fa9e4107 resolved merge conflicts when merged with public repo master branch
Change-Id: Iad6ba809680ba5081cc9d7879794ef58cc8f8a40
2019-11-25 14:46:48 +05:30
Meghana
5560f75c0c Modified makefiles for zen and zen2 to pick up compiler flags based on architecture and compiler versions
Change-Id: I443e47c38e0ffd12f4b303f546abd46d02aa31ca
2019-11-21 09:53:44 +05:30
Devrajegowda, Kiran
b5475f527d Adding context initialisation for SUP kernels in zen2 architecture
Change-Id: I9de533abb039d0dff348728be51554cc53679d10
2019-11-12 13:59:26 +05:30
Devrajegowda, Kiran
4158e7fffe missed changes while rebasing field's SUP code
Change-Id: I560b93c42901ca2bbd4c22e833f55ba884a38a50
2019-10-23 10:33:43 +05:30
Field G. Van Zee
0016d541e6 Changed -march=znver2 to =znver1 for clang on zen2.
Details:
- In config/zen2/make_defs.mk, changed the -march= flag so that
  -march=znver1 is used instead of -march=znver2 when CC_VENDOR is
  clang. (The gcc branch attempts to differentiate between various
  versions, but the equivalent version cutoffs for clang are not
  yet known by us, so we have to use a single flag for all versions
  of clang. Hopefully -march=znver1 is new enough. If not, we'll
  fall back to -march=bdver4 -mno-fma4 -mno-tbm -mno-xop -mno-lwp.)
  This issue was discovered thanks to AppVeyor.
2019-10-11 11:09:44 -05:00
Field G. Van Zee
29b0e1ef4e Code review + tweaks to AMD's AOCL 2.0 PR (#349).
Details:
- NOTE: This is a merge commit of 'master' of git://github.com/amd/blis
  into 'amd-master' of flame/blis.
- Fixed a bug in the downstream value of BLIS_NUM_ARCHS, which was
  inadvertantly not incremented when the Zen2 subconfiguration was
  added.
- In bli_gemm_front(), added a missing conditional constraint around the
  call to bli_gemm_small() that ensures that the computation precision
  of C matches the storage precision of C.
- In bli_syrk_front(), reorganized and relocated the notrans/trans logic
  that existed around the call to bli_syrk_small() into bli_syrk_small()
  to minimize the calling code footprint and also to bring that code
  into stylistic harmony with similar code in bli_gemm_front() and
  bli_trsm_front(). Also, replaced direct accessing of obj_t fields with
  proper accessor static functions (e.g. 'a->dim[0]' becomes
  'bli_obj_length( a )').
- Added #ifdef BLIS_ENABLE_SMALL_MATRIX guard around prototypes for
  bli_gemm_small(), bli_syrk_small(), and bli_trsm_small(). This is
  strictly speaking unnecessary, but it serves as a useful visual cue to
  those who may be reading the files.
- Removed cpp macro-protected small matrix debugging code from
  bli_trsm_front.c.
- Added a GCC_OT_9_1_0 variable to build/config.mk.in to facilitate gcc
  version check for availability of -march=znver2, and added appropriate
  support to configure script.
- Cleanups to compiler flags common to recent AMD microarchitectures in
  config/zen/amd_config.mk, including: removal of -march=znver1 et al.
  from CKVECFLAGS (since the -march flag is added within make_defs.mk);
  setting CRVECFLAGS similarly to CKVECFLAGS.
- Cleanups to config/zen/bli_cntx_init_zen.c.
- Cleanups, added comments to config/zen/make_defs.mk.
- Cleanups to config/zen2/make_defs.mk, including making use of newly-
  added GCC_OT_9_1_0 and existing GCC_OT_6_1_0 to choose the correct
  set of compiler flags based on the version of gcc being used.
- Reverted downstream changes to test/test_gemm.c.
- Various whitespace/comment changes.
2019-10-11 10:24:24 -05:00
Kiran Varaganti
d605a19e42 config_registry: New AMD zen2 architecture configuration added.
frame/base/bli_arch.c: #ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; #endif added. zen2 is added in config_name[BLIS_NUM_ARCHS]
  frame/base/bli_cpuid.c : #ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; #endif, defined new function bool bli_cpuid_is_zen2(...).
  frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..).
  frame/base/bli_gks.c : #ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); #endif
  frame/include/bli_arch_config.h : #ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) #endif #ifdef BLIS_FAMILY_ZEN2 #include "bli_family_zen2.h" #endif
  frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20

Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866
2019-08-23 14:18:55 +05:30
Meghana
fdce1a5648 changed gcc version check condition from 'ifeq' to 'if greater or equal'
Change-Id: Ie4c461867829bcc113210791bbefb9517e52c226
2019-07-24 15:04:41 +05:30
Meghana
c9486e0c4f code to detect version of gcc and set flags accordingly for zen2
Change-Id: I29b0311d0000dee1a2533ee29941acf53f9e9f34
2019-07-24 09:45:17 +05:30
Meghana
dcc0ce12fd Added a global Makefile for AMD architectures in config/zen folder
This Makefile(amd_config.mk) has all the flags that are common to EPYC series

Change-Id: Ic02c60a8293ccdd37f0f292e631acd198e6895de
2019-07-22 17:12:01 +05:30
kdevraje
3f867c96ca When running HPL with pure MPI without DGEMM Threading (Single Threaded BLIS ), making this macro 1 gives best performance.wq
Change-Id: I24fd0bf99216f315e49f1c74c44c3feaffd7078d
2019-05-31 14:31:49 +05:30
Meghana
ee123f5358 Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME
Updated copyright information for kernels/zen/bli_trsm_small.c file
Removed separate kernels for zen2 architecture
Instead added threshold conditions in zen kernels both for ROME and NAPLES

Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5
2019-05-27 15:36:44 +05:30
kdevraje
84215022f2 Adding threshold condition to dgemm small matrix kernels, defining the constants in zen2 configuration
Change-Id: I53a58b5d734925a6fcb8d8bea5a02ddb8971fcd5
2019-05-23 14:33:47 +05:30
Kiran Varaganti
b80bd5bcb2 config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2
config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H.
test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples

Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2
2019-05-22 05:51:22 -04:00
Kiran Varaganti
a042db011d Modified make_defs.mk for zen2 to get compiled by gcc version less than gcc9.0
Change-Id: I8fcac30538ee39534c296932639053b47b9a2d43
2019-05-22 05:51:10 -04:00
Kiran Varaganti
a23f92594c config_registry: New AMD zen2 architecture configuration added.
frame/base/bli_arch.c: #ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; #endif added. zen2 is added in config_name[BLIS_NUM_ARCHS]
  frame/base/bli_cpuid.c : #ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; #endif, defined new function bool bli_cpuid_is_zen2(...).
  frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..).
  frame/base/bli_gks.c : #ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); #endif
  frame/include/bli_arch_config.h : #ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) #endif #ifdef BLIS_FAMILY_ZEN2 #include "bli_family_zen2.h" #endif
  frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20

Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866
2019-05-22 05:28:16 -04:00