Commit Graph

195 Commits

Author SHA1 Message Date
Edward Smyth
c6f3340125 Merge commit '5013a6cb' into amd-main
* commit '5013a6cb':
  More edits and fixes to docs/FAQ.md.
  Fixed newly broken link to CREDITS in FAQ.md.
  More minor fixes to FAQ.md and Sandboxes.md.
  Updates to FAQ.md, Sandboxes.md, and README.md.
  Safelist 'master', 'dev', 'amd' branches.
  Re-enable and fix fb93d24.
  Reverted fb93d24.
  Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
  Removed last vestige of #define BLIS_NUM_ARCHS.
  Added new packm var3 to 'gemmlike'.
  Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
  Fix more copy-paste errors in the haswell gemmsup code.
  Do a fast test on OSX. [ci skip]
  Fix AArch64 tests and consolidate some other tests.
  Use C++ cross-compiler for ARM tests.
  Attempt to fix cxx-test for OOT builds.
  Updated travis-ci.org link in README.md to .com.
  Disabled (at least temporarily) commit 8e0c425.
  Define BLIS_OS_NONE when using --disable-system.
  Updated stale calls to malloc_intl() in gemmlike.
  Blacklist clang10/gcc9 and older for 'armsve'.
  Add test to Travis using C++ compiler to make sure blis.h is C++-compatible.
  Moved lang defs from _macro_def.h to _lang_defs.h.
  Minor tweaks to gemmlike sandbox.
  Added local _check() code to gemmlike sandbox.
  README.md citation updates (e.g. BLIS7 bibtex).
  Tweaks to gemmlike to facilitate 3rd party mods.
  Whitespace tweaks.
  Add row- and column-strides for A/B in obj_ukr_fn_t.
  Clean up some warnings that show up on clang/OSX.
  Remove schema field on obj_t (redundant) and add new API functions.
  Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
  Disabled sanity check in bli_pool_finalize().
  Implement proposed new function pointer fields for obj_t.

AMD-Internal: [CPUPL-2698]
Change-Id: I6fc33351fa824580cf4f25b63f0370383cd9422d
2023-11-10 13:05:12 -05:00
Edward Smyth
f5505be9f3 Merge commit 'e366665c' into amd-main
* commit 'e366665c':
  Fixed stale API calls to membrk API in gemmlike.
  Fixed bli_init.c compile-time error on OSX clang.
  Fixed configure breakage on OSX clang.
  Fixed one-time use property of bli_init() (#525).
  CREDITS file update.
  Added Graviton2 Neoverse N1 performance results.
  Remove unnecesary windows/zen2 directory.
  Add vzeroupper to Haswell microkernels. (#524)
  Fix Win64 AVX512 bug.
  Add comment about make checkblas on Windows
  CREDITS file update.
  Test installation in Travis CI
  Add symlink to blis.pc.in for out-of-tree builds
  Revert "Always run `make check`."
  Always run `make check`.
  Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script.   if the string contains zen and zen2, and zen need to be replaced with another string, then zen2   also be incorrectly replaced.
  Update POWER10.md
  Rework POWER10 sandbox
  Skip clearing temp microtile in gemmlike sandbox.
  Fix asm warning
  Sandbox header edits trigger full library rebuild.
  Add vhsubpd/vhsubpd.
  Fixed bugs in cpackm kernels, gemmlike code.
  Armv8A Rename Regs for Safe Darwin Compile
  Armv8A Rename Regs for Clang Compile: FP32 Part
  Armv8A Rename Regs for Clang Compile: FP64 Part
  Asm Flag Mingling for Darwin_Aarch64
  Added a new 'gemmlike' sandbox.
  Updated Fugaku (a64fx) performance results.
  Add explicit compiler check for Windows.
  Remove `rm-dupls` function in common.mk.
  Travis CI Revert Unnecessary Extras from 91d3636
  Adjust TravisCI
  Travis Support Arm SVE
  Added 512b SVE-based a64fx subconfig + SVE kernels.
  Replace bli_dlamch with something less archaic (#498)
  Allow clang for ThunderX2 config

AMD-Internal: [CPUPL-2698]
Change-Id: I561ca3959b7049a00cc128dee3617be51ae11bc4
2023-10-18 09:09:54 -04:00
Edward Smyth
85f2bf6c4a Fix for x86_64 builds
Configuration x86_64 includes all Intel and AMD sub-configurations.
Fixes to enable this to work correctly again are:
- In config_registry use amdzen rather than amd64 in x86_64 family.
- Copy settings from config/amdzen/bli_family_amdzen.h to
  config/x86_64/bli_family_x86_64.h
- Modify configure to set enable_aocl_zen=yes for x86_64, but not
  for amd64_legacy.
- Add "if defined(BLIS_FAMILY_X86_64)" to frame/3/bli_l3_sup.c and
  frame/3/bli_l3_sup_int_amd.c so zen-specific code paths are
  enabled.

Note: sub-configurations knl and bulldozer use instructions that are
not supported on most x86_64 processors.

AMD-Internal: [CPUPL-3838]
Change-Id: I0bd8fd89ccd846f80e5491ef44ade7d409970b04
2023-10-09 07:24:21 -04:00
Edward Smyth
b531022bac BLIS cpuid: distinguish submodels within a microarchitecture
Incorporate a means of detecting submodels of a microarchitecture,
so that different optimizations e.g. block sizes or kernel choices
can be used. The details are as follows:
- Different models are currently only enabled for zen3 and zen4
  architectures (for server parts).
- There is a single enumeration (model_t) for all models for all
  architectures, but function bli_check_valid_model_id() should
  check the provided model_id against the suitable range within
  the enumeration for the provided arch_id.
- To enable the model_id to be used within the cntx setup functions,
  checking of a user specified value of BLIS_ARCH_TYPE against
  the enabled configurations is delayed to a separate function,
  bli_arch_check_id().
- Default selection based on hardware can be overridden using the
  BLIS_MODEL_TYPE environment variable. Valid values are:
    Genoa, Bergamo, Genoa-X, Milan, Milan-X
  Values are case-insensitive and -X can also be specified as _X or X
- Specifying an incorrect value for BLIS_MODEL_TYPE is not an error,
  but will result in the default option for that architecture being
  selected. This is different to specifying an incorrect value of
  BLIS_ARCH_TYPE, which is an error.
- The environment variable BLIS_MODEL_TYPE can be renamed using
  the --rename-blis-model-type argument to configure (or cmake
  equivalent), in a similar way to renaming BLIS_ARCH_TYPE with
  --rename-blis-arch-type.
- Configure option --disable-blis-arch-type will disable both
  BLIS_ARCH_TYPE and BLIS_MODEL_TYPE environment variables.
- Added code in bli_cpuid.c to detect L1, L2 and L3 cache sizes,
  currently only for AMD cpus. Functions are provided to query
  these from other parts of the code, namely:
    uint32_t bli_cpuid_query_{l1d,l1i,l2,l3}_cache_size()

AMD-Internal: [CPUPL-3033]
Change-Id: I37a3741abfd59a95e0e905d926c6ede9a0143702
2023-04-20 10:47:44 -04:00
Aayush Kumar
5bd2a777ba Fixed Compilation Fails when configured with --disable-blas
- Moved *_blis_impl function declaration outside the BLIS_ENABLE_BLAS
  guard.
- Changed Makefile to continue to compile bla_ files to get
  *_blis_impl interfaces.
- Modify CBLAS headers, bli_macro_defs.h and bli_util_api_wrap.{c,h}
  to add BLIS_ENABLE_CBLAS guards.
- Comment out BLIS_ENABLE_BLAS guards in various headers and utility
  functions.
- Define BLIS Fortran-style functions lsame_blis_impl and
  xerbla_blis_impl. New macros PASTE_LSAME and PASTE_XERBLA are
  used in bla_*_check headers and some other places to select
  whether to call lsame and xerbla, or the _blis_impl versions.
- Defined various other missing _blis_impl functions.
- In bli_util_api_wrap.c, only define any functions if
  BLIS_ENABLE_BLAS is defined, and only define the subroutine
  versions of functions like dot, nrm2, etc if BLIS_ENABLE_CBLAS
  is defined.
- BLAS layer is needed if CBLAS layer is enabled. Changed header
  files build/bli_config.h.in and bli_blas.h, and configure
  program to help ensure consistency in generated blis.h header
  and configure output.

Undefining BLIS_ENABLE_BLAS_DEFS appears to be broken in UTA BLIS
too, thus BLIS_ENABLE_BLAS_DEFS is currently permanently defined.

AMD-Internal: [CPUPL-3015]

Change-Id: I7c0fe07db85781db46f2c690e174451860b37635
2023-03-23 06:11:52 -04:00
Edward Smyth
82c2eb4e8e Code cleanup and warnings fixes
Corrections for some occurances of:
- Compiler warnings about initialization of float from double
- Spelling mistakes in comments
- Incorrect indentation of code and comments

AMD-Internal: [CPUPL-2870]
Change-Id: Icb68c789687bd0684844331d43071bfffecac9fc
2023-01-09 04:34:52 -05:00
Edward Smyth
6861fcae91 BLIS: Improve architecture selection at runtime
Make BLIS_ARCH_TYPE=0 be an error, so that incorrect meaningful names
will get an error rather than "skx" code path. BLIS_ARCH_TYPE=1 is
now "generic", so that it should be constant as new code paths are
added. Thus all other code path enum values have increased by 2.

Also added new options to BLIS configure program to allow:
1. BLIS_ARCH_TYPE functionality to be disabled, e.g.:

./configure --disable-blis-arch-type amdzen

2. Renaming the environment variable tested from "BLIS_ARCH_TYPE" to a
   specified value, e.g.:

./configure --rename-blis-arch-type=MY_NAME_FOR_ARCH_TYPE amdzen

On Windows, these can be enabled with e.g.:

cmake ... -DDISABLE_BLIS_ARCH_TYPE=ON

or

cmake ... -DRENAME_BLIS_ARCH_TYPE=MY_NAME_FOR_ARCH_TYPE

This implements changes 2 and 3 in the Jira ticket below.

AMD-Internal: [CPUPL-2235]
Change-Id: Ie42906bd909f9d83f00a90c5bef9c5bf3ef5adb4
2022-08-19 10:59:35 -04:00
Dipal M. Zambare
c85bbfdb50 Updated BLIS version string format
- Updated version string to match the recommended format
  “AOCL-BLIS 3.2.1 Build 20220727”.
- Fixed issues with include paths which was preventing compile time
  version sting definition passing via build commands.
- Removed version string determination based on git tag
  using ‘git describe’, version string will always be taken from
  the version file.

AMD-Internal: [CPUPL-2324]
Change-Id: Idc7edf1211f66d348ec3b5b43f2507c2b810f088
2022-08-12 05:53:35 +00:00
Field G. Van Zee
7a0ba4194f Added support for addons.
Details:
- Implemented a new feature called addons, which are similar to
  sandboxes except that there is no requirement to define gemm or any
  other particular operation.
- Updated configure to accept --enable-addon=<name> or -a <name> syntax
  for requesting an addon be included within a BLIS build. configure now
  outputs the list of enabled addons into config.mk. It also outputs the
  corresponding #include directives for the addons' headers to a new
  companion to the bli_config.h header file named bli_addon.h. Because
  addons may wish to make use of existing BLIS types within their own
  definitions, the addons' headers must be included sometime after that
  of bli_config.h (which currently is #included before bli_type_defs.h).
  This is why the #include directives needed to go into a new top-level
  header file rather than the existing bli_config.h file.
- Added a markdown document, docs/Addons.md, to explain addons, how to
  build with them, and what assumptions their authors should keep in
  mind as they create them.
- Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
  as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
  functions, including the user-level object and typed APIs.
- Updated .gitignore so that git ignores bli_addon.h files.

Change-Id: Ie7efdea366481ce25075cb2459bdbcfd52309717
2022-03-31 12:03:27 +05:30
Dipal M Zambare
f63f78d783 Removed Arch specific code from BLIS framework.
- Removed BLIS_CONFIG_EPYC macro
- The code dependent on this macro is handled in
  one of the three ways

  -- It is updated to work across platforms.
  -- Added in architecture/feature specific runtime checks.
  -- Duplicated in AMD specific files. Build system is updated to
      pick AMD specific files when library is built for any of the
     zen architecture

AMD-Internal: [CPUPL-1960]
Change-Id: I6f9f8018e41fa48eb43ae4245c9c2c361857f43b
2022-01-18 11:51:08 +05:30
Dipal M Zambare
5d287fdba0 Include LP64/ILP64 in BLIS binary name
Binary name will be chosen based on multi-threading and BLAS
  integer size configuration as given below.

  libblis-[mt]-lp64 - when configured to use 32 bit integers
  libblis-[mt]-ilp64 - when configured to use 64 bit integers

AMD-Internal: [CPUPL-1879]
Change-Id: I865023c63235a0a72bdfce7057b2cfb8158b1d87
2021-11-12 08:58:51 +05:30
Field G. Van Zee
2f7325b2b7 Blacklist clang10/gcc9 and older for 'armsve'.
Details:
- Prohibit use of clang 10.x and older or gcc 9.x and older for the
  'armsve' subconfiguration. Addresses issue #535.
2021-08-23 15:04:05 -05:00
Field G. Van Zee
c8728cfbd1 Fixed configure breakage on OSX clang.
Details:
- Accept either 'clang' or 'LLVM' in vendor string when greping for
  the version number (after determining that we're working with clang).
  Thanks to Devin Matthews for this fix.
2021-08-05 15:17:09 -05:00
Field G. Van Zee
69205ac266 CREDITS file update.
Details:
- Thanks to Chengguo Sun for submitting #515 (5ef7f68).
- Thanks to Andrew Wildman for submitting #519 (551c6b4).
- Whitespace update to configure (spaces to tabs).
2021-07-06 20:39:22 -05:00
Andrew Wildman
f648df4e55 Add symlink to blis.pc.in for out-of-tree builds 2021-07-06 16:35:12 -07:00
sunchengguo
ad6231cca3 Fixed configure script bug.
Details:
- Fixed kernel list string substitution error by adding function substitute_words in configure script.
  if the string contains zen and zen2, and zen need to be replaced with another string, then zen2
  also be incorrectly replaced.
2021-07-06 07:30:00 -04:00
Dipal M Zambare
d2313bb4e6 Update show config to include missing info.
-- Ignore aocl dynamic configuration if multithreading is disabled.
     AOCL Dynamic will also be disabled in this case.
  -- Added following configuration settings in showconfig output
     1. Complex return scheme
     2. TRSM preinversion status
     3. AOCL dynamic active status

AOCL-Internal: [CPUPL-1565]
Change-Id: Id5a31b233fc08dcd871de4a693aab0b2a5d9f1c4
2021-06-29 12:03:47 +05:30
Dipal M Zambare
fe3384b3c6 Enable AOCL Dynamic feature by default.
It can be disabled by configuration option --disable-aocl-dynamic.

AOCL-Internal: [CPUPL-1565]
Change-Id: I15ea5964dcd479f16dc9edc72957af3bcf4bc0e2
2021-06-22 14:17:52 +05:30
Devin Matthews
5feb04e233 Add explicit compiler check for Windows.
Check the C compiler for a predefined macro `_WIN32` to indicate (cross-)compilation for Windows. Fixes #463.
2021-05-23 18:46:56 -05:00
Dipal M Zambare
21130ebece Added configure option for AOCL Dynamic feature.
- AOCL Dynamic feature is added in BLIS which determines optimal
    number of threads for the current problem size.
  - This feature can be enabled/disabled by modifying the source
    code
  - This change adds support to enable/disable this feature during
    configuration time by adding a new option in configure script

AOCL-Internal : [CPUPL-1565]

Change-Id: I590693f793cabc44d27a7f815adc41631dd01bbe
2021-05-12 00:41:13 -04:00
lcpu
7401effc03 BLIS:merge:
Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch

Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond)

Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations.

Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations.

Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu)

Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu)

Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs.

Minor code consolidation in all level-3 _front() functions.

Reorganized Windows cpp branch of bli_pthreads.c.

Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS.

Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion.

Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv.

AMD-internal-[CPUPL-1523]

Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd
2021-04-27 11:09:48 +05:30
Field G. Van Zee
8f39aea11f Merge branch 'dev' 2021-01-30 17:59:56 -06:00
Devin Matthews
874c3f04ec Update configure
Choose last sub-config in the kernel-to-config map if the config list doesn't contain the name of the kernel set. E.g. for "zen: skx knl haswell" pick "haswell" instead of "skx" which was chosen previously. Fixes #470.
2021-01-08 13:56:30 -06:00
dzambare
48f2366b6f Updated BLIS version string to "AOCL BLIS X.x" format
AMD-Internal : [CPUPL-1394]

Change-Id: Ifebcb14d9eb064d231b831f5a1e151853ad5a009
2021-01-07 12:38:32 +05:30
Field G. Van Zee
ed50c94738 Merge branch 'master' into dev 2021-01-04 14:31:44 -06:00
nprasadm
10ac4e2aba Blis: DOTC Additional argument for Complex types when using FLANG
Merged the changes done in UT Austin BLIS repo for DOTC Additional
argument.
Other modifications related to test application included.

Verifed the above code changes through scalapack test applications 'xztrd' , 'xctrd'

Change-Id: I7e16f3953db71890f9e8fbb0f7b363eaad899f62
Signed-off-by: Nagendra <Nagendra.PrasadM@amd.com>
AMD-Internal: [CPUPL-1323]
2020-12-16 14:03:10 +05:30
Isuru Fernando
21aa67e11c fix cc_vendor for crosstool-ng toolchains 2020-12-05 21:59:13 -06:00
Field G. Van Zee
7038bbaa05 Optionally disable trsm diagonal pre-inversion.
Details:
- Implemented a configure-time option, --disable-trsm-preinversion, that
  optionally disables the pre-inversion of diagonal elements of the
  triangular matrix in the trsm operation and instead uses division
  instructions within the gemmtrsm microkernels. Pre-inversion is
  enabled by default. When it is disabled, performance may suffer
  slightly, but numerical robustness should improve for certain
  pathological cases involving denormal (subnormal) numbers that would
  otherwise result in overflow in the pre-inverted value. Thanks to
  Bhaskar Nallani for reporting this issue via #461.
- Added preprocessor macro guards to bli_trsm_cntl.c as well as the
  gemmtrsm microkernels for 'haswell' and 'penryn' kernel sets pursuant
  to the aforementioned feature.
- Added macros to frame/include/bli_x86_asm_macros.h related to division
  instructions.
2020-12-04 16:08:15 -06:00
Field G. Van Zee
9bb23e6c2a Added support for systemless build (no pthreads).
Details:
- Added a configure option, --[enable|disable]-system, which determines
  whether the modest operating system dependencies in BLIS are included.
  The most notable example of this on Linux and BSD/OSX is the use of
  POSIX threads to ensure thread safety for when application-level
  threads call BLIS. When --disable-system is given, the bli_pthreads
  implementation is dummied out entirely, allowing the calling code
  within BLIS to remain unchanged. Why would anyone want to build BLIS
  like this? The motivating example was submitted via #454 in which a
  user wanted to build BLIS for a simulator such as gem5 where thread
  safety may not be a concern (and where the operating system is largely
  absent anyway). Thanks to Stepan Nassyr for suggesting this feature.
- Another, more minor side effect of the --disable-system option is that
  the implementation of bli_clock() unconditionally returns 0.0 instead
  of the time elapsed since some fixed point in the past. The reasoning
  for this is that if the operating system is truly minimal, the system
  function call upon which bli_clock() would normally be implemented
  (e.g. clock_gettime()) may not be available.
- Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h
  to remove redundancies.
- Removed old comments and commented #include of "bli_pthread_wrap.h"
  from bli_system.h.
- Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md
  and BLISTypedAPI.md, with a note that both are non-functional when
  BLIS is configured with --disable-system.
2020-11-16 15:55:45 -06:00
Field G. Van Zee
88ad841434 Squash-merge 'pr' into 'squash'. (#457)
Merged contributions from AMD's AOCL BLIS (#448).
  
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
  only the lower or upper triangle of a square matrix C. For now, only
  the conventional/large code path will be supported (in vanilla BLIS).
  This was accomplished by leveraging the existing variant logic for
  herk. However, some of the infrastructure to support a gemmtsup is
  included in this commit, including
  - A bli_gemmtsup() front-end, similar to bli_gemmsup().
  - A bli_gemmtsup_ref() reference handler function.
  - A bli_gemmtsup_int() variant chooser function (with variant calls
    commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
  wrapper to a set of polymorphic CBLAS-like function wrappers defined
  in another header (cblas.hh). These two headers are installed if
  running the 'install' target with INSTALL_HH is set to 'yes'. (Also
  added a set of unit tests that exercise blis.hh, although they are
  disabled for now because they aren't compatible with out-of-tree
  builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
  within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
  various minor updates to dotv and scalv kernels. Also added various
  sup kernels contributed by AMD to kernels/zen/3. However, these
  kernels are (for now) not yet used, in part because they caused
  AppVeyor clang failures, and also because I have not found time to
  review and vet them.
- Output the python found during configure into the definition of PYTHON
  in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
  to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
  bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
  bug surfaced because the gemmt module verifies its computation using
  gemm with its beta parameter set to zero, which, on a cortexa15 system
  caused the gemm kernel code to unconditionally multiply the
  uninitialized C data by beta. The C matrix likely contained
  non-numeric values such as NaN, which then would have resulted in a
  false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
  in bli_l3_blocksize.c, was inadvertantly being defined in terms of
  helper functions meant for trmm. This bug was probably harmless since
  the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
  kernels/zen/3/bli_gemm_small.c since those macros are not used in
  vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
  accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
  Windows systems.
- Various whitespace changes.
2020-11-14 09:39:48 -06:00
Dipal M Zambare
4347d2d823 Re-enable support for Intel 19+ compiler.
Note that there is know issue with Intel 19+ as explained
in https://github.com/flame/blis/issues/371.

AMD version needs this support as some user applications
need ICC support.

AMD-Internal: [CPUPL-1223]

Change-Id: I86ddee068ae18bd940a5952d60960228d8100e97
2020-11-06 11:11:46 +05:30
Field G. Van Zee
2a0682f8e5 Implemented runtime subconfig selection (#451).
Details:
- Implemented support for the user manually overriding the automatic
  subconfiguration selection that happens at runtime. This override
  can be requested by setting the BLIS_ARCH_TYPE environment variable.
  The variable must be set to the arch_t id (as enumerated in
  bli_type_defs.h) corresponding to the desired subconfiguration. If a
  value outside this enumerated range is given, BLIS will abort with an
  error message. If the value is in the valid range but corresponds to a
  subconfiguration that was not activated at configure-time/compile-time,
  BLIS will abort with a (different) error message. Thanks to decandia50
  for suggesting this feature via issue #451.
- Defined a new function bli_gks_lookup_id to return the address of an
  internal data structure within the gks. If this address is NULL, then
  it indicates that the subconfig corresponding to the arch_t id passed
  into the function was not compiled into BLIS. This function is used
  in the second of the two abort scenarios described above.
- Defined the enumerated error code BLIS_UNINITIALIZED_GKS_CNTX, which
  is returned for the latter of the two abort scenarios mentioned above,
  along with a corresponding error message and a function to perform
  the error check.
- Added cpp macro branching to bli_env.c to support compilation of the
  auto-detect.x executable during configure-time. This cpp branch is
  similar to the cpp code already found in bli_arch.c and bli_cpuid.c.
- Cleaned up the auto_detect() function to facilitate easier maintenance
  going forward. Also added a convenient debug switch that outputs the
  compilation command for the auto-detect.x executable and exits.
2020-10-18 18:04:03 -05:00
Field G. Van Zee
97e87f2c9f Whitespace/comment updates to #434 PR. 2020-09-07 15:56:42 -05:00
Devin Matthews
7fdc0fc893 Add an option to change the complex return type.
ifort apparently does not return complex numbers in registers as in C/C++ (or gfortran), but instead creates a "hidden" first parameter for the return value. The option --complex-return=gnu|intel has been added, as well as a guess based on a provided FC if not specified (otherwise default to gnu). This option affects the signatures of cdotc, cdotu, zdotc, and zdotu, and a single library cannot be used with both GNU and Intel Fortran compilers. Fixes #433.
2020-08-06 14:09:23 -05:00
dzambare
9c7814da1c Added support for zen3 configuration
- User can now specify zen3 configuration,
      currently it reuses block sizes and kernels from zen2.
    - Auto configuration can detect and enable if zen3 config is needed
    - Added support for amd64 bundle which contains all zen platforms
    - Moved exiting amd bundle to amd64 legacy.

AMD-Internal: [CPUPL-500, CPUPL-1013]
Change-Id: I60b0b8abc6d2821c27ff0f5f6e032e889194b957
2020-07-22 18:24:26 +05:30
Field G. Van Zee
f973f00d94 Defined netlib equivalent of xerbla_array().
Details:
- Added a function definition for xerbla_array_(), which largely mirrors
  its netlib implementation. Thanks to Isuru Fernando for suggesting the
  addition of this function.

Change-Id: Ie9c619f5604e60a32edfda2db2b66f0c762581d3
2020-05-21 11:57:54 +05:30
Field G. Van Zee
b325f1ea62 Warn user when auto-detection returns 'generic'.
Details:
- Added logic to configure that causes the script to output a warning
  to the user if/when "./configure auto" is run and the underlying
  hardware feature detection code is unable to identify the hardware.
  In these cases, the auto-detect code will return 'generic', which
  is likely not what the user expected, and a flag will be set so that
  a message is printed at the end of the configure output. (Thankfully,
  we don't expect this scenario to play out very often.) Thanks to
  Devin Matthews for suggesting this fix #384.
2020-05-21 11:54:53 +05:30
Field G. Van Zee
99da76fd64 Fixed 'configure' breakage introduced in 6433831.
Details:
- Added a missing 'fi' (endif) keyword to a conditional block added in
  the configure script in commit 6433831.
2020-05-21 11:40:00 +05:30
Jeff Hammond
570d51483b blacklist ICC 18 for knl/skx due to test failures
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
2020-05-21 11:40:00 +05:30
Jeff Hammond
afc57adc1b blacklist Intel 19+
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
2020-05-21 11:40:00 +05:30
Field G. Van Zee
d51245e58b Add support for Intel oneAPI in configure.
Details:
- Properly select cc_vendor based on the output of invoking CC with the
  --version option, including cases where CC is the variant of clang
  that is included with Intel oneAPI. (However, we continue to treat
  the compiler as clang for other purposes, not icc.) Thanks to Ajay
  Panyala and Devin Matthews for reporting on this issue via #402.
2020-05-08 18:00:54 -05:00
dzambare
d40edf7dac Execution and Debug trace support.
Added support add debug logging, execution trace and decode.

Change-Id: I024bf6165daa9e23a62423f2401c0f1c5de459ba
AMD-Internal: [CPUPL-806]
2020-04-07 08:48:59 +05:30
Field G. Van Zee
c40a33190b Warn user when auto-detection returns 'generic'.
Details:
- Added logic to configure that causes the script to output a warning
  to the user if/when "./configure auto" is run and the underlying
  hardware feature detection code is unable to identify the hardware.
  In these cases, the auto-detect code will return 'generic', which
  is likely not what the user expected, and a flag will be set so that
  a message is printed at the end of the configure output. (Thankfully,
  we don't expect this scenario to play out very often.) Thanks to
  Devin Matthews for suggesting this fix #384.
2020-03-26 16:55:00 -05:00
Meghana Vankadari
cc98047fd6 Made framework changes to initialize specific cache block sizes for TRSM.
Details:
-This commit addresses the performance optimization(single-thread and
 multi-thread) for DTRSM on zen2.
-This new optimization employs different MC, KC & NC values for TRSM than
 what is being used in other Level-3 routines like DGEMM.
-Changed TRSM framework code to choose these blocksizes for TRSM
 on zen family configurations.
-Added a new field called "trsm_blkszs" to cntx structure in order to
 store TRSM specific block sizes.
-Implemented routines to initialize, set and query the TRSM-specific
 block sizes.
-Defined a new macro "AOCL_BLIS_ZEN" in configure script.
 This macro is automatically defined for zen family architectures.
 It enables us to choose different cache block sizes for TRSM instead of common level-3 block sizes.

Change-Id: Id8557b1c962a316b1edecca9cd582675eaf35fe6
Signed-off-by: Meghana Vankadari <meghana.vankadari@amd.com>
AMD-Internal: [CPUPL-656]
2020-03-09 10:33:42 +05:30
Field G. Van Zee
5ca1a3cfc1 Fixed 'configure' breakage introduced in 6433831.
Details:
- Added a missing 'fi' (endif) keyword to a conditional block added in
  the configure script in commit 6433831.
2020-01-06 12:29:12 -06:00
Jeff Hammond
6433831cc3 blacklist ICC 18 for knl/skx due to test failures
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
2020-01-03 17:51:05 -08:00
Jeff Hammond
af3589f1f9 blacklist Intel 19+
Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>
2020-01-03 17:51:05 -08:00
Devrajegowda, Kiran
c4047e491a Merge branch 'amd-blis-nov-mergetest' into amd-staging-rome2.1
Change-Id: I1e04592dd9494faa34555008dd1edbca8a092a44
2019-11-29 23:01:51 +05:30
Dipal M Zambare
37badee648 Updated build infra to use python detected by auto config.
Even though configure script check the availability of correct version
of python, this information is not passed to makefiles. This results
in python scripts getting involved without interpreter. This normally
works fine as the script used the path for shebang, however it doesn't
work if the command specified by shebang is alias.

This also causes confusion that even though configure has found the
python, we end up with python not found error during build.

This fix will pass the detected version of the python interpreter to
makefiles which solved both issues mentioned above.

Change-Id: Ic04da77601ff8ad2a461e9f2f936470109cda22c
2019-11-26 14:57:47 +05:30
Meghana Vankadari
764d6f4643 changed configure script to support AOCC
Change-Id: I86d2f36f42bc6cc7e6b950f4e85087753ce5bc40
2019-11-25 15:17:04 +05:30