mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
Merge branch 'amd' into rt
Details: - Merged contributions made by AMD via 'amd' branch (see summary below). Special thanks to AMD for their contributions to-date, especially with regard to intrinsic- and assembly-based kernels. - Added column storage output cases to microkernels in bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with the extra cost of transposing the microtile in registers, this is much faster than using the general storage case when the underlying matrix is column-stored. - Added s and d assembly-based zen gemmtrsm_u microkernel (including column storage optimization mentioned above). - Updated zen sub-configuration to reflect presence of new native kernels. - Temporarily reverted zen sub-configuration's level-3 cache blocksizes to smaller haswell values. - Temporarily disabled small matrix handling for zen configuration family in config/zen/bli_family_zen.h. - Updated zen CFLAGS according to changes in1e4365b. - Updated haswell microkernels such that: - only one vzeroupper instruction is called prior to returning - movapd/movupd are used in leiu of movaps/movups for double-real microkernels. (Note that single-real microkernels still use movaps/movups.) - Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is now included via frame/include/bli_arch_config.h. - Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation in testsuite/src/test_amaxv.c). - Added early return for alpha == 0 in bli_dotxv_ref.c. - Integrated changes fromf07b176, including a fix for undefined behavior when executing the 1m method under certain conditions. - Updated config_registry; no longer need haswell kernels for zen sub-configuration. - Tweaked marginal and pass thresholds for dotxf. - Reformatted level-1v, -1f, and -3 amd kernels and inserted additional comments. - Updated LICENSE file to explicitly mention that parts are copyright UT-Austin and AMD. - Added AMD copyright to header templates in build/templates. Summary of previous changes from 'amd' branch. - Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and s and d assembly-based zen gemmtrsm_l microkernels (d6x8). - Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv, and scalv, with extra-unrolling variants for axpyv and scalv. - Added a small matrix handler to bli_gemm_front(), with the handler implemented in kernels/zen/3/bli_gemm_small_matrix.c. - Added additional logic to sumsqv that first attempts to compute the sum of the squares via dotv(). If there is a floating-point exception (FE_OVERFLOW), then the previous (numerically conservative) code is used; otherwise, the result of dotv() is square-rooted and stored as the result. This new implementation is only enabled when FE_OVERFLOW is #defined. If the macro is not #defined, then the previous implementation is used. - Added axpyv and dotv standalone test drivers to test directory. - Added zen support to old cpuid_x86.c driver in build/auto-detect/old. - Added thread-local and __attribute__-related macros to bli_macro_defs.h.
This commit is contained in:
@@ -214,9 +214,9 @@ CNTX_INIT_PROTS( generic )
|
||||
|
||||
// -- AMD64 architectures --
|
||||
|
||||
//#ifdef BLIS_KERNELS_ZEN
|
||||
//#include "bli_kernels_zen.h"
|
||||
//#endif
|
||||
#ifdef BLIS_KERNELS_ZEN
|
||||
#include "bli_kernels_zen.h"
|
||||
#endif
|
||||
//#ifdef BLIS_KERNELS_EXCAVATOR
|
||||
//#include "bli_kernels_excavator.h"
|
||||
//#endif
|
||||
|
||||
@@ -5,6 +5,7 @@
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
Copyright (C) 2017, Advanced Micro Devices, Inc.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
@@ -64,6 +65,49 @@
|
||||
#endif
|
||||
|
||||
|
||||
// -- BLIS Thread Local Storage Keyword --
|
||||
|
||||
// __thread for TLS is supported by GCC, CLANG, ICC, and IBMC.
|
||||
// There is a small risk here as __GNUC__ can also be defined by some other
|
||||
// compiler (other than ICC and CLANG which we know define it) that
|
||||
// doesn't support __thread, as __GNUC__ is not quite unique to GCC.
|
||||
// But the possibility of someone using such non-main-stream compiler
|
||||
// for building BLIS is low.
|
||||
#if defined(__GNUC__) || defined(__clang__) || defined(__ICC) || defined(__IBMC__)
|
||||
#define BLIS_THREAD_LOCAL __thread
|
||||
#else
|
||||
#define BLIS_THREAD_LOCAL
|
||||
#endif
|
||||
|
||||
|
||||
// -- BLIS constructor/destructor function attribute --
|
||||
|
||||
// __attribute__((constructor/destructor)) is supported by GCC only.
|
||||
// There is a small risk here as __GNUC__ can also be defined by some other
|
||||
// compiler (other than ICC and CLANG which we know define it) that
|
||||
// doesn't support this, as __GNUC__ is not quite unique to GCC.
|
||||
// But the possibility of someone using such non-main-stream compiler
|
||||
// for building BLIS is low.
|
||||
|
||||
#if defined(__ICC) || defined(__INTEL_COMPILER)
|
||||
// ICC defines __GNUC__ but doesn't support this
|
||||
#define BLIS_ATTRIB_CTOR
|
||||
#define BLIS_ATTRIB_DTOR
|
||||
#elif defined(__clang__)
|
||||
// CLANG supports __attribute__, but its documentation doesn't
|
||||
// mention support for constructor/destructor. Compiling with
|
||||
// clang and testing shows that it does support.
|
||||
#define BLIS_ATTRIB_CTOR __attribute__((constructor))
|
||||
#define BLIS_ATTRIB_DTOR __attribute__((destructor))
|
||||
#elif defined(__GNUC__)
|
||||
#define BLIS_ATTRIB_CTOR __attribute__((constructor))
|
||||
#define BLIS_ATTRIB_DTOR __attribute__((destructor))
|
||||
#else
|
||||
#define BLIS_ATTRIB_CTOR
|
||||
#define BLIS_ATTRIB_DTOR
|
||||
#endif
|
||||
|
||||
|
||||
// -- Concatenation macros --
|
||||
|
||||
#define BLIS_FUNC_PREFIX_STR "bli"
|
||||
|
||||
Reference in New Issue
Block a user