amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-21 00:48:56 +00:00

Author	SHA1	Message	Date
Edward Smyth	c6f3340125	Merge commit '5013a6cb' into amd-main * commit '5013a6cb': More edits and fixes to docs/FAQ.md. Fixed newly broken link to CREDITS in FAQ.md. More minor fixes to FAQ.md and Sandboxes.md. Updates to FAQ.md, Sandboxes.md, and README.md. Safelist 'master', 'dev', 'amd' branches. Re-enable and fix `fb93d24`. Reverted `fb93d24`. Re-enable and fix `8e0c425` (BLIS_ENABLE_SYSTEM). Removed last vestige of #define BLIS_NUM_ARCHS. Added new packm var3 to 'gemmlike'. Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell. Fix more copy-paste errors in the haswell gemmsup code. Do a fast test on OSX. [ci skip] Fix AArch64 tests and consolidate some other tests. Use C++ cross-compiler for ARM tests. Attempt to fix cxx-test for OOT builds. Updated travis-ci.org link in README.md to .com. Disabled (at least temporarily) commit `8e0c425`. Define BLIS_OS_NONE when using --disable-system. Updated stale calls to malloc_intl() in gemmlike. Blacklist clang10/gcc9 and older for 'armsve'. Add test to Travis using C++ compiler to make sure blis.h is C++-compatible. Moved lang defs from _macro_def.h to _lang_defs.h. Minor tweaks to gemmlike sandbox. Added local _check() code to gemmlike sandbox. README.md citation updates (e.g. BLIS7 bibtex). Tweaks to gemmlike to facilitate 3rd party mods. Whitespace tweaks. Add row- and column-strides for A/B in obj_ukr_fn_t. Clean up some warnings that show up on clang/OSX. Remove schema field on obj_t (redundant) and add new API functions. Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects. Disabled sanity check in bli_pool_finalize(). Implement proposed new function pointer fields for obj_t. AMD-Internal: [CPUPL-2698] Change-Id: I6fc33351fa824580cf4f25b63f0370383cd9422d	2023-11-10 13:05:12 -05:00
Edward Smyth	f5505be9f3	Merge commit 'e366665c' into amd-main * commit 'e366665c': Fixed stale API calls to membrk API in gemmlike. Fixed bli_init.c compile-time error on OSX clang. Fixed configure breakage on OSX clang. Fixed one-time use property of bli_init() (#525). CREDITS file update. Added Graviton2 Neoverse N1 performance results. Remove unnecesary windows/zen2 directory. Add vzeroupper to Haswell microkernels. (#524) Fix Win64 AVX512 bug. Add comment about make checkblas on Windows CREDITS file update. Test installation in Travis CI Add symlink to blis.pc.in for out-of-tree builds Revert "Always run `make check`." Always run `make check`. Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script. if the string contains zen and zen2, and zen need to be replaced with another string, then zen2 also be incorrectly replaced. Update POWER10.md Rework POWER10 sandbox Skip clearing temp microtile in gemmlike sandbox. Fix asm warning Sandbox header edits trigger full library rebuild. Add vhsubpd/vhsubpd. Fixed bugs in cpackm kernels, gemmlike code. Armv8A Rename Regs for Safe Darwin Compile Armv8A Rename Regs for Clang Compile: FP32 Part Armv8A Rename Regs for Clang Compile: FP64 Part Asm Flag Mingling for Darwin_Aarch64 Added a new 'gemmlike' sandbox. Updated Fugaku (a64fx) performance results. Add explicit compiler check for Windows. Remove `rm-dupls` function in common.mk. Travis CI Revert Unnecessary Extras from `91d3636` Adjust TravisCI Travis Support Arm SVE Added 512b SVE-based a64fx subconfig + SVE kernels. Replace bli_dlamch with something less archaic (#498) Allow clang for ThunderX2 config AMD-Internal: [CPUPL-2698] Change-Id: I561ca3959b7049a00cc128dee3617be51ae11bc4	2023-10-18 09:09:54 -04:00
Edward Smyth	85f2bf6c4a	Fix for x86_64 builds Configuration x86_64 includes all Intel and AMD sub-configurations. Fixes to enable this to work correctly again are: - In config_registry use amdzen rather than amd64 in x86_64 family. - Copy settings from config/amdzen/bli_family_amdzen.h to config/x86_64/bli_family_x86_64.h - Modify configure to set enable_aocl_zen=yes for x86_64, but not for amd64_legacy. - Add "if defined(BLIS_FAMILY_X86_64)" to frame/3/bli_l3_sup.c and frame/3/bli_l3_sup_int_amd.c so zen-specific code paths are enabled. Note: sub-configurations knl and bulldozer use instructions that are not supported on most x86_64 processors. AMD-Internal: [CPUPL-3838] Change-Id: I0bd8fd89ccd846f80e5491ef44ade7d409970b04	2023-10-09 07:24:21 -04:00
Edward Smyth	b531022bac	BLIS cpuid: distinguish submodels within a microarchitecture Incorporate a means of detecting submodels of a microarchitecture, so that different optimizations e.g. block sizes or kernel choices can be used. The details are as follows: - Different models are currently only enabled for zen3 and zen4 architectures (for server parts). - There is a single enumeration (model_t) for all models for all architectures, but function bli_check_valid_model_id() should check the provided model_id against the suitable range within the enumeration for the provided arch_id. - To enable the model_id to be used within the cntx setup functions, checking of a user specified value of BLIS_ARCH_TYPE against the enabled configurations is delayed to a separate function, bli_arch_check_id(). - Default selection based on hardware can be overridden using the BLIS_MODEL_TYPE environment variable. Valid values are: Genoa, Bergamo, Genoa-X, Milan, Milan-X Values are case-insensitive and -X can also be specified as _X or X - Specifying an incorrect value for BLIS_MODEL_TYPE is not an error, but will result in the default option for that architecture being selected. This is different to specifying an incorrect value of BLIS_ARCH_TYPE, which is an error. - The environment variable BLIS_MODEL_TYPE can be renamed using the --rename-blis-model-type argument to configure (or cmake equivalent), in a similar way to renaming BLIS_ARCH_TYPE with --rename-blis-arch-type. - Configure option --disable-blis-arch-type will disable both BLIS_ARCH_TYPE and BLIS_MODEL_TYPE environment variables. - Added code in bli_cpuid.c to detect L1, L2 and L3 cache sizes, currently only for AMD cpus. Functions are provided to query these from other parts of the code, namely: uint32_t bli_cpuid_query_{l1d,l1i,l2,l3}_cache_size() AMD-Internal: [CPUPL-3033] Change-Id: I37a3741abfd59a95e0e905d926c6ede9a0143702	2023-04-20 10:47:44 -04:00
Aayush Kumar	5bd2a777ba	Fixed Compilation Fails when configured with --disable-blas - Moved _blis_impl function declaration outside the BLIS_ENABLE_BLAS guard. - Changed Makefile to continue to compile bla_ files to get _blis_impl interfaces. - Modify CBLAS headers, bli_macro_defs.h and bli_util_api_wrap.{c,h} to add BLIS_ENABLE_CBLAS guards. - Comment out BLIS_ENABLE_BLAS guards in various headers and utility functions. - Define BLIS Fortran-style functions lsame_blis_impl and xerbla_blis_impl. New macros PASTE_LSAME and PASTE_XERBLA are used in bla_*_check headers and some other places to select whether to call lsame and xerbla, or the _blis_impl versions. - Defined various other missing _blis_impl functions. - In bli_util_api_wrap.c, only define any functions if BLIS_ENABLE_BLAS is defined, and only define the subroutine versions of functions like dot, nrm2, etc if BLIS_ENABLE_CBLAS is defined. - BLAS layer is needed if CBLAS layer is enabled. Changed header files build/bli_config.h.in and bli_blas.h, and configure program to help ensure consistency in generated blis.h header and configure output. Undefining BLIS_ENABLE_BLAS_DEFS appears to be broken in UTA BLIS too, thus BLIS_ENABLE_BLAS_DEFS is currently permanently defined. AMD-Internal: [CPUPL-3015] Change-Id: I7c0fe07db85781db46f2c690e174451860b37635	2023-03-23 06:11:52 -04:00
Edward Smyth	82c2eb4e8e	Code cleanup and warnings fixes Corrections for some occurances of: - Compiler warnings about initialization of float from double - Spelling mistakes in comments - Incorrect indentation of code and comments AMD-Internal: [CPUPL-2870] Change-Id: Icb68c789687bd0684844331d43071bfffecac9fc	2023-01-09 04:34:52 -05:00
Edward Smyth	6861fcae91	BLIS: Improve architecture selection at runtime Make BLIS_ARCH_TYPE=0 be an error, so that incorrect meaningful names will get an error rather than "skx" code path. BLIS_ARCH_TYPE=1 is now "generic", so that it should be constant as new code paths are added. Thus all other code path enum values have increased by 2. Also added new options to BLIS configure program to allow: 1. BLIS_ARCH_TYPE functionality to be disabled, e.g.: ./configure --disable-blis-arch-type amdzen 2. Renaming the environment variable tested from "BLIS_ARCH_TYPE" to a specified value, e.g.: ./configure --rename-blis-arch-type=MY_NAME_FOR_ARCH_TYPE amdzen On Windows, these can be enabled with e.g.: cmake ... -DDISABLE_BLIS_ARCH_TYPE=ON or cmake ... -DRENAME_BLIS_ARCH_TYPE=MY_NAME_FOR_ARCH_TYPE This implements changes 2 and 3 in the Jira ticket below. AMD-Internal: [CPUPL-2235] Change-Id: Ie42906bd909f9d83f00a90c5bef9c5bf3ef5adb4	2022-08-19 10:59:35 -04:00
Dipal M. Zambare	c85bbfdb50	Updated BLIS version string format - Updated version string to match the recommended format “AOCL-BLIS 3.2.1 Build 20220727”. - Fixed issues with include paths which was preventing compile time version sting definition passing via build commands. - Removed version string determination based on git tag using ‘git describe’, version string will always be taken from the version file. AMD-Internal: [CPUPL-2324] Change-Id: Idc7edf1211f66d348ec3b5b43f2507c2b810f088	2022-08-12 05:53:35 +00:00
Field G. Van Zee	7a0ba4194f	Added support for addons. Details: - Implemented a new feature called addons, which are similar to sandboxes except that there is no requirement to define gemm or any other particular operation. - Updated configure to accept --enable-addon=<name> or -a <name> syntax for requesting an addon be included within a BLIS build. configure now outputs the list of enabled addons into config.mk. It also outputs the corresponding #include directives for the addons' headers to a new companion to the bli_config.h header file named bli_addon.h. Because addons may wish to make use of existing BLIS types within their own definitions, the addons' headers must be included sometime after that of bli_config.h (which currently is #included before bli_type_defs.h). This is why the #include directives needed to go into a new top-level header file rather than the existing bli_config.h file. - Added a markdown document, docs/Addons.md, to explain addons, how to build with them, and what assumptions their authors should keep in mind as they create them. - Added a gemmlike-like implementation of sandwich gemm called 'gemmd' as an addon in addon/gemmd. The code uses a 'bao_' prefix for local functions, including the user-level object and typed APIs. - Updated .gitignore so that git ignores bli_addon.h files. Change-Id: Ie7efdea366481ce25075cb2459bdbcfd52309717	2022-03-31 12:03:27 +05:30
Dipal M Zambare	f63f78d783	Removed Arch specific code from BLIS framework. - Removed BLIS_CONFIG_EPYC macro - The code dependent on this macro is handled in one of the three ways -- It is updated to work across platforms. -- Added in architecture/feature specific runtime checks. -- Duplicated in AMD specific files. Build system is updated to pick AMD specific files when library is built for any of the zen architecture AMD-Internal: [CPUPL-1960] Change-Id: I6f9f8018e41fa48eb43ae4245c9c2c361857f43b	2022-01-18 11:51:08 +05:30
Dipal M Zambare	5d287fdba0	Include LP64/ILP64 in BLIS binary name Binary name will be chosen based on multi-threading and BLAS integer size configuration as given below. libblis-[mt]-lp64 - when configured to use 32 bit integers libblis-[mt]-ilp64 - when configured to use 64 bit integers AMD-Internal: [CPUPL-1879] Change-Id: I865023c63235a0a72bdfce7057b2cfb8158b1d87	2021-11-12 08:58:51 +05:30
Field G. Van Zee	2f7325b2b7	Blacklist clang10/gcc9 and older for 'armsve'. Details: - Prohibit use of clang 10.x and older or gcc 9.x and older for the 'armsve' subconfiguration. Addresses issue #535.	2021-08-23 15:04:05 -05:00
Field G. Van Zee	c8728cfbd1	Fixed configure breakage on OSX clang. Details: - Accept either 'clang' or 'LLVM' in vendor string when greping for the version number (after determining that we're working with clang). Thanks to Devin Matthews for this fix.	2021-08-05 15:17:09 -05:00
Field G. Van Zee	69205ac266	CREDITS file update. Details: - Thanks to Chengguo Sun for submitting #515 (`5ef7f68`). - Thanks to Andrew Wildman for submitting #519 (`551c6b4`). - Whitespace update to configure (spaces to tabs).	2021-07-06 20:39:22 -05:00
Andrew Wildman	f648df4e55	Add symlink to blis.pc.in for out-of-tree builds	2021-07-06 16:35:12 -07:00
sunchengguo	ad6231cca3	Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script. if the string contains zen and zen2, and zen need to be replaced with another string, then zen2 also be incorrectly replaced.	2021-07-06 07:30:00 -04:00
Dipal M Zambare	d2313bb4e6	Update show config to include missing info. -- Ignore aocl dynamic configuration if multithreading is disabled. AOCL Dynamic will also be disabled in this case. -- Added following configuration settings in showconfig output 1. Complex return scheme 2. TRSM preinversion status 3. AOCL dynamic active status AOCL-Internal: [CPUPL-1565] Change-Id: Id5a31b233fc08dcd871de4a693aab0b2a5d9f1c4	2021-06-29 12:03:47 +05:30
Dipal M Zambare	fe3384b3c6	Enable AOCL Dynamic feature by default. It can be disabled by configuration option --disable-aocl-dynamic. AOCL-Internal: [CPUPL-1565] Change-Id: I15ea5964dcd479f16dc9edc72957af3bcf4bc0e2	2021-06-22 14:17:52 +05:30
Devin Matthews	5feb04e233	Add explicit compiler check for Windows. Check the C compiler for a predefined macro `_WIN32` to indicate (cross-)compilation for Windows. Fixes #463.	2021-05-23 18:46:56 -05:00
Dipal M Zambare	21130ebece	Added configure option for AOCL Dynamic feature. - AOCL Dynamic feature is added in BLIS which determines optimal number of threads for the current problem size. - This feature can be enabled/disabled by modifying the source code - This change adds support to enable/disable this feature during configuration time by adding a new option in configure script AOCL-Internal : [CPUPL-1565] Change-Id: I590693f793cabc44d27a7f815adc41631dd01bbe	2021-05-12 00:41:13 -04:00
lcpu	7401effc03	BLIS:merge: Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond) Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations. Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations. Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu) Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu) Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs. Minor code consolidation in all level-3 _front() functions. Reorganized Windows cpp branch of bli_pthreads.c. Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS. Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion. Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv. AMD-internal-[CPUPL-1523] Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd	2021-04-27 11:09:48 +05:30
Field G. Van Zee	8f39aea11f	Merge branch 'dev'	2021-01-30 17:59:56 -06:00
Devin Matthews	874c3f04ec	Update configure Choose last sub-config in the kernel-to-config map if the config list doesn't contain the name of the kernel set. E.g. for "zen: skx knl haswell" pick "haswell" instead of "skx" which was chosen previously. Fixes #470.	2021-01-08 13:56:30 -06:00
dzambare	48f2366b6f	Updated BLIS version string to "AOCL BLIS X.x" format AMD-Internal : [CPUPL-1394] Change-Id: Ifebcb14d9eb064d231b831f5a1e151853ad5a009	2021-01-07 12:38:32 +05:30
Field G. Van Zee	ed50c94738	Merge branch 'master' into dev	2021-01-04 14:31:44 -06:00
nprasadm	10ac4e2aba	Blis: DOTC Additional argument for Complex types when using FLANG Merged the changes done in UT Austin BLIS repo for DOTC Additional argument. Other modifications related to test application included. Verifed the above code changes through scalapack test applications 'xztrd' , 'xctrd' Change-Id: I7e16f3953db71890f9e8fbb0f7b363eaad899f62 Signed-off-by: Nagendra <Nagendra.PrasadM@amd.com> AMD-Internal: [CPUPL-1323]	2020-12-16 14:03:10 +05:30
Isuru Fernando	21aa67e11c	fix cc_vendor for crosstool-ng toolchains	2020-12-05 21:59:13 -06:00
Field G. Van Zee	7038bbaa05	Optionally disable trsm diagonal pre-inversion. Details: - Implemented a configure-time option, --disable-trsm-preinversion, that optionally disables the pre-inversion of diagonal elements of the triangular matrix in the trsm operation and instead uses division instructions within the gemmtrsm microkernels. Pre-inversion is enabled by default. When it is disabled, performance may suffer slightly, but numerical robustness should improve for certain pathological cases involving denormal (subnormal) numbers that would otherwise result in overflow in the pre-inverted value. Thanks to Bhaskar Nallani for reporting this issue via #461. - Added preprocessor macro guards to bli_trsm_cntl.c as well as the gemmtrsm microkernels for 'haswell' and 'penryn' kernel sets pursuant to the aforementioned feature. - Added macros to frame/include/bli_x86_asm_macros.h related to division instructions.	2020-12-04 16:08:15 -06:00
Field G. Van Zee	9bb23e6c2a	Added support for systemless build (no pthreads). Details: - Added a configure option, --[enable\|disable]-system, which determines whether the modest operating system dependencies in BLIS are included. The most notable example of this on Linux and BSD/OSX is the use of POSIX threads to ensure thread safety for when application-level threads call BLIS. When --disable-system is given, the bli_pthreads implementation is dummied out entirely, allowing the calling code within BLIS to remain unchanged. Why would anyone want to build BLIS like this? The motivating example was submitted via #454 in which a user wanted to build BLIS for a simulator such as gem5 where thread safety may not be a concern (and where the operating system is largely absent anyway). Thanks to Stepan Nassyr for suggesting this feature. - Another, more minor side effect of the --disable-system option is that the implementation of bli_clock() unconditionally returns 0.0 instead of the time elapsed since some fixed point in the past. The reasoning for this is that if the operating system is truly minimal, the system function call upon which bli_clock() would normally be implemented (e.g. clock_gettime()) may not be available. - Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h to remove redundancies. - Removed old comments and commented #include of "bli_pthread_wrap.h" from bli_system.h. - Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md and BLISTypedAPI.md, with a note that both are non-functional when BLIS is configured with --disable-system.	2020-11-16 15:55:45 -06:00
Field G. Van Zee	88ad841434	Squash-merge 'pr' into 'squash'. (#457 ) Merged contributions from AMD's AOCL BLIS (#448). Details: - Added support for level-3 operation gemmt, which performs a gemm on only the lower or upper triangle of a square matrix C. For now, only the conventional/large code path will be supported (in vanilla BLIS). This was accomplished by leveraging the existing variant logic for herk. However, some of the infrastructure to support a gemmtsup is included in this commit, including - A bli_gemmtsup() front-end, similar to bli_gemmsup(). - A bli_gemmtsup_ref() reference handler function. - A bli_gemmtsup_int() variant chooser function (with variant calls commented out). - Added support for inducing complex domain gemmt via the 1m method. - Added gemmt APIs to the BLAS and CBLAS compatiblity layers. - Added gemmt test module to testsuite. - Added standalone gemmt test driver to 'test' directory. - Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md. - Added a C++ template header (blis.hh) containing a BLAS-inspired wrapper to a set of polymorphic CBLAS-like function wrappers defined in another header (cblas.hh). These two headers are installed if running the 'install' target with INSTALL_HH is set to 'yes'. (Also added a set of unit tests that exercise blis.hh, although they are disabled for now because they aren't compatible with out-of-tree builds.) These files now live in the 'vendor' top-level directory. - Various updates to 'zen' and 'zen2' subconfigurations, particularly within the context initialization functions. - Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and various minor updates to dotv and scalv kernels. Also added various sup kernels contributed by AMD to kernels/zen/3. However, these kernels are (for now) not yet used, in part because they caused AppVeyor clang failures, and also because I have not found time to review and vet them. - Output the python found during configure into the definition of PYTHON in build/config.mk (via build/config.mk.in). - Added early-return checks (A, B, or C with zero dimension; alpha = 0) to bli_gemm_front.c. - Implemented explicit beta = 0 handling in for the sgemm ukernel in bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent bug surfaced because the gemmt module verifies its computation using gemm with its beta parameter set to zero, which, on a cortexa15 system caused the gemm kernel code to unconditionally multiply the uninitialized C data by beta. The C matrix likely contained non-numeric values such as NaN, which then would have resulted in a false failure. - Fixed a bug whereby the implementation for bli_herk_determine_kc(), in bli_l3_blocksize.c, was inadvertantly being defined in terms of helper functions meant for trmm. This bug was probably harmless since the trmm code should have also done the right thing for herk. - Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in kernels/zen/3/bli_gemm_small.c since those macros are not used in vanilla BLIS. - Added cpp guard to definition of bli_mem_clear() in bli_mem.h to accommodate C++'s stricter type checking. - Added cpp guard to test/*.c drivers that facilitate compilation on Windows systems. - Various whitespace changes.	2020-11-14 09:39:48 -06:00
Dipal M Zambare	4347d2d823	Re-enable support for Intel 19+ compiler. Note that there is know issue with Intel 19+ as explained in https://github.com/flame/blis/issues/371. AMD version needs this support as some user applications need ICC support. AMD-Internal: [CPUPL-1223] Change-Id: I86ddee068ae18bd940a5952d60960228d8100e97	2020-11-06 11:11:46 +05:30
Field G. Van Zee	2a0682f8e5	Implemented runtime subconfig selection (#451 ). Details: - Implemented support for the user manually overriding the automatic subconfiguration selection that happens at runtime. This override can be requested by setting the BLIS_ARCH_TYPE environment variable. The variable must be set to the arch_t id (as enumerated in bli_type_defs.h) corresponding to the desired subconfiguration. If a value outside this enumerated range is given, BLIS will abort with an error message. If the value is in the valid range but corresponds to a subconfiguration that was not activated at configure-time/compile-time, BLIS will abort with a (different) error message. Thanks to decandia50 for suggesting this feature via issue #451. - Defined a new function bli_gks_lookup_id to return the address of an internal data structure within the gks. If this address is NULL, then it indicates that the subconfig corresponding to the arch_t id passed into the function was not compiled into BLIS. This function is used in the second of the two abort scenarios described above. - Defined the enumerated error code BLIS_UNINITIALIZED_GKS_CNTX, which is returned for the latter of the two abort scenarios mentioned above, along with a corresponding error message and a function to perform the error check. - Added cpp macro branching to bli_env.c to support compilation of the auto-detect.x executable during configure-time. This cpp branch is similar to the cpp code already found in bli_arch.c and bli_cpuid.c. - Cleaned up the auto_detect() function to facilitate easier maintenance going forward. Also added a convenient debug switch that outputs the compilation command for the auto-detect.x executable and exits.	2020-10-18 18:04:03 -05:00
Field G. Van Zee	97e87f2c9f	Whitespace/comment updates to #434 PR.	2020-09-07 15:56:42 -05:00
Devin Matthews	7fdc0fc893	Add an option to change the complex return type. ifort apparently does not return complex numbers in registers as in C/C++ (or gfortran), but instead creates a "hidden" first parameter for the return value. The option --complex-return=gnu\|intel has been added, as well as a guess based on a provided FC if not specified (otherwise default to gnu). This option affects the signatures of cdotc, cdotu, zdotc, and zdotu, and a single library cannot be used with both GNU and Intel Fortran compilers. Fixes #433.	2020-08-06 14:09:23 -05:00
dzambare	9c7814da1c	Added support for zen3 configuration - User can now specify zen3 configuration, currently it reuses block sizes and kernels from zen2. - Auto configuration can detect and enable if zen3 config is needed - Added support for amd64 bundle which contains all zen platforms - Moved exiting amd bundle to amd64 legacy. AMD-Internal: [CPUPL-500, CPUPL-1013] Change-Id: I60b0b8abc6d2821c27ff0f5f6e032e889194b957	2020-07-22 18:24:26 +05:30
Field G. Van Zee	f973f00d94	Defined netlib equivalent of xerbla_array(). Details: - Added a function definition for xerbla_array_(), which largely mirrors its netlib implementation. Thanks to Isuru Fernando for suggesting the addition of this function. Change-Id: Ie9c619f5604e60a32edfda2db2b66f0c762581d3	2020-05-21 11:57:54 +05:30
Field G. Van Zee	b325f1ea62	Warn user when auto-detection returns 'generic'. Details: - Added logic to configure that causes the script to output a warning to the user if/when "./configure auto" is run and the underlying hardware feature detection code is unable to identify the hardware. In these cases, the auto-detect code will return 'generic', which is likely not what the user expected, and a flag will be set so that a message is printed at the end of the configure output. (Thankfully, we don't expect this scenario to play out very often.) Thanks to Devin Matthews for suggesting this fix #384.	2020-05-21 11:54:53 +05:30
Field G. Van Zee	99da76fd64	Fixed 'configure' breakage introduced in `6433831`. Details: - Added a missing 'fi' (endif) keyword to a conditional block added in the configure script in commit `6433831`.	2020-05-21 11:40:00 +05:30
Jeff Hammond	570d51483b	blacklist ICC 18 for knl/skx due to test failures Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>	2020-05-21 11:40:00 +05:30
Jeff Hammond	afc57adc1b	blacklist Intel 19+ Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>	2020-05-21 11:40:00 +05:30
Field G. Van Zee	d51245e58b	Add support for Intel oneAPI in configure. Details: - Properly select cc_vendor based on the output of invoking CC with the --version option, including cases where CC is the variant of clang that is included with Intel oneAPI. (However, we continue to treat the compiler as clang for other purposes, not icc.) Thanks to Ajay Panyala and Devin Matthews for reporting on this issue via #402.	2020-05-08 18:00:54 -05:00
dzambare	d40edf7dac	Execution and Debug trace support. Added support add debug logging, execution trace and decode. Change-Id: I024bf6165daa9e23a62423f2401c0f1c5de459ba AMD-Internal: [CPUPL-806]	2020-04-07 08:48:59 +05:30
Field G. Van Zee	c40a33190b	Warn user when auto-detection returns 'generic'. Details: - Added logic to configure that causes the script to output a warning to the user if/when "./configure auto" is run and the underlying hardware feature detection code is unable to identify the hardware. In these cases, the auto-detect code will return 'generic', which is likely not what the user expected, and a flag will be set so that a message is printed at the end of the configure output. (Thankfully, we don't expect this scenario to play out very often.) Thanks to Devin Matthews for suggesting this fix #384.	2020-03-26 16:55:00 -05:00
Meghana Vankadari	cc98047fd6	Made framework changes to initialize specific cache block sizes for TRSM. Details: -This commit addresses the performance optimization(single-thread and multi-thread) for DTRSM on zen2. -This new optimization employs different MC, KC & NC values for TRSM than what is being used in other Level-3 routines like DGEMM. -Changed TRSM framework code to choose these blocksizes for TRSM on zen family configurations. -Added a new field called "trsm_blkszs" to cntx structure in order to store TRSM specific block sizes. -Implemented routines to initialize, set and query the TRSM-specific block sizes. -Defined a new macro "AOCL_BLIS_ZEN" in configure script. This macro is automatically defined for zen family architectures. It enables us to choose different cache block sizes for TRSM instead of common level-3 block sizes. Change-Id: Id8557b1c962a316b1edecca9cd582675eaf35fe6 Signed-off-by: Meghana Vankadari <meghana.vankadari@amd.com> AMD-Internal: [CPUPL-656]	2020-03-09 10:33:42 +05:30
Field G. Van Zee	5ca1a3cfc1	Fixed 'configure' breakage introduced in `6433831`. Details: - Added a missing 'fi' (endif) keyword to a conditional block added in the configure script in commit `6433831`.	2020-01-06 12:29:12 -06:00
Jeff Hammond	6433831cc3	blacklist ICC 18 for knl/skx due to test failures Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>	2020-01-03 17:51:05 -08:00
Jeff Hammond	af3589f1f9	blacklist Intel 19+ Signed-off-by: Jeff Hammond <jeff.r.hammond@intel.com>	2020-01-03 17:51:05 -08:00
Devrajegowda, Kiran	c4047e491a	Merge branch 'amd-blis-nov-mergetest' into amd-staging-rome2.1 Change-Id: I1e04592dd9494faa34555008dd1edbca8a092a44	2019-11-29 23:01:51 +05:30
Dipal M Zambare	37badee648	Updated build infra to use python detected by auto config. Even though configure script check the availability of correct version of python, this information is not passed to makefiles. This results in python scripts getting involved without interpreter. This normally works fine as the script used the path for shebang, however it doesn't work if the command specified by shebang is alias. This also causes confusion that even though configure has found the python, we end up with python not found error during build. This fix will pass the detected version of the python interpreter to makefiles which solved both issues mentioned above. Change-Id: Ic04da77601ff8ad2a461e9f2f936470109cda22c	2019-11-26 14:57:47 +05:30
Meghana Vankadari	764d6f4643	changed configure script to support AOCC Change-Id: I86d2f36f42bc6cc7e6b950f4e85087753ce5bc40	2019-11-25 15:17:04 +05:30

1 2 3 4

195 Commits