amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-13 10:35:38 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	29d5bcb1c8	Changed unsafe-loop to unsafe-math optimizations. Details: - Changed -funsafe-loop-optimizations (re-)introduced in `7690855` for make_defs.mk files' CRVECFLAGS to -funsafe-math-optimizations (to account for a miscommunication in issue #300). Thanks to Dave Love for this suggestion and Jeff Hammond for his feedback on the topic.	2019-08-23 14:18:07 +05:30
Field G. Van Zee	9a42c1a323	Restored -funsafe-loop-optimizations to subconfigs. Details: - Restored use of -funsafe-loop-optimizations in the definitions of CRVECFLAGS (when using gcc), but only for sub-configurations (and not configuration families such as amd64, intel64, and x86_64). This more or less reverts `5190d05` and `6cf1550`.	2019-08-23 14:18:07 +05:30
Field G. Van Zee	e62bdd4df1	Disable TBM, XOP, LWP instructions in AMD configs. Details: - Added -mno-tbm -mno-xop -mno-lwp to CKVECFLAGS in bulldozer, piledriver, steamroller, and excavator configurations to explicitly disable AMD's bulldozer-era TBM, XOP, and LWP instruction sets in an attempt to fix the invalid instruction error that has plagued Travis CI builds since `6a014a3`. Thanks to Devin Matthews for pointing out that the offending instruction was part of TBM (issue #300). - Restored -O3 to piledriver configuration's COPTFLAGS.	2019-08-23 14:18:07 +05:30
Field G. Van Zee	e7b73bf1ed	Reverted piledriver COPTFLAGS from -O3 to -O2. Details: - Debugging continues; changing COPTFLAGS for piledriver subconfig from -O3 to -O2, its original value prior to `6a014a3`.	2019-08-23 14:18:07 +05:30
Field G. Van Zee	176e4c6860	Removed -funsafe-loop-optimizations from all configs. Details: - Error persists. Removed -funsafe-loop-optimizations from all remaining sub-configurations.	2019-08-23 14:18:07 +05:30
Field G. Van Zee	24adee071c	Removed -funsafe-loop-optimizations from piledriver. Details: - Error persists; continuing debugging from `bf0fb78c` by removing -funsafe-loop-optimizations from piledriver configuration.	2019-08-23 14:18:07 +05:30
Field G. Van Zee	7128d4b94b	Removed -funsafe-loop-optimizations from families. Details: - Removed -funsafe-loop-optimizations from the configuration families affected by `6a014a3`, specifically: intel64, amd64, and x86_64. This is part of an attempt to debug why the sde, as executed by Travis CI, is crashing via the following error: TID 0 SDE-ERROR: Executed instruction not valid for specified chip (ICELAKE): 0x9172a5: bextr_xop rax, rcx, 0x103	2019-08-23 14:18:07 +05:30
Field G. Van Zee	b7c4f1e305	Standardized optimization flags in make_defs.mk. Details: - Per Dave Love's recommendation in issue #300, this commit defines COPTFLAGS := -03 and CRVECFLAGS := $(CKVECFLAGS) -funsafe-loop-optimizations in the make_defs.mk for all Intel- and AMD-based configurations.	2019-08-23 14:18:07 +05:30
Meghana	fdce1a5648	changed gcc version check condition from 'ifeq' to 'if greater or equal' Change-Id: Ie4c461867829bcc113210791bbefb9517e52c226	2019-07-24 15:04:41 +05:30
Meghana	c9486e0c4f	code to detect version of gcc and set flags accordingly for zen2 Change-Id: I29b0311d0000dee1a2533ee29941acf53f9e9f34	2019-07-24 09:45:17 +05:30
Meghana	dcc0ce12fd	Added a global Makefile for AMD architectures in config/zen folder This Makefile(amd_config.mk) has all the flags that are common to EPYC series Change-Id: Ic02c60a8293ccdd37f0f292e631acd198e6895de	2019-07-22 17:12:01 +05:30
Meghana Vankadari	b84cee29f4	Merge "Added compiler flags for vanilla clang" into amd-staging-rome2.0	2019-07-08 02:03:07 -04:00
kdevraje	1f80858abf	This checkin solves the dgemm performance issue jira ticket CPUPL 458, as #else was missed during integration, it was always following else path to get the block sizes Change-Id: I0084b5856c2513ab1066c08c15b5086db6532717	2019-07-05 16:05:11 +05:30
Meghana	c7dd6e6cd2	Added compiler flags for vanilla clang Change-Id: I13c00b4c0d65bbda4c929848fd48b0ab611952ab	2019-07-04 09:32:51 +05:30
Meghana	2acd49b764	fix for test failures using AOCC 2.0 Change-Id: If44eaccc64bbe96bbbe1d32279b1b5773aba08d1	2019-07-01 15:44:07 +05:30
kdevraje	cac127182d	Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis with public repo commit id `565fa3853b`. Change-Id: I68b9824b110cf14df248217a24a6191b3df79d42	2019-06-24 14:05:54 +05:30
Kiran Devrajegowda	3a45ecb154	Merge "Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup" into amd-staging-rome2.0	2019-05-31 06:47:02 -04:00
Kiran Varaganti	b69fb0b74a	Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup Change-Id: I9f5d8225254676a99c6f2b09a0825e545206d0fc	2019-05-31 15:14:22 +05:30
kdevraje	3f867c96ca	When running HPL with pure MPI without DGEMM Threading (Single Threaded BLIS ), making this macro 1 gives best performance.wq Change-Id: I24fd0bf99216f315e49f1c74c44c3feaffd7078d	2019-05-31 14:31:49 +05:30
kdevraje	13806ba3b0	This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019 Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041	2019-05-27 16:24:43 +05:30
Meghana	ee123f5358	Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME Updated copyright information for kernels/zen/bli_trsm_small.c file Removed separate kernels for zen2 architecture Instead added threshold conditions in zen kernels both for ROME and NAPLES Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5	2019-05-27 15:36:44 +05:30
kdevraje	84215022f2	Adding threshold condition to dgemm small matrix kernels, defining the constants in zen2 configuration Change-Id: I53a58b5d734925a6fcb8d8bea5a02ddb8971fcd5	2019-05-23 14:33:47 +05:30
Kiran Varaganti	b80bd5bcb2	config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2 config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H. test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64) Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2	2019-05-22 05:51:22 -04:00
Kiran Varaganti	a042db011d	Modified make_defs.mk for zen2 to get compiled by gcc version less than gcc9.0 Change-Id: I8fcac30538ee39534c296932639053b47b9a2d43	2019-05-22 05:51:10 -04:00
Kiran Varaganti	a23f92594c	config_registry: New AMD zen2 architecture configuration added. frame/base/bli_arch.c: #ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; #endif added. zen2 is added in config_name[BLIS_NUM_ARCHS] frame/base/bli_cpuid.c : #ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; #endif, defined new function bool bli_cpuid_is_zen2(...). frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..). frame/base/bli_gks.c : #ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); #endif frame/include/bli_arch_config.h : #ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) #endif #ifdef BLIS_FAMILY_ZEN2 #include "bli_family_zen2.h" #endif frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20 Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866	2019-05-22 05:28:16 -04:00
Kiran Varaganti	ca4b33c001	Added compiler option (-mno-avx256-split-unaligned-store) in the file config/zen/make_defs.mk to improve performance of intrinsic codes, this flag ensures compiler generates 256-bit stores for the equivalent intrinsics code. Change-Id: I8f8cd81a3604869df18d38bc42097a04f178d324	2019-04-24 15:02:39 +05:30
Kiran Varaganti	7fe4474838	Disabled BLIS_ENABLE_ZEN_BLOCK_SIZES in bli_family_zen.h for ROME tuning Change-Id: Iec47fcf51f4d4396afef1ce3958e58cf02c59a57	2019-03-06 16:23:31 +05:30
Kiran Varaganti	f5ed95ecd7	Merged BLIS Release 1.3 Modified config/zen/make_defs.mk, now CKVECFLAGS := -mavx2 -mfpmath=sse -mfma -march=znver1 Change-Id: Ia0942d285a21447cd0c470de1bc021fe63e80d81	2019-03-05 15:03:57 +05:30
Nicholai Tukanov	78bc0bc8b6	Power9 sub-configuration (#298 ) Formally registered power9 sub-configuration. Details: - Added and registered power9 sub-configuration into the build system. Thanks to Nicholai Tukanov and Devangi Parikh for these contributions. - Note: The sub-configuration does not yet have a corresponding architecture-specific kernel set registered, and so for now the sub-config is using the generic kernel set.	2019-02-14 13:29:02 -06:00
Devangi N. Parikh	dfc91843ea	Fixed gcc flags for thunderx2 subconfiguration Details: - Fixed -march flag. Thunderx2 is an armv8.1a architecture not armv8a.	2019-02-04 15:23:40 -05:00
Field G. Van Zee	26c5cf495c	Fixed bug in skx subconfig related to `bdd46f9`. Details: - Fixed code in the skx subconfiguration that became a bug after committing `bdd46f9`. Specifically, the bli_cntx_init_skx() function was overwriting default blocksizes for the scomplex and dcomplex microkernels despite the fact that only single and double real microkernels were being registered. This was not a problem prior to `bdd46f9` since all microkernels used dynamically-queried (at runtime) register blocksizes for loop bounds. However, post-bdd46f9, this became a bug because the reference ukernels for scomplex and dcomplex were written with their register blocksizes hard-coded as constant loop bounds, which conflicted the the erroneous scomplex and dcomplex values that bli_cntx_init_skx() was setting in the context. The lesson here is that going forward, all subconfigurations must not set any blocksizes for datatypes corresponding to default/reference microkernels. (Note that a blocksize is left unchanged by the bli_cntx_set_blkszs() function if it was set to -1.)	2019-01-24 18:49:31 -06:00
Field G. Van Zee	adf5c17f08	Formally registered thunderx2 subconfiguration. Details: - Added a separate subconfiguration for thunderx2, which now uses different optimization flags than cortexa57/cortexa53.	2019-01-18 15:14:45 -06:00
Field G. Van Zee	0645f239fb	Remove UT-Austin from copyright headers' clause 3. Details: - Removed explicit reference to The University of Texas at Austin in the third clause of the license comment blocks of all relevant files and replaced it with a more all-encompassing "copyright holder(s)". - Removed duplicate words ("derived") from a few kernels' license comment blocks. - Homogenized license comment block in kernels/zen/3/bli_gemm_small.c with format of all other comment blocks.	2018-12-04 14:31:06 -06:00
Field G. Van Zee	3c52725693	Renamed/moved l3 zen ukernels to haswell kernel set. Details: - Renamed the microkernels in kernels/zen/3 to kernels/haswell/3 and then updated the file contents to use the 'haswell' infix. - Updated bli_cntx_init_zen.c and bli_cntx_init_haswell.c according to above function renames. - Moved/updated the corresponding prototypes in bli_kernels_zen.h to bli_kernels_haswell.h. - Updated config_registry according to above changes. - NOTE: This rename reflects the fact that haswell microkernels are specifically written to overcome the floating-point latency for FMA instructions on Intel Haswell-like architectures, which can issue two FMA instructions per cycle. These ukernels happen to work fine on AMD Zen-based architectures. However, Zen only issues one FMA per cycle, which, while halving its floating-point throughput, gives it extra flexibility in the design of its microkernels--namely, mr and nr can be smaller and still overcome the floating-point latency for those single-issue cores. A smaller value of mr and nr allows for a larger value of kc, which may be useful in some situations. In the future, we may write such Zen-specific microkernels to take advantage of this additional flexibility.	2018-10-17 14:56:22 -05:00
Ye Luo	6722ec2181	Fix bgclang compilation on BGQ (#270 ) * Fix bgq kernels * Support bgq with bgclang	2018-10-17 11:26:00 -05:00
Field G. Van Zee	53a9ab1c85	Renamed thread auto-factorization macro constants. Details: - Renamed the following C preprocessor macros whose fallback/default values are specified within frame/include/bli_kernel_macro_defs.h: BLIS_DEFAULT_MR_THREAD_MAX -> BLIS_THREAD_MAX_IR BLIS_DEFAULT_NR_THREAD_MAX -> BLIS_THREAD_MAX_JR BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N - Renamed the above cpp macro overrides within the knl, skx, and zen sub-configurations, as well as invocations of those macros in bli_rntm.c. - Moved config/zen/bli_kernel.h to an 'old' directory as it is no longer used by any code within BLIS.	2018-10-10 15:11:09 -05:00
Field G. Van Zee	e249a00a82	Imported skx dgemm ukernel from skx-redux branch. Details: - Added the new bli_dgemm_skx_asm_16x14.c microkernel from the skx-redux branch, along with appropriate blocksizes in bli_cntx_init_skx.c and a prototype in bli_kernels_skx.h. (Devin has not yet written the sgemm analague, so for now we will continue using the older sgemm ukernel.) - Updated frame/include/bli_x86_asm_macros.h with a minor change that was present within the skx-redux branch.	2018-09-10 16:48:35 -05:00
Field G. Van Zee	cc2cca4f56	Merge branch 'dev'	2018-09-06 17:12:13 -05:00
Field G. Van Zee	fb81c7fc66	Defined cortexa53 sub-configuration. Details: - Added a new sub-configuration 'cortexa53', which is a mirror image of cortexa57 except that it will use slightly different compiler flags. Thanks to Mathieu Poumeyrol for making this suggestion after discovering that the compiler flags being used by cortexa57 were not working properly in certain OS X environments (the fix to which is currently pending in pull request #245).	2018-09-06 16:29:39 -05:00
Mathieu Poumeyrol	97965b0905	cortexa9 and cortexa53 travis build + qemu test (#245 )	2018-09-06 14:10:29 -05:00
Field G. Van Zee	4fa4cb0734	Trivial comment header updates. Details: - Removed four trailing spaces after "BLIS" that occurs in most files' commented-out license headers. - Added UT copyright lines to some files. (These files previously had only AMD copyright lines but were contributed to by both UT and AMD.) - In some files' copyright lines, expanded 'The University of Texas' to 'The University of Texas at Austin'. - Fixed various typos/misspellings in some license headers.	2018-08-29 18:06:41 -05:00
Field G. Van Zee	8e10cac5f3	Updates to CREDITS, RELEASING, config/README.md. Details: - Added individuals' github handles to CREDITS file. - Updated RELEASING, config/README.md files.	2018-07-27 14:45:35 -05:00
Field G. Van Zee	89e178ce38	Merge branch 'master' into dev	2018-07-04 17:51:16 -05:00
Isuru Fernando	14648e1376	Native windows support using clang (#227 ) * Add appveyor file * Build script * Remove fPIC for now * copy as * set CC and CXX * Change the order of immintrin.h * Fix testsuite header * Move testsuite defs to .c * Fix appveyor file * Remove fPIC again and fix strerror_r missing bug * Remove appveyor script * cd to blis directory * Fix sleep implementation * Add f2c_types_win.h * Fix f2c compilation * Remove rdp and rename appveyor.yml * Remove setenv declaration in test header * set CPICFLAGS to empty * Fix another immintrin.h issue * Escape CFLAGS and LDFLAGS * Fix more ?mmintrin.h issues * Build x86_64 in appveyor * override LIBM LIBPTHREAD AR AS * override pthreads in configure * Move windows definitions to bli_winsys.h * Fix LIBPTHREAD default value * Build intel64 in appveyor for now	2018-07-04 17:48:42 -05:00
Field G. Van Zee	195480beb5	Merge branch 'master' into dev	2018-06-25 13:24:21 -05:00
Field G. Van Zee	d4a22702c7	Set up haswell config for optional col-pref ukrs. Details: - Added two presently-disabled cpp blocks in bli_cntx_init_haswell.c to easily allow one to switch to a set of column-preferential gemm microkernels (in the haswell subconfiguration). The second column- preferring block sets the the register blocksizes to their appropriate values. However, cache blocksizes are left unchanged, and therefore are likely suboptimal. This should be addressed later.	2018-06-19 14:54:57 -05:00
Field G. Van Zee	ed2c8aed84	Temporarily disabled small matrix handling on zen. Details: - Disabled small matrix handling in config/zen/bli_family_zen.h due to what appears to be a bug that manifests as failures in the single and double precision real level-3 BLAS test drivers (visible via out.sblat3 and out.dblat3). Thanks to Robin Christ for reporting this issue.	2018-06-18 11:49:34 -05:00
Field G. Van Zee	dbaf440540	Merge branch 'master' into dev	2018-06-11 12:37:04 -05:00
Field G. Van Zee	262a62e348	Fixed undefined ref in steamroller/excavator configs. Details: - Fixed erroneous calls to bli_cntx_init_piledriver_ref() in bli_cntx_init_steamroller() and bli_cntx_init_excavator(), which should have been to their respectively-named bli_cntx_init_*() functions instead. Thanks to qnerd for bringing these bugs to our attention.	2018-06-08 12:10:54 -05:00
Devin Matthews	850a8a46c0	Test all x86_64 configurations... (#212 ) Add custom SDE cpuid files. * Set up testing of all x86_64 architectures (except bulldozer) using SDE. * Update .travis.yml [ci skip] * Update do_testsuite.sh [ci skip] * Updated .travis.yml with my secret token. Details: - Replaced Devin's temporary secret token with my own, which is used by Travis when accessing the Intel SDE via Dropbox. * Work around CPUID dispatch in glibc/libm by patching ld.so. * Detect path of loader at runtime. * Attempt to make SDE run on Travis * Allow unpatched ld.so if we don't know how to patch it. I think this only happens for older glibc without the multi-arch stuff (e.g. Ubuntu 14.04 on Travis), but who knows? * Upgrade Travis to gcc-6 and binutils-2.26. * Try to get Travis to use the right assembler. * Apparently you need ld-2.26 too. * Try to also patch ld.so from Ubuntu 14.04. * Take the nuclear option. * Account for non-absolute dependencies in ldd output. * String manipulation fail. * Update patch-ld-so.py * Add Zen to SDE testing. * Removed dead variable from travis/do_testsuite.sh. Details: - Removed 'BLIS_ENABLE_TEST_OUTPUT=yes' from make invocations in travis/do_testsuite.sh. This variable is no longer present in the BLIS build system (if it ever was?), and therefore has no effect.	2018-05-29 13:51:21 -05:00

1 2 3 4 5 ...

327 Commits