amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-14 03:02:08 +00:00

Author	SHA1	Message	Date
Marat Dukhan	1016383307	Fix Emscripten builds	2017-12-11 12:08:58 +05:30
Minh Quan HO	c09b30d115	set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is not set in bli_membrk_init	2017-12-11 12:08:58 +05:30
sthangar	997628ed97	Reducing the framework overhead of GEMV routines Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684	2017-12-11 12:08:58 +05:30
Kiran Varaganti	ee86906616	Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024. Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4	2017-12-11 12:08:58 +05:30
Devin Matthews	7b933b90b1	Add new SSI acknowledgment	2017-12-11 12:08:21 +05:30
sthangar	3485abba4b	Checked in the small matrix code to compute GEMM called with A transpose case Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462	2017-12-11 12:07:31 +05:30
Devin Matthews	de16beb83b	PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.	2017-12-11 12:07:31 +05:30
Devin Matthews	25d0e61854	Revert "Change PACKDIM_MR (double) for haswell to 8." This reverts commit `681eec913d`.	2017-12-11 12:07:31 +05:30
Devin Matthews	c5bdd84b35	Change PACKDIM_MR (double) for haswell to 8.	2017-12-11 12:07:31 +05:30
Field G. Van Zee	172789d562	Restored deleted lines from makefile fragments.	2017-12-11 12:07:31 +05:30
Devin Matthews	3ea9bd2c8e	Change to /bin/sh. All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.	2017-12-11 12:07:31 +05:30
Devin Matthews	49438409ee	Remove shebangs from makefiles.	2017-12-11 12:07:31 +05:30
J M Dieterich	497e264047	Fix if/else structure. Thanks to TravisCI.	2017-12-11 12:07:31 +05:30
J M Dieterich	835035c56a	Mark piledriver compilable w/ clang.	2017-12-11 12:06:40 +05:30
J M Dieterich	6cdb533472	Mark bulldozer compilable w/ clang.	2017-12-11 12:06:40 +05:30
J M Dieterich	a85697d622	Correct error message.	2017-12-11 12:06:40 +05:30
J M Dieterich	e0c64cad27	Indeed once can compile for carrizo also using clang.	2017-12-11 12:06:40 +05:30
J M Dieterich	4aafe0505d	A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash	2017-12-11 12:06:40 +05:30
Field G. Van Zee	abaeaa68ea	Fixed a bug in norm1v, norm1m. Details: - Fixed a bug that manifested as improperly-computed 1-norm for vectors and matrices. This is one of the few operations in BLIS that does not have its own test module within the testsuite, hence why it went undetected for so long. The bad 1-norms were being used to normalize matrices in the testsuite after initialization, which led to some matrices containing a combination of "large" and "small" values. This tended to push the residuals computed after each test away from zero. In some cases, they were off just enough to the testsuite to label it a "failure". Many thanks to Jeff Hammond for reporting this bug. (Wonky details: the bug was due to improperly-defined level-0 scalar macros for abval2, an operation that computes the absolute square, or complex magnitude/modulus. Certain complex domain instances of abval2 were being incorrectly defined in terms of real-only solutions, leading to bad results. This level-0 operation forms the basis of norm1v/norm1m. absq2 was also affected, but almost nothing uses this operation.)	2017-12-11 12:05:22 +05:30
Devin Matthews	cc3107ae1c	Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123 .	2017-12-11 12:05:22 +05:30
Field G. Van Zee	c8ab91f70d	Disable complex 3m/4m in testsuite by default. Details: - Disabled testsuite tests of all level-3 implementations based on 3m and 4m. This will improve testing runtime on Travis CI as well as for anyone manually running the testsuite using default test parameters. Thanks to Devin Matthews for suggesting this change.	2017-12-11 12:05:22 +05:30
Jeff Hammond	9700f0e578	allow KNL build without hbwmalloc.h (i.e. emulated) we want to be able to run BLIS KNL binaries on non-KNL machines via SDE. although it is possible to install hbwmalloc implementation on such systems, it is easier not to, since obviously the performance of SDE execution is not representative so there is no reason to emulate HBW allocation.	2017-12-11 12:05:22 +05:30
Field G. Van Zee	17dcd5a33f	Fixed stray parentheses in README citations.	2017-12-11 12:05:22 +05:30
Field G. Van Zee	2910d44ff9	CHANGELOG update (0.2.2)	2017-12-11 12:05:22 +05:30
Field G. Van Zee	5ca3863220	Fixed a trsm1m bug that affected right-side cases. Details: - Fixed a bug introduced in `1c732d3` that affected trsm1m_r. The result was nondeterministic behavior (usually segmentation faults) for certain problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c which explicitly directed the virtual gemm micro-kernel to use temporary space if the storage preference of the [real domain] gemm ukernel did not match the storage of the output matrix C. In the context of gemm, this handling is not needed because agreement between the storage pref and the matrix is guaranteed by a high-level optimization in BLIS. However, this optimization is not applied to trsm because the storage of C is not necessarily the same as the storage of the micro-panels of B--both of which are updated by the micro-kernel during a trsm operation. Thus, the guarantee of storage/preference agreement is not in place for trsm, which means we must handle that case within the virtual gemm micro-kernel. - Comment updates and a minor macro change to bli_trsm*_cntx_init() for 3m1, 4m1a, and 1m.	2017-12-11 12:03:07 +05:30
Field G. Van Zee	1af0b09f5c	README.md update. Details: - Updated bibtex entries for 4th BLIS paper, and adds entries for 5th and 6th BLIS papers.	2017-12-11 12:03:07 +05:30
Field G. Van Zee	db4a0bb8ba	Whitespace reformatting to armv8a kernels file. Details: - Updated formatting of function signature/header in kernels/armv8a/3/bli_gemm_opt_4x4.c.	2017-12-11 11:58:33 +05:30
Field G. Van Zee	e3eb01f6b9	Disabled experiment-related 1m code. Details: - Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was specifically inserted to facilitate the benchmarking of 1m block-panel and panel-block algorithms. - Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to reflect changes used/needed during benchmarking.	2017-12-11 11:58:33 +05:30
Field G. Van Zee	4f61528d56	Added 1m-specific APIs for bp, pb gemm algorithms. Details: - Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the body of bli_gemm_cntl_create() replaced with a call to the former. - Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now, bli_cntl_free() can check if the thread parameter is NULL, and if so, call the latter, and otherwise call the former. - Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in terms of bli_gemm1mxx_cntx_init(), which behaves the same as bli_gemm1m_cntx_init() did before, except that an extra bool parameter (is_pb) is used to support both bp and pb algorithms (including to support the anti-preference field described below). - Added support for "anti-preference" in context. The anti_pref field, when true, will toggle the boolean return value of routines such as bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of causing BLIS to transpose the operation to achieve disagreement (rather than agreement) between the storage of C and the micro-kernel output preference. This disagreement is needed for panel-block implementations, since they induce a transposition of the suboperation immediately before the macro-kernel is called, which changes the apparent storage of C. For now, anti-preference is used only with the pb algorithm for 1m (and not with any other non-1m implementation). - Defined new functions, bli_cntx_l3_ukr_eff_prefers_storage_of() bli_cntx_l3_ukr_eff_dislikes_storage_of() bli_cntx_l3_nat_ukr_eff_prefers_storage_of() bli_cntx_l3_nat_ukr_eff_dislikes_storage_of() which are identical to their non-"eff" (effectively) counterparts except that they take the anti-preference field of the context into account. - Explicitly initialize the anti-pref field to FALSE in bli_gks_cntx_set_l3_nat_ukr_prefs(). - Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel in terms of the existing block-panel macro-kernel _ker_var2(). This technique requires inducing transposes on all operands and swapping the A and B. - Changed bli_obj_induce_trans() macro so that pack-related fields are also changed to reflect the induced transposition. - Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily specify the 1m algorithm (block-panel or panel-block). - Renamed the following cntx_t-related macros: bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block() bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel() bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel() and updated all instantiations. Also updated the field names in the cntx_t struct. - Comment updates.	2017-12-11 11:58:33 +05:30
Field G. Van Zee	1d728ccb23	Implemented the 1m method. Details: - Implemented the 1m method for inducing complex domain matrix multiplication. 1m support has been added to all level-3 operations, including trsm, and is now the default induced method when native complex domain gemm microkernels are omitted from the configuration. - Updated _cntx_init() operations to take a datatype parameter. This was needed for the corresponding function for 1m (because 1m requires us to choose between column-oriented or row-oriented execution, which requires us to query the context for the storage preference of the gemm microkernel, which requires knowing the datatype) but I decided that it made sense for consistency to add the parameter to all other cntx initialization functions as well, even though those functions don't use the parameter. - Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take a second scalar for each blocksize entry. The semantic meaning of the two scalars now is that the first will scale the default blocksize while the second will scale the maximum blocksize. This allows scaling the two independently, and was needed to support 1m, which requires scaling for a register blocksize but not the register storage blocksize (ie: "packdim") analogue. - Deprecated bli_blksz_reduce_dt_to() and defined two new functions, bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing default and maximum blocksizes to some desired blocksize multiple. These functions are needed in the updated definitions of bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs(). - Added support for the 1e and 1r packing schemas to packm, including 1e/1r packing kernels. - Added a minor optimization to bli_gemm_ker_var2() that allows, under certain circumstances (specifically, real domain beta and row- or column-stored matrix C), the real domain macrokernel and microkernel to be called directly, rather than using the virtual microkernel via the complex domain macrokernel, which carries a slight additional amount of overhead. - Added 1m support to the testsuite. - Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified some code in test_gemm.c driver.	2017-12-11 11:55:31 +05:30
Jeff Hammond	0d1b90286e	never use libm with Intel compilers Intel compilers include a highly optimized math library (libimf) that should be used instead of GNU libm. yes, this change is for ALL targets, including those that are not supported by the Intel compiler. there is no harm in doing this, and it is future-proof in the event that the Intel compilers support other architectures.	2017-12-11 11:52:25 +05:30
Field G. Van Zee	b150870397	Removed most "old" directories. Details: - Removed the vast majority of directories named "old", which contained deprecated code that I wasn't quite ready to jettison from the source tree.	2017-12-08 16:08:41 -06:00
Field G. Van Zee	270c65985d	Modified bli_getopt() for thread-safety. Details: - Changed the interface of bli_getopt() to take a new argument, a getopt_t struct, that stores the values of optarg, optind, opterr, and optopt, and updated the implementation accordingly. (Previously, these variables were assumed to be global.) - Added a function for initializing a getopt_t struct. - Changed test_libblis.c--currently the only consumer of bli_getopt()--to utilize the new getopt_t state object.	2017-12-08 15:21:18 -06:00
Field G. Van Zee	ce4d8fabc2	Merge branch 'master' of github.com:flame/blis	2017-12-07 17:36:44 -06:00
Field G. Van Zee	39be59f2a8	Replaced several macros with static function APIs. Details: - Reimplemented several sets of get/set-style preprocessor macros with static functions, including those in the following frame/base headers: auxinfo, cntl, mbool, mem, membrk, opid, and pool. A few headers in frame/thread were touched as well: mutex_*, thrcomm, and thrinfo.	2017-12-07 17:35:20 -06:00
dnp	e05a8dfa7c	Merge branch 'rt'	2017-12-06 16:45:24 -06:00
dnp	4423e33dc5	Adding SKX kernels and configuration.	2017-12-06 16:35:03 -06:00
Field G. Van Zee	79507337e1	Various checks to ensure that arch_t id is in range. Details: - Expanded checking of the arch_t id in bli_gks.c--either passed in from the caller or as returned from bli_arch_query_id()--against the expected range of id values. Thanks to Devangi Parikh for suggesting these additional sanity checks.	2017-12-06 16:21:35 -06:00
Field G. Van Zee	fde7c1126c	Added 'uninstall-old-headers' target to Makefile. Details: - Defined a new 'uninstall-old-headers' target that allows users of BLIS to uninstall no-longer-needed headers left over from previous installations. - Fixed the 'uninstall-old' target so that it will install both .a and .so libraries. - Renamed 'uninstall-old' to 'uninstall-old-libs'. - Added 'uninstall-old' target (different from previous 'uninstall-old' target) that combines 'uninstall-old-libs' and 'uninstall-old-headers'.	2017-12-04 16:11:01 -06:00
Field G. Van Zee	d4ee770bde	Create/install monolithic cblas.h. Details: - When CBLAS is enabled at configure-time, BLIS now creates a monolithic cblas.h using the same flatten-header.sh script that was recently introduced for creating monolithic blis.h header files. The top-level Makefile will also install this cblas.h file into the install prefix alongside blis.h when the 'install' target is invoked. The two header files are compatible with one another. Regardless whether the user's source #includes cblas.h, both blis.h and cblas.h, or just blis.h, the user will get the CBLAS function prototypes and enums, as expected.	2017-12-04 14:53:43 -06:00
Field G. Van Zee	52f9e6f1b6	Merge branch 'rt'	2017-12-01 12:28:09 -06:00
Field G. Van Zee	21360dd8e2	Fixed cntx_t packm query when ker_id > _NUM_PACKM_KERS. Details: - Fixed a subtle bug in bli_cntx_get_[un]packm_ker_dt() in which the function fails to return NULL when passed a kernel id argument that is equal to or beyond BLIS_NUM_[UN]PACKM_KERS. Instead, the function was attempting to index into the cntx_t's packm kernel array, which resulted in undefined behvaior. Thanks to Devangi Parikh for finding this bug.	2017-11-29 14:11:34 -06:00
Field G. Van Zee	244a6f4e66	Fixed POSIX sed non-compliance in flatten-header.sh. Details: - Changed GNU usage of 'i' and 'a' sed commands used in flatten-header.sh to POSIX-compliant usage that will work on OS X's sed.	2017-11-28 17:48:48 -06:00
Field G. Van Zee	4507862167	Generate/compile with/install monolithic blis.h. Details: - Rewrote monolithify-header.sh (and renamed to flatten-header.sh) so that headers are inserted recursively. This improves performance by a factor of 3-4x. - Modified configure to create an 'include/<configname>' directory in which make can create a monolithic header. - Modified the top-level Makefile so that a monolithic header is generated unconditionally prior to compilation (stored in include/<configname>) and so that the single header is installed instead of the 450 or so header files that reside throughout the framework source tree. - Added "include//.h" to .gitignore file. - Removed some pnacl/emscripten leftovers that I intended to include in `a1caeba` (mostly in testsuite/Makefile). - Trivial comment changes to frame/include/bli_f2c.h.	2017-11-28 15:16:22 -06:00
Field G. Van Zee	1f30b1301b	Added missing framework support for x86_64 family. Details: - Added support for the x86_64 configuration family to bli_arch.c and bli_arch_config.h. Thanks to Johannes Dieterich for reporting this issue. - Bumped the default value for BLIS_SIMD_NUM_REGISTERS from 16 to 32 and the default value for BLIS_SIMD_SIZE from 32 to 64. This will support configuration families that include Skylake and newer processors without any supported needed in the bli_family_*.h file. The semantics of these values have always been "maximum" and not exact values; comments in bli_kernel_macro_defs.h and the github wiki have been adjusted accordingly.	2017-11-25 16:54:26 -06:00
Field G. Van Zee	9f39806c4e	Fixed a bug in e31f0b3/b131b9a. Details: - Erroneously placed the "don't overwrite existing blocksize" logic in bli_blksz_init() rather than in bli_cntx_set_blkszs(). It belongs in the latter because that function copies blocksizes as-is from the blksz_t function argument to the appropriate field in the cntx_t. If the blksz_t was previously initialized selectively, based on the sign of the blocksize value passed into bli_blksz_init(), that just leaves some fields possibly uninitialized (with garbage values), which definitely will not work. - The aforementioned logic has been moved to bli_cntx_set_blkszs() via a new function bli_blksz_copy_if_pos(), which selectively copies only the blocksizes that are greater than zero.	2017-11-21 16:03:56 -06:00
Field G. Van Zee	b131b9a025	Updated configs to omit setting some blocksizes. Details: - Employ the new semantics of bli_blksz_init() in `e31f0b3` in various sub-configurations' bli_cntx_init_() functions by passing in 0 for register and cache blocksizes that correpond to gemm microkernel datatypes that were not registered, allowing the default values set by the bli_cntx_init_*_ref() function call to remain.	2017-11-21 14:30:26 -06:00
Field G. Van Zee	499a4c002f	Merge branch 'rt' of github.com:flame/blis into rt	2017-11-21 14:25:08 -06:00
Field G. Van Zee	e31f0b3e2d	Subtle update to bli_blksz_init*() API. Details: - Updated the semantics of bli_blksz_init() and bli_blksz_init_ed() so that non-positive blocksize values are ignored entirely. This provides an easy way to indicate that certain existing values should not be touched by the update. Thanks to Devangi Parikh for feedback that led to these changes.	2017-11-21 14:21:25 -06:00
Field G. Van Zee	6c3ba502a1	Added 'x86_64' sub-config directory. Details: - Added missing x86_64 configuration directory, which was intended to be part of `b7ca580`. - Added -Wfatal-errors compiler warning flag to all configurations so that compilation stops after the first error. - Changed the vectorization flags for intel64 configuration to be compatible with 'penryn', the oldest sub-config included in that family. - Changed the vectorization flags for penryn to target the 'core2' microarchitecture and ssse3.	2017-11-21 13:50:53 -06:00

... 16 17 18 19 20 ...

1886 Commits