amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Author	SHA1	Message	Date
Devin Matthews	8823f91a14	Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143 .	2017-07-20 10:04:34 -05:00
Field G. Van Zee	1f1ec0db93	Updated ar option list used by all configurations. Details: - Dropped 'u' from the list of modifiers passed into the library archiver ar. Previously, "cru" was used, while now we employ only "cr". This change was prompted by a warning observed on Ubuntu 16.04: ar: `u' modifier ignored since `D' is the default (see `U') This caused me to realize that the default mode causes timestamps to be zero, and thus the 'u' option, which causes only changed object files to be inserted, is not applicable.	2017-07-19 15:40:48 -05:00
Field G. Van Zee	5caaba2d61	Added --force-version=STRING option to configure. Details: - Added an option to configure that allows the user to force an arbitrary version string at configure-time. The help text also now describes the usage information. - Changed the way the version string is communicated to the Makefile. Previously, it was read into the VERSION variable from the 'version' file via $(shell cat ...). Now, the VERSION variable is instead set in config.mk (via a configure-substituted anchor from config.mk.in).	2017-07-19 13:51:53 -05:00
Field G. Van Zee	13175c5fb7	Updated openmp/pthread barriers with GNU atomics. Details: - Updated the non-tree openmp and pthreads barriers defined in bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new implementation goes through the same motions as the previous codes, but protects its loads and increments with GNU atomic built-ins. These atomic statements take memory ordering parameters that allow us to specify just enough constraints for the barrier to work as intended on weakly-ordered hardware. The prior implementation was only guaranteed to work on systems with strongly- ordered memory. (Thanks to Devin Matthews for suggesting this change and his crash-course in atomics and memory ordering.) - Removed 'volatile' from structs' barrier field declarations in bli_thrcomm_.h. - Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields consistent with that of the _openmp.? files. - Updated other bli_thrcomm_ files to rename "communicator" variables to simply "comm".	2017-07-18 17:56:00 -05:00
Field G. Van Zee	0e58ba1b3a	Added API to set mt environment variables. Details: - Renamed bli_env_get_nway() -> bli_thread_get_env(). - Added bli_thread_set_env() to allow setting environment variables pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS. - Added the following convenience wrapper routines: bli_thread_get_jc_nt() bli_thread_get_ic_nt() bli_thread_get_jr_nt() bli_thread_get_ir_nt() bli_thread_get_num_threads() bli_thread_set_jc_nt() bli_thread_set_ic_nt() bli_thread_set_jr_nt() bli_thread_set_ir_nt() bli_thread_set_num_threads() - Added #include "errno.h" to bli_system.h. - This commit addresses issue #140. - Thanks to Chris Goodyer for inspiring these updates.	2017-07-17 19:03:22 -05:00
Field G. Van Zee	72c8b49bb8	Merge pull request #138 from hominhquan/membrk_set_free_fp Set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers	2017-07-12 14:58:12 -05:00
Minh Quan HO	ba7cada51a	set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is not set in bli_membrk_init	2017-07-07 10:52:05 +02:00
Devin Matthews	70cc825b55	Update LICENSE Remove totally unnecessary first 9 lines and hopefully get Github to recognize it as 3BSD [ci skip].	2017-06-06 21:58:21 -05:00
Devin Matthews	cf54c77bc7	Add new SSI acknowledgment	2017-06-06 20:23:17 -05:00
Field G. Van Zee	6e04f9df01	Restored deleted lines from makefile fragments.	2017-05-17 13:03:52 -05:00
Devin Matthews	ec5c0c0448	Change to /bin/sh. All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.	2017-05-17 12:29:44 -05:00
Devin Matthews	555ddc30d4	Remove shebangs from makefiles.	2017-05-17 12:27:14 -05:00
Devin Matthews	f26bd7f42e	Merge pull request #128 from iotamudelta/master Portability and clang	2017-05-17 11:58:41 -05:00
J M Dieterich	169fb05f22	Fix if/else structure. Thanks to TravisCI.	2017-05-16 23:11:22 -04:00
J M Dieterich	0579dfea0b	Restore version.	2017-05-16 22:58:07 -04:00
J M Dieterich	a75b05c23d	Mark piledriver compilable w/ clang.	2017-05-16 22:23:27 -04:00
J M Dieterich	7541d46e2b	Mark bulldozer compilable w/ clang.	2017-05-16 22:12:12 -04:00
J M Dieterich	91f897073e	Correct error message.	2017-05-16 22:06:59 -04:00
J M Dieterich	f5131e1e49	Indeed once can compile for carrizo also using clang.	2017-05-16 22:03:23 -04:00
J M Dieterich	5fa4e9439c	A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash	2017-05-16 21:50:49 -04:00
Tyler Michael Smith	cbf8710a1b	Merge pull request #127 from devinamatthews/fix_blis_nt_xx Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS	2017-05-08 11:21:20 -05:00
Field G. Van Zee	cf39d3ef3b	Fixed a bug in norm1v, norm1m. Details: - Fixed a bug that manifested as improperly-computed 1-norm for vectors and matrices. This is one of the few operations in BLIS that does not have its own test module within the testsuite, hence why it went undetected for so long. The bad 1-norms were being used to normalize matrices in the testsuite after initialization, which led to some matrices containing a combination of "large" and "small" values. This tended to push the residuals computed after each test away from zero. In some cases, they were off just enough to the testsuite to label it a "failure". Many thanks to Jeff Hammond for reporting this bug. (Wonky details: the bug was due to improperly-defined level-0 scalar macros for abval2, an operation that computes the absolute square, or complex magnitude/modulus. Certain complex domain instances of abval2 were being incorrectly defined in terms of real-only solutions, leading to bad results. This level-0 operation forms the basis of norm1v/norm1m. absq2 was also affected, but almost nothing uses this operation.)	2017-05-05 15:06:56 -05:00
Devin Matthews	799485124f	Merge pull request #121 from jeffhammond/not-real-knl allow KNL build without hbwmalloc (i.e. emulated)	2017-05-04 10:52:09 -05:00
Devin Matthews	fdc66f12d4	Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123 .	2017-05-04 10:36:04 -05:00
Field G. Van Zee	773a24efb2	Merge branch 'master' of github.com:flame/blis	2017-05-03 15:07:59 -05:00
Field G. Van Zee	dd58c9545c	Disable complex 3m/4m in testsuite by default. Details: - Disabled testsuite tests of all level-3 implementations based on 3m and 4m. This will improve testing runtime on Travis CI as well as for anyone manually running the testsuite using default test parameters. Thanks to Devin Matthews for suggesting this change.	2017-05-03 15:04:51 -05:00
Jeff Hammond	0df3541f54	allow KNL build without hbwmalloc.h (i.e. emulated) we want to be able to run BLIS KNL binaries on non-KNL machines via SDE. although it is possible to install hbwmalloc implementation on such systems, it is easier not to, since obviously the performance of SDE execution is not representative so there is no reason to emulate HBW allocation.	2017-05-02 19:35:38 -07:00
Field G. Van Zee	b88542591d	Merge pull request #107 from jeffhammond/intel-compilers-no-use-libm never use libm with Intel compilers	2017-05-02 19:22:41 -05:00
Field G. Van Zee	43007f7b65	Fixed stray parentheses in README citations.	2017-05-02 16:48:43 -05:00
Field G. Van Zee	a4f1d0b880	CHANGELOG update (0.2.2)	2017-05-02 16:38:43 -05:00
Field G. Van Zee	940a707ac7	Version file update (0.2.2) 0.2.2	2017-05-02 16:38:42 -05:00
Field G. Van Zee	d5a5e003ea	Fixed a trsm1m bug that affected right-side cases. Details: - Fixed a bug introduced in `1c732d3` that affected trsm1m_r. The result was nondeterministic behavior (usually segmentation faults) for certain problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c which explicitly directed the virtual gemm micro-kernel to use temporary space if the storage preference of the [real domain] gemm ukernel did not match the storage of the output matrix C. In the context of gemm, this handling is not needed because agreement between the storage pref and the matrix is guaranteed by a high-level optimization in BLIS. However, this optimization is not applied to trsm because the storage of C is not necessarily the same as the storage of the micro-panels of B--both of which are updated by the micro-kernel during a trsm operation. Thus, the guarantee of storage/preference agreement is not in place for trsm, which means we must handle that case within the virtual gemm micro-kernel. - Comment updates and a minor macro change to bli_trsm*_cntx_init() for 3m1, 4m1a, and 1m.	2017-05-02 15:48:30 -05:00
Field G. Van Zee	e80993e71f	Merge branch 'master' into 1m	2017-05-02 12:30:28 -05:00
Field G. Van Zee	ca3a792477	README.md update. Details: - Updated bibtex entries for 4th BLIS paper, and adds entries for 5th and 6th BLIS papers.	2017-05-02 12:09:39 -05:00
Field G. Van Zee	6e7de6ef84	Minor updates to test/3m4m. Details: - Updated initial problem size and increment in Makefile. - Updated code in test_gemm.c to correctly query kc from context.	2017-03-17 12:10:24 -05:00
Field G. Van Zee	f484c6cd43	Whitespace reformatting to armv8a kernels file. Details: - Updated formatting of function signature/header in kernels/armv8a/3/bli_gemm_opt_4x4.c.	2017-03-17 12:07:27 -05:00
Field G. Van Zee	a509fbd5ac	Merge branch 'master' into 1m	2017-02-21 17:06:16 -06:00
Field G. Van Zee	69b4846ae9	Disabled experiment-related 1m code. Details: - Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was specifically inserted to facilitate the benchmarking of 1m block-panel and panel-block algorithms. - Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to reflect changes used/needed during benchmarking.	2017-02-21 15:33:39 -06:00
Devin Matthews	513944e4a9	Merge pull request #118 from devinamatthews/master Handle k=0 correctly in KNL dgemm ukernel.	2017-02-20 10:04:33 -05:00
Devin Matthews	0e18f68cf1	Handle k=0 correctly in KNL dgemm ukernel.	2017-02-20 09:03:21 -06:00
Devin Matthews	8b462a0e8c	Merge pull request #117 from devinamatthews/master Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.	2017-02-19 23:03:03 -05:00
Devin Matthews	7d42fc0796	Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.	2017-02-19 21:10:55 -05:00
Field G. Van Zee	c362afc525	Added missing "level-0" BLAS [sd]cabs1_(). Details: - Fixed issue #115 by adding implementations for scabs1_() and dcabs1_() to the BLAS compatibility layer. Thanks to heroxbd for pointing out their absence.	2017-02-09 11:54:59 -06:00
Field G. Van Zee	018180c938	Fixed a minor bug in configure (issue #114 ). Details: - Fixed a bug in the configure script whereby a non-preferred value for --enable-threading would cause problems in common.mk vis-a-vis detecting which threading model was chosen. Thanks to heroxbd for reporting this issue.	2017-02-08 11:20:52 -06:00
Devin Matthews	ddf45e7177	Merge pull request #113 from devinamatthews/knl_thread_params Change default threading parameters for KNL.	2017-01-27 14:25:40 -06:00
Devin Matthews	78e1b16e16	Change default threading parameters for KNL.	2017-01-27 14:22:20 -06:00
Field G. Van Zee	1c732d3ddc	Added 1m-specific APIs for bp, pb gemm algorithms. Details: - Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the body of bli_gemm_cntl_create() replaced with a call to the former. - Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now, bli_cntl_free() can check if the thread parameter is NULL, and if so, call the latter, and otherwise call the former. - Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in terms of bli_gemm1mxx_cntx_init(), which behaves the same as bli_gemm1m_cntx_init() did before, except that an extra bool parameter (is_pb) is used to support both bp and pb algorithms (including to support the anti-preference field described below). - Added support for "anti-preference" in context. The anti_pref field, when true, will toggle the boolean return value of routines such as bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of causing BLIS to transpose the operation to achieve disagreement (rather than agreement) between the storage of C and the micro-kernel output preference. This disagreement is needed for panel-block implementations, since they induce a transposition of the suboperation immediately before the macro-kernel is called, which changes the apparent storage of C. For now, anti-preference is used only with the pb algorithm for 1m (and not with any other non-1m implementation). - Defined new functions, bli_cntx_l3_ukr_eff_prefers_storage_of() bli_cntx_l3_ukr_eff_dislikes_storage_of() bli_cntx_l3_nat_ukr_eff_prefers_storage_of() bli_cntx_l3_nat_ukr_eff_dislikes_storage_of() which are identical to their non-"eff" (effectively) counterparts except that they take the anti-preference field of the context into account. - Explicitly initialize the anti-pref field to FALSE in bli_gks_cntx_set_l3_nat_ukr_prefs(). - Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel in terms of the existing block-panel macro-kernel _ker_var2(). This technique requires inducing transposes on all operands and swapping the A and B. - Changed bli_obj_induce_trans() macro so that pack-related fields are also changed to reflect the induced transposition. - Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily specify the 1m algorithm (block-panel or panel-block). - Renamed the following cntx_t-related macros: bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block() bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel() bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel() and updated all instantiations. Also updated the field names in the cntx_t struct. - Comment updates.	2017-01-25 16:25:46 -06:00
Field G. Van Zee	a6ab91bc61	Merge pull request #111 from figual/master Fixed missing cntx argument in ARMv8 microkernels.	2016-11-30 09:26:58 -06:00
Francisco Igual	7f31a6307b	Fixed missing cntx argument in ARMv8 microkernels.	2016-11-27 14:40:47 +01:00
Field G. Van Zee	126482a3b6	Implemented the 1m method. Details: - Implemented the 1m method for inducing complex domain matrix multiplication. 1m support has been added to all level-3 operations, including trsm, and is now the default induced method when native complex domain gemm microkernels are omitted from the configuration. - Updated _cntx_init() operations to take a datatype parameter. This was needed for the corresponding function for 1m (because 1m requires us to choose between column-oriented or row-oriented execution, which requires us to query the context for the storage preference of the gemm microkernel, which requires knowing the datatype) but I decided that it made sense for consistency to add the parameter to all other cntx initialization functions as well, even though those functions don't use the parameter. - Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take a second scalar for each blocksize entry. The semantic meaning of the two scalars now is that the first will scale the default blocksize while the second will scale the maximum blocksize. This allows scaling the two independently, and was needed to support 1m, which requires scaling for a register blocksize but not the register storage blocksize (ie: "packdim") analogue. - Deprecated bli_blksz_reduce_dt_to() and defined two new functions, bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing default and maximum blocksizes to some desired blocksize multiple. These functions are needed in the updated definitions of bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs(). - Added support for the 1e and 1r packing schemas to packm, including 1e/1r packing kernels. - Added a minor optimization to bli_gemm_ker_var2() that allows, under certain circumstances (specifically, real domain beta and row- or column-stored matrix C), the real domain macrokernel and microkernel to be called directly, rather than using the virtual microkernel via the complex domain macrokernel, which carries a slight additional amount of overhead. - Added 1m support to the testsuite. - Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified some code in test_gemm.c driver.	2016-11-25 18:29:49 -06:00

1 2 3 4 5 ...

823 Commits