amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	fd4ac636d9	Unimplemented kernels now call reference. Details: - Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that unimplemented kernel functions simply call the corresponding reference implementation. (Previously, these unimplemented functions would abort() with a "not yet implemented" message.)	2013-12-02 13:50:36 -06:00
Field G. Van Zee	e65c476284	Minor updates to packm_blk_var2.c and _blk_var3.c. Details: - Comment updates to packm_blk_var2.c and packm_blk_var3.c. - In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly instead of setm(), scal2m().	2013-11-19 10:05:35 -06:00
Field G. Van Zee	9e1d0d4bca	Added trsm_l, trsm_u ukernels for x86_64/core2. Details: - Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2). These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels that already existed in kernels/x86_64/core2-sse3/3.	2013-11-18 18:11:07 -06:00
Field G. Van Zee	85e7e02ea3	Merge branch 'master'. Forgot to git-pull.	2013-11-18 12:02:00 -06:00
Field G. Van Zee	67761e224c	Attempting to fix errors in bgq build. Details: - Removed restrict declaration from b_cast and c_cast from bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they are causing problems for xlc only in those two files and no other macro-kernels. - Fixed (hopefully) kernel function parameter type declarations in kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.	2013-11-18 11:57:40 -06:00
Field G. Van Zee	707200541d	Syntax error fix in x86_64/core2 gemmtrsm_u ukr.	2013-11-18 11:17:31 -06:00
Field G. Van Zee	bbe2b84a49	Updated Makefile in test, testsuite. Details: - Updated Makefiles in test and testsuite directories to use the new BLIS header installation directory scheme, which is to compile with -I<PREFIX>/include/blis instead of -I<PREFIX>/include.	2013-11-18 11:11:06 -06:00
Field G. Van Zee	9bd7fcfd43	Outer-to-inner 'restrict' fix in macro-kernels. Details: - Fixed sloppy placement of 'restrict' pointer declarations in level-3 macro-kernels. Previously, all restricted pointers were being declared at the outer-most function scope level. While this violates the C99 standard, very few of the compilers used with BLIS so far have seemed to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith for identifying this bug (and suggesting the fix).	2013-11-18 10:58:09 -06:00
Field G. Van Zee	50549a6a31	Changed header install directory to include/blis. Details: - Changed top-level Makefile so that headers are installed to $(INSTALL_PREFIX)/include/blis/. (Header directories are no longer named by version/configuration and then symlinked.) - Added uninstall targets, including uninstall-old to clean out old library archives. - Added GREP makefile definitions to all configurations' make_defs.mk.	2013-11-17 18:31:27 -06:00
Field G. Van Zee	d70733abdd	Added ARM kernels, configurations. Details: - Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15. Thanks to Francisco Igual for contributing these kernels and configurations.	2013-11-16 17:34:25 -06:00
Field G. Van Zee	d37c2cff62	Minor comment and Makefile changes. Details: - Added missing 'check-config' and 'check-make-defs' targets to testsuite/Makefile. - Removed unused 'test' target from top-level Makefile. - Comment changes to testsuite input files.	2013-11-13 10:47:11 -06:00
Field G. Van Zee	19885f893a	Updated some kernel comment headers. Details: - Updated bgq and piledriver comment headers to use BLIS copyright header instead of libflame.	2013-11-11 12:09:21 -06:00
Field G. Van Zee	1a4d698f42	CHANGELOG update (for 0.1.0).	2013-11-11 10:15:40 -06:00
Field G. Van Zee	089048d589	Added object wrappers to 1f test suite modules. Details: - Added missing object wrappers to level-1f test suite modules. This was only apparent if you were configuring with something other than the reference configuration. - Commented out object-wrappers in level-1f front-ends. These were not working as intended the reference configuration was selected, because most kernel sets, such as those in the template set, do not have object wrappers. - Whitespace changes to template micro-kernels. - Comment changes to template level-1f kernel headers. 0.1.0	2013-11-09 17:18:00 -06:00
Field G. Van Zee	9ef3752079	Updated template kernels wrt KernelsHowTo wiki. Details: - Merged latest state of KernelsHowTo wiki into template micro-kernels located in config/template/kernels/3.	2013-11-08 17:20:47 -06:00
Field G. Van Zee	376bbb59c8	Removed support for duplication. Details: - Removed support for duplication from the gemmtrsm/trsm micro-kernels and all framework code. - Updated test suite modules according to above changes.	2013-11-08 11:17:34 -06:00
Field G. Van Zee	68a5910974	Added comments to testsuite/input.operations. Details: - Added extensive comments to the top of testsuite/input.operations, which describe how to edit the file. - Removed input.operations.0 and input.operations.1. - Changed input.general to test all datatypes ("sdcz") by default.	2013-11-07 11:36:11 -06:00
Field G. Van Zee	a98f78b715	Changed dim_t and inc_t to be signed integers. Details: - Redefined dim_t and inc_t in terms of gint_t (instead of guint_t). This will facilitate interoperability with Fortran in the future. (Fortran does not support unsigned integers.) - Redefined many instances of stride-related macros so that they return or use the absolute value of the strides, rather than the raw strides which may now be signed. Added new macros bli_is_row_stored_f() and bli_is_col_stored_f(), which assume positive (forward-oriented) strides, and changed the packm_blk_var[23] variants to use these macros instead of the existing bli_is_row_stored(), bli_is_col_stored(). - Added/adjusted typecasting to to various functions/macros, including bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer- related macros in bli_param_macro_defs.h. - Redefined bli_convert_blas_incv() macro so that the BLAS compatibility layer properly handles situations where vector increments are negative. Thanks to Vladimir Sukharev for pointing out this issue. - Changed type of increment parameters in bli_adjust_strides() from dim_t to inc_t. Likewise in bli_check_matrix_strides(). - Defined bli_check_matrix_object(), which checks for negative strides. - Redefined bli_check_scalar_object() and bli_check_vector_object() so that they also check for negative stride. - Added instances of bli_check_matrix_object() to various operations' _check routines.	2013-11-06 15:32:47 -06:00
Field G. Van Zee	1f8afc3e08	Minor comment update to BLAS compat files.	2013-11-06 10:09:10 -06:00
Field G. Van Zee	1abbf768af	Fixed bugs in scalv and setv. Details: - Fixed bugs similar to those addressed in `cca1e1f51d`, whereby a segmentation fault may occur if beta is not the same type as the vector operand for scalv and setv. - Changed axpyv and scal2v front-ends in a similar fashion.	2013-11-04 15:50:00 -06:00
Field G. Van Zee	f5953259a1	Fixed a bug related to Hermitian matrix diagonals. Details: - Fixed a bug whereby BLIS assumed that the imaginary components of the diagonal elements of Hermitian matrices were already zero. This property is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks to Vladimir Sukharev for reporting this bug. - Minor comment updates to template kernels.	2013-11-04 14:43:55 -06:00
Field G. Van Zee	d70f2b089d	Added scaling to abval2s, sqrt2s macros. Details: - Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow and overflow from squaring the real and imaginary components. (This is the same technique used to fix recent bugs in invscals/invscaljs and inverts.)	2013-11-02 17:19:40 -05:00
Field G. Van Zee	c5b1ed9409	Added new dotxaxpyf variant 2. Details: - Added a new variant for dotxaxpyf that is based on dotxf and axpyf kernels. By default, this variant is not used by any other operation.	2013-11-01 10:28:04 -05:00
Field G. Van Zee	97f89fbcf2	Fixed bug in complex invscals. Details: - Fixed complex inversion in invscals and invscaljs whereby the imaginary component was being computed incorrectly. - Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar in inverts, invscals, and invscaljs. - Changed bli_abs() and bli_fabs() macro definitions to use "<=" operator instead of "<".	2013-11-01 10:16:39 -05:00
Field G. Van Zee	eda42a21d1	Defined missing symbols in bla_rotg.c Details: - Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and d_abs() for completeness. Thanks to Vladimir Sukharev for reporting these bugs.	2013-10-31 18:00:44 -05:00
Field G. Van Zee	cca1e1f51d	Fixed bugs in scalm and setm. Details: - Fixed bugs in scalm and setm that resulted in segmentation faults when beta is not the same type as the matrix operand. Thanks to Vladimir Sukharev for reporting this bug. - Changed axpym and scal2m front-ends in fashion similar to that of scalm and setm; namely, the alpha scalar is copy-cast the type of the first matrix operand. - Changed the template and reference configurations' bli_config.h files so that the number of memory allocator blocks of A and B are set based on BLIS_MAX_NUM_THREADS. - Comment updates to bli_obj.c and variable rename in bla_nrm2.c.	2013-10-30 14:39:01 -05:00
Field G. Van Zee	2807013a47	Fixed over/under-flow in complex inversion. Details: - Fixed the complex bli_?inverts() macros, which were inverting elements in an "unsafe" manner, such that very large and very small values were unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for reporting this bug. - Comment update to bli_sumsqv_unb_var1.c. - Removed redundant bli_min() macro in bli_scalar_macro_defs.h. - Changed 1.0F to 1.0 for bli_drands() macro.	2013-10-24 14:32:20 -05:00
Field G. Van Zee	45a80c625f	Fixed parameter checking issue in BLAS syr[2]k. Details: - Fixed a minor parameter checking bug in the BLAS compatibility layer for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the trans parameter of either operation, it is (a) allowed, and (b) treated as 'T' (whereas previously it was disallowed). Thanks for Vladimir Sukharev for finding and reporting this bug.	2013-10-23 12:15:25 -05:00
Field G. Van Zee	a091a219bd	Minor fixes to piledriver configuration, ukernel. Details: - Applied a patch from Tyler that fixes minor staleness in the piledriver configuration and gemm micro-kernel. - Very minor changes to test suite input files.	2013-10-14 10:11:29 -05:00
Field G. Van Zee	dacdde27ae	Added Fran's Sandy Bridge kernels/configuration. Details: - Added a kernel directory for kernels developed by Francisco Igual for the Sandy Bridge architecture, including a dgemm ukernel coded with AVX intrinsics. - Added a configuration for Sandy Bridge using values supplied by Fran.	2013-10-11 11:37:19 -05:00
Field G. Van Zee	03106d650e	Fixed minor perf bug in gemm_ker_var2. Details: - Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not computed correctly (ie: do not wraparound) at the edge cases. Thanks to Tze Meng for helping me identify this bug.	2013-10-11 10:40:38 -05:00
Field G. Van Zee	b053337387	Added fusing factors, MR/NR to test suite output. Details: - Updated the test suite driver (and modules where appropriate) so that the level-1f fusing factors are output along with the variable dimension. While this is not strictly necessary, since the fusing factors are output in the initial parameter summary, it allows extra reassurance to the user since the fusing factors appear alongside the variable dimension, which together give a complete picture of the problem size. Similar changes were made for outputting the register blocksizes when reporting results for the micro-kernel test modules.	2013-10-10 18:26:55 -05:00
Field G. Van Zee	be4833bd91	Added test suite modules for level-1f, 3 kernels. Details: - Added test modules in test suite for level-1f kernels and level-3 micro-kernels. (Duplication in the micro-kernels, for now, is NOT supported by these test modules.) - Added section override switches to test suite's input.operations file. - Added obj_t APIs for level-1f front-ends and their unblocked variants to facilitate the level-1f test modules. Also added front-end for dupl operation. - Added obj_t-based check routines for level-1f operations, which are called from the new front-ends mentioned above. - Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing factors as a function of datatype, which is needed by their respective test modules. - Whitespace changes to bli_kernel.h of all existing configurations.	2013-10-10 14:20:06 -05:00
Field G. Van Zee	680188d46b	Cleaned up old test drivers. Details: - Minor updates to old test drivers in preparation for our participation in ACM TOMS's replicated results initiative.	2013-10-10 13:23:37 -05:00
Field G. Van Zee	3690bdd4f9	More updates to level-1f kernels for core2-sse3. Details: - Changed types in function signatures to match new prototypes. Meant to include this in previous commit.	2013-10-10 11:45:33 -05:00
Field G. Van Zee	661d5120cd	Fixed outdated fusing factor macros in 1f kernels. Details: - Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor macros. Meant to include this in `5e54f46c`. Thanks to Fran for pointing this out.	2013-10-10 11:27:27 -05:00
Field G. Van Zee	73aa1e9f31	Added section overrides to test suite. Details: - Added new lines of input to the test suite's input.operations file, which allows the user to disable entire sections (levels) of tests. Before this change, the user had to manually disable each operation tests's "master switch". (This is why input.operations.0 existed: to allow a more convenient starting point for someone who only wanted to test one or a few operations.)	2013-10-01 17:01:18 -05:00
Field G. Van Zee	5e54f46ccb	Added template implementations and other tweaks. Details: - Added a 'template' configuration, which contains stub implementations of the level 1, 1f, and 3 kernels with one datatype implemented in C for each, with lots of in-file comments and documentation. - Modified some variable/parameter names for some 1/1f operations. (e.g. renaming vector length parameter from m to n.) - Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files to bli_kernel.h. - Modifed test suite to print out fusing factors for axpyf, dotxf, and dotxaxpyf, as well as the default fusing factor (which are all equal in the reference and template implementations). - Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these reference variants were implemented in terms of front-end routines rather that directly in terms of the kernels. (For example, axpy2v was implemented as two calls to axpyv rather than two calls to AXPYV_KERNEL.) - Changed the interface to dotxf so that it matches that of axpyf, in that A is assumed to be m x b_n in both cases, and for dotxf A is actually used as A^T. - Minor variable naming and comment changes to reference micro-kernels in frame/3/gemm/ukernels and frame/3/trsm/ukernels.	2013-09-30 12:58:18 -05:00
Field G. Van Zee	97aaf220a8	Added new kernels, configurations. Details: - Added various micro-kernels for the following architectures: Intel MIC IBM BG/Q IBM Power7 AMD Piledriver Loogson 3A and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler, and Xianyi Zhang for contributing these kernels. - Added configurations corresponding to above architectures, and renamed "clarksville" configuration to "dunnington".	2013-09-17 10:51:36 -05:00
Field G. Van Zee	fe979c5a11	Removed default configuration behavior. Details: - Changed the configure script so that it no longer defaults to the reference configuration. This change is being made so that the developer has a firm awareness of which configuration is being used to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this suggested change.	2013-09-13 14:31:53 -05:00
Field G. Van Zee	da77e9614f	Minor improvements to static memory allocator. Details: - Expanded on cpp macro definitions from bli_mem.c and relocated them to a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded functionality includes computing the pool size for each datatype (using that datatype's cache blocksizes) and using the maximum to size the actual pool array. This addresses the somewhat common pitfall whereby a developer updates cache blocksizes in bli_kernel.h for only one datatype (say, single-precision real), while the memory pools are sized using the double-precision real values. Then, when the developer attempts to link to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with a message saying the static memory pool was exhausted. Clearly, this message is misleading when the pool was not sized properly to begin with. - Removed previously disabled code in bli_kernel_macro_defs.h that was meant to check for size consistency among the various cache blocksizes. (Obviously the memory pool size-based solution mentioned above is better.) - Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a reasonable place to put these constants, rather than further crowd up bli_config.h. - Updated testsuite driver to output memory pool sizes for A, B, and C. - Minor comment updates to bli_config.h. - Removed 'flame' configuration. It was beginning to get out-of-date, and I hadn't used it in months. We can always re-create it later.	2013-09-13 12:00:37 -05:00
Field G. Van Zee	631f347b7a	Added ESSL and Accelerate targets to test drivers. Details: - Added ESSL and Accelerate (OS X) targets to standalone test drivers' Makefile in "test" directory. Thanks to Jeff Hammond for suggesting / providing this patch.	2013-09-10 17:17:28 -05:00
Field G. Van Zee	7ae4d7a41d	Various changes to treatment of integers. Details: - Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be assigned values of 32, 64, or some other value. The former two result in defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter causes integers to be defined in terms of a default type (e.g. long int). - Updated bli_config.h in reference and clarksville configurations according to above changes. - Updated test drivers in test and testsuite to avoid type warnings associated with format specifiers not matching the types of their arguments to printf() and scanf(). - Inserted missing #include "bli_system.h" into blis.h (which was slated for inclusion in `d141f9eeb6`). - Added explicit typecasting of dim_t and inc_t to macros in bli_blas_macro_defs.h (which are used in BLAS compatibility layer). - Slight changes to CREDITS and INSTALL files. - Slight tweaks to Windows build system, mostly in the form of switching to Windows-style CRLF newlines for certain files.	2013-09-10 16:35:12 -05:00
Field G. Van Zee	068437736b	Fixed set-but-not-used compiler (gcc) warnings. Details: - Used void-casts of certain variables to appease gcc (and perhaps other compilers) when such variables are only used in the complex instances of the functions. Special thanks to Karl Rupp for suggesting a portable fix for these warnings.	2013-09-09 14:07:58 -05:00
Field G. Van Zee	6dc85f63dc	Small fix to Windows defs.mk makefile fragment. Details: - Commented out a !include statement that was attempting to include a version file that does not yet exist. For now, the version string is hard-coded into defs.mk.	2013-09-09 13:48:52 -05:00
Field G. Van Zee	d141f9eeb6	Added Windows build system. Details: - Added a 'windows' directory, which contains a Windows build system similar to that of libflame's. Thanks to Martin for getting this up and running. - Spun off system header #includes into bli_system.h, which is included in blis.h - Added a Windows section to bli_clock.c (similar to libflame's).	2013-09-09 13:09:16 -05:00
Field G. Van Zee	9b320e7406	Edited bli_?lamch.c to avoid Windows keyword. Details: - Renamed "small" variable to "smnum" to avoid collision with Windows type by the same name. This change is needed in advance of the upcoming Windows build system.	2013-09-09 11:04:46 -05:00
Field G. Van Zee	9013ad6ff2	Switched integer typedefs (again) to C types. Details: - Redefined gint_t and guint_t in terms of the standard C types long int and unsigned long int, respectively. - Changed testsuite default max problem size to 500. - Changed testsuite input.operations to use square problems for level-3 operation tests.	2013-09-04 13:36:07 -05:00
Field G. Van Zee	981a60cfa0	Falling back to 32-bit integers for dim_t, etc. Details: - In light of recent segfaulting issues when compiling on 32-bit systems, I've changed the default typedef for gint_t and guint_t from int64_t and uint64_t to int32_t and uint32_t, respectively. - Disabled 64-bit integers in the blas2blis layer for the reference configuration. - Added type sizes of gint_t, guint_t, and the four floating-point datatypes to introductory output of the testsuite.	2013-09-04 12:09:11 -05:00
Field G. Van Zee	b776ddcd43	Applied temp fix to typecasting bug in testsuite. Details: - Applied a temporary fix to the typecasting bug in the testsuite driver. The fix involves casting both numerator and denominator to unsigned long. This fix is more voodoo than science, as I can't be sure why it even works.	2013-09-03 21:58:07 -05:00

1 2 3 4 5

203 Commits