amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-13 02:25:39 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	fd3e32a5f4	Refined INSERT_GENTFUNC macro usage. Details: - Defined new INSERT_GENTFUNC macros so that the macro always takes exactly the number of arguments needed for the particular operation or variant being defined. Many operations were using INSERT_GENTFUNC macros that expected one auxiliary argument even though none were needed. Those instances have now been updated. Most of these instances were in the level-0 and -1v operations, as well as some operations defined in frame/util.	2014-03-20 13:59:48 -05:00
Field G. Van Zee	a3902750b9	Reorganized norm operations. Details: - Completely reoganized norm operations: - Renames: - fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm) - absumv -> norm1v (vector 1-norm) - New operations: - norm1m (matrix 1-norm) - normiv, normim (infinity-norm) - amaxv (BLAS-like absolute maximum value index) - asumv (BLAS-like absolute sum) - Deprecated absumm, as it did not correspond to any actual norm. (However, an inlined version now exists in the testsuite module for randm.)	2014-03-19 12:35:17 -05:00
Field G. Van Zee	c2b2ab6270	Deprecated panel stride alignment in bli_config.h. Details: - Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all configurations. It was already going unused in packm_init() since the recent 4m/3m commit. This setting was rarely, if ever, useful, and its existence only posed a potential risk for 4m/3m-based implementations. - Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h. - Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template micro-kernels.	2014-02-26 12:46:45 -06:00
Field G. Van Zee	fde5f1fdec	Added extensive support for configuration defaults. Details: - Standard names for reference kernels (levels-1v, -1f and 3) are now macro constants. Examples: BLIS_SAXPYV_KERNEL_REF BLIS_DDOTXF_KERNEL_REF BLIS_ZGEMM_UKERNEL_REF - Developers no longer have to name all datatype instances of a kernel with a common base name; [sdcz] datatype flavors of each kernel or micro-kernel (level-1v, -1f, or 3) may now be named independently. This means you can now, if you wish, encode the datatype-specific register blocksizes in the name of the micro-kernel functions. - Any datatype instances of any kernel (1v, 1f, or 3) that is left undefined in bli_kernel.h will default to the corresponding reference implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined, it will be defined to be BLIS_DGEMM_UKERNEL_REF. - Developers no longer need to name level-1v/-1f kernels with multiple datatype chars to match the number of types the kernel WOULD take in a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is sufficient, as in bli_daxpyv_opt(). - There is no longer a need to define an obj_t wrapper to go along with your level-1v/-1f kernels. The framework now prvides a _kernel() function which serves as the obj_t wrapper for whatever kernels are specified (or defaulted to) via bli_kernel.h - Developers no longer need to prototype their kernels, and thus no longer need to include any prototyping headers from within bli_kernel.h. The framework now generates kernel prototypes, with the proper type signature, based on the kernel names defined (or defaulted to) via bli_kernel.h. - If the complex datatype x (of [cz]) implementation of the gemm micro- kernel is left undefined by bli_kernel.h, but its same-precision real domain equivalent IS defined, BLIS will use a 4m-based implementation for the datatype x implementations of all level-3 operations, using only the real gemm micro-kernel.	2014-02-25 13:34:56 -06:00
Field G. Van Zee	6363a9f658	Added level-3 support for complex via 4m-/3m. Details: - Added the ability to induce complex domain level-3 operations via new virtual complex micro-kernels which are implemented via only real domain micro-kernels. Two new implementations are provided: 4m and 3m. 4m implements complex matrix multiplication in terms of four real matrix multiplications, where as 3m uses only three and thus is capable of even higher (than peak) performance. However, the 3m method has somewhat weaker numerical properties, making it less desirable in general. - Further refined packing routines, which were recently revamped, and added packing functionality for 4m and 3m. - Some modifications to trmm and trsm macro-kernels to facilitate indexing into micro-panels which were packed for 4m/3m virtual kernels. - Added 4m and 3m interfaces for each level-3 operation. - Various other minor changes to facilitate 4m/3m methods.	2014-02-19 17:00:52 -06:00
Field G. Van Zee	b7da57b282	Updated calls to packm_blk_var2() in testsuite. Details: - In ukernel testsuite modules, replaced calls to packm_blk_var2() with _var1(). Meant to include this in previous commit.	2014-02-11 10:28:23 -06:00
Field G. Van Zee	c255a293e2	Consolidated packm_blk_var2 and var3. Details: - Consolidated the functionality previously supported by packm_blk_var2() and packm_blk_var3() into a new variant, packm_blk_var1(). - Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk() to accommodate above changes. - Removed packm_blk_var3() and retired packm_blk_var2() to frame/1m/packm/old. - Updated all level-3 _cntl_init() functions so that the new, more versatile packm_blk_var1 is used for all level-3 matrix packing.	2014-02-10 14:31:24 -06:00
Field G. Van Zee	6c80670287	Renamed enumerated type in testsuite and modules. Details: - Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and renamed all corresponding "impl" variables to "iface".	2014-02-07 11:27:15 -06:00
Field G. Van Zee	f8f67d7251	Typecast bli_getopt() return value in testsuite. Details: - In the test suite driver, inserted an explicit typecast of the return value of bli_getopt() prior parsing. The lack of typecast caused a problem on at least one system whereby a return value of -1 was interpreted as garbage character. Thanks to Francisco Igual for finding and submitting this fix.	2014-01-10 09:06:11 -06:00
Field G. Van Zee	12fa82ec12	Implemented bli_getopt(). Details: - Added bli_getopt.c and .h files to frame/base. These files implement a custom version of getopt(), which may be used to parse command line options passed into a program via argc/argv. I am implementing this function myself, as opposed to using the version available via unistd.h, for portability reasons, as the only requirements are string.h (which is available via the standard C library). - Modified test suite to allow the user to specify the file name (and/or path) to the parameters and operations input files: -g may be used to specify the general input file and -o to specify the operations input file). If -g or -o or both are not given, default filenames are assumed (as well as their existence in the current directory).	2014-01-08 16:09:26 -06:00
Field G. Van Zee	2cb13600f9	Updated year in copyright headers to 2014.	2014-01-03 12:29:13 -06:00
Field G. Van Zee	e3a6c7e776	Macroized conditionals for a2/b2 in macro-kernels. Details: - Replaced conditional expressions in macro-kernels related to computing the addresses a2 and b2 (a_next and b_next) with a preprocessor macro invocation, bli_is_last_iter(), that tests the same condition. - Updated gemm_ukr module to use auxinfo_t argument. - Whitespace changes in test suite ukr modules.	2013-12-19 16:29:31 -06:00
Field G. Van Zee	a0331fb10a	Introduced auxinfo_t argument to micro-kernels. Details: - Removed a_next and b_next arguments to micro-kernels and replaced them with a pointer to a new datatype, auxinfo_t, which is simply a struct that holds a_next and b_next. The struct may hold other auxiliary information that may be useful to a micro-kernel, such as micro-panel stride. Micro-kernels may access struct fields via accessor macros defined in bli_auxinfo_macro_defs.h. - Updated all instances of micro-kernel definitions, micro-kernel calls, as well as macro-kernels (for declaring and initializing the structs) according to above change.	2013-12-19 14:50:11 -06:00
Field G. Van Zee	5ad2ce7bf5	Minor x86_64 (core2) kernel fixes. Details: - Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels for x86_64/core2 were calling the wrong reference code (l instead of u). - Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf kernels. - Minor typecasting fix in testsuite/src/test_libblis.c. - Makefile updates.	2013-12-09 18:30:49 -06:00
Field G. Van Zee	b444489f10	Added new "attached" scalar representation. Details: - Added infrastructure to support a new scalar representation, whereby every object contains an internal scalar that defaults to 1.0. This facilitates passing scalars around without having to house them in separate objects. These "attached" scalars are stored in the internal atom_t field of the obj_t struct, and are always stored to be the same datatype as the object to which they are attached. Level-3 variants no longer take scalar arguments, however, level-3 internal back-ends stll do; this is so that the calling function can perform subproblems such as C := C - alpha * A * B on-the-fly without needing to change either of the scalars attached to A or B. - Removed scalar argument from packm_int(). - Observe and apply attached scalars in scalm_int(), and removed scalar from interface of scalm_unb_var1(). - Renamed the following functions (and corresponding invocations): bli_obj_init_scalar_copy_of() -> bli_obj_scalar_init_detached_copy_of() bli_obj_init_scalar() -> bli_obj_scalar_init_detached() bli_obj_create_scalar_with_attached_buffer() -> bli_obj_create_1x1_with_attached_buffer() bli_obj_scalar_equals() -> bli_obj_equals() - Defined new functions: bli_obj_scalar_detach() bli_obj_scalar_attach() bli_obj_scalar_apply_scalar() bli_obj_scalar_reset() bli_obj_scalar_has_nonzero_imag() bli_obj_scalar_equals() - Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c. - Renamed the following macros: bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1() bli_obj_is_scalar() -> bli_obj_is_1x1() - Defined new macros to set and copy internal scalars between objects: bli_obj_set_internal_scalar() bli_obj_copy_internal_scalar() - In level-3 internal back-ends, added conditional blocks where alpha and beta are checked for non-unit-ness. Those values for alpha and beta are applied to the scalars attached to aliases of A/B/C, as appropriate, before being passed into the variant specified by the control tree. - In level-3 blocked variants, pass BLIS_ONE into subproblems instead of alpha and/or beta. - In level-3 macro-kernels, changed how scalars are obtained. Now, scalars attached to A and B are multiplied together to obtain alpha, while beta is obtained directly from C. - In level-3 front-ends, removed old function calls meant to provide future support for mixed domain/precision. These can be added back later once that functionality is given proper treatment. Also, removed the creating of copy-casts of alpha and beta since typecasting of scalars is now implicitly handled in the internal back-ends when alpha and beta are applied to the attached scalars.	2013-12-03 16:08:30 -06:00
Field G. Van Zee	089048d589	Added object wrappers to 1f test suite modules. Details: - Added missing object wrappers to level-1f test suite modules. This was only apparent if you were configuring with something other than the reference configuration. - Commented out object-wrappers in level-1f front-ends. These were not working as intended the reference configuration was selected, because most kernel sets, such as those in the template set, do not have object wrappers. - Whitespace changes to template micro-kernels. - Comment changes to template level-1f kernel headers.	2013-11-09 17:18:00 -06:00
Field G. Van Zee	376bbb59c8	Removed support for duplication. Details: - Removed support for duplication from the gemmtrsm/trsm micro-kernels and all framework code. - Updated test suite modules according to above changes.	2013-11-08 11:17:34 -06:00
Field G. Van Zee	b053337387	Added fusing factors, MR/NR to test suite output. Details: - Updated the test suite driver (and modules where appropriate) so that the level-1f fusing factors are output along with the variable dimension. While this is not strictly necessary, since the fusing factors are output in the initial parameter summary, it allows extra reassurance to the user since the fusing factors appear alongside the variable dimension, which together give a complete picture of the problem size. Similar changes were made for outputting the register blocksizes when reporting results for the micro-kernel test modules.	2013-10-10 18:26:55 -05:00
Field G. Van Zee	be4833bd91	Added test suite modules for level-1f, 3 kernels. Details: - Added test modules in test suite for level-1f kernels and level-3 micro-kernels. (Duplication in the micro-kernels, for now, is NOT supported by these test modules.) - Added section override switches to test suite's input.operations file. - Added obj_t APIs for level-1f front-ends and their unblocked variants to facilitate the level-1f test modules. Also added front-end for dupl operation. - Added obj_t-based check routines for level-1f operations, which are called from the new front-ends mentioned above. - Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing factors as a function of datatype, which is needed by their respective test modules. - Whitespace changes to bli_kernel.h of all existing configurations.	2013-10-10 14:20:06 -05:00
Field G. Van Zee	73aa1e9f31	Added section overrides to test suite. Details: - Added new lines of input to the test suite's input.operations file, which allows the user to disable entire sections (levels) of tests. Before this change, the user had to manually disable each operation tests's "master switch". (This is why input.operations.0 existed: to allow a more convenient starting point for someone who only wanted to test one or a few operations.)	2013-10-01 17:01:18 -05:00
Field G. Van Zee	5e54f46ccb	Added template implementations and other tweaks. Details: - Added a 'template' configuration, which contains stub implementations of the level 1, 1f, and 3 kernels with one datatype implemented in C for each, with lots of in-file comments and documentation. - Modified some variable/parameter names for some 1/1f operations. (e.g. renaming vector length parameter from m to n.) - Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files to bli_kernel.h. - Modifed test suite to print out fusing factors for axpyf, dotxf, and dotxaxpyf, as well as the default fusing factor (which are all equal in the reference and template implementations). - Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these reference variants were implemented in terms of front-end routines rather that directly in terms of the kernels. (For example, axpy2v was implemented as two calls to axpyv rather than two calls to AXPYV_KERNEL.) - Changed the interface to dotxf so that it matches that of axpyf, in that A is assumed to be m x b_n in both cases, and for dotxf A is actually used as A^T. - Minor variable naming and comment changes to reference micro-kernels in frame/3/gemm/ukernels and frame/3/trsm/ukernels.	2013-09-30 12:58:18 -05:00
Field G. Van Zee	da77e9614f	Minor improvements to static memory allocator. Details: - Expanded on cpp macro definitions from bli_mem.c and relocated them to a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded functionality includes computing the pool size for each datatype (using that datatype's cache blocksizes) and using the maximum to size the actual pool array. This addresses the somewhat common pitfall whereby a developer updates cache blocksizes in bli_kernel.h for only one datatype (say, single-precision real), while the memory pools are sized using the double-precision real values. Then, when the developer attempts to link to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with a message saying the static memory pool was exhausted. Clearly, this message is misleading when the pool was not sized properly to begin with. - Removed previously disabled code in bli_kernel_macro_defs.h that was meant to check for size consistency among the various cache blocksizes. (Obviously the memory pool size-based solution mentioned above is better.) - Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a reasonable place to put these constants, rather than further crowd up bli_config.h. - Updated testsuite driver to output memory pool sizes for A, B, and C. - Minor comment updates to bli_config.h. - Removed 'flame' configuration. It was beginning to get out-of-date, and I hadn't used it in months. We can always re-create it later.	2013-09-13 12:00:37 -05:00
Field G. Van Zee	7ae4d7a41d	Various changes to treatment of integers. Details: - Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be assigned values of 32, 64, or some other value. The former two result in defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter causes integers to be defined in terms of a default type (e.g. long int). - Updated bli_config.h in reference and clarksville configurations according to above changes. - Updated test drivers in test and testsuite to avoid type warnings associated with format specifiers not matching the types of their arguments to printf() and scanf(). - Inserted missing #include "bli_system.h" into blis.h (which was slated for inclusion in `d141f9eeb6`). - Added explicit typecasting of dim_t and inc_t to macros in bli_blas_macro_defs.h (which are used in BLAS compatibility layer). - Slight changes to CREDITS and INSTALL files. - Slight tweaks to Windows build system, mostly in the form of switching to Windows-style CRLF newlines for certain files.	2013-09-10 16:35:12 -05:00
Field G. Van Zee	981a60cfa0	Falling back to 32-bit integers for dim_t, etc. Details: - In light of recent segfaulting issues when compiling on 32-bit systems, I've changed the default typedef for gint_t and guint_t from int64_t and uint64_t to int32_t and uint32_t, respectively. - Disabled 64-bit integers in the blas2blis layer for the reference configuration. - Added type sizes of gint_t, guint_t, and the four floating-point datatypes to introductory output of the testsuite.	2013-09-04 12:09:11 -05:00
Field G. Van Zee	b776ddcd43	Applied temp fix to typecasting bug in testsuite. Details: - Applied a temporary fix to the typecasting bug in the testsuite driver. The fix involves casting both numerator and denominator to unsigned long. This fix is more voodoo than science, as I can't be sure why it even works.	2013-09-03 21:58:07 -05:00
Field G. Van Zee	e8be081e68	Generalized matlab and file output in testsuite. Details: - Added a new option in input.general that allows outputting in matlab/octave format so that one can output in matlab format independently from outputting to files. - Adjusted input.operations according to above. - Added input.operations.0 and input.operations.1 with all options disabled and enabled, respectively.	2013-08-28 15:52:34 -05:00
Field G. Van Zee	4b7e7970f1	Migrated integer usage to stdint.h types. Details: - Changed the way bli_type_defs.h defines integer types so that dim_t, inc_t, doff_t, etc. are all defined in terms of gint_t (general signed integer) or guint_t (general unsigned integer). - Renamed Fortran types fchar and fint to f77_char and f77_int. - Define f77_int as int64_t if a new configuration variable, BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise. These types are defined in stdint.h, which is now included in blis.h. - Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed in terms of scomplex. - Renamed "char" type in f2c files to "character" and typedef'ed in terms of char. - Updated bla_amax() wrappers so that the return type is defined directly as f77_int, rather than letting the prototype-generating macro decide the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros, so I removed them. Also, changed the body of the wrapper so that a gint_t is passed into abmaxv, which is THEN typecast to an f77_int before returning the value. - Updated f2c code that accessed .r and .i fields of complex and doublecomplex types so that they use .real and .imag instead (now that we are using scomplex and dcomplex).	2013-07-08 15:20:34 -05:00
Field G. Van Zee	3725013985	Added experimental bli_gemm_ker_var5(). Details: - Added support for an experimental gemm macro-kernel incrementally packs one micro-panel of B at a time. This is useful for certain special cases of gemm where m is small. - Minor changes to default values of clarksville configuration. - Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we do not yet have any use (or implementation support) for block storage. - Comment update to bli_packm_init.c.	2013-07-08 11:24:18 -05:00
Field G. Van Zee	d1e81ddc84	Minor generalizing tweaks to trmm blk var1, var2.	2013-06-13 11:14:21 -05:00
Field G. Van Zee	05a657a6b9	Added beta == 0 optimization to x86_64 ukernel. Details: - Modified x86_64 gemm microkernel so that when beta is zero, C is not read from memory (nor scaled by beta). - Fixed minor bug in test suite driver when "Test all combinations of storage schemes?" switch is disabled, which would result in redundant tests being executed for matrix-only (e.g. level-1m, level-3) operations if multiple vector storage schemes were specified. - Restored debug flags as default in clarksville configuration.	2013-06-07 11:04:10 -05:00
Field G. Van Zee	f1aa6b81cc	Whitespace changes to old test drivers. Details: - Replaced tabs with four spaces in places where indention was already in place.	2013-06-06 13:36:06 -05:00
Field G. Van Zee	2d9c667f3c	Fixed x86_64 kernel bugs and other minor issues. Details: - Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in unaligned subpartitions. We were already going out of our way a bit to handle edge cases in the first iteration for blocked variants, and this was simply the unblocked-fused extension of that idea. - Fixed control tree handling in her/her2/syr/syr2 that was not taking into account how the choice of variant needed to be altered for upper-stored matrices (given that only lower-stored algorithms are explicitly implemented). - Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b() macros to provide inlined versions of bli_determine_blocksize_[fb]() for use by unblocked-fused variants. - Integrated new blocksize_dim macros into gemv/hemv unf variants for consistency with that of the bugfix for trmv/trsv (both of which now use the same macros). - Modified bli_obj_vector_inc() so that 1 is returned if the object is a vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain conditions (e.g. dotv_opt_var1), an invalid increment was returned, which was invalid only because the code was expecting 1 (for purposes of performing contiguous vector loads) but got a value greater than 1 because the column stride of the object (e.g. rho) was inflated for alignment purposes (albeit unnecessarily since there is only one element in the object). - Replaced some old invocations of set0 with set0s. - Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly. - Fixed increment bug in cleanup loop of gemm ukernel for x86_64. - Added safeguard to test modules so that testing a problem with a zero dimension does not result in a failure. - Tweaked handling of zero dimensions in level-2 and level-3 operations' internal back-ends to correctly handle cases where output operand still needs to be scaled (e.g. by beta, in the case of gemm with k = 0).	2013-05-24 16:28:10 -05:00
Field G. Van Zee	bc7c8005ce	Added option to disable err checking in testsuite. Details: - Added a new line to input.general that allows one to specify the error- checking level to use for each BLIS experiment. The only two levels supported for now are "no error checking" and "full error checking".	2013-04-25 17:16:59 -05:00
Field G. Van Zee	b6ef84fad1	Allow ldim of packed micro-panels != MR, NR. Details: - Made substantial changes throughout the framework to decouple the leading dimension (row or column stride) used within each packed micro-panel from the corresponding register blocksize. It appears advantageous on some systems to use, for example, packed micro-panels of A where the column stride is greater than MR (whereas previously it was always equal to MR). - Changes include: - Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding to use when packing micro-panels of A and B. - Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR where appropriate, instead of MR and NR. - Added pd field (panel dimension) to obj_t. - New interface to bli_packm_cntl_obj_create(). - Renamed bli_obj_packed_length()/_width() macros to bli_obj_padded_length()/_width(). - Removed local #defines for cache/register blocksizes in level-3 *_cntl.c. - Print out new cache and register blocksize extensions in test suite. - Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger blocksize for edge cases, which can improve performance at the margins.	2013-04-21 15:00:24 -05:00
Field G. Van Zee	19155a768d	Fixed overzealous type-checking in bli_getsc(). Details: - Relaxed type checking in getsc so that the input object could be a constant and not just a proper floating-point type. (If it is a constant, default to extracting the dcomplex values.) Thanks to Bryan Marker for reporting this bug. - Added definition for bli_is_constant() in bli_param_macro_defs.h - Comment updates to various level-0 scalar routines.	2013-04-16 11:24:03 -05:00
Field G. Van Zee	d9948c541c	Tweak to test suite function string construction. Details: - Fixed a minor bug in the way that the test suite would construct function name strings when the user anchored all parameters in input.operations. In this case, the test driver would mistake this situation for one where the operation simply had no parameters to begin with, and thus would not include the parameter string in the function string that is output for every result.	2013-04-15 10:21:26 -05:00
Field G. Van Zee	1a9f427b85	Added/renamed alignment constants to _config.h. Details: - Added new memory alignment constants: BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM) BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE) BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced) and renamed existing ones BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE to better convey what the alignment factor is used for (and what it is not used for). - Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1. - Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))) into macro-kernels to specify stack alignment of temporary buffers. - Modified test suite driver to output new constants. - Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now use bli_align_dim_to_size(), which takes a third argument (the desired alignment).	2013-04-12 15:25:54 -05:00
Field G. Van Zee	3be14c32f7	Updated information in testsuite output header. Details: - Added to the information that is echoed at the beginning of the test suite's output, and also re-labeled some existing information.	2013-04-06 12:54:45 -05:00
Field G. Van Zee	6684b73d55	Implemented amax operation and related changes. Details: - Implemented amax operation in BLIS. - Activated BLAS2BLIS routine mapping for new amax BLIS implementation. - Added integer support to [f]printv, [f]printm. - Added integer support to level-0 copys macros. - Updated printing of configuration information in test suite driver. - Comment changes to _config.h files. - Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are used for.	2013-04-02 13:06:20 -05:00
Field G. Van Zee	fb68087f87	More memory alignment-related tweaks. Details: - Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE. - Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN. - Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment passed into posix_memalign() or equivalent. - Defined new function, bli_align_dim_to_cmem(), which applies the contiguous memory alignment (rather than the system/malloc alignment).	2013-03-26 15:10:16 -05:00
Field G. Van Zee	3a787cccaa	Renamed memory alignment macro constant. Details: - Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to BLIS_MEMORY_ALIGNMENT_SIZE.	2013-03-26 13:59:19 -05:00
Field G. Van Zee	b65cdc57d9	Migrated 'bl2' prefix to 'bli'. Details: - Changed all filename and function prefixes from 'bl2' to 'bli'. - Changed the "blis2.h" header filename to "blis.h" and changed all corresponding #include statements accordingly. - Fixed incorrect association for Fran in CREDITS file.	2013-03-24 20:01:49 -05:00
Field G. Van Zee	f469907503	Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_. Details: - Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word (e.g. "prefetch" instructions, which are different than the particular kind of prefetching/preloading referred to by this constant).	2013-03-22 15:20:15 -05:00
Field G. Van Zee	e99281a0f4	Fixed test suite flop formulas for ops with side. Details: - Fixed incorrect flop counts in test suite modules for hemm, symm, trmm, trmm3, and trsm. - Comment updates in herk macro-kernels.	2013-03-07 14:00:10 -06:00
Field G. Van Zee	722b66c7dc	Removed some calls to setv() in test modules. Details: - Removed calls to setv() in test modules whose sole purpose was to initialize vectors to zero to ensure that nan's and inf's would not taint the computation. Now that beta == zero semantics have been updated to clear the output operand (when beta is zero), rather than multiply against it, these setv() calls are no longer needed.	2013-02-14 10:18:00 -06:00
Field G. Van Zee	e6ac623a90	Properly implemented beta == 0 semantics. Details: - Changed name of set0 and set0_mxn macros to set0s and set0s_mxn, respectively. - Added code to the following operations that sets the output operand to zero if the corresponding scalar is zero (rather than performing the floating-point multiply, or in the case of setv, copying the value). This will prevent nan's and inf's from creeping into results from uninitialized memory. - axpy - dotxv - scalv - scal2v - setv - gemv - ger - hemv - her - her2 - gemm reference ukernels	2013-02-13 18:44:59 -06:00
Field G. Van Zee	1274e12437	Updated copyright headers from 2012 to 2013.	2013-02-11 14:37:47 -06:00
Field G. Van Zee	768fcebaa8	Added unified test suite, and many fixes. Details: - Added a highly configurable, unified test suite. - Removed DUPB configuration constant from bl2_kernel.h and macro-kernel header files. Now, instead, DUPB is computed as (NDUP != 1) within each macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into incorrectly when DUPB was set to FALSE but the NDUP was still non-unit. By encoding both pieces of information into one constant in _kernel.h, it seems somewhat less likely others will encounter this bug in the future. - Added level-2 cache blocksizes to _kernel.h for reference configuration, and defined blocksizes in _cntl.c files to these default values. - Changed semantics of her2k and syr2k such that these operations no longer expect the B matrix to already be conjugate-transposed (or just transposed for syr2k). However, these semantics are preserved for the internal mechanics of the implementations, including the internal back-end and all blocked variants. - Inserted checks for real-valued alpha and beta for herk/her2k and herk, respectively. - Relaxed general object structure constraints in _basic_check() for gemv, ger. - Changed her front-end to NOT copy-cast to real projection; instead, this is replaced by selecting either the real part or both parts within the unblocked algorithm implementation, depending on the value of conjh. - Added conjh to all _check routines for her so that the code knows when to verify that alpha has an imaginary component equal to zero (for her, but not syr). - Changed control tree for her to forgo packing. - Added unit diagonal support to fnormm. - Redefined real versions of abval2s macros in terms of fabs(), fabsf(). - Redefined complex versions of sqrt2s macros using the actual "complex square root" formula. - Created new level-0 object-based routines, suffixed with "sc" (for "scalar"). - Defined new level-1v, -1d, and -1m versions of add and sub operations (two-operand add and subtract). - Added new scalar macros: - getris: acquire real and imaginary components. - setris: set real and imaginary components. - addjs: addition with conjugated x. - subjs: subtraction with conjugated x. - Defined new utility operations: - absumv: element-wise sum of absolute values for vector elements. - absumm: element-wise sum of absolute values for matrix elements. - mkherm: convert existing matrix to Hermitian. - mksymm: convert existing matrix to symmetric. - mktrim: convert existing matrix to triangular. - Added various error checking routines. - Added bl2_clock_min_diff(), which is used to more cleanly measure the wall clock time of a code block. - Added general stride support to bl2_obj_alloc_buffer(). - Added bl2_obj_init_scalar(). - Updated parameter mapping in bl2_param_map.c. - Added support for queriable version string. - Fixed a bug in the her2k macro-kernels (which currently are simply implemented in terms of two invocations of herk) whereby beta was being applied to both the first and second rank-k updates, rather than only the first. - Fixed a bug in trmm/trsm whereby transpose and right side cases were not properly implemented due to erroneous assumptions regarding aliasing and root objects. - Fixed a bug in the upper triangular trsm macro-kernel in which the wrong MR x NR block of B was being updated. - Fixed a bug in the inverts macro in the double real case whereby the value was typecast to float before inversion. This affected non-unit cases of dtrsm. - Fixed a bug in the reference kernels for gemmtrsm whereby the minus one constant was being applied incorrectly. - Fixed a bug in the overall treatment of non-unit alpha for trsm. The code now mimics the rank-k strategy of gemm, whereby alpah is applied during the first iteration of variant 3, with BLIS_ONE passed in instead for subsequent iterations. This also required passing alpha into the macro- kernels as well as the fused gemmtrsm micro-kernels. - Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being called for blocks strictly above the diagonal. While this sounds good in theory, this cannot be done because gemm_ker_var2 expects row panels of A to be packed from top to bottom, while for trsm_u, A is actually packed from bottom to top due to the reverse (BR->TL) nature of the algorithm. - Fixed a bug in packm_cxk() whereby panel packings with unit panel dimensions were mishandled due to incorrect arguments to the copyv kernel. Also changed the copyv kernel invocation to scal2v so that these edge cases are properly handled when scaling is requested. - Fixed a bug in packv_int() whereby an uninitialized object is passed in instead of the source object. - Fixed a bug whereby level-2 code could allocate memory dynamically via bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed a potential future bug whereby a mem_t object that is actually no longer "allocated" from the static pool is mistaken for being allocated due to failure to NULLify the buffer when the block was most recently released. - Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly toggled when the requested subpartition needed to be "reflected" due to it residing in an unstored region.	2013-02-11 13:20:44 -06:00

48 Commits