amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 10:05:38 +00:00

Author	SHA1	Message	Date
Tyler Smith	23d9eab354	Merge https://github.com/flame/blis	2014-03-20 16:54:35 -05:00
Field G. Van Zee	fd3e32a5f4	Refined INSERT_GENTFUNC macro usage. Details: - Defined new INSERT_GENTFUNC macros so that the macro always takes exactly the number of arguments needed for the particular operation or variant being defined. Many operations were using INSERT_GENTFUNC macros that expected one auxiliary argument even though none were needed. Those instances have now been updated. Most of these instances were in the level-0 and -1v operations, as well as some operations defined in frame/util.	2014-03-20 13:59:48 -05:00
Field G. Van Zee	a3902750b9	Reorganized norm operations. Details: - Completely reoganized norm operations: - Renames: - fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm) - absumv -> norm1v (vector 1-norm) - New operations: - norm1m (matrix 1-norm) - normiv, normim (infinity-norm) - amaxv (BLAS-like absolute maximum value index) - asumv (BLAS-like absolute sum) - Deprecated absumm, as it did not correspond to any actual norm. (However, an inlined version now exists in the testsuite module for randm.)	2014-03-19 12:35:17 -05:00
Tyler Smith	92233cf642	Some fixes to gemm thread info tree creation, Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED instead of BLIS_SINGLE_THREADED	2014-03-11 14:16:08 -05:00
Tyler Smith	2c158fb885	Merge https://github.com/flame/blis Conflicts: frame/1m/packm/bli_packm_blk_var1.c	2014-02-27 16:46:23 -06:00
Tyler Smith	01b125e815	First pass at adding parallelism to BLIS. Added a multithreading infrastructure that should be independent of multithreading implementation in the future. Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.	2014-02-27 11:55:45 -06:00
Field G. Van Zee	c2b2ab6270	Deprecated panel stride alignment in bli_config.h. Details: - Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all configurations. It was already going unused in packm_init() since the recent 4m/3m commit. This setting was rarely, if ever, useful, and its existence only posed a potential risk for 4m/3m-based implementations. - Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h. - Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template micro-kernels.	2014-02-26 12:46:45 -06:00
Field G. Van Zee	fde5f1fdec	Added extensive support for configuration defaults. Details: - Standard names for reference kernels (levels-1v, -1f and 3) are now macro constants. Examples: BLIS_SAXPYV_KERNEL_REF BLIS_DDOTXF_KERNEL_REF BLIS_ZGEMM_UKERNEL_REF - Developers no longer have to name all datatype instances of a kernel with a common base name; [sdcz] datatype flavors of each kernel or micro-kernel (level-1v, -1f, or 3) may now be named independently. This means you can now, if you wish, encode the datatype-specific register blocksizes in the name of the micro-kernel functions. - Any datatype instances of any kernel (1v, 1f, or 3) that is left undefined in bli_kernel.h will default to the corresponding reference implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined, it will be defined to be BLIS_DGEMM_UKERNEL_REF. - Developers no longer need to name level-1v/-1f kernels with multiple datatype chars to match the number of types the kernel WOULD take in a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is sufficient, as in bli_daxpyv_opt(). - There is no longer a need to define an obj_t wrapper to go along with your level-1v/-1f kernels. The framework now prvides a _kernel() function which serves as the obj_t wrapper for whatever kernels are specified (or defaulted to) via bli_kernel.h - Developers no longer need to prototype their kernels, and thus no longer need to include any prototyping headers from within bli_kernel.h. The framework now generates kernel prototypes, with the proper type signature, based on the kernel names defined (or defaulted to) via bli_kernel.h. - If the complex datatype x (of [cz]) implementation of the gemm micro- kernel is left undefined by bli_kernel.h, but its same-precision real domain equivalent IS defined, BLIS will use a 4m-based implementation for the datatype x implementations of all level-3 operations, using only the real gemm micro-kernel.	2014-02-25 13:34:56 -06:00
Field G. Van Zee	6363a9f658	Added level-3 support for complex via 4m-/3m. Details: - Added the ability to induce complex domain level-3 operations via new virtual complex micro-kernels which are implemented via only real domain micro-kernels. Two new implementations are provided: 4m and 3m. 4m implements complex matrix multiplication in terms of four real matrix multiplications, where as 3m uses only three and thus is capable of even higher (than peak) performance. However, the 3m method has somewhat weaker numerical properties, making it less desirable in general. - Further refined packing routines, which were recently revamped, and added packing functionality for 4m and 3m. - Some modifications to trmm and trsm macro-kernels to facilitate indexing into micro-panels which were packed for 4m/3m virtual kernels. - Added 4m and 3m interfaces for each level-3 operation. - Various other minor changes to facilitate 4m/3m methods.	2014-02-19 17:00:52 -06:00
Field G. Van Zee	b7da57b282	Updated calls to packm_blk_var2() in testsuite. Details: - In ukernel testsuite modules, replaced calls to packm_blk_var2() with _var1(). Meant to include this in previous commit.	2014-02-11 10:28:23 -06:00
Field G. Van Zee	c255a293e2	Consolidated packm_blk_var2 and var3. Details: - Consolidated the functionality previously supported by packm_blk_var2() and packm_blk_var3() into a new variant, packm_blk_var1(). - Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk() to accommodate above changes. - Removed packm_blk_var3() and retired packm_blk_var2() to frame/1m/packm/old. - Updated all level-3 _cntl_init() functions so that the new, more versatile packm_blk_var1 is used for all level-3 matrix packing.	2014-02-10 14:31:24 -06:00
Field G. Van Zee	6c80670287	Renamed enumerated type in testsuite and modules. Details: - Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and renamed all corresponding "impl" variables to "iface".	2014-02-07 11:27:15 -06:00
Field G. Van Zee	32cae66326	Fixed some instances of sloppy 'restrict' usage. Details: - Fixed some technical incorrectness with some usage of the 'restrict' keyword in the reference trsm micro-kernels. - Tweak to testsuite/Makefile that causes rebuild if libblis was touched.	2014-02-06 18:06:42 -06:00
Field G. Van Zee	f8f67d7251	Typecast bli_getopt() return value in testsuite. Details: - In the test suite driver, inserted an explicit typecast of the return value of bli_getopt() prior parsing. The lack of typecast caused a problem on at least one system whereby a return value of -1 was interpreted as garbage character. Thanks to Francisco Igual for finding and submitting this fix.	2014-01-10 09:06:11 -06:00
Field G. Van Zee	89c76a8a51	Allow building outside source distribution. Details: - Modified build system (mostly configure and top-level Makefile) so that a user can build a BLIS library outside of the top-level directory of the source distribution. - Added "test" target to Makefile so that the user can run "make test", which will compile, link, and run the testsuite binary. This works even if the build directory is externally located, thanks to the test suite binary's new -g and -o command-line options. Also, when creating the test suite via the top-level Makefile, the linking is against the local archive, in lib/<configname>, rather than at <install_prefix>/lib. - Modified testsuite/Makefile so that it links against the library built locally, in ../lib/<configname>. - Added "-lm" to LDFLAGS of most configurations' make_defs.mk. - Various other cleanups to build system.	2014-01-09 12:08:37 -06:00
Field G. Van Zee	12fa82ec12	Implemented bli_getopt(). Details: - Added bli_getopt.c and .h files to frame/base. These files implement a custom version of getopt(), which may be used to parse command line options passed into a program via argc/argv. I am implementing this function myself, as opposed to using the version available via unistd.h, for portability reasons, as the only requirements are string.h (which is available via the standard C library). - Modified test suite to allow the user to specify the file name (and/or path) to the parameters and operations input files: -g may be used to specify the general input file and -o to specify the operations input file). If -g or -o or both are not given, default filenames are assumed (as well as their existence in the current directory).	2014-01-08 16:09:26 -06:00
Field G. Van Zee	2cb13600f9	Updated year in copyright headers to 2014.	2014-01-03 12:29:13 -06:00
Field G. Van Zee	e3a6c7e776	Macroized conditionals for a2/b2 in macro-kernels. Details: - Replaced conditional expressions in macro-kernels related to computing the addresses a2 and b2 (a_next and b_next) with a preprocessor macro invocation, bli_is_last_iter(), that tests the same condition. - Updated gemm_ukr module to use auxinfo_t argument. - Whitespace changes in test suite ukr modules.	2013-12-19 16:29:31 -06:00
Field G. Van Zee	a0331fb10a	Introduced auxinfo_t argument to micro-kernels. Details: - Removed a_next and b_next arguments to micro-kernels and replaced them with a pointer to a new datatype, auxinfo_t, which is simply a struct that holds a_next and b_next. The struct may hold other auxiliary information that may be useful to a micro-kernel, such as micro-panel stride. Micro-kernels may access struct fields via accessor macros defined in bli_auxinfo_macro_defs.h. - Updated all instances of micro-kernel definitions, micro-kernel calls, as well as macro-kernels (for declaring and initializing the structs) according to above change.	2013-12-19 14:50:11 -06:00
Field G. Van Zee	5ad2ce7bf5	Minor x86_64 (core2) kernel fixes. Details: - Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels for x86_64/core2 were calling the wrong reference code (l instead of u). - Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf kernels. - Minor typecasting fix in testsuite/src/test_libblis.c. - Makefile updates.	2013-12-09 18:30:49 -06:00
Field G. Van Zee	b444489f10	Added new "attached" scalar representation. Details: - Added infrastructure to support a new scalar representation, whereby every object contains an internal scalar that defaults to 1.0. This facilitates passing scalars around without having to house them in separate objects. These "attached" scalars are stored in the internal atom_t field of the obj_t struct, and are always stored to be the same datatype as the object to which they are attached. Level-3 variants no longer take scalar arguments, however, level-3 internal back-ends stll do; this is so that the calling function can perform subproblems such as C := C - alpha * A * B on-the-fly without needing to change either of the scalars attached to A or B. - Removed scalar argument from packm_int(). - Observe and apply attached scalars in scalm_int(), and removed scalar from interface of scalm_unb_var1(). - Renamed the following functions (and corresponding invocations): bli_obj_init_scalar_copy_of() -> bli_obj_scalar_init_detached_copy_of() bli_obj_init_scalar() -> bli_obj_scalar_init_detached() bli_obj_create_scalar_with_attached_buffer() -> bli_obj_create_1x1_with_attached_buffer() bli_obj_scalar_equals() -> bli_obj_equals() - Defined new functions: bli_obj_scalar_detach() bli_obj_scalar_attach() bli_obj_scalar_apply_scalar() bli_obj_scalar_reset() bli_obj_scalar_has_nonzero_imag() bli_obj_scalar_equals() - Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c. - Renamed the following macros: bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1() bli_obj_is_scalar() -> bli_obj_is_1x1() - Defined new macros to set and copy internal scalars between objects: bli_obj_set_internal_scalar() bli_obj_copy_internal_scalar() - In level-3 internal back-ends, added conditional blocks where alpha and beta are checked for non-unit-ness. Those values for alpha and beta are applied to the scalars attached to aliases of A/B/C, as appropriate, before being passed into the variant specified by the control tree. - In level-3 blocked variants, pass BLIS_ONE into subproblems instead of alpha and/or beta. - In level-3 macro-kernels, changed how scalars are obtained. Now, scalars attached to A and B are multiplied together to obtain alpha, while beta is obtained directly from C. - In level-3 front-ends, removed old function calls meant to provide future support for mixed domain/precision. These can be added back later once that functionality is given proper treatment. Also, removed the creating of copy-casts of alpha and beta since typecasting of scalars is now implicitly handled in the internal back-ends when alpha and beta are applied to the attached scalars.	2013-12-03 16:08:30 -06:00
Field G. Van Zee	bbe2b84a49	Updated Makefile in test, testsuite. Details: - Updated Makefiles in test and testsuite directories to use the new BLIS header installation directory scheme, which is to compile with -I<PREFIX>/include/blis instead of -I<PREFIX>/include.	2013-11-18 11:11:06 -06:00
Field G. Van Zee	d37c2cff62	Minor comment and Makefile changes. Details: - Added missing 'check-config' and 'check-make-defs' targets to testsuite/Makefile. - Removed unused 'test' target from top-level Makefile. - Comment changes to testsuite input files.	2013-11-13 10:47:11 -06:00
Field G. Van Zee	089048d589	Added object wrappers to 1f test suite modules. Details: - Added missing object wrappers to level-1f test suite modules. This was only apparent if you were configuring with something other than the reference configuration. - Commented out object-wrappers in level-1f front-ends. These were not working as intended the reference configuration was selected, because most kernel sets, such as those in the template set, do not have object wrappers. - Whitespace changes to template micro-kernels. - Comment changes to template level-1f kernel headers.	2013-11-09 17:18:00 -06:00
Field G. Van Zee	376bbb59c8	Removed support for duplication. Details: - Removed support for duplication from the gemmtrsm/trsm micro-kernels and all framework code. - Updated test suite modules according to above changes.	2013-11-08 11:17:34 -06:00
Field G. Van Zee	68a5910974	Added comments to testsuite/input.operations. Details: - Added extensive comments to the top of testsuite/input.operations, which describe how to edit the file. - Removed input.operations.0 and input.operations.1. - Changed input.general to test all datatypes ("sdcz") by default.	2013-11-07 11:36:11 -06:00
Field G. Van Zee	a091a219bd	Minor fixes to piledriver configuration, ukernel. Details: - Applied a patch from Tyler that fixes minor staleness in the piledriver configuration and gemm micro-kernel. - Very minor changes to test suite input files.	2013-10-14 10:11:29 -05:00
Field G. Van Zee	b053337387	Added fusing factors, MR/NR to test suite output. Details: - Updated the test suite driver (and modules where appropriate) so that the level-1f fusing factors are output along with the variable dimension. While this is not strictly necessary, since the fusing factors are output in the initial parameter summary, it allows extra reassurance to the user since the fusing factors appear alongside the variable dimension, which together give a complete picture of the problem size. Similar changes were made for outputting the register blocksizes when reporting results for the micro-kernel test modules.	2013-10-10 18:26:55 -05:00
Field G. Van Zee	be4833bd91	Added test suite modules for level-1f, 3 kernels. Details: - Added test modules in test suite for level-1f kernels and level-3 micro-kernels. (Duplication in the micro-kernels, for now, is NOT supported by these test modules.) - Added section override switches to test suite's input.operations file. - Added obj_t APIs for level-1f front-ends and their unblocked variants to facilitate the level-1f test modules. Also added front-end for dupl operation. - Added obj_t-based check routines for level-1f operations, which are called from the new front-ends mentioned above. - Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing factors as a function of datatype, which is needed by their respective test modules. - Whitespace changes to bli_kernel.h of all existing configurations.	2013-10-10 14:20:06 -05:00
Field G. Van Zee	73aa1e9f31	Added section overrides to test suite. Details: - Added new lines of input to the test suite's input.operations file, which allows the user to disable entire sections (levels) of tests. Before this change, the user had to manually disable each operation tests's "master switch". (This is why input.operations.0 existed: to allow a more convenient starting point for someone who only wanted to test one or a few operations.)	2013-10-01 17:01:18 -05:00
Field G. Van Zee	5e54f46ccb	Added template implementations and other tweaks. Details: - Added a 'template' configuration, which contains stub implementations of the level 1, 1f, and 3 kernels with one datatype implemented in C for each, with lots of in-file comments and documentation. - Modified some variable/parameter names for some 1/1f operations. (e.g. renaming vector length parameter from m to n.) - Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files to bli_kernel.h. - Modifed test suite to print out fusing factors for axpyf, dotxf, and dotxaxpyf, as well as the default fusing factor (which are all equal in the reference and template implementations). - Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these reference variants were implemented in terms of front-end routines rather that directly in terms of the kernels. (For example, axpy2v was implemented as two calls to axpyv rather than two calls to AXPYV_KERNEL.) - Changed the interface to dotxf so that it matches that of axpyf, in that A is assumed to be m x b_n in both cases, and for dotxf A is actually used as A^T. - Minor variable naming and comment changes to reference micro-kernels in frame/3/gemm/ukernels and frame/3/trsm/ukernels.	2013-09-30 12:58:18 -05:00
Field G. Van Zee	da77e9614f	Minor improvements to static memory allocator. Details: - Expanded on cpp macro definitions from bli_mem.c and relocated them to a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded functionality includes computing the pool size for each datatype (using that datatype's cache blocksizes) and using the maximum to size the actual pool array. This addresses the somewhat common pitfall whereby a developer updates cache blocksizes in bli_kernel.h for only one datatype (say, single-precision real), while the memory pools are sized using the double-precision real values. Then, when the developer attempts to link to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with a message saying the static memory pool was exhausted. Clearly, this message is misleading when the pool was not sized properly to begin with. - Removed previously disabled code in bli_kernel_macro_defs.h that was meant to check for size consistency among the various cache blocksizes. (Obviously the memory pool size-based solution mentioned above is better.) - Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a reasonable place to put these constants, rather than further crowd up bli_config.h. - Updated testsuite driver to output memory pool sizes for A, B, and C. - Minor comment updates to bli_config.h. - Removed 'flame' configuration. It was beginning to get out-of-date, and I hadn't used it in months. We can always re-create it later.	2013-09-13 12:00:37 -05:00
Field G. Van Zee	7ae4d7a41d	Various changes to treatment of integers. Details: - Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be assigned values of 32, 64, or some other value. The former two result in defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter causes integers to be defined in terms of a default type (e.g. long int). - Updated bli_config.h in reference and clarksville configurations according to above changes. - Updated test drivers in test and testsuite to avoid type warnings associated with format specifiers not matching the types of their arguments to printf() and scanf(). - Inserted missing #include "bli_system.h" into blis.h (which was slated for inclusion in `d141f9eeb6`). - Added explicit typecasting of dim_t and inc_t to macros in bli_blas_macro_defs.h (which are used in BLAS compatibility layer). - Slight changes to CREDITS and INSTALL files. - Slight tweaks to Windows build system, mostly in the form of switching to Windows-style CRLF newlines for certain files.	2013-09-10 16:35:12 -05:00
Field G. Van Zee	9013ad6ff2	Switched integer typedefs (again) to C types. Details: - Redefined gint_t and guint_t in terms of the standard C types long int and unsigned long int, respectively. - Changed testsuite default max problem size to 500. - Changed testsuite input.operations to use square problems for level-3 operation tests.	2013-09-04 13:36:07 -05:00
Field G. Van Zee	981a60cfa0	Falling back to 32-bit integers for dim_t, etc. Details: - In light of recent segfaulting issues when compiling on 32-bit systems, I've changed the default typedef for gint_t and guint_t from int64_t and uint64_t to int32_t and uint32_t, respectively. - Disabled 64-bit integers in the blas2blis layer for the reference configuration. - Added type sizes of gint_t, guint_t, and the four floating-point datatypes to introductory output of the testsuite.	2013-09-04 12:09:11 -05:00
Field G. Van Zee	b776ddcd43	Applied temp fix to typecasting bug in testsuite. Details: - Applied a temporary fix to the typecasting bug in the testsuite driver. The fix involves casting both numerator and denominator to unsigned long. This fix is more voodoo than science, as I can't be sure why it even works.	2013-09-03 21:58:07 -05:00
Field G. Van Zee	9ee6e12537	Changed dimension spec for gemm in testsuite. Details: - Encounted a bizarre typecasting bug whereby the test suite was not computing the proper dimension from the problem size and dimension specification when the latter was set to -3. Will investigate. Thanks to Fran for finding this "bug".	2013-09-03 21:53:27 -05:00
Field G. Van Zee	e8be081e68	Generalized matlab and file output in testsuite. Details: - Added a new option in input.general that allows outputting in matlab/octave format so that one can output in matlab format independently from outputting to files. - Adjusted input.operations according to above. - Added input.operations.0 and input.operations.1 with all options disabled and enabled, respectively.	2013-08-28 15:52:34 -05:00
Field G. Van Zee	4b7e7970f1	Migrated integer usage to stdint.h types. Details: - Changed the way bli_type_defs.h defines integer types so that dim_t, inc_t, doff_t, etc. are all defined in terms of gint_t (general signed integer) or guint_t (general unsigned integer). - Renamed Fortran types fchar and fint to f77_char and f77_int. - Define f77_int as int64_t if a new configuration variable, BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise. These types are defined in stdint.h, which is now included in blis.h. - Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed in terms of scomplex. - Renamed "char" type in f2c files to "character" and typedef'ed in terms of char. - Updated bla_amax() wrappers so that the return type is defined directly as f77_int, rather than letting the prototype-generating macro decide the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros, so I removed them. Also, changed the body of the wrapper so that a gint_t is passed into abmaxv, which is THEN typecast to an f77_int before returning the value. - Updated f2c code that accessed .r and .i fields of complex and doublecomplex types so that they use .real and .imag instead (now that we are using scomplex and dcomplex).	2013-07-08 15:20:34 -05:00
Field G. Van Zee	3725013985	Added experimental bli_gemm_ker_var5(). Details: - Added support for an experimental gemm macro-kernel incrementally packs one micro-panel of B at a time. This is useful for certain special cases of gemm where m is small. - Minor changes to default values of clarksville configuration. - Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we do not yet have any use (or implementation support) for block storage. - Comment update to bli_packm_init.c.	2013-07-08 11:24:18 -05:00
Field G. Van Zee	d1e81ddc84	Minor generalizing tweaks to trmm blk var1, var2.	2013-06-13 11:14:21 -05:00
Field G. Van Zee	05a657a6b9	Added beta == 0 optimization to x86_64 ukernel. Details: - Modified x86_64 gemm microkernel so that when beta is zero, C is not read from memory (nor scaled by beta). - Fixed minor bug in test suite driver when "Test all combinations of storage schemes?" switch is disabled, which would result in redundant tests being executed for matrix-only (e.g. level-1m, level-3) operations if multiple vector storage schemes were specified. - Restored debug flags as default in clarksville configuration.	2013-06-07 11:04:10 -05:00
Field G. Van Zee	f1aa6b81cc	Whitespace changes to old test drivers. Details: - Replaced tabs with four spaces in places where indention was already in place.	2013-06-06 13:36:06 -05:00
Field G. Van Zee	22b06cfcd2	Updated level-1/-1f [vector intrinsic] kernels. Details: - Updated level-1/-1f kernels so that non-unit and un-aligned cases are handled by reference implementation (rather than aborted). - Added -fomit-frame-pointer to default make_defs.mk for clarksville configuration. - Defined bli_offset_from_alignment() macro. - Minor edits to old test drivers.	2013-06-03 16:54:52 -05:00
Field G. Van Zee	2d9c667f3c	Fixed x86_64 kernel bugs and other minor issues. Details: - Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in unaligned subpartitions. We were already going out of our way a bit to handle edge cases in the first iteration for blocked variants, and this was simply the unblocked-fused extension of that idea. - Fixed control tree handling in her/her2/syr/syr2 that was not taking into account how the choice of variant needed to be altered for upper-stored matrices (given that only lower-stored algorithms are explicitly implemented). - Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b() macros to provide inlined versions of bli_determine_blocksize_[fb]() for use by unblocked-fused variants. - Integrated new blocksize_dim macros into gemv/hemv unf variants for consistency with that of the bugfix for trmv/trsv (both of which now use the same macros). - Modified bli_obj_vector_inc() so that 1 is returned if the object is a vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain conditions (e.g. dotv_opt_var1), an invalid increment was returned, which was invalid only because the code was expecting 1 (for purposes of performing contiguous vector loads) but got a value greater than 1 because the column stride of the object (e.g. rho) was inflated for alignment purposes (albeit unnecessarily since there is only one element in the object). - Replaced some old invocations of set0 with set0s. - Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly. - Fixed increment bug in cleanup loop of gemm ukernel for x86_64. - Added safeguard to test modules so that testing a problem with a zero dimension does not result in a failure. - Tweaked handling of zero dimensions in level-2 and level-3 operations' internal back-ends to correctly handle cases where output operand still needs to be scaled (e.g. by beta, in the case of gemm with k = 0).	2013-05-24 16:28:10 -05:00
Field G. Van Zee	bc7c8005ce	Added option to disable err checking in testsuite. Details: - Added a new line to input.general that allows one to specify the error- checking level to use for each BLIS experiment. The only two levels supported for now are "no error checking" and "full error checking".	2013-04-25 17:16:59 -05:00
Field G. Van Zee	b6ef84fad1	Allow ldim of packed micro-panels != MR, NR. Details: - Made substantial changes throughout the framework to decouple the leading dimension (row or column stride) used within each packed micro-panel from the corresponding register blocksize. It appears advantageous on some systems to use, for example, packed micro-panels of A where the column stride is greater than MR (whereas previously it was always equal to MR). - Changes include: - Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding to use when packing micro-panels of A and B. - Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR where appropriate, instead of MR and NR. - Added pd field (panel dimension) to obj_t. - New interface to bli_packm_cntl_obj_create(). - Renamed bli_obj_packed_length()/_width() macros to bli_obj_padded_length()/_width(). - Removed local #defines for cache/register blocksizes in level-3 *_cntl.c. - Print out new cache and register blocksize extensions in test suite. - Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger blocksize for edge cases, which can improve performance at the margins.	2013-04-21 15:00:24 -05:00
Field G. Van Zee	19155a768d	Fixed overzealous type-checking in bli_getsc(). Details: - Relaxed type checking in getsc so that the input object could be a constant and not just a proper floating-point type. (If it is a constant, default to extracting the dcomplex values.) Thanks to Bryan Marker for reporting this bug. - Added definition for bli_is_constant() in bli_param_macro_defs.h - Comment updates to various level-0 scalar routines.	2013-04-16 11:24:03 -05:00
Field G. Van Zee	d9948c541c	Tweak to test suite function string construction. Details: - Fixed a minor bug in the way that the test suite would construct function name strings when the user anchored all parameters in input.operations. In this case, the test driver would mistake this situation for one where the operation simply had no parameters to begin with, and thus would not include the parameter string in the function string that is output for every result.	2013-04-15 10:21:26 -05:00
Field G. Van Zee	1a9f427b85	Added/renamed alignment constants to _config.h. Details: - Added new memory alignment constants: BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM) BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE) BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced) and renamed existing ones BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE to better convey what the alignment factor is used for (and what it is not used for). - Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1. - Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))) into macro-kernels to specify stack alignment of temporary buffers. - Modified test suite driver to output new constants. - Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now use bli_align_dim_to_size(), which takes a third argument (the desired alignment).	2013-04-12 15:25:54 -05:00

1 2

64 Commits