amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-13 02:25:39 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	d70f2b089d	Added scaling to abval2s, sqrt2s macros. Details: - Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow and overflow from squaring the real and imaginary components. (This is the same technique used to fix recent bugs in invscals/invscaljs and inverts.)	2013-11-02 17:19:40 -05:00
Field G. Van Zee	c5b1ed9409	Added new dotxaxpyf variant 2. Details: - Added a new variant for dotxaxpyf that is based on dotxf and axpyf kernels. By default, this variant is not used by any other operation.	2013-11-01 10:28:04 -05:00
Field G. Van Zee	97f89fbcf2	Fixed bug in complex invscals. Details: - Fixed complex inversion in invscals and invscaljs whereby the imaginary component was being computed incorrectly. - Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar in inverts, invscals, and invscaljs. - Changed bli_abs() and bli_fabs() macro definitions to use "<=" operator instead of "<".	2013-11-01 10:16:39 -05:00
Field G. Van Zee	eda42a21d1	Defined missing symbols in bla_rotg.c Details: - Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and d_abs() for completeness. Thanks to Vladimir Sukharev for reporting these bugs.	2013-10-31 18:00:44 -05:00
Field G. Van Zee	cca1e1f51d	Fixed bugs in scalm and setm. Details: - Fixed bugs in scalm and setm that resulted in segmentation faults when beta is not the same type as the matrix operand. Thanks to Vladimir Sukharev for reporting this bug. - Changed axpym and scal2m front-ends in fashion similar to that of scalm and setm; namely, the alpha scalar is copy-cast the type of the first matrix operand. - Changed the template and reference configurations' bli_config.h files so that the number of memory allocator blocks of A and B are set based on BLIS_MAX_NUM_THREADS. - Comment updates to bli_obj.c and variable rename in bla_nrm2.c.	2013-10-30 14:39:01 -05:00
Field G. Van Zee	2807013a47	Fixed over/under-flow in complex inversion. Details: - Fixed the complex bli_?inverts() macros, which were inverting elements in an "unsafe" manner, such that very large and very small values were unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for reporting this bug. - Comment update to bli_sumsqv_unb_var1.c. - Removed redundant bli_min() macro in bli_scalar_macro_defs.h. - Changed 1.0F to 1.0 for bli_drands() macro.	2013-10-24 14:32:20 -05:00
Field G. Van Zee	45a80c625f	Fixed parameter checking issue in BLAS syr[2]k. Details: - Fixed a minor parameter checking bug in the BLAS compatibility layer for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the trans parameter of either operation, it is (a) allowed, and (b) treated as 'T' (whereas previously it was disallowed). Thanks for Vladimir Sukharev for finding and reporting this bug.	2013-10-23 12:15:25 -05:00
Field G. Van Zee	03106d650e	Fixed minor perf bug in gemm_ker_var2. Details: - Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not computed correctly (ie: do not wraparound) at the edge cases. Thanks to Tze Meng for helping me identify this bug.	2013-10-11 10:40:38 -05:00
Field G. Van Zee	be4833bd91	Added test suite modules for level-1f, 3 kernels. Details: - Added test modules in test suite for level-1f kernels and level-3 micro-kernels. (Duplication in the micro-kernels, for now, is NOT supported by these test modules.) - Added section override switches to test suite's input.operations file. - Added obj_t APIs for level-1f front-ends and their unblocked variants to facilitate the level-1f test modules. Also added front-end for dupl operation. - Added obj_t-based check routines for level-1f operations, which are called from the new front-ends mentioned above. - Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing factors as a function of datatype, which is needed by their respective test modules. - Whitespace changes to bli_kernel.h of all existing configurations.	2013-10-10 14:20:06 -05:00
Field G. Van Zee	5e54f46ccb	Added template implementations and other tweaks. Details: - Added a 'template' configuration, which contains stub implementations of the level 1, 1f, and 3 kernels with one datatype implemented in C for each, with lots of in-file comments and documentation. - Modified some variable/parameter names for some 1/1f operations. (e.g. renaming vector length parameter from m to n.) - Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files to bli_kernel.h. - Modifed test suite to print out fusing factors for axpyf, dotxf, and dotxaxpyf, as well as the default fusing factor (which are all equal in the reference and template implementations). - Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these reference variants were implemented in terms of front-end routines rather that directly in terms of the kernels. (For example, axpy2v was implemented as two calls to axpyv rather than two calls to AXPYV_KERNEL.) - Changed the interface to dotxf so that it matches that of axpyf, in that A is assumed to be m x b_n in both cases, and for dotxf A is actually used as A^T. - Minor variable naming and comment changes to reference micro-kernels in frame/3/gemm/ukernels and frame/3/trsm/ukernels.	2013-09-30 12:58:18 -05:00
Field G. Van Zee	da77e9614f	Minor improvements to static memory allocator. Details: - Expanded on cpp macro definitions from bli_mem.c and relocated them to a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded functionality includes computing the pool size for each datatype (using that datatype's cache blocksizes) and using the maximum to size the actual pool array. This addresses the somewhat common pitfall whereby a developer updates cache blocksizes in bli_kernel.h for only one datatype (say, single-precision real), while the memory pools are sized using the double-precision real values. Then, when the developer attempts to link to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with a message saying the static memory pool was exhausted. Clearly, this message is misleading when the pool was not sized properly to begin with. - Removed previously disabled code in bli_kernel_macro_defs.h that was meant to check for size consistency among the various cache blocksizes. (Obviously the memory pool size-based solution mentioned above is better.) - Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a reasonable place to put these constants, rather than further crowd up bli_config.h. - Updated testsuite driver to output memory pool sizes for A, B, and C. - Minor comment updates to bli_config.h. - Removed 'flame' configuration. It was beginning to get out-of-date, and I hadn't used it in months. We can always re-create it later.	2013-09-13 12:00:37 -05:00
Field G. Van Zee	7ae4d7a41d	Various changes to treatment of integers. Details: - Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be assigned values of 32, 64, or some other value. The former two result in defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter causes integers to be defined in terms of a default type (e.g. long int). - Updated bli_config.h in reference and clarksville configurations according to above changes. - Updated test drivers in test and testsuite to avoid type warnings associated with format specifiers not matching the types of their arguments to printf() and scanf(). - Inserted missing #include "bli_system.h" into blis.h (which was slated for inclusion in `d141f9eeb6`). - Added explicit typecasting of dim_t and inc_t to macros in bli_blas_macro_defs.h (which are used in BLAS compatibility layer). - Slight changes to CREDITS and INSTALL files. - Slight tweaks to Windows build system, mostly in the form of switching to Windows-style CRLF newlines for certain files.	2013-09-10 16:35:12 -05:00
Field G. Van Zee	068437736b	Fixed set-but-not-used compiler (gcc) warnings. Details: - Used void-casts of certain variables to appease gcc (and perhaps other compilers) when such variables are only used in the complex instances of the functions. Special thanks to Karl Rupp for suggesting a portable fix for these warnings.	2013-09-09 14:07:58 -05:00
Field G. Van Zee	d141f9eeb6	Added Windows build system. Details: - Added a 'windows' directory, which contains a Windows build system similar to that of libflame's. Thanks to Martin for getting this up and running. - Spun off system header #includes into bli_system.h, which is included in blis.h - Added a Windows section to bli_clock.c (similar to libflame's).	2013-09-09 13:09:16 -05:00
Field G. Van Zee	9b320e7406	Edited bli_?lamch.c to avoid Windows keyword. Details: - Renamed "small" variable to "smnum" to avoid collision with Windows type by the same name. This change is needed in advance of the upcoming Windows build system.	2013-09-09 11:04:46 -05:00
Field G. Van Zee	9013ad6ff2	Switched integer typedefs (again) to C types. Details: - Redefined gint_t and guint_t in terms of the standard C types long int and unsigned long int, respectively. - Changed testsuite default max problem size to 500. - Changed testsuite input.operations to use square problems for level-3 operation tests.	2013-09-04 13:36:07 -05:00
Field G. Van Zee	981a60cfa0	Falling back to 32-bit integers for dim_t, etc. Details: - In light of recent segfaulting issues when compiling on 32-bit systems, I've changed the default typedef for gint_t and guint_t from int64_t and uint64_t to int32_t and uint32_t, respectively. - Disabled 64-bit integers in the blas2blis layer for the reference configuration. - Added type sizes of gint_t, guint_t, and the four floating-point datatypes to introductory output of the testsuite.	2013-09-04 12:09:11 -05:00
Field G. Van Zee	d352c746e5	Added single/real gemm micro-kernel for x86_64. Details: - Added a single-precision real gemm micro-kernel in kernels/x86_64/3/bli_gemm_opt_d4x4.c. - Adjusted the single-precision real register blocksizes in config/clarksville/bli_kernel.h to be 8x4. - Added a missing comment to bli_packm_blk_var2.c that was present in bli_packm_blk_var3.c	2013-08-27 13:41:46 -05:00
Field G. Van Zee	dedda523dc	Fixed bug in bli_acquire_mpart_t2b(), _l2r(). Details: - Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r() that cause incorrect partitioning when SUBPART0 was requested. This bug was introduced in `46d3d09d49`. Thanks to Bryan for isolating this bug. - Removed dupl kernels from kernels/x86_64/3 directory. - Uncommented beta == 0 optimizaition code in kernels/x86_64/3/bli_gemm_opt_d4x4.c.	2013-08-19 12:07:41 -05:00
Field G. Van Zee	12dbd2f334	Moved init_safe(), finalize_safe() to BLAS compat. Details: - Moved the bli_init_safe() and bli_finalize_safe() function calls from the BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto- initializers in the BLIS layer wasn't buying us anything because the user could still call the library with uninitialized global scalar constants, for example. Thus, we will just have to live with the constraint that bli_init() MUST be called before calling ANY routine with a bli_ prefix. - Added the missing _init_safe() and finalize_safe() calls to the level-1 BLAS compatibility wrappers.	2013-08-08 14:39:35 -05:00
Field G. Van Zee	8abfe55f2a	Miscellaneous updates. Details: - Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to BLIS_CACHE_LINE_SIZE (typically 64). - Changed the use of nr in sizing of bd buffer to packnr in level-3 macro- kernels. - Reformulated gemm_ker_var2 to look more like the other level-3 macro- kernels, in that the interior and edge-case handling is expressed once inside the loops in the n and m dimensions, rather than the edge-case handling being "unrolled" and expressed as distinct code regions. The previous macro-kernel now lives in retired form in the subdirectory other/bli_gemm_ker_var2.c.old. - Updated experimental gemm_ker_var5 according to above change. - Fixed bug in bli_her2k.c whereby incorrect transformations were being applied to optimize the macro-kernel accesses pattern on C when C is row-stored. - Various updates inside of test/exec_sizes.	2013-08-08 13:30:19 -05:00
Field G. Van Zee	1aa05736ff	Fixed bug in interface of bla_ger_check(). Details: - Fixed the misplaced lda parameter in the function signature of bla_ger_check(). Thanks to Tyler for finding this bug.	2013-08-07 12:27:04 -05:00
Field G. Van Zee	685aad2535	Fixed cpp guard typos in frame/compat/check files. Details: - Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this. - Fixed various syntax errors in the code that had yet to be compiled due to the aforementioned bug.	2013-08-06 12:25:51 -05:00
Field G. Van Zee	f4ec28e723	Added basic OpenMP-based gemm and packm files. Details: - Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2 into the following auxiliary files frame/1m/packm/other/bli_packm_blk_var2.c frame/3/gemm/other/bli_gemm_ker_var2.c The routine in the first file uses a basic OpenMP parallel region to parallelize the packing of blocks of A and panels of B, while the second uses a similar parallel region to parallelize along the n dimension of the gemm macro-kernel.	2013-08-01 11:24:23 -05:00
Field G. Van Zee	f8980edf9c	Merge branch 'master' of https://code.google.com/p/blis	2013-07-26 11:14:27 -05:00
Field G. Van Zee	67a8b9498d	Added missing cpp kernel blocksize constraints. Details: - Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce constraints on the register blocksizes relative to the cache blocksizes. Thanks to Tyler for helping me stumble across this issue.	2013-07-26 11:12:37 -05:00
Field G. Van Zee	6e7e452343	Fixed minor warnings and misc issues. Details: - Fixed various warnings output by gcc 4.6.3-1, including removing some set-but-not-used variables and addressing some instances of typecasting of pointer types to integer types of different sizes.	2013-07-22 14:50:57 -05:00
Field G. Van Zee	03f6c35997	Tightened some macros that detect datatypes. Details: - Modified the definitions of some macros, such as bli_is_real(), so that the "special" bit is taken into account so that BLIS_INT is differentiated from BLIS_FLOAT. - Whitespace changes to bli_obj_macro_defs.h. - Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't being used.	2013-07-22 12:54:32 -05:00
Field G. Van Zee	0680916fdd	Added BLAS error checking to compatibility layer. Details: - Added frame/compat/check directory, which now houses companion _check() routines for each of the BLAS wrappers in frame/compat. These _check() routines are called from the compatibility wrappers and mimic the error-checking present in the netlib BLAS. - Edited bla_xerbla.c so that xerbla() translates the operation string to uppercase before printing. - Redefined util routines in frame/compat/f2c/util in terms of level0 macros. - Added prototypes for util routines, f2c routines, lsame(), and xerbla(). - Commented out prototypes in test/test_*.c since Fortran integers are now int64_t by default (and the prototypes that were present in the files used int). - Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c, since blis.h was already being included. - Other minor changes to code in frame/compat/f2c.	2013-07-18 18:04:34 -05:00
Field G. Van Zee	4e80ad28c9	Added support for C99 complex types/arithmetic. Details: - Added support for C99 complex types to bli_type_defs.h and overloaded complex arithmetic to the scalar-level macros in include/level0. This includes a somewhat substantial reorganization and re-layering of much of the existing machinery present in the level0 macros. - Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files, commented-out by default, which optionally enables the use of built-in C99 complex types and arithmetic. - Minor changes to clarksville and reference configs' make_defs.mk files. - Removed macro definitions from bli_param_macro_defs.h which was not being used (bli_proj_dt_to_real_if_imag_eq0).	2013-07-18 17:53:31 -05:00
Field G. Van Zee	6072d7c848	Fixed bugs in trsm, trmm macro-kernels. Details: - Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling. - Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was incorrectly being adjusted upward by MR, instead of NR. The rl and ru trmm macro-kernels were updated in a similar fashion. - Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on diagoffb when recomputing k to skip a zero region below where the diagonal intersects the right side of the block. The corresponding trmm macro-kernel was also updated. - Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR) needed to be placed AFTER the block that recomputes k to skip the zero region (if present). The other three trsm macro-kernels, as well as the trmm macro-kernels, were updated in the same manner, for consistency. - Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was being updated to skip a zero region to the left of where the diagonal of A intersects the top edge of the block. - Comment updates to all trsm and trmm macro-kernels. - Comment updates to bli_packm_init.c.	2013-07-17 12:27:45 -05:00
Field G. Van Zee	47410a48f9	Added f2c'ed Givens rotation wrappers. Details: - Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic' along with other wrappers for which no BLIS implementation exists. - Added f2c-generated codes for applicable datatype flavors of rot, rotg, rotm, and rotmg operations.	2013-07-10 14:53:59 -05:00
Field G. Van Zee	aec12d90f5	Removed copynzv, copynzm and related codes. Details: - Removed copynzv and copynzm operation directories. These operations implemented a variation of copyv/m that, in the case of real source and complex destination operands, leaves the imaginary component untouched (rather than setting it to zero). I realize now that the special case(s) (e.g. gemm with real A and B but complex C) that I thought required this operation actually can be handled more simply. - Removed level0 scalar macros implementing copynzs, copynzjs.	2013-07-10 13:33:30 -05:00
Field G. Van Zee	b0a0a0f274	Added handling of restrict, stdint.h for non-C99. Details: - Removed the #include <stdint.h> from blis.h and inserted a cpp macro block in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise manually typedefs the types we need (which, for now, are unconditionally int64_t and uint64_t). - Moved basic typedefs to top of bli_type_defs.h, and comment changes. - Added cpp macro block to bli_macro_defs.h that #defines restrict as nothing for C++ and non-C99.	2013-07-09 17:15:38 -05:00
Field G. Van Zee	4b7e7970f1	Migrated integer usage to stdint.h types. Details: - Changed the way bli_type_defs.h defines integer types so that dim_t, inc_t, doff_t, etc. are all defined in terms of gint_t (general signed integer) or guint_t (general unsigned integer). - Renamed Fortran types fchar and fint to f77_char and f77_int. - Define f77_int as int64_t if a new configuration variable, BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise. These types are defined in stdint.h, which is now included in blis.h. - Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed in terms of scomplex. - Renamed "char" type in f2c files to "character" and typedef'ed in terms of char. - Updated bla_amax() wrappers so that the return type is defined directly as f77_int, rather than letting the prototype-generating macro decide the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros, so I removed them. Also, changed the body of the wrapper so that a gint_t is passed into abmaxv, which is THEN typecast to an f77_int before returning the value. - Updated f2c code that accessed .r and .i fields of complex and doublecomplex types so that they use .real and .imag instead (now that we are using scomplex and dcomplex).	2013-07-08 15:20:34 -05:00
Field G. Van Zee	3725013985	Added experimental bli_gemm_ker_var5(). Details: - Added support for an experimental gemm macro-kernel incrementally packs one micro-panel of B at a time. This is useful for certain special cases of gemm where m is small. - Minor changes to default values of clarksville configuration. - Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we do not yet have any use (or implementation support) for block storage. - Comment update to bli_packm_init.c.	2013-07-08 11:24:18 -05:00
Field G. Van Zee	9915d667a7	Defined "total" blocksize query functions. Details: - Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query the default blocksize plus blocksize extension (using the type or the type of an object). - Comment update in bli_packm_cxk.c.	2013-07-07 13:28:39 -05:00
Field G. Van Zee	46d3d09d49	Consolidated lower/upper her[2]k blocked variants. Details: - Consolidated lower and upper blocked variants for herk and her2k, and renamed the resulting variants, according to the same changes recently made to trmm and trsm. - Implemented support for four new subpartitions types: BLIS_SUBPART1T BLIS_SUBPART1B BLIS_SUBPART1L BLIS_SUBPART1R which correspond to "merged" partitions that include the middle "1" partition as well as either the neighboring "0" or "2" partition. This is used to clean up code in herk/her2k var2 that attempts to partition away the strictly zero region above or below the diagonal of a matrix operand that is being marched through diagonally. - Added safeguards to herk macro-kernels that skip any leading or trailing zero region in the panel of C that is passed in. This is now needed given that herk/her2k var1 no longer partitions off this zero region before calling the macro-kernel (via bli_her[2]k_int()). - Updated comments and other whitespace changes to trmm/trsm macro-kernels.	2013-06-27 13:19:56 -05:00
Field G. Van Zee	02002ef6f3	Added row-storage optimizations for trmm, trsm. Details: - Implemented algorithmic optimizations for trmm and trsm whereby the right side case is now handled explicitly, rather than induced indirectly by transposing and swapping strides on operands. This allows us to walk through the output matrix with favorable access patterns no matter how it is stored, for all parameter combinations. - Renamed trmm and trsm blocked variants so that there is no longer a lower/upper distinction. Instead, we simply label the variants by which dimension is partitioned and whether the variant marches forwards or backwards through the corresponding partitioned operands. - Added support for row-stored packing of lower and upper triangular matrices (as provided by bli_packm_blk_var3.c). - Fixed a performance bug in bli_determine_blocksize_b() whereby the cache blocksize extensions (if non-zero) were not being used to appropriately size the first iteration (ie: the bottom/right edge case). - Updated comments in bli_kernel.h to indicate that both MC and NC must be whole multiples of MR AND NR. This is needed for the case of trsm_r where, in order to reuse existing left-side gemmtrsm fused micro-kernels, the packing of A (left-hand operand) and B (right-hand operand) is done with NR and MR, respectively (instead of MR and NR).	2013-06-24 17:08:14 -05:00
Field G. Van Zee	d1e81ddc84	Minor generalizing tweaks to trmm blk var1, var2.	2013-06-13 11:14:21 -05:00
Field G. Van Zee	08475e7c76	Various level-3 optimizations for row storage. Details: - Implemented remaining two cases within bli_packm_blk_var2(), which allow packing from a lower or upper-stored symmetric/Hermitian matrix to column panels (which are row-stored). Previously one could only pack to row panels (which are column-stored). - Implemented various optimizations in the level-3 front-ends that allow more favorable access through row-stored matrices for gemm, hemm, herk, her2k, symm, syrk, and syr2k. - Cleaned up code in level-3 front-ends that has to do with setting target and execution datatypes.	2013-06-11 12:18:39 -05:00
Field G. Van Zee	22b06cfcd2	Updated level-1/-1f [vector intrinsic] kernels. Details: - Updated level-1/-1f kernels so that non-unit and un-aligned cases are handled by reference implementation (rather than aborted). - Added -fomit-frame-pointer to default make_defs.mk for clarksville configuration. - Defined bli_offset_from_alignment() macro. - Minor edits to old test drivers.	2013-06-03 16:54:52 -05:00
Field G. Van Zee	0288c827d3	Updated ukernels for x86_64. Details: - Tweaked micro-kernels and configuration for clarksville. - Updated/cleaned up old test drivers in test directory. - Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced recently).	2013-06-01 08:02:23 -05:00
Field G. Van Zee	85a6d1c9a5	Replaced axpys usage with subs in trsv. Details: - Replaced instances of axpys with alpha equal to -1 with subs. - Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of sizeof(dcomplex).	2013-05-29 10:58:24 -05:00
Field G. Van Zee	2d9c667f3c	Fixed x86_64 kernel bugs and other minor issues. Details: - Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in unaligned subpartitions. We were already going out of our way a bit to handle edge cases in the first iteration for blocked variants, and this was simply the unblocked-fused extension of that idea. - Fixed control tree handling in her/her2/syr/syr2 that was not taking into account how the choice of variant needed to be altered for upper-stored matrices (given that only lower-stored algorithms are explicitly implemented). - Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b() macros to provide inlined versions of bli_determine_blocksize_[fb]() for use by unblocked-fused variants. - Integrated new blocksize_dim macros into gemv/hemv unf variants for consistency with that of the bugfix for trmv/trsv (both of which now use the same macros). - Modified bli_obj_vector_inc() so that 1 is returned if the object is a vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain conditions (e.g. dotv_opt_var1), an invalid increment was returned, which was invalid only because the code was expecting 1 (for purposes of performing contiguous vector loads) but got a value greater than 1 because the column stride of the object (e.g. rho) was inflated for alignment purposes (albeit unnecessarily since there is only one element in the object). - Replaced some old invocations of set0 with set0s. - Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly. - Fixed increment bug in cleanup loop of gemm ukernel for x86_64. - Added safeguard to test modules so that testing a problem with a zero dimension does not result in a failure. - Tweaked handling of zero dimensions in level-2 and level-3 operations' internal back-ends to correctly handle cases where output operand still needs to be scaled (e.g. by beta, in the case of gemm with k = 0).	2013-05-24 16:28:10 -05:00
Field G. Van Zee	d57ec42b34	Renamed _trans_status() macro. Details: - Mistakenly forgot to rename the _trans_status() macro and instances in previous commit.	2013-05-03 17:35:32 -05:00
Field G. Van Zee	9e2b227866	Renamed _set_trans(), _trans_status() macros. Details: - Renamed the following macros: bli_obj_set_trans() -> bli_obj_set_onlytrans() bli_obj_trans_status() -> bli_obj_onlytrans_status() to remove ambiguity as to which bits are read/updated.	2013-05-03 17:24:58 -05:00
Field G. Van Zee	2f8174509e	Unconditionally check memory pool(s) for errors. Details: - Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the memory pool is exhausted before checking out and returning a block, even if BLIS error checking has been disabled. These errors are useful because they likely indicate that BLIS was improperly configured for the code being run.	2013-05-01 15:06:30 -05:00
Field G. Van Zee	6bfa96f848	Absorbed blocksize extensions into main objects. Details: - Revamped some parts of commit `b6ef84fad1` by adding blocksize extension fields to the blksz_t object rather than have them as separate structs. - Updated all packm interfaces/invocations according to above change. - Generalized bli_determine_blocksize_?() so that edge case optimization happens if and only if cache blocksizes are created with non-zero extensions. - Updated comments in bli_kernel.h files to indicate that the edge case blocksize extension mechanism is now available for use.	2013-04-30 19:35:54 -05:00
Field G. Van Zee	096b366ddc	Use cntl trees that block in n dimension. Details: - Updated _cntl.c files for each level-3 operation to induce blocked algorithms that first paritition in the n dimension with a blocksize of NC. Typically this is not an issue since only very large problems exceed that of NC. But developers often run very large problems, and so this extra blocking should be the default. - Removed some recently introduced but now unused macros from bli_param_macro_defs.h.	2013-04-25 16:43:43 -05:00

1 2 3

129 Commits