amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	03f6c35997	Tightened some macros that detect datatypes. Details: - Modified the definitions of some macros, such as bli_is_real(), so that the "special" bit is taken into account so that BLIS_INT is differentiated from BLIS_FLOAT. - Whitespace changes to bli_obj_macro_defs.h. - Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't being used.	2013-07-22 12:54:32 -05:00
Field G. Van Zee	b33e2f4443	CHANGELOG update (for 0.0.9).	2013-07-19 17:15:03 -05:00
Field G. Van Zee	0680916fdd	Added BLAS error checking to compatibility layer. Details: - Added frame/compat/check directory, which now houses companion _check() routines for each of the BLAS wrappers in frame/compat. These _check() routines are called from the compatibility wrappers and mimic the error-checking present in the netlib BLAS. - Edited bla_xerbla.c so that xerbla() translates the operation string to uppercase before printing. - Redefined util routines in frame/compat/f2c/util in terms of level0 macros. - Added prototypes for util routines, f2c routines, lsame(), and xerbla(). - Commented out prototypes in test/test_*.c since Fortran integers are now int64_t by default (and the prototypes that were present in the files used int). - Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c, since blis.h was already being included. - Other minor changes to code in frame/compat/f2c. 0.0.9	2013-07-18 18:04:34 -05:00
Field G. Van Zee	4e80ad28c9	Added support for C99 complex types/arithmetic. Details: - Added support for C99 complex types to bli_type_defs.h and overloaded complex arithmetic to the scalar-level macros in include/level0. This includes a somewhat substantial reorganization and re-layering of much of the existing machinery present in the level0 macros. - Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files, commented-out by default, which optionally enables the use of built-in C99 complex types and arithmetic. - Minor changes to clarksville and reference configs' make_defs.mk files. - Removed macro definitions from bli_param_macro_defs.h which was not being used (bli_proj_dt_to_real_if_imag_eq0).	2013-07-18 17:53:31 -05:00
Field G. Van Zee	6072d7c848	Fixed bugs in trsm, trmm macro-kernels. Details: - Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling. - Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was incorrectly being adjusted upward by MR, instead of NR. The rl and ru trmm macro-kernels were updated in a similar fashion. - Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on diagoffb when recomputing k to skip a zero region below where the diagonal intersects the right side of the block. The corresponding trmm macro-kernel was also updated. - Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR) needed to be placed AFTER the block that recomputes k to skip the zero region (if present). The other three trsm macro-kernels, as well as the trmm macro-kernels, were updated in the same manner, for consistency. - Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was being updated to skip a zero region to the left of where the diagonal of A intersects the top edge of the block. - Comment updates to all trsm and trmm macro-kernels. - Comment updates to bli_packm_init.c.	2013-07-17 12:27:45 -05:00
Field G. Van Zee	47410a48f9	Added f2c'ed Givens rotation wrappers. Details: - Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic' along with other wrappers for which no BLIS implementation exists. - Added f2c-generated codes for applicable datatype flavors of rot, rotg, rotm, and rotmg operations.	2013-07-10 14:53:59 -05:00
Field G. Van Zee	e5f90f3a8d	Removed copynz defs from bli_kernel.h files. Details: - Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each configuration. (Meant to include this in previous commit.)	2013-07-10 13:40:12 -05:00
Field G. Van Zee	aec12d90f5	Removed copynzv, copynzm and related codes. Details: - Removed copynzv and copynzm operation directories. These operations implemented a variation of copyv/m that, in the case of real source and complex destination operands, leaves the imaginary component untouched (rather than setting it to zero). I realize now that the special case(s) (e.g. gemm with real A and B but complex C) that I thought required this operation actually can be handled more simply. - Removed level0 scalar macros implementing copynzs, copynzjs.	2013-07-10 13:33:30 -05:00
Field G. Van Zee	b0a0a0f274	Added handling of restrict, stdint.h for non-C99. Details: - Removed the #include <stdint.h> from blis.h and inserted a cpp macro block in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise manually typedefs the types we need (which, for now, are unconditionally int64_t and uint64_t). - Moved basic typedefs to top of bli_type_defs.h, and comment changes. - Added cpp macro block to bli_macro_defs.h that #defines restrict as nothing for C++ and non-C99.	2013-07-09 17:15:38 -05:00
Field G. Van Zee	4b7e7970f1	Migrated integer usage to stdint.h types. Details: - Changed the way bli_type_defs.h defines integer types so that dim_t, inc_t, doff_t, etc. are all defined in terms of gint_t (general signed integer) or guint_t (general unsigned integer). - Renamed Fortran types fchar and fint to f77_char and f77_int. - Define f77_int as int64_t if a new configuration variable, BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise. These types are defined in stdint.h, which is now included in blis.h. - Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed in terms of scomplex. - Renamed "char" type in f2c files to "character" and typedef'ed in terms of char. - Updated bla_amax() wrappers so that the return type is defined directly as f77_int, rather than letting the prototype-generating macro decide the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros, so I removed them. Also, changed the body of the wrapper so that a gint_t is passed into abmaxv, which is THEN typecast to an f77_int before returning the value. - Updated f2c code that accessed .r and .i fields of complex and doublecomplex types so that they use .real and .imag instead (now that we are using scomplex and dcomplex).	2013-07-08 15:20:34 -05:00
Field G. Van Zee	3725013985	Added experimental bli_gemm_ker_var5(). Details: - Added support for an experimental gemm macro-kernel incrementally packs one micro-panel of B at a time. This is useful for certain special cases of gemm where m is small. - Minor changes to default values of clarksville configuration. - Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we do not yet have any use (or implementation support) for block storage. - Comment update to bli_packm_init.c.	2013-07-08 11:24:18 -05:00
Field G. Van Zee	9915d667a7	Defined "total" blocksize query functions. Details: - Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query the default blocksize plus blocksize extension (using the type or the type of an object). - Comment update in bli_packm_cxk.c.	2013-07-07 13:28:39 -05:00
Field G. Van Zee	46d3d09d49	Consolidated lower/upper her[2]k blocked variants. Details: - Consolidated lower and upper blocked variants for herk and her2k, and renamed the resulting variants, according to the same changes recently made to trmm and trsm. - Implemented support for four new subpartitions types: BLIS_SUBPART1T BLIS_SUBPART1B BLIS_SUBPART1L BLIS_SUBPART1R which correspond to "merged" partitions that include the middle "1" partition as well as either the neighboring "0" or "2" partition. This is used to clean up code in herk/her2k var2 that attempts to partition away the strictly zero region above or below the diagonal of a matrix operand that is being marched through diagonally. - Added safeguards to herk macro-kernels that skip any leading or trailing zero region in the panel of C that is passed in. This is now needed given that herk/her2k var1 no longer partitions off this zero region before calling the macro-kernel (via bli_her[2]k_int()). - Updated comments and other whitespace changes to trmm/trsm macro-kernels.	2013-06-27 13:19:56 -05:00
Field G. Van Zee	02002ef6f3	Added row-storage optimizations for trmm, trsm. Details: - Implemented algorithmic optimizations for trmm and trsm whereby the right side case is now handled explicitly, rather than induced indirectly by transposing and swapping strides on operands. This allows us to walk through the output matrix with favorable access patterns no matter how it is stored, for all parameter combinations. - Renamed trmm and trsm blocked variants so that there is no longer a lower/upper distinction. Instead, we simply label the variants by which dimension is partitioned and whether the variant marches forwards or backwards through the corresponding partitioned operands. - Added support for row-stored packing of lower and upper triangular matrices (as provided by bli_packm_blk_var3.c). - Fixed a performance bug in bli_determine_blocksize_b() whereby the cache blocksize extensions (if non-zero) were not being used to appropriately size the first iteration (ie: the bottom/right edge case). - Updated comments in bli_kernel.h to indicate that both MC and NC must be whole multiples of MR AND NR. This is needed for the case of trsm_r where, in order to reuse existing left-side gemmtrsm fused micro-kernels, the packing of A (left-hand operand) and B (right-hand operand) is done with NR and MR, respectively (instead of MR and NR).	2013-06-24 17:08:14 -05:00
Field G. Van Zee	d1e81ddc84	Minor generalizing tweaks to trmm blk var1, var2.	2013-06-13 11:14:21 -05:00
Field G. Van Zee	0efb7974f1	CHANGELOG update.	2013-06-12 16:40:04 -05:00
Field G. Van Zee	5b641c3bab	Use separate CFLAGS for "kernels" directories. Details: - Added a new "special" directory type: any source code within directories named "kernels" will be compiled with a separate CFLAGS_KERNELS set of compiler flags. This allows the developer to specify a separate set of flags (e.g. optimization flags) for compiling kernels while maintaining a standard set for regular framework code. - Fixed a bug in the top-level Makefile that was causing "noopt" code to be compiled with the standard set of compilation flags. - Updated make_defs.mk in reference, flame, and clarksville configurations according to above changes. 0.0.8	2013-06-12 16:02:12 -05:00
Field G. Van Zee	08475e7c76	Various level-3 optimizations for row storage. Details: - Implemented remaining two cases within bli_packm_blk_var2(), which allow packing from a lower or upper-stored symmetric/Hermitian matrix to column panels (which are row-stored). Previously one could only pack to row panels (which are column-stored). - Implemented various optimizations in the level-3 front-ends that allow more favorable access through row-stored matrices for gemm, hemm, herk, her2k, symm, syrk, and syr2k. - Cleaned up code in level-3 front-ends that has to do with setting target and execution datatypes.	2013-06-11 12:18:39 -05:00
Field G. Van Zee	05a657a6b9	Added beta == 0 optimization to x86_64 ukernel. Details: - Modified x86_64 gemm microkernel so that when beta is zero, C is not read from memory (nor scaled by beta). - Fixed minor bug in test suite driver when "Test all combinations of storage schemes?" switch is disabled, which would result in redundant tests being executed for matrix-only (e.g. level-1m, level-3) operations if multiple vector storage schemes were specified. - Restored debug flags as default in clarksville configuration.	2013-06-07 11:04:10 -05:00
Field G. Van Zee	f1aa6b81cc	Whitespace changes to old test drivers. Details: - Replaced tabs with four spaces in places where indention was already in place.	2013-06-06 13:36:06 -05:00
Field G. Van Zee	9feb4c23d2	Fixed unaligned handling in axpyf, dotxaxpyf. Details: - Fixed over-cautious handling of unaligned operands in vector instrinsic implementation of axpyf kernel. - Fixed over- and under-cautious handling of unaligned operands in vector intrinsic implementation of dotxaxpyf kernel.	2013-06-04 14:57:46 -05:00
Field G. Van Zee	22b06cfcd2	Updated level-1/-1f [vector intrinsic] kernels. Details: - Updated level-1/-1f kernels so that non-unit and un-aligned cases are handled by reference implementation (rather than aborted). - Added -fomit-frame-pointer to default make_defs.mk for clarksville configuration. - Defined bli_offset_from_alignment() macro. - Minor edits to old test drivers.	2013-06-03 16:54:52 -05:00
Field G. Van Zee	0288c827d3	Updated ukernels for x86_64. Details: - Tweaked micro-kernels and configuration for clarksville. - Updated/cleaned up old test drivers in test directory. - Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced recently).	2013-06-01 08:02:23 -05:00
Field G. Van Zee	85a6d1c9a5	Replaced axpys usage with subs in trsv. Details: - Replaced instances of axpys with alpha equal to -1 with subs. - Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of sizeof(dcomplex).	2013-05-29 10:58:24 -05:00
Field G. Van Zee	2d9c667f3c	Fixed x86_64 kernel bugs and other minor issues. Details: - Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in unaligned subpartitions. We were already going out of our way a bit to handle edge cases in the first iteration for blocked variants, and this was simply the unblocked-fused extension of that idea. - Fixed control tree handling in her/her2/syr/syr2 that was not taking into account how the choice of variant needed to be altered for upper-stored matrices (given that only lower-stored algorithms are explicitly implemented). - Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b() macros to provide inlined versions of bli_determine_blocksize_[fb]() for use by unblocked-fused variants. - Integrated new blocksize_dim macros into gemv/hemv unf variants for consistency with that of the bugfix for trmv/trsv (both of which now use the same macros). - Modified bli_obj_vector_inc() so that 1 is returned if the object is a vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain conditions (e.g. dotv_opt_var1), an invalid increment was returned, which was invalid only because the code was expecting 1 (for purposes of performing contiguous vector loads) but got a value greater than 1 because the column stride of the object (e.g. rho) was inflated for alignment purposes (albeit unnecessarily since there is only one element in the object). - Replaced some old invocations of set0 with set0s. - Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly. - Fixed increment bug in cleanup loop of gemm ukernel for x86_64. - Added safeguard to test modules so that testing a problem with a zero dimension does not result in a failure. - Tweaked handling of zero dimensions in level-2 and level-3 operations' internal back-ends to correctly handle cases where output operand still needs to be scaled (e.g. by beta, in the case of gemm with k = 0).	2013-05-24 16:28:10 -05:00
Field G. Van Zee	d57ec42b34	Renamed _trans_status() macro. Details: - Mistakenly forgot to rename the _trans_status() macro and instances in previous commit.	2013-05-03 17:35:32 -05:00
Field G. Van Zee	9e2b227866	Renamed _set_trans(), _trans_status() macros. Details: - Renamed the following macros: bli_obj_set_trans() -> bli_obj_set_onlytrans() bli_obj_trans_status() -> bli_obj_onlytrans_status() to remove ambiguity as to which bits are read/updated.	2013-05-03 17:24:58 -05:00
Field G. Van Zee	2f8174509e	Unconditionally check memory pool(s) for errors. Details: - Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the memory pool is exhausted before checking out and returning a block, even if BLIS error checking has been disabled. These errors are useful because they likely indicate that BLIS was improperly configured for the code being run.	2013-05-01 15:06:30 -05:00
Field G. Van Zee	75405a2b83	CHANGELOG update.	2013-05-01 15:00:30 -05:00
Field G. Van Zee	6bfa96f848	Absorbed blocksize extensions into main objects. Details: - Revamped some parts of commit `b6ef84fad1` by adding blocksize extension fields to the blksz_t object rather than have them as separate structs. - Updated all packm interfaces/invocations according to above change. - Generalized bli_determine_blocksize_?() so that edge case optimization happens if and only if cache blocksizes are created with non-zero extensions. - Updated comments in bli_kernel.h files to indicate that the edge case blocksize extension mechanism is now available for use. 0.0.7	2013-04-30 19:35:54 -05:00
Field G. Van Zee	bc7c8005ce	Added option to disable err checking in testsuite. Details: - Added a new line to input.general that allows one to specify the error- checking level to use for each BLIS experiment. The only two levels supported for now are "no error checking" and "full error checking".	2013-04-25 17:16:59 -05:00
Field G. Van Zee	096b366ddc	Use cntl trees that block in n dimension. Details: - Updated _cntl.c files for each level-3 operation to induce blocked algorithms that first paritition in the n dimension with a blocksize of NC. Typically this is not an issue since only very large problems exceed that of NC. But developers often run very large problems, and so this extra blocking should be the default. - Removed some recently introduced but now unused macros from bli_param_macro_defs.h.	2013-04-25 16:43:43 -05:00
Field G. Van Zee	b6e24b23cb	Use PASTEMAC in macro-kernels (over MAC2 or MAC3). Details: - Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2 and PASTEMAC3) with those that only use a single type (PASTEMAC). - Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to accommodate above change. - Fixed comment typo in bli_config.h files. - Added .nfs* pattern to .gitignore.	2013-04-25 12:06:12 -05:00
Field G. Van Zee	df80acf517	Fixed computation of b_next in L3 macro-kernels. Details: - Restructured herk_l and herk_u macro-kernels in the imagine of trmm and trsm, in that the edge cases are captured by the main loop, rather than trying to have "cleanup" sections that result in four distinct parts (interior, bottom edge, right edge, bottom-right edge) of the code. - Fixed the way b_next was being computed in the non-gemm level-3 macro-kernels (herk, trmm, trsm). The way they are computed now matches that of gemm.	2013-04-23 19:43:23 -05:00
Field G. Van Zee	3671528cf8	Fixed minor bug in computing b_next in gemm.	2013-04-23 19:12:14 -05:00
Field G. Van Zee	db072a5b4a	Fixed rare edge case bug in herk_l macro-kernel. Details: - Fixed a potential bug in herk_l at the m_left edge case. If MR was chosen to be much larger than NR, then one could encounter edge cases in the the MC dimension that fall entirely below the diagonal, which the previous implementation of the herk_l macro-kernel was not allowing for.	2013-04-23 17:49:10 -05:00
Field G. Van Zee	1dab11e37d	Updated x86 gemmtrsm ukernels to use alpha.	2013-04-23 17:17:11 -05:00
Field G. Van Zee	9d10d7dd9b	Added a_next, b_next arguments to micro-kernels. Details: - Added two more arguments to the gemm and gemmtrsm microkernels: the addresses of the next micro-panels of A and B. By passing these pointers into the micro-kernel, we allow the micro-kernel author to prefetch micro-panels of A and B as necessary (though this is completely optional; these addresses may also be safely ignored). - Updated all seven macro-kernels so that they compute and pass in a_next and b_next. Note that ONLY the gemm macro-kernel computes a_next and b_next with the precise semantics we want. I will go back and fix the other macro-kernels in the near future. - Added 'restrict' to various micro-kernels from which it was missing.	2013-04-23 16:00:18 -05:00
Field G. Van Zee	f3815dc84d	Added code for backward edge-case blocking. Disabled: - Edited bli_determine_blocksize_b() to include experimental (and currently disabled) code that computes extended blocks. - Updated commnts relate to above changes. - Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.	2013-04-23 11:12:33 -05:00
Field G. Van Zee	4fe1435f20	Updated dupl implementation to use PACKNR and NR. Details: - Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR explicitly so navigate b1 so that situations where PACKNR > NR are supported. - Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and frame/3/trsm/ukernels to kernels/c99/. - Updated clarksville and flame configurations.	2013-04-22 19:00:43 -05:00
Field G. Van Zee	2d6f9e8379	Disabled blocksize checks for memory pools. Details: - Temporarily disabled checks that ensure that enough memory will be allocated by the contiguous memory allocator for all types, given that the values for double precision real are the ones used to allocate the space. These checks can easily go awry in certain situations, especially if you are developing for only one datatype. So for now, they are probably more trouble than they are worth.	2013-04-21 15:10:34 -05:00
Field G. Van Zee	b6ef84fad1	Allow ldim of packed micro-panels != MR, NR. Details: - Made substantial changes throughout the framework to decouple the leading dimension (row or column stride) used within each packed micro-panel from the corresponding register blocksize. It appears advantageous on some systems to use, for example, packed micro-panels of A where the column stride is greater than MR (whereas previously it was always equal to MR). - Changes include: - Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding to use when packing micro-panels of A and B. - Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR where appropriate, instead of MR and NR. - Added pd field (panel dimension) to obj_t. - New interface to bli_packm_cntl_obj_create(). - Renamed bli_obj_packed_length()/_width() macros to bli_obj_padded_length()/_width(). - Removed local #defines for cache/register blocksizes in level-3 *_cntl.c. - Print out new cache and register blocksize extensions in test suite. - Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger blocksize for edge cases, which can improve performance at the margins.	2013-04-21 15:00:24 -05:00
Field G. Van Zee	59fca58dbe	Fixed bug in compatibility layer (her2k/syr2k). Details: - Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c and bla_syr2k.c, that caused incorrect computation to occur when the BLAS interface caller requests the [conjugate-]transpose case. Thanks to Bryan Marker for reporting the behavior that led to this bug.	2013-04-19 15:26:29 -05:00
Field G. Van Zee	09eacbd1ab	Changed old level3 test drivers to call front-ends. Details: - Changed old level-3 test drivers, in 'test' directory, to always call the front-end object API instead of the internal back-end with the locally defined control tree.	2013-04-18 19:39:13 -05:00
Field G. Van Zee	83e45de23e	Allow packm_init() to reacquire a too-small mem_t. Details: - Changed bli_packm_init() to react differently to a situation where a pack obj_t has an already-allocated mem_t entry that has a buffer that is smaller than what will be needed to hold the block/panel that now needs to be packed. Previously, this situation was treated with an abort() since I assumed something was horribly wrong. I have changed the code so that it now reacts by releasing the previous mem_t and re-acquires a new mem_t with the new information. (This change was done at the request of Bryan Marker to facilitate code generation via DxT.)	2013-04-18 18:33:03 -05:00
Field G. Van Zee	a699043417	Fixed bug in packing block of A for hemm/symm. Details: - Fixed a bug in bli_packm_blk_var2() that affected the packing functionality of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or symmetric matrix where the block of A being packed intersects the diagonal, but some of its micro-panels do not intersect the diagonal and lie completely in the unstored region. Thanks to Francisco Igual for reporting this bug. - Comment updates to both _blk_var2.c and _blk_var3.c.	2013-04-18 13:52:47 -05:00
Field G. Van Zee	c92e7590e1	Activated bli_packm_acquire_mpart_t2b(). Details: - Removed the overly-paranoid bli_abort() from the end of bli_packm_acquire_mpart_t2b(), to allow others to experiment with partitioning through packed blocks of A. Also, and more importantly, changed an earlier check that was causing an erroneous (but coincidentally redundant) abort(). Also, updated some of the comments in bli_packm_part.c.	2013-04-17 20:53:29 -05:00
Field G. Van Zee	bea579e9f0	Allow creation of "empty" objects. Details: - Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and modified bli_adjust_strides() to explicitly handle m = n = 0. - Updated bli_check_matrix_strides() to allow cases where m = n = 0.	2013-04-16 19:43:14 -05:00
Field G. Van Zee	7904e20f2e	Fixed "root" object bug in bli_her[2]k/syr[2]k. Details: - Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k, that manifested as the incorrect triangle being updated. It occurred when the user would pass in a matrix object that was correctly marked as symmetric/Hermitian and lower-stored, but whose root object was never marked as lower (or upper). We now alias and re-assign root status for matrix C within the front-ends. Note that trmm and trsm were already doing this, albeit for a slightly different reason (to allow the internal back-end to choose which algorithm to run--lower or upper--based on the uplo of the root object for both left and right side cases). Thanks to Bryan Marker for leading me to this bug.	2013-04-16 17:37:16 -05:00
Field G. Van Zee	19155a768d	Fixed overzealous type-checking in bli_getsc(). Details: - Relaxed type checking in getsc so that the input object could be a constant and not just a proper floating-point type. (If it is a constant, default to extracting the dcomplex values.) Thanks to Bryan Marker for reporting this bug. - Added definition for bli_is_constant() in bli_param_macro_defs.h - Comment updates to various level-0 scalar routines.	2013-04-16 11:24:03 -05:00

1 2 3

141 Commits