Details:
- Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in
a segmentation fault if a column-stored matrix's starting address was
aligned, but its leading dimension was such that its second column was
unaligned. Basically, the micro-kernel was assuming that aligned load
instructions were safe when they actually were not. An extra condition
that checks the alignment of cs_c (ie: the leading dimension in the
column storage case) has now been added. Thanks to Michael Lehn for
reporting this bug.
No longer requires OpenMP to compile
Define the following in bli_config.h in order to enable multithreading:
BLIS_ENABLE_MULTITHREADING
BLIS_ENABLE_OPENMP
Also fixes a bug with bli_get_range_weighted
Details:
- Relaxed the constraint in bli_obj_attach_buffer_check(), which required
the buffer address being attached to be non-NULL. This is acceptable
because the user was already able to create and use objects with NULL
buffers (via bli_obj_create_without_buffer(), which initializes the
buffer to NULL).
- Inserted calls to newly defined function, bli_check_object_buffer(),
into nearly all operations' _check() or _int_check() functions. This
allows BLIS to abort peacefully if a computational routine is called
with an object containing a NULL buffer. By contrast, under such
conditions, BLAS would typically fail with a segmentation fault.
- Within operation front-ends, moved the calls to _check()/_int_check()
so that zero dimensions are checked first (and if found, execution
returns with trivial or no computation). This resolves issue #7. Thanks
to Jack Poulson for reporting this bug.
Details:
- Added a new field to blksz_t objects that allows one to attach a
sub-object. Doing this allows us to associate a register blocksize with
any given cache blocksize. That way, the register blocksize can be
queried wherever the cache blocksize would normally be accessible
(e.g. a blocked algorithm).
- Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register
blocksizes are attached to the cache blocksizes after they are created.
Details:
- Changed level-2 _cntl.c files so that the blocksizes for gemv are
imported and used, rather than blocksizes being declared locally.
- Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as
4m/3m variants).
- Removed test/old/test_blis2.c.
Removed barrier after unpackm in all level3 blocked variants
Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)
Moved the enabling of the tree barriers into bli_config.h
Fed the default MR and NR for double precision into bli_get_range instead of the number 8
Fixed up some stuff in the thread info free functions
Disabled threading for TRSM so that it actually works when threading environment variables are set
Details:
- Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels.
(However, there may be other instances of similar misuse elsewhere in
BLIS.) Thanks to Jeff Hammond for reporting this issue.
Details:
- Added initialization statements to various macros used in level 1m and
1m-like operations. I wasn't able to reproduce the reported behavior,
so hopefully this takes care of it. Thanks to Jeff Hammond for the
report.
This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines,
but not the outer packing routines.
This allowed, for instance, the block of B to not be finished being packed before computation to occur.
Also enabled weighted partitioning for herk, trmm
Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions
Correctly computed a_next and b_next for gemm, herk macrokernels
a_next and b_next point to the current micropanels in trmm