Details:
- Fixed a bug in the way that the variadic bli_cntx_set_l3_nat_ukrs()
function was defined. This function is meant to take a microkernel id,
microkernel datatype, microkernel address, and microkernel preference
as arguments, and is typically called within the bli_cntx_init_*()
function defined within a sub-configuration for initializing an
appropriate context. The problem is with the final argument: the
microkernel preference. These preferences are actually boolean values,
0 or 1 (encoded as FALSE or TRUE). Since the variadic function does
not give the compiler any type information for any variadic arguments,
they are "promoted" in the course of internal (macroized) processing
according to default argument promotion rules. Thus, integer literals
such as 0 and 1 become int and floating-point literals (such as 0.0 or
1.0) become double. Previous to this commit, we indicated to va_arg()
that the ukernel preference was a 'bool_t', which is a typedef of
int64_t on 64-bit systems. On systems where int is defined as 64 bits,
no problems manifest since int is the same size as the type we passed
in to va_arg(), but on systems where int is 32 bits, the ukernel
preference could be misinterpreted as a garbage value. (This was
observed on a modern armv8 system.) The fix was to interpret the
bool_t value as int and then immediately typecast it to and store it
as a bool_t. Special thanks to Devangi Parikh for helping track down
this issue, including deciphering the use of va_arg() and its
byzantine treatment of types.
- Added explicit typecasts for all invocations of va_arg() in
bli_cntx.c.
Details:
- Fixed a memory leak in the global kernel structure that resulted in 56
bytes per configured architecture (of which only 18 are presently
supported by BLIS). The leak would only manifest if BLIS was
initialized and then finalized before the application terminated.
Thanks to Devangi Parikh for helping track down this leak.
Details:
- Previously, bli_finalize_once()--which, like bli_init_once(), was
implemented in terms of pthread_once()--was using the same
pthread_once_t control object being used by bli_init(), thus
guaranteeing that it would never be called as long as BLIS had already
been initialized. This could manifest as a rather large memory leak to
any application that attempted to finalize BLIS midway through its
execution (since BLIS reserves several megabytes of storage for
packing buffers per thread used). The fix entailed giving each
function its own pthread_once_t object. Thanks to Devangi Parikh for
helping track down this very quiet bug.
Details:
- Changed the cleanmk target to delete makefile fragments from their new
home in obj/$(CONFIG_NAME). The old definition worked only because of
a typo (REFERKN_PATH instead of REFKERN_PATH), and only in the
non-verbose (V != 1) case.
Details:
- Disable sandbox-related obj directory creation, directory mirroring,
and makefile fragment generation when a sandbox is not enabled.
- Prevent various duplicate actions by configure (such as those
mentioned above for sandboxes above).
Details:
- The docs/ConfigurationHowTo.md document was written with examples that
did not yet contain the skx sub-configuration, but the previous commit
included bli_arch.c code copied and pasted from a recent commit that
does support skx. To keep things consistent, I've removed skx from the
recently-added ConfigurationHowTo.md code snippet.
Details:
- Added missing language directing the reader to modify the config_name
string array in bli_arch.c when adding a new sub-configuration. Thanks
to Devangi Parikh for reporting this missing section.
Details:
- Fixed some stale code that was preventing the -p option to configure
from working as expected (though the --prefix option was unaffected).
This bug was was most likely introduced in 7e5648c (May 7 2018).
Thanks to Dave Love for reporting this issue.
Details:
- Redefined the 'test' make target in the top-level Makefile so that the
final result ("everything passed" or at "least one failure") is echoed
to stdout. Note that 'check' is unchanged, and thus is now effectively
a fast version of 'test'.
- Updated docs/BuildSystem.md to reflect the above change.
Details:
- Fixed a linker error that occurred when attempting to compile and link
the testsuite and/or BLAS test drivers after having configured BLIS to
only generate a shared library (no static library). The chosen
solution involved
(1) adding the local library path, $(BASE_LIB_PATH), to the search
paths for the shared library via the link option
-Wl,-rpath,$(BASE_LIB_PATH).
(2) adding a local symlink to $(BASE_LIB_PATH) that uses the .so major
version number so that ld would find the shared library at
execution time.
Thanks to Sajid Ali for reporting this issue, to Devin Matthews for
pointing out the need for the -rpath option, and to Devangi Parikh for
helping Sajid isolate the problem.
- Added #include <ctype.h> to bli_system.h to avoid a compiler warning
resulting from using toupper() from bli_string.c without a prototype.
Thanks again to Sajid Ali, whose build log revealed this compiler
warning.
- Added '*.so.*' to .gitignore.
- CREDITS file update.
Details:
- Removed a stray/accidental redefinition of axpyv and scal2v function
types in frame/1d/bli_l1d_ft.h (probably a copy/paste leftover during
development).
Details:
- Updated older _ft kernel type suffixes used within penryn level-1v
and -1f kernels to use the newer _ker_ft suffix that was introduced
in 0175483. (Thank you Travis CI.)
Details:
- Previously, most object API functions (_oapi.c) used a function
chooser macro that would expand out to an if-elseif-elseif-else
conditional that used a num_t datatype to call the appropriate
type-specific API (_tapi.c). This always felt a little hackish, and
would get in the way somewhat of addig support for new num_t datatypes
in the future. So, I've replaced that functionality with code that
queries a function pointer that is then typecast appropriately. This
model of function calling was already pervasive for kernels queried
from the cntx_t structure. It was also already in use in various other
functions, such as macrokernels, and this commit simply extends that
pattern.
- The above change required many new files, mostly header files, that
define the function types (mostly _ft.h) for the queriable functions
as well as some source files to define the function pointer arrays and
their corresponding query functions (_fpa.c). Various other function
types, mostly for kernel function types, were renamed to reduce the
potential for confusion with the function types for expert and basic
(non-expert) typed API functions.
- Removed definitions for all of the "bli_call_ft_*()" function chooser
macros from bli_misc_macro_defs.h.
Details:
- Removed the dtime (delta time, or wallclock time) column from the
matlab output of all test drivers in test, test/3m4m, test/studies.
This value was rarely (if ever) really needed and usually only served
to take up screen space.
- Updated format specifier in test/studies/skx to use %7.2f instead of
%6.3f.
- For the test drivers in 'test' directory, added an initial line of
output that sets last entry of matlab matrix to zero in order to
induce a pre-allocation of the entire array of performance results.
Details:
- Changed the format specifier for the gflops column in the testsuite
output from %7.3f to %7.2f. This was done mainly to keep the output
aligned properly when the expected perfomance exceeded 1000 gflops.
Also, two decimal places still conveys plenty of precision for all
practical applications, including just eyeballing performance deltas
between two executions (let alone two implementations).
- Changed the format specifier for gflops in the test/3m4m drivers
from %6.3f to %7.2f (for the same reasons listed above).
Details:
- Fixed bug in static function bli_cntx_set_[packm/unpackm]_ker_dt(), which
were incorrectly calling bli_cntx_get_[packm/unpackm]_ker_dt to get the
corresponding func_t.
Details:
- Added links, and sandbox language to README.md.
- Adjusted some comments in high-level level-3 object functions to make
clear what bli_thread_init_rntm() does.
Details:
- Updated the typed and object APIs to include language on the rntm_t
parameters in the expert interfaces.
- Updated README to include link to object API.
Details:
- Modified a few sections to take advantage of a feature of markdown
that allows a bullet or enumeration to have multiple paragraphs. This
is a trial run to make sure the indentation looks good when rendered
in a web browser.
Details:
- Consolidated typed API function prototypes in bli_l1v_tapi.h by
leveraging identical function signatures between operations.
- Removed 'restrict' keyword since it is not actually present in the
function definitions.
Details:
- Filled in remaining section on object creation function reference
of BLISObjectAPI.md. All object management functions demonstrated as
part of the example code in examples/oapi are now documented, as well
as some other functions that are not shown in the example code.
- Updated variuos links (mostly in function index) to correctly point to
the object API reference instead of the typed API reference.
- Added documentation to getijm, setijm.
Details:
- Added explicit typecasting to various functions (mostly static
functions), primarily those in bli_param_macro_defs.h,
bli_obj_macro_defs.h, bli_cntx.h, bli_cntl.h, and a few other header
files.
- This change was prompted by feedback from Jacob Gorm Hansen, who
reported that #including "blis.h" from his application caused a
gcc to output error messages (relating to types being returned
mismatching the declared return types) when used via the C++ compiler
front-end. This is the first pass of fixes, and we may need to
iterate with additional follow-up commits (#233).