Details:
- Textually replaced nearly all non-comment instances of bool_t with the
C99 bool type. A few remaining instances, such as those in the files
bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
used not for boolean purposes but to index into an array.
- This commit constitutes the third phase of a transition toward using
C99's bool instead of bool_t, which was raised in issue #420. The first
phase, which cleaned up various typecasts in preparation for using
bool as the basis for bool_t (instead of gint_t), was implemented by
commit a69a4d7. The second phase, which redefined the bool_t typedef
in terms of bool (from gint_t), was implemented by commit 2c554c2.
* Fix parsing in vpu_count on workstation SKX
* Document Skylake-X as Haswell for single FMA
* Update vpu_count for Skylake and Cascade Lake models
* Support printing the configuration selected, controlled by the environment
Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.
* Move bli_log outside the cpp condition, and use it where intended
* Add Fixme comment (Skylake D)
* Mostly superficial edits to commits towards #351.
Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
relates to single-VPU Skylake-Xs.
* Fix comment typos
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
Details:
- Added export annotations to additional function prototypes in order to
accommodate the testsuite.
- Disabled calling bli_amaxv_check() from within the testsuite's
test_amaxv.c.
Details:
- After merging PR #303, at Isuru's request, I removed the use of
BLIS_EXPORT_BLIS from all function prototypes *except* those that we
potentially wish to be exported in shared/dynamic libraries. In other
words, I removed the use of BLIS_EXPORT_BLIS from all prototypes of
functions that can be considered private or for internal use only.
This is likely the last big modification along the path towards
implementing the functionality spelled out in issue #248. Thanks
again to Isuru Fernando for his initial efforts of sprinkling the
export macros throughout BLIS, which made removing them where
necessary relatively painless. Also, I'd like to thank Tony Kelman,
Nathaniel Smith, Ian Henriksen, Marat Dukhan, and Matthew Brett for
participating in the initial discussion in issue #37 that was later
summarized and restated in issue #248.
- CREDITS file update.
Details:
- Removed explicit reference to The University of Texas at Austin in the
third clause of the license comment blocks of all relevant files and
replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
with format of all other comment blocks.
Details:
- Removed four trailing spaces after "BLIS" that occurs in most files'
commented-out license headers.
- Added UT copyright lines to some files. (These files previously had
only AMD copyright lines but were contributed to by both UT and AMD.)
- In some files' copyright lines, expanded 'The University of Texas' to
'The University of Texas at Austin'.
- Fixed various typos/misspellings in some license headers.
Details:
- Added logic to bli_arch.c that will call what was previously the body
of bli_arch_query_id() only once and then cache the value in a static
variable local to the file. (Previously, the arch_t associated with
the hardware/configuration was queried every time bli_arch_query_id()
was called, which was at least once per level-3 function call. Thanks
to Devin Matthews for suggesting this feature via issue #175.
- Added -lpthread to the compile/link command line of the compiler
invocation that compiles build/detect/config/config_detect.c, which
prints the string identifying the detected configuration, since it
is now needed due to new pthread_once() logic in bli_arch.c.
- Implementation note: I chose to implement this arch_t caching feature
via pthread_once(), using a separate pthread_once_t variable local to
the file, rather than calling bli_init_once(). The reason is that I
did not want to require bli_init() as a prerequisite to this function.
bli_init() already calls several sub-components, some of which make use
of bli_arch_query_id(), and therefore it would be easy to fall into a
circular self-init situation (which usually causes pthreads to hang
indefinitely).
Details:
- Reimplemented the hardware detection functionality invoked when running
"./configure auto". Previously, a standalone script in build/auto-detect
that used CPUID was used. However, the script attempted to enumerate all
models for each microarchitecture supported. The new approach recycles
the same code used for runtime hardware detection introduced in 2c51356.
This has two immediate benefits. First, it reduces and consolidates the
code required to detect microarchitectures via the CPUID instruction.
Second, it provides an indirect way of testing at configure-time the
code that is used to detect hardware at runtime. This code is (a) only
activated when targeting a configuration family (such as intel64 or
amd64) at configure-time and (b) somewhat difficult to test in
practice, since it relies on having access to older microarchitectures.
- The above change required placing conditional cpp macro blocks in
bli_arch.c and bli_cpuid.c which either #include "blis.h" or #include
a bare-bones set of headers that does not rely on the presence of a
bli_config.h header. This is needed because bli_config.h has not been
created yet when configure-time auto-detection takes places.
- Defined a new function in bli_arch.c, bli_arch_string(), which takes
an arch_t id and returns a pointer to a string that contains the
lowercase name of the corresponding microarchitecture. This function
is used by the auto-detection script to printf() the name of the
sub-configuration corresponding to the detected hardware.
Details:
- Added a "generic" configuration that leaves the default blocksizes and
kernels unchanged. This replaces the older "reference" configuration.
Updated auto-detect script and code accordingly.
- Added support for generic configuration to arch_t (bli_type_defs.h),
bli_gks_init() (bli_gks.c), and bli_arch_config.h
- Moved bli_arch_query_id() to bli_arch.c (and prototype to bli_arch.h).
- Whitespace changes to configurations' make_defs.mk files.