Details:
- Removed support for all induced methods except for 1m. This included
removing code related to 3mh, 3m1, 4mh, 4m1a, and 4m1b as well as any
code that existed only to support those implementations. These
implementations were rarely used and posed code maintenance challenges
for BLIS's maintainers going forward.
- Removed reference kernels for packm that pack 3m and 4m micropanels,
and removed 3m/4m-related code from bli_cntx_ref.c.
- Removed support for 3m/4m from the code in frame/ind, then reorganized
and streamlined the remaining code in that directory. The *ind(),
*nat(), and *1m() APIs were all removed. (These additional API layers
no longer made as much sense with only one induced method (1m) being
supported.) The bli_ind.c file (and header) were moved to frame/base
and bli_l3_ind.c (and header) and bli_l3_ind_tapi.h were moved to
frame/3.
- Removed 3m/4m support from the code in frame/1m/packm.
- Removed 3m/4m support from trmm/trsm macrokernels and simplified some
pointer arithmetic that was previously expressed in terms of the
bli_ptr_inc_by_frac() static inline function (whose definition was
also removed).
- Removed the following subdirectories of level-0 macro headers from
frame/include/level0: ri3, rih, ri, ro, rpi. The level-0 scalar macros
defined in these directories were used exclusively for 3m and 4m
method codes.
- Simplified bli_cntx_set_blkszs() and bli_cntx_set_ind_blkszs() in
light of 1m being the only induced method left within BLIS.
- Removed dt_on_output field within auxinfo_t and its associated
accessor functions.
- Re-indexed the 1e/1r pack schemas after removing those associated with
variants of the 3m and 4m methods. This leaves two bits unused within
the pack format portion of the schema bitfield. (See bli_type_defs.h
for more info.)
- Spun off the basic and expert interfaces to the object and typed APIs
into separate files: bli_l3_oapi.c and bli_l3_oapi_ex.c; bli_l3_tapi.c
and bli_l3_tapi_ex.c.
- Moved the level-3 operation-specific _check function calls from the
operations' _front() functions to the corresponding _ex() function of
the object API. (This change roughly maintains where the _check()
functions are called in the call stack but lays the groundwork for
future changes that may come to the level-3 object APIs.) Minor
modifications to bli_l3_check.c to allow the check() functions to be
called from the expert interface APIs.
- Removed support within the testsuite for testing the aforementioned
induced methods, and updated the standalone test drivers in the 'test'
directory so reflect the retirement of those induced methods.
- Modified the sandbox contract so that the user is obliged to define
bli_gemm_ex() instead of bli_gemmnat(). (This change was made in light
of the *nat() functions no longer existing.) Also updated the existing
'power10' and 'gemmlike' sandboxes to come into compliance with the
new sandbox rules.
- Updated BLISObjectAPI.md, BLISTypedAPI.md, Testsuite.md documentation
to reflect the retirement of 3m/4m, and also modified Sandboxes.md to
bring the document into alignment with new conventions.
- Updated various comments; removed segments of commented-out code.
Details:
- Updated FAQ.md to include two new questions, reordered an existing
question, and also removed an outdated and redundant question about
BLIS vs. AMD BLIS.
- Updated Sandboxes.md to use 'gemmlike' as its main example, along with
other smaller details.
- Added ARM as a funder to README.md.
Details:
- Added language to remind the reader to disable sup if the intended
behavior is for the sandbox implementation to handle all problem
sizes, even the smaller ones that would normally be handled by the
sup code path.
Details:
- The 'ref99' sandbox was broken by multiple refactorings and internal
API changes over the last two years. Rather than try to fix it, I've
replaced it with a much simpler version based on var2 of gemmsup.
Why not fix the previous implementation? It occurred to me that the
old implementation was trying to be a lightly simplified duplication
of what exists in the framework. Duplication aside, this sandbox
would have worked fine if it had been completely independent of the
framework code. The problem was that it was only partially
independent, with many function calls calling a function in BLIS
rather than a duplicated/simplified version within the sandbox. (And
the reason I didn't make it fully independent to begin with was that
it seemed unnecessarily duplicative at the time.) Maintaining two
versions of the same implementation is problematic for obvious
reasons, especially when it wasn't even done properly to begin with.
This explains the reimplementation in this commit. The only catch is
that the newer implementation is single-threaded only and does not
perform any packing on either input matrix (A or B). Basically, it's
only meant to be a simple placeholder that shows how you could plug
in your own implementation. Thanks to Francisco Igual for reporting
this brokenness.
- Updated the three reference gemmsup kernels (defined in
ref_kernels/3/bli_gemmsup_ref.c) so that they properly handle
conjugation of conja and/or conjb. The general storage kernel, which
is currently identical to the column-storage kernel, is used in the
new ref99 sandbox to provide basic support for all datatypes
(including scomplex and dcomplex).
- Minor updates to docs/Sandboxes.md, including adding the threading
and packing limitations to the Caveats section.
- Fixed a comment typo in bli_l3_sup_var1n2m.c (upon which the new
sandbox implementation is based).