mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
Details:
- Retrofitted a new data structure, known as a context, into virtually
all internal APIs for computational operations in BLIS. The structure
is now present within the type-aware APIs, as well as many supporting
utility functions that require information stored in the context. User-
level object APIs were unaffected and continue to be "context-free,"
however, these APIs were duplicated/mirrored so that "context-aware"
APIs now also exist, differentiated with an "_ex" suffix (for "expert").
These new context-aware object APIs (along with the lower-level, type-
aware, BLAS-like APIs) contain the the address of a context as a last
parameter, after all other operands. Contexts, or specifically, cntx_t
object pointers, are passed all the way down the function stack into
the kernels and allow the code at any level to query information about
the runtime, such as kernel addresses and blocksizes, in a thread-
friendly manner--that is, one that allows thread-safety, even if the
original source of the information stored in the context changes at
run-time; see next bullet for more on this "original source" of info).
(Special thanks go to Lee Killough for suggesting the use of this kind
of data structure in discussions that transpired during the early
planning stages of BLIS, and also for suggesting such a perfectly
appropriate name.)
- Added a new API, in frame/base/bli_gks.c, to define a "global kernel
structure" (gks). This data structure and API will allow the caller to
initialize a context with the kernel addresses, blocksizes, and other
information associated with the currently active kernel configuration.
The currently active kernel configuration within the gks cannot be
changed (for now), and is initialized with the traditional cpp macros
that define kernel function names, blocksizes, and the like. However,
in the future, the gks API will be expanded to allow runtime management
of kernels and runtime parameters. The most obvious application of this
new infrastructure is the runtime detection of hardware (and the
implied selection of appropriate kernels). With contexts in place,
kernels may even be "hot swapped" at runtime within the gks. Once
execution enters a level-3 _front() function, the memory allocator will
be reinitialized on-the-fly, if necessary, to accommodate the new
kernels' blocksizes. If another application thread is executing with
another (previously loaded) kernel, it will finish in a deterministic
fashion because its kernel information was loaded into its context
before computation began, and also because the blocks it checked out
from the internal memory pools will be unaffected by the newer threads'
reinitialization of the allocator.
- Reorganized and streamlined the 'ind' directory, which contains much of
the code enabling use of induced methods for complex domain matrix
multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
those APIs' functionality is now mostly subsumed within the global
kernel structure.
- Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
that will reinitialize a memory pool if the necessary pool block size
has increased.
- Updated bli_mem.c to use bli_pool_reinit_if() instead of
bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
usage of contexts where appropriate to communicate cache and register
blocksizes to bli_mem_compute_pool_block_sizes().
- Simplified control trees now that much of the information resides in
the context and/or the global kernel structure:
- Removed blocksize object pointers (blksz_t*) fields from all control
tree node definitions and replaced them with blocksize id (bszid_t)
values instead, which may be passed into a context query routine in
order to extract the corresponding blocksize from the given context.
- Removed micro-kernel function pointers (func_t*) fields from all
control tree node definitions. Now, any code that needs these function
pointers can query them from the local context, as identified by a
level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
level-1v kernel id (l1vkr_t).
- Removed blksz_t object creation and initialization, as well as kernel
function object creation and initialization, from all operation-
specific control tree initialization files (bli_*_cntl.c), since this
information will now live in the gks and, secondarily, in the context.
- Removed blocksize multiples from blksz_t objects. Now, we track
blocksize multiples for each blocksize id (bszid_t) in the context
object.
- Removed the bool_t's that were required when a func_t was initialized.
These bools are meant to allow one to track the micro-kernel's storage
preferences (by rows or columns). This preference is now tracked
separately within the gks and contexts.
- Merged and reorganized many separate-but-related functions into single
files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
util directories, but has the most obvious effect of allowing BLIS
to compile noticeably faster.
- Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
in an attempt to reduce overhead for memory-bound operations. This
includes removal of default use of object-based variants for level-2
operations. Now, by default, level-2 operations will directly call a
low-level (non-object based) loop over a level-1v or -1f kernel.
- Converted many common query functions in blk_blksz.c (renamed from
bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
respective header files.
- Defined bli_mbool.c API to create and query "multi-bools", or
heterogeneous bool_t's (one for each floating-point datatype), in the
same spirit as blksz_t and func_t.
- Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
and BLIS_SIMD_SIZE. These values are needed in order to compute a third
new parameter, which may be set indirectly via the aforementioned
macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
statically allocate memory in macro-kernels and the induced methods'
virtual kernels to be used as temporary space to hold a single
micro-tile. These values are now output by the testsuite. The default
value of BLIS_STACK_BUF_MAX_SIZE is computed as
"2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
- Cleaned up top-level 'kernels' directory (for example, renaming the
embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
and "haswell," respectively, and gave more consistent and meaningful
names to many kernel files (as well as updating their interfaces to
conform to the new context-aware kernel APIs).
- Updated the testsuite to query blocksizes from a locally-initialized
context for test modules that need those values: axpyf, dotxf,
dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
- Reformatted many function signatures into a standard format that will
more easily facilitate future API-wide changes.
- Updated many "mxn" level-0 macros (ie: those used to inline double loops
for level-1m-like operations on small matrices) in frame/include/level0
to use more obscure local variable names in an effort to avoid variable
shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
which are only output using -Wshadow.)
- Added a conj argument to setm, so that its interface now mirrors that
of scalm. The semantic meaning of the conj argument is to optionally
allow implicit conjugation of the scalar prior to being populated into
the object.
- Deprecated all type-aware mixed domain and mixed precision APIs. Note
that this does not preclude supporting mixed types via the object APIs,
where it produces absolutely zero API code bloat.
344 lines
10 KiB
Plaintext
344 lines
10 KiB
Plaintext
# --------------------------------------------------------------------------
|
|
#
|
|
# input.operations
|
|
# BLIS test suite
|
|
#
|
|
# This file contains input values that control which BLIS operations are
|
|
# tested as well as how those test runs are parameterized. We will now
|
|
# describe how each section or line type may be edited.
|
|
#
|
|
# ENABLING/DISABLING ENTIRE SECTIONS
|
|
# The values in the "Section overrides" section allow you to disable
|
|
# all operations in a given "level". Enabling a level here by itself
|
|
# does not enable every operation in that level; it simply means that
|
|
# the individual switches for each operation (in that level) determine
|
|
# whether or not the tests are executed. Use 1 to enable a section, or
|
|
# 0 to disable.
|
|
#
|
|
# ENABLING/DISABLING INDIVIDUAL OPERATION TESTS
|
|
# Given that an operation's section override switch is set to 1
|
|
# (enabled, whether or not that operation will get tested is determined
|
|
# by its local switch. For example, if the level-1v section override is
|
|
# set to 1, and there is a 1 on the line marked "addv", then the addv
|
|
# operation will be tested. Similarly, a 0 would cause addv to not be
|
|
# tested. NOTE: You may ignore the lines marked "test sequential
|
|
# front-end." These lines are for future use, to distinguish tests of
|
|
# the sequential implementation from tests of the multithreaded
|
|
# implementation. For now, BLIS does not contain separate APIs for
|
|
# multithreaded execution, even though multithreading is supported.
|
|
# So, these should be left set to 1.
|
|
#
|
|
# CHANGING PROBLEM SIZE/SHAPES TESTED
|
|
# The problem sizes tested by an operation are determined by the
|
|
# dimension specifiers on the line marked "dimensions: <spec_labels>".
|
|
# If, for example, <spec_labels> contains two dimension labels (e.g.
|
|
# "m n"), then the line should begin with two dimension specifiers.
|
|
# Dimension specifiers of -1 cause the corresponding dimension to be
|
|
# bound to the problem size, which is determined by values set in
|
|
# input.general. Positive values cause the corresponding dimension to
|
|
# be fixed to that value and held constant.
|
|
#
|
|
# Examples of dimension specifiers (where the dimensions are m and n):
|
|
#
|
|
# -1 -1 Dimensions m and n grow with problem size (resulting in
|
|
# square matrices).
|
|
# -1 150 Dimension m grows with problem size and n is fixed at
|
|
# 150.
|
|
# -1 -2 Dimension m grows with problem size and n grows
|
|
# proportional to half the problem size.
|
|
#
|
|
# CHANGING PARAMTER COMBINATIONS TESTED
|
|
# The parameter combinations tested by an operation are determined by
|
|
# the parameter specifier characters on the line marked "parameters:
|
|
# <param_labels>". If, for example, <param_labels> contains two
|
|
# parameter labels (e.g. "transa conjx"), then the line should contain
|
|
# two parameter specifier characters. The '?' specifier character
|
|
# serves as a wildcard--it causes all possible values of that parameter
|
|
# to be tested. A character such as 'n' or 't' causes only that value
|
|
# to be tested.
|
|
#
|
|
# Examples of parameter specifiers (where the parameters are transa
|
|
# and conjx):
|
|
#
|
|
# ?? All combinations of the transa and conjx parameters are
|
|
# tested: nn, nc, tn, tc, cn, cc, hn, hc.
|
|
# ?n conjx is fixed to "no conjugate" but transa is allowed
|
|
# to vary: nn, tn, cn, hn.
|
|
# hc Only the case where transa is "Hermitian-transpose" and
|
|
# conjx is "conjugate" is tested.
|
|
#
|
|
# Here is a full list of the parameter types used by the various BLIS
|
|
# operations along with their possible character encodings:
|
|
#
|
|
# side: l,r left, right
|
|
# uplo: l,u lower, upper
|
|
# trans: n,t,c,h no transpose, transpose, conjugate, Hermitian-
|
|
# transpose (i.e. conjugate-transpose)
|
|
# conj: n,c no conjugate, conjugate
|
|
# diag: n,u non-unit diagonal, unit diagonal
|
|
#
|
|
|
|
# --- Section overrides ----------------------------------------------------
|
|
|
|
1 # Utility
|
|
1 # Level-1v kernels
|
|
1 # Level-1m
|
|
1 # Level-1f kernels
|
|
1 # Level-2
|
|
1 # Level-3 micro-kernels
|
|
1 # Level-3
|
|
|
|
|
|
# --- Utility --------------------------------------------------------------
|
|
|
|
1 # randv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
|
|
1 # randm
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m n
|
|
|
|
|
|
# --- Level-1v -------------------------------------------------------------
|
|
|
|
1 # addv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
? # parameters: conjx
|
|
|
|
1 # axpyv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
? # parameters: conjx
|
|
|
|
1 # copyv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
? # parameters: conjx
|
|
|
|
1 # dotv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
?? # parameters: conjx conjy
|
|
|
|
1 # dotxv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
?? # parameters: conjx conjy
|
|
|
|
1 # normfv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
|
|
1 # scalv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
? # parameters: conjbeta
|
|
|
|
1 # scal2v
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
? # parameters: conjx
|
|
|
|
1 # setv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
|
|
1 # subv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
? # parameters: conjx
|
|
|
|
|
|
# --- Level-1m -------------------------------------------------------------
|
|
|
|
1 # addm
|
|
1 # test sequential front-end
|
|
-1 -2 # dimensions: m n
|
|
? # parameters: transa
|
|
|
|
1 # axpym
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m n
|
|
? # parameters: transa
|
|
|
|
1 # copym
|
|
1 # test sequential front-end
|
|
-1 -2 # dimensions: m n
|
|
? # parameters: transa
|
|
|
|
1 # normfm
|
|
1 # test sequential front-end
|
|
-1 -2 # dimensions: m n
|
|
|
|
1 # scalm
|
|
1 # test sequential front-end
|
|
-1 -2 # dimensions: m n
|
|
? # parameters: conjbeta
|
|
|
|
1 # scal2m
|
|
1 # test sequential front-end
|
|
-1 -2 # dimensions: m n
|
|
? # parameters: transa
|
|
|
|
1 # setm
|
|
1 # test sequential front-end
|
|
-1 -2 # dimensions: m n
|
|
|
|
1 # subm
|
|
1 # test sequential front-end
|
|
-1 -2 # dimensions: m n
|
|
? # parameters: transa
|
|
|
|
|
|
# --- Level-1f kernels -----------------------------------------------------
|
|
|
|
1 # axpy2v
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
?? # parameters: conjx conjy
|
|
|
|
1 # dotaxpyv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
??? # parameters: conjxt conjx conjy
|
|
|
|
1 # axpyf
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
?? # parameters: conja conjx
|
|
|
|
1 # dotxf
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
?? # parameters: conjat conjx
|
|
|
|
1 # dotxaxpyf
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
???? # parameters: conjat conja conjw conjx
|
|
|
|
|
|
# --- Level-2 --------------------------------------------------------------
|
|
|
|
1 # gemv
|
|
1 # test sequential front-end
|
|
-1 -2 # dimensions: m n
|
|
?? # parameters: transa conjx
|
|
|
|
1 # ger
|
|
1 # test sequential front-end
|
|
-1 -2 # dimensions: m n
|
|
?? # parameters: conjx conjy
|
|
|
|
1 # hemv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
??? # parameters: uploa conja conjx
|
|
|
|
1 # her
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
?? # parameters: uploc conjx
|
|
|
|
1 # her2
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
??? # parameters: uploc conjx conjy
|
|
|
|
1 # symv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
??? # parameters: uploa conja conjx
|
|
|
|
1 # syr
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
?? # parameters: uploc conjx
|
|
|
|
1 # syr2
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
??? # parameters: uploc conjx conjy
|
|
|
|
1 # trmv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
??? # parameters: uploa transa diaga
|
|
|
|
1 # trsv
|
|
1 # test sequential front-end
|
|
-1 # dimensions: m
|
|
??? # parameters: uploa transa diaga
|
|
|
|
|
|
# --- Level-3 micro-kernels ------------------------------------------------
|
|
|
|
1 # gemm
|
|
1 # test sequential micro-kernel
|
|
-1 # dimensions: k
|
|
|
|
1 # trsm
|
|
1 # test sequential micro-kernel
|
|
? # parameters: uploa
|
|
|
|
1 # gemmtrsm
|
|
1 # test sequential micro-kernel
|
|
-1 # dimensions: k
|
|
? # parameters: uploa
|
|
|
|
|
|
# --- Level-3 --------------------------------------------------------------
|
|
|
|
1 # gemm
|
|
1 # test sequential front-end
|
|
-1 -1 -1 # dimensions: m n k
|
|
?? # parameters: transa transb
|
|
|
|
1 # hemm
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m n
|
|
???? # parameters: side uploa conja transb
|
|
|
|
1 # herk
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m k
|
|
?? # parameters: uploc transa
|
|
|
|
1 # her2k
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m k
|
|
??? # parameters: uploc transa transb
|
|
|
|
1 # symm
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m n
|
|
???? # parameters: side uploa conja transb
|
|
|
|
1 # syrk
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m k
|
|
?? # parameters: uploc transa
|
|
|
|
1 # syr2k
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m k
|
|
??? # parameters: uploc transa transb
|
|
|
|
1 # trmm
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m n
|
|
???? # parameters: side uploa transa diaga
|
|
|
|
1 # trmm3
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m n
|
|
????n # parameters: side uploa transa diaga transb
|
|
|
|
1 # trsm
|
|
1 # test sequential front-end
|
|
-1 -1 # dimensions: m n
|
|
???? # parameters: side uploa transa diaga
|
|
|