Options to configure have been added for:
- Setting the internal BLIS and BLAS/CBLAS integer sizes.
- Enabling and disabling the BLAS and CBLAS layers.
Additionally, configure options which require defining macros (the above plus the threading model), write their macros to the automatically-generated bli_config.h file in the top-level build directory. The old bli_config.h files in the config dirs were removed, and any kernel-related macros (SIMD size and alignment etc.) were moved to bli_kernel.h. The Makefiles were also modified to find the new bli_config.h file.
Lastly, support for OMP in clang has been added (closes#56).
1) Add -- options.
2) Add -d/--enable-debug option to enable debugging symbols with and without optimization.
3) Allow user to specify CC at configure time, and determine vendor (gcc/icc/etc.). For now configurations enforce a particular vendor.
4) Add make V=[0,1] option to control build verbosity.
Details:
- Replaced the old memory allocator, which was based on statically-
allocated arrays, with one based on a new internal pool_t type, which,
combined with a new bli_pool_*() API, provides a new abstract data
type that implements the same memory pool functionality but with blocks
from the heap (ie: malloc() or equivalent). Hiding the details of the
pool in a separate API also allows for a much simpler bli_mem.c family
of functions.
- Added a new internal header, bli_config_macro_defs.h, which enables
sane defaults for the values previously found in bli_config. Those
values can be overridden by #defining them in bli_config.h the same
way kernel defaults can be overridden in bli_kernel.h. This file most
resembles what was previously a typical configuration's bli_config.h.
- Added a new configuration macro, BLIS_POOL_ADDR_ALIGN_SIZE, which
defaults to BLIS_PAGE_SIZE, to specify the alignment of individual
blocks in the memory pool. Also added a corresponding query routine to
the bli_info API.
- Deprecated (once again) the micro-panel alignment feature. Upon further
reflection, it seems that the goal of more predictable L1 cache
replacement behavior is outweighed by the harm caused by non-contiguous
micro-panels when k % kc != 0. I honestly don't think anyone will even
miss this feature.
- Changed bli_ukr_get_funcs() and bli_ukr_get_ref_funcs() to call
bli_cntl_init() instead of bli_init().
- Removed query functions from bli_info.c that are no longer applicable
given the dynamic memory allocator.
- Removed unnecessary definitions from configurations' bli_config.h files,
which are now pleasantly sparse.
- Fixed incorrect flop counts in addv, subv, scal2v, scal2m testsuite
modules. Thanks to Devangi Parikh for pointing out these
miscalculations.
- Comment, whitespace changes.
Details:
- Reverted some changes that were unintentionally included in the
previous commit (9526ce98). Thanks to Tony Kelman for pointing
this out. (Note: a few select changes were not reverted.)
Details:
- Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
is defined first and then the other two are defined in terms of
CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
little easier to read.
Details:
- Added a new section in bli_config.h files of all configurations for
enabling CBLAS support. (Currently, the default is for the CBLAS layer
to be disabled.)
- Added a directory, frame/compat/cblas, to house CBLAS source code. A
subdirectory 'f77_sub' holds subroutine wrappers corresponding to
subroutines found in CBLAS that allow calling some BLAS routines with
the return value passed as the last argument rather than as an actual
(function) return value. This was probably intended to allow CBLAS to
avoid the whole f2c debacle altogether. However, since BLIS does not
assume the presence of a Fortran compiler, we had to provide similar
routines in C.
- A script, integrate-cblas-tarball.sh, is included to streamline the
integration of future revisions of the CBLAS source code.
- The current tarball, cblas.tgz, that was used with the above script to
generate the present set of CBLAS source code is also included.
- Updated blis.h to include necessary CBLAS-related headers.
Details:
- Updated copyright headers to include "at Austin" in the name of the
University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
2012).
Details:
- Modified top-level Makefile to support building shared (dynamic)
libraries.
- Updated most configurations' make_defs.mk files to include necessary
compiler/linker flags needed by top-level Makefile.
- Note that by default, all configurations presently do NOT build
shared libraries. To enable, one must change the value of
BLIS_ENABLE_DYNAMIC_BUILD to 'yes'.
Details:
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
configurations. It was already going unused in packm_init() since the
recent 4m/3m commit. This setting was rarely, if ever, useful, and its
existence only posed a potential risk for 4m/3m-based implementations.
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
- Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
micro-kernels.
Details:
- Standard names for reference kernels (levels-1v, -1f and 3) are now
macro constants. Examples:
BLIS_SAXPYV_KERNEL_REF
BLIS_DDOTXF_KERNEL_REF
BLIS_ZGEMM_UKERNEL_REF
- Developers no longer have to name all datatype instances of a kernel
with a common base name; [sdcz] datatype flavors of each kernel or
micro-kernel (level-1v, -1f, or 3) may now be named independently.
This means you can now, if you wish, encode the datatype-specific
register blocksizes in the name of the micro-kernel functions.
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
undefined in bli_kernel.h will default to the corresponding reference
implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
it will be defined to be BLIS_DGEMM_UKERNEL_REF.
- Developers no longer need to name level-1v/-1f kernels with multiple
datatype chars to match the number of types the kernel WOULD take in
a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
sufficient, as in bli_daxpyv_opt().
- There is no longer a need to define an obj_t wrapper to go along with
your level-1v/-1f kernels. The framework now prvides a _kernel()
function which serves as the obj_t wrapper for whatever kernels are
specified (or defaulted to) via bli_kernel.h
- Developers no longer need to prototype their kernels, and thus no
longer need to include any prototyping headers from within
bli_kernel.h. The framework now generates kernel prototypes, with the
proper type signature, based on the kernel names defined (or defaulted
to) via bli_kernel.h.
- If the complex datatype x (of [cz]) implementation of the gemm micro-
kernel is left undefined by bli_kernel.h, but its same-precision real
domain equivalent IS defined, BLIS will use a 4m-based implementation
for the datatype x implementations of all level-3 operations, using
only the real gemm micro-kernel.
Details:
- Updated (and fixed some errors in) the "Assumptions/assertions" comment
section of macro-kernels.
- Changed register blocksizes of reference configuration to MR = 8 and
NR = 4. It's always good for MR != NR in the reference configuration
since it may help uncover bugs related to non-square micro-kernels.
Details:
- Removed macro constant definitions related to incremental blocksizes
from all configurations' bli_kernel.h files. This change is minor and
is mostly a cleanup related to a previous commit.
Details:
- Modified build system (mostly configure and top-level Makefile) so that
a user can build a BLIS library outside of the top-level directory of
the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
which will compile, link, and run the testsuite binary. This works even
if the build directory is externally located, thanks to the test suite
binary's new -g and -o command-line options. Also, when creating the
test suite via the top-level Makefile, the linking is against the
local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.
Details:
- Changed top-level Makefile so that headers are installed to
$(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
and all framework code.
- Updated test suite modules according to above changes.
Details:
- Fixed bugs in scalm and setm that resulted in segmentation faults when
beta is not the same type as the matrix operand. Thanks to Vladimir
Sukharev for reporting this bug.
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
and setm; namely, the alpha scalar is copy-cast the type of the first
matrix operand.
- Changed the template and reference configurations' bli_config.h files
so that the number of memory allocator blocks of A and B are set based
on BLIS_MAX_NUM_THREADS.
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
Details:
- Added test modules in test suite for level-1f kernels and level-3
micro-kernels. (Duplication in the micro-kernels, for now, is NOT
supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
facilitate the level-1f test modules. Also added front-end for dupl
operation.
- Added obj_t-based check routines for level-1f operations, which are
called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
factors as a function of datatype, which is needed by their respective
test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.
Details:
- Added a 'template' configuration, which contains stub implementations of the
level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
dotxaxpyf, as well as the default fusing factor (which are all equal
in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
reference variants were implemented in terms of front-end routines rather
that directly in terms of the kernels. (For example, axpy2v was implemented
as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
A is assumed to be m x b_n in both cases, and for dotxf A is actually used
as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
frame/3/gemm/ukernels and frame/3/trsm/ukernels.
Details:
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
functionality includes computing the pool size for each datatype (using
that datatype's cache blocksizes) and using the maximum to size the
actual pool array. This addresses the somewhat common pitfall whereby a
developer updates cache blocksizes in bli_kernel.h for only one datatype
(say, single-precision real), while the memory pools are sized using the
double-precision real values. Then, when the developer attempts to link
to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
a message saying the static memory pool was exhausted. Clearly, this
message is misleading when the pool was not sized properly to begin with.
- Removed previously disabled code in bli_kernel_macro_defs.h that was
meant to check for size consistency among the various cache blocksizes.
(Obviously the memory pool size-based solution mentioned above is better.)
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
reasonable place to put these constants, rather than further crowd up
bli_config.h.
- Updated testsuite driver to output memory pool sizes for A, B, and C.
- Minor comment updates to bli_config.h.
- Removed 'flame' configuration. It was beginning to get out-of-date, and
I hadn't used it in months. We can always re-create it later.
Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
assigned values of 32, 64, or some other value. The former two result in
defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
with format specifiers not matching the types of their arguments to printf()
and scanf().
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
inclusion in d141f9eeb6).
- Added explicit typecasting of dim_t and inc_t to macros in
bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
Windows-style CRLF newlines for certain files.
Details:
- In light of recent segfaulting issues when compiling on 32-bit systems,
I've changed the default typedef for gint_t and guint_t from int64_t and
uint64_t to int32_t and uint32_t, respectively.
- Disabled 64-bit integers in the blas2blis layer for the reference
configuration.
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
to introductory output of the testsuite.
Details:
- Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
BLIS_CACHE_LINE_SIZE (typically 64).
- Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
kernels.
- Reformulated gemm_ker_var2 to look more like the other level-3 macro-
kernels, in that the interior and edge-case handling is expressed once
inside the loops in the n and m dimensions, rather than the edge-case
handling being "unrolled" and expressed as distinct code regions. The
previous macro-kernel now lives in retired form in the subdirectory
other/bli_gemm_ker_var2.c.old.
- Updated experimental gemm_ker_var5 according to above change.
- Fixed bug in bli_her2k.c whereby incorrect transformations were being
applied to optimize the macro-kernel accesses pattern on C when C is
row-stored.
- Various updates inside of test/exec_sizes.
Details:
- Added support for C99 complex types to bli_type_defs.h and overloaded
complex arithmetic to the scalar-level macros in include/level0. This
includes a somewhat substantial reorganization and re-layering of much
of the existing machinery present in the level0 macros.
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
commented-out by default, which optionally enables the use of built-in
C99 complex types and arithmetic.
- Minor changes to clarksville and reference configs' make_defs.mk files.
- Removed macro definitions from bli_param_macro_defs.h which was not being
used (bli_proj_dt_to_real_if_imag_eq0).
Details:
- Changed the way bli_type_defs.h defines integer types so that dim_t,
inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
integer) or guint_t (general unsigned integer).
- Renamed Fortran types fchar and fint to f77_char and f77_int.
- Define f77_int as int64_t if a new configuration variable,
BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
These types are defined in stdint.h, which is now included in blis.h.
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
in terms of scomplex.
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
of char.
- Updated bla_amax() wrappers so that the return type is defined directly
as f77_int, rather than letting the prototype-generating macro decide
the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
so I removed them. Also, changed the body of the wrapper so that a
gint_t is passed into abmaxv, which is THEN typecast to an f77_int
before returning the value.
- Updated f2c code that accessed .r and .i fields of complex and
doublecomplex types so that they use .real and .imag instead (now that
we are using scomplex and dcomplex).
Details:
- Implemented algorithmic optimizations for trmm and trsm whereby the right
side case is now handled explicitly, rather than induced indirectly by
transposing and swapping strides on operands. This allows us to walk through
the output matrix with favorable access patterns no matter how it is stored,
for all parameter combinations.
- Renamed trmm and trsm blocked variants so that there is no longer a
lower/upper distinction. Instead, we simply label the variants by which
dimension is partitioned and whether the variant marches forwards or
backwards through the corresponding partitioned operands.
- Added support for row-stored packing of lower and upper triangular matrices
(as provided by bli_packm_blk_var3.c).
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
blocksize extensions (if non-zero) were not being used to appropriately size
the first iteration (ie: the bottom/right edge case).
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
whole multiples of MR AND NR. This is needed for the case of trsm_r where,
in order to reuse existing left-side gemmtrsm fused micro-kernels, the
packing of A (left-hand operand) and B (right-hand operand) is done with
NR and MR, respectively (instead of MR and NR).
Details:
- Added a new "special" directory type: any source code within directories
named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
compiler flags. This allows the developer to specify a separate set of
flags (e.g. optimization flags) for compiling kernels while maintaining a
standard set for regular framework code.
- Fixed a bug in the top-level Makefile that was causing "noopt" code
to be compiled with the standard set of compilation flags.
- Updated make_defs.mk in reference, flame, and clarksville configurations
according to above changes.
Details:
- Revamped some parts of commit b6ef84fad1 by adding blocksize extension
fields to the blksz_t object rather than have them as separate structs.
- Updated all packm interfaces/invocations according to above change.
- Generalized bli_determine_blocksize_?() so that edge case optimization
happens if and only if cache blocksizes are created with non-zero
extensions.
- Updated comments in bli_kernel.h files to indicate that the edge case
blocksize extension mechanism is now available for use.
Details:
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
and PASTEMAC3) with those that only use a single type (PASTEMAC).
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
accommodate above change.
- Fixed comment typo in bli_config.h files.
- Added .nfs* pattern to .gitignore.
Details:
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
explicitly so navigate b1 so that situations where PACKNR > NR are
supported.
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
frame/3/trsm/ukernels to kernels/c99/.
- Updated clarksville and flame configurations.
Details:
- Made substantial changes throughout the framework to decouple the leading
dimension (row or column stride) used within each packed micro-panel from
the corresponding register blocksize. It appears advantageous on some
systems to use, for example, packed micro-panels of A where the column
stride is greater than MR (whereas previously it was always equal to MR).
- Changes include:
- Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
to use when packing micro-panels of A and B.
- Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
where appropriate, instead of MR and NR.
- Added pd field (panel dimension) to obj_t.
- New interface to bli_packm_cntl_obj_create().
- Renamed bli_obj_packed_length()/_width() macros to
bli_obj_padded_length()/_width().
- Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
- Print out new cache and register blocksize extensions in test suite.
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
blocksize for edge cases, which can improve performance at the margins.
Details:
- Added new memory alignment constants:
BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM)
BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE)
BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced)
and renamed existing ones
BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
to better convey what the alignment factor is used for (and what it is
not used for).
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
into macro-kernels to specify stack alignment of temporary buffers.
- Modified test suite driver to output new constants.
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
use bli_align_dim_to_size(), which takes a third argument (the desired
alignment).
Details:
- Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
the compiler command line arguments in make_defs.mk (for both configs).
Thanks to Devin Matthews for suggesting this change.
Details:
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
only manifests when BLIS is configured such that MR != NR. The bug involves
incorrectly detecting edge cases, which resulted in some parts of matrix C
potentially being skipped and not updated, depending on the problem size.
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
8 and 4, respectively, so that I can better stress the framework on a
day-to-day basis. (The fact that they were both equal to 4 for so long is
why I did not stumble upon this bug much sooner.)
Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
contain explicit loops over MR and NR, thus allowing them to be used
unmodified by developers who want to build a reference library with
custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
single-char macros are also defined.
Details:
- Implemented amax operation in BLIS.
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
- Added integer support to [f]printv, [f]printm.
- Added integer support to level-0 copys macros.
- Updated printing of configuration information in test suite driver.
- Comment changes to _config.h files.
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
used for.
Details:
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
passed into posix_memalign() or equivalent.
- Defined new function, bli_align_dim_to_cmem(), which applies the
contiguous memory alignment (rather than the system/malloc alignment).
Details:
- Removed guard around #define for memory alignment size constant.
Memory alignment should always be enabled, and so this value should
always be defined.
Details:
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
subsequent packed panel of A and B begins at an aligned address. (The
first panel is presumably aligned to system alignment because it is
aligned to a page boundary, which is typically much larger.)
- Rearranged code in packm_init_pack() to prevent additional conditional
blocks as a result of the aforementioned change.
- Adjusted contiguous memory allocator so that the system memory alignment
is used to allocate enough space for each block no matter what kind of
register blocking is used (even if register blocksize is unit and every
row/column needs maximal padding).
- Adjusted default blocksizes in reference configuration so that MC*KC
and KC*NC result in identical footprints for all datatypes.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
Details:
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
(e.g. "prefetch" instructions, which are different than the particular
kind of prefetching/preloading referred to by this constant).
Details:
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
allocator instantiates and initializes three separate memory pool objects,
each one associated with a separate array of contiguous memory blocks, each
block of fixed and uniform size. (The three pools are for allocating mc-by-kc
blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
objects use a stack structure internally to track which blocks in the region
have been "checked out" to a thread and which are still available. Critical
regions are now clearly marked and adaptable to parallel environments (e.g.
OpenMP). Memory pools are set up when bl2_init() is called.
- Added a new field to the packm control tree node, which indicates what kind
of packed buffer is being allocated. The enumerated type for this argument
is defined as packbuf_t in bl2_type_defs.h.
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
packbuf_t argument to bl2_packm_cntl_obj_create().
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
bl2_mem_macro_defs.h.
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
number of blocks of A reserved for the memory allocator.
- Deprecated bl2_align_dim(). Replaced usage with that of
bl2_align_dim_to_mult(). Turns out that typically we don't need to align
a dimension to the system alignment, since that value has to do with
starting addresses, whereas the values we are dealing with are unitless
dimensions.
Details:
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
configuration directory (bl2_config.h, specifically) given that it can be
expected to be tweaked by some developers.