* commit '81e10346':
Alloc at least 1 elem in pool_t block_ptrs. (#560)
Fix insufficient pool-growing logic in bli_pool.c. (#559)
Arm SVE C/ZGEMM Fix FMOV 0 Mistake
SH Kernel Unused Eigher
Arm SVE C/ZGEMM Support *beta==0
Arm SVE Config armsve Use ZGEMM/CGEMM
Arm SVE: Update Perf. Graph
Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0
Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0
A64FX Config Use ZGEMM/CGEMM
Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg
Arm SVE Add SGEMM 2Vx10 Unindexed
Arm SVE ZGEMM Support Gather Load / Scatt. St.
Arm SVE Add ZGEMM 2Vx10 Unindexed
Arm SVE Add ZGEMM 2Vx7 Unindexed
Arm SVE Add ZGEMM 2Vx8 Unindexed
Update Travis CI badge
Armv8 Trash New Bulk Kernels
Enable testing 1m in `make check`.
Config ArmSVE Unregister 12xk. Move 12xk to Old
Revert __has_include(). Distinguish w/ BLIS_FAMILY_**
Register firestorm into arm64 Metaconfig
Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo
Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo
Add test for Apple M1 (firestorm)
Firestorm CPUID Dispatcher
Armv8 GEMMSUP Edge Cases Require Signed Ints
Make error checking level a thread-local variable.
Fix data race in testsuite.
Update .appveyor.yml
Firestorm Block Size Fixes
Armv8 Handle *beta == 0 for GEMMSUP ??r Case.
Move unused ARM SVE kernels to "old" directory.
Add an option to control whether or not to use @rpath.
Fix $ORIGIN usage on linux.
Arm micro-architecture dispatch (#344)
Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries.
Armv8 Handle *beta == 0 for GEMMSUP ?rc Case.
Armv8 Fix 6x8 Row-Maj Ukr
Apply patch from @xrq-phys.
Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.
bli_error: more cleanup on the error strings array
Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9
Arm SVE: Correct PACKM Ker Name: Intrinsic Kers
Fix config_name in bli_arch.c
Arm Whole GEMMSUP Call Route is Asm/Int Optimized
Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref
Header Typo
Arm: DGEMMSUP ??r(rv) Invoke Edge Size
Arm: DGEMMSUP ?rc(rd) Invoke Edge Size
Arm: Implement GEMMSUP Fallback Method
Arm64 Fix: Support Alpha/Beta in GEMMSUP Intrin
Added Apple Firestorm (A14/M1) Subconfig
Arm64 8x4 Kernel Use Less Regs
Armv8-A Supplimentary GEMMSUP Sizes for RD
Armv8-A Fix GEMMSUP-RD Kernels on GNU Asm
Armv8-A Adjust Types for PACKM Kernels
Armv8-A GEMMSUP-RD 6x8m
Armv8-A GEMMSUP-RD 6x8n
Armv8-A s/d Packing Kernels Fix Typo
Armv8-A Introduced s/d Packing Kernels
Armv8-A DGEMMSUP 6x8m Kernel
Armv8-A DGEMMSUP Adjustments
Armv8-A Add More DGEMMSUP
Armv8-A Add GEMMSUP 4x8n Kernel
Armv8-A Add Part of GEMMSUP 8x4m Kernel
Armv8A DGEMM 4x4 Kernel WIP. Slow
Armv8-A Add 8x4 Kernel WIP
AMD-Internal: [CPUPL-2698]
Change-Id: I194ff69356740bb36ca189fd1bf9fef02eec3803
Correction to commit 2450a1813b
to add -DBLIS_CONFIG_FAMILY=zen5 support in cmake.
AMD-Internal: [CPUPL-3518]
Change-Id: Iecff2b64d5d95960cecbbf98d5269133747b122e
- Updating gemm/cgemm_ukernel.cpp to cast integers so that gtestsuite works for ILP64.
- Updating BLIS cmake presets to be conditional on Windows and Linux.
- Updating GTestSuite cmake system to use environment variable to set BLIS_PATH and reference library.
- Add more cmake presets options in gtestsuite.
- Added BUILD_STATIC_LIBS option which is on by default, only on Linux.
- Added TEST_WITH_SHARED option which is off by default, only on Linux.
- If only shared or static lib is being built, that's the one that will be used for testing.
- If both are being built, TEST_WITH_SHARED determins which library wil be used for testing.
- Set linux workflows so that they build both static and shared libs, and use linux-static and linux-shared to denote which one should be used for testing.
- Set -fPIC for both static and shared builds to fix issues faced when building blis using AOCC 4.0.0 and gtestsuite using gcc 9.4.0.
AMD-Internal: [CPUPL-2748]
Change-Id: I4227bab97ff31ecddfe218e18499f33b4e4ee63e
Details:
- Added new folder named JIT/ under addon/aocl_gemm/. This folder
will contain all the JIT related code.
- Modified lpgemm_cntx_init code to generate main and fringe kernels
for 6x64 bf16 microkernel and store function pointers to all the
generated kernels in a global function pointer array. This happens
only when gcc version is < 11.2
- When gcc version < 11.2, microkernel uses JIT-generated kernels.
otherwise, microkernel uses the intrinsics based implementation.
AMD-Internal: [SWLCSG-2622]
Change-Id: I16256c797b2546a8cd2049680001947346260461
CMakelists.txt is updated to support ASAN to find
memory related errors in blis library. ASAN is enabled
by configuring cmake with the following option .
$ cmake .. -DENABLE_ASAN=ON
ASAN supports only on linux with clang compiler.
And redzone size default size is 16 bytes and maximum
redzone size is 2048 bytes.
$ ASAN_OPTIONS=redzone=2048 <exe>
AMD-Internal: [CPUPL-2748]
Change-Id: I0b70af5c41cf5c68602150daeb67d7432bbe5cb8
While build blis library using ninja generator on windows, observed
ninja is randomly adding "|| '(set', 'FAIL_LINE=3&', 'goto', ':ABORT)'"
as extra arguments for add_custom_command. Due to this flatten-headers
python script was failing to create blis.h and cblas.h headers.
Modified the python script to fix above issue.
AMD-Internal: [CPUPL-2748]
Change-Id: I83b753d08e46f94b282176fcc661ce34e5eee3cf
CMakelists.txt is Updated to generate code coverage
report in html format just by configuring cmake with
-DENABLE_COVERAGE=ON. Code supports only on linux
with gcc compiler
cmake .. -DENABLE_COVERAGE=ON
AMD-Internal: [CPUPL-2748]
Change-Id: I9b36b6cc3f1f97b53e1c4ee62948a017418e3d41
CMakeLists.txt is updated in blis/testsuite to make it work for
static single thread version of BLIS.
AMD-Internal: [CPUPL-2748]
Change-Id: I004e19d4ddbf9cb94d6d23699893a2f684a3fb35
Some text files were missing a newline at the end of the file.
One has been added.
AMD-Internal: [CPUPL-3519]
Change-Id: I4b00876b1230b036723d6b56755c6ca844a7ffce
User control over code path using AOCL_ENABLE_INSTRUCTIONS
or BLIS_ARCH_TYPE only makes sense for fat binary builds.
Thus this functionality is now disabled by default for
single architecture builds. User can still override the default
selections by using configure options --enable-blis-arch-type
or --disable-blis-arch-type.
Other changes:
- include x86_64 family as using zen codepaths in cmake build system.
- Update help and error messages to include AOCL_ENABLE_INSTRUCTIONS.
AMD-Internal: [CPUPL-4202]
Change-Id: I7aa5fcf89df8675bcc12d81f81781de647e0fcf8
* commit '5013a6cb':
More edits and fixes to docs/FAQ.md.
Fixed newly broken link to CREDITS in FAQ.md.
More minor fixes to FAQ.md and Sandboxes.md.
Updates to FAQ.md, Sandboxes.md, and README.md.
Safelist 'master', 'dev', 'amd' branches.
Re-enable and fix fb93d24.
Reverted fb93d24.
Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
Removed last vestige of #define BLIS_NUM_ARCHS.
Added new packm var3 to 'gemmlike'.
Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
Fix more copy-paste errors in the haswell gemmsup code.
Do a fast test on OSX. [ci skip]
Fix AArch64 tests and consolidate some other tests.
Use C++ cross-compiler for ARM tests.
Attempt to fix cxx-test for OOT builds.
Updated travis-ci.org link in README.md to .com.
Disabled (at least temporarily) commit 8e0c425.
Define BLIS_OS_NONE when using --disable-system.
Updated stale calls to malloc_intl() in gemmlike.
Blacklist clang10/gcc9 and older for 'armsve'.
Add test to Travis using C++ compiler to make sure blis.h is C++-compatible.
Moved lang defs from _macro_def.h to _lang_defs.h.
Minor tweaks to gemmlike sandbox.
Added local _check() code to gemmlike sandbox.
README.md citation updates (e.g. BLIS7 bibtex).
Tweaks to gemmlike to facilitate 3rd party mods.
Whitespace tweaks.
Add row- and column-strides for A/B in obj_ukr_fn_t.
Clean up some warnings that show up on clang/OSX.
Remove schema field on obj_t (redundant) and add new API functions.
Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
Disabled sanity check in bli_pool_finalize().
Implement proposed new function pointer fields for obj_t.
AMD-Internal: [CPUPL-2698]
Change-Id: I6fc33351fa824580cf4f25b63f0370383cd9422d
* commit 'e366665c':
Fixed stale API calls to membrk API in gemmlike.
Fixed bli_init.c compile-time error on OSX clang.
Fixed configure breakage on OSX clang.
Fixed one-time use property of bli_init() (#525).
CREDITS file update.
Added Graviton2 Neoverse N1 performance results.
Remove unnecesary windows/zen2 directory.
Add vzeroupper to Haswell microkernels. (#524)
Fix Win64 AVX512 bug.
Add comment about make checkblas on Windows
CREDITS file update.
Test installation in Travis CI
Add symlink to blis.pc.in for out-of-tree builds
Revert "Always run `make check`."
Always run `make check`.
Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script. if the string contains zen and zen2, and zen need to be replaced with another string, then zen2 also be incorrectly replaced.
Update POWER10.md
Rework POWER10 sandbox
Skip clearing temp microtile in gemmlike sandbox.
Fix asm warning
Sandbox header edits trigger full library rebuild.
Add vhsubpd/vhsubpd.
Fixed bugs in cpackm kernels, gemmlike code.
Armv8A Rename Regs for Safe Darwin Compile
Armv8A Rename Regs for Clang Compile: FP32 Part
Armv8A Rename Regs for Clang Compile: FP64 Part
Asm Flag Mingling for Darwin_Aarch64
Added a new 'gemmlike' sandbox.
Updated Fugaku (a64fx) performance results.
Add explicit compiler check for Windows.
Remove `rm-dupls` function in common.mk.
Travis CI Revert Unnecessary Extras from 91d3636
Adjust TravisCI
Travis Support Arm SVE
Added 512b SVE-based a64fx subconfig + SVE kernels.
Replace bli_dlamch with something less archaic (#498)
Allow clang for ThunderX2 config
AMD-Internal: [CPUPL-2698]
Change-Id: I561ca3959b7049a00cc128dee3617be51ae11bc4
* commit 'b683d01b':
Use extra #undef when including ba/ex API headers.
Minor preprocessor/header cleanup.
Fixed typo in cpp guard in bli_util_ft.h.
Defined eqsc, eqv, eqm to test object equality.
Defined setijv, getijv to set/get vector elements.
Minor API breakage in bli_pack API.
Add err_t* "return" parameter to malloc functions.
Always stay initialized after BLAS compat calls.
Renamed membrk files/vars/functions to pba.
Switch allocator mutexes to static initialization.
AMD-Internal: [CPUPL-2698]
Change-Id: Ied2ca8619f144d4b8a7123ac45a1be0dda3875df
Some text files were missing a newline at the end of the file.
One has been added.
Also correct file format of windows/tests/inputs.yaml, which
was missed in commit 0f0277e104
AMD-Internal: [CPUPL-2870]
Change-Id: Icb83a4a27033dc0ff325cb84a1cf399e953ec549
Source and other files in some directories were a mixture of
Unix and DOS file formats. Convert all relevant files to Unix
format for consistency. Some Windows-specific files remain in
DOS format.
AMD-Internal: [CPUPL-2870]
Change-Id: Ic9a0fddb2dba6dc8bcf0ad9b3cc93774a46caeeb
Incorporate a means of detecting submodels of a microarchitecture,
so that different optimizations e.g. block sizes or kernel choices
can be used. The details are as follows:
- Different models are currently only enabled for zen3 and zen4
architectures (for server parts).
- There is a single enumeration (model_t) for all models for all
architectures, but function bli_check_valid_model_id() should
check the provided model_id against the suitable range within
the enumeration for the provided arch_id.
- To enable the model_id to be used within the cntx setup functions,
checking of a user specified value of BLIS_ARCH_TYPE against
the enabled configurations is delayed to a separate function,
bli_arch_check_id().
- Default selection based on hardware can be overridden using the
BLIS_MODEL_TYPE environment variable. Valid values are:
Genoa, Bergamo, Genoa-X, Milan, Milan-X
Values are case-insensitive and -X can also be specified as _X or X
- Specifying an incorrect value for BLIS_MODEL_TYPE is not an error,
but will result in the default option for that architecture being
selected. This is different to specifying an incorrect value of
BLIS_ARCH_TYPE, which is an error.
- The environment variable BLIS_MODEL_TYPE can be renamed using
the --rename-blis-model-type argument to configure (or cmake
equivalent), in a similar way to renaming BLIS_ARCH_TYPE with
--rename-blis-arch-type.
- Configure option --disable-blis-arch-type will disable both
BLIS_ARCH_TYPE and BLIS_MODEL_TYPE environment variables.
- Added code in bli_cpuid.c to detect L1, L2 and L3 cache sizes,
currently only for AMD cpus. Functions are provided to query
these from other parts of the code, namely:
uint32_t bli_cpuid_query_{l1d,l1i,l2,l3}_cache_size()
AMD-Internal: [CPUPL-3033]
Change-Id: I37a3741abfd59a95e0e905d926c6ede9a0143702
- Moved *_blis_impl function declaration outside the BLIS_ENABLE_BLAS
guard.
- Changed Makefile to continue to compile bla_ files to get
*_blis_impl interfaces.
- Modify CBLAS headers, bli_macro_defs.h and bli_util_api_wrap.{c,h}
to add BLIS_ENABLE_CBLAS guards.
- Comment out BLIS_ENABLE_BLAS guards in various headers and utility
functions.
- Define BLIS Fortran-style functions lsame_blis_impl and
xerbla_blis_impl. New macros PASTE_LSAME and PASTE_XERBLA are
used in bla_*_check headers and some other places to select
whether to call lsame and xerbla, or the _blis_impl versions.
- Defined various other missing _blis_impl functions.
- In bli_util_api_wrap.c, only define any functions if
BLIS_ENABLE_BLAS is defined, and only define the subroutine
versions of functions like dot, nrm2, etc if BLIS_ENABLE_CBLAS
is defined.
- BLAS layer is needed if CBLAS layer is enabled. Changed header
files build/bli_config.h.in and bli_blas.h, and configure
program to help ensure consistency in generated blis.h header
and configure output.
Undefining BLIS_ENABLE_BLAS_DEFS appears to be broken in UTA BLIS
too, thus BLIS_ENABLE_BLAS_DEFS is currently permanently defined.
AMD-Internal: [CPUPL-3015]
Change-Id: I7c0fe07db85781db46f2c690e174451860b37635
Details:
1. Reference kernel File names are to be mirrored with unique names
for each architecture configuration while generating fat binary.
2. Python script "blis_ref_kernel_mirror.py" is updated to append
the filenames with each configuration (zen, zen2, zen3, zen4
and generic) that is built for windows.
AMD-Internal: [CPUPL-3009]
Change-Id: Ib02206382199cf2aebe14ff9c869b6089228e1c2
Make BLIS_ARCH_TYPE=0 be an error, so that incorrect meaningful names
will get an error rather than "skx" code path. BLIS_ARCH_TYPE=1 is
now "generic", so that it should be constant as new code paths are
added. Thus all other code path enum values have increased by 2.
Also added new options to BLIS configure program to allow:
1. BLIS_ARCH_TYPE functionality to be disabled, e.g.:
./configure --disable-blis-arch-type amdzen
2. Renaming the environment variable tested from "BLIS_ARCH_TYPE" to a
specified value, e.g.:
./configure --rename-blis-arch-type=MY_NAME_FOR_ARCH_TYPE amdzen
On Windows, these can be enabled with e.g.:
cmake ... -DDISABLE_BLIS_ARCH_TYPE=ON
or
cmake ... -DRENAME_BLIS_ARCH_TYPE=MY_NAME_FOR_ARCH_TYPE
This implements changes 2 and 3 in the Jira ticket below.
AMD-Internal: [CPUPL-2235]
Change-Id: Ie42906bd909f9d83f00a90c5bef9c5bf3ef5adb4
Details:
- Implemented a new feature called addons, which are similar to
sandboxes except that there is no requirement to define gemm or any
other particular operation.
- Updated configure to accept --enable-addon=<name> or -a <name> syntax
for requesting an addon be included within a BLIS build. configure now
outputs the list of enabled addons into config.mk. It also outputs the
corresponding #include directives for the addons' headers to a new
companion to the bli_config.h header file named bli_addon.h. Because
addons may wish to make use of existing BLIS types within their own
definitions, the addons' headers must be included sometime after that
of bli_config.h (which currently is #included before bli_type_defs.h).
This is why the #include directives needed to go into a new top-level
header file rather than the existing bli_config.h file.
- Added a markdown document, docs/Addons.md, to explain addons, how to
build with them, and what assumptions their authors should keep in
mind as they create them.
- Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
functions, including the user-level object and typed APIs.
- Updated .gitignore so that git ignores bli_addon.h files.
Change-Id: Ie7efdea366481ce25075cb2459bdbcfd52309717
- Removed BLIS_CONFIG_EPYC macro
- The code dependent on this macro is handled in
one of the three ways
-- It is updated to work across platforms.
-- Added in architecture/feature specific runtime checks.
-- Duplicated in AMD specific files. Build system is updated to
pick AMD specific files when library is built for any of the
zen architecture
AMD-Internal: [CPUPL-1960]
Change-Id: I6f9f8018e41fa48eb43ae4245c9c2c361857f43b
1. Removed the libomp.lib hardcoded from cmake scripts and made it user configurable. By default libomp.lib is used as an omp library.
2. Added the STATIC_LIBRARY_OPTIONS property in set_target_properties cmake command to link omp library to build static-mt blis library.
3. Updated the blis_ref_kernel_mirror.py to give support for zen4 architecture.
AMD-Internal: CPUPL-1630
Change-Id: I54b04cde2fa6a1ddc4b4303f1da808c1efe0484a
- Added configuration option for zen4 architecture
- Added auto-detection of zen4 architecture
- Added zen4 configuration for all checks related
to AMD specific optimizations
AMD-Internal: [CPUPL-1937]
Change-Id: I1a1a45de04653f725aa53c30dffb6c0f7cc6e39a
Binary name will be chosen based on multi-threading and BLAS
integer size configuration as given below.
libblis-[mt]-lp64 - when configured to use 32 bit integers
libblis-[mt]-ilp64 - when configured to use 64 bit integers
AMD-Internal: [CPUPL-1879]
Change-Id: I865023c63235a0a72bdfce7057b2cfb8158b1d87
Adds `--enable-rpath/--disable--rpath` (default disabled) to use an install_name starting with @rpath/. Otherwise, set the install_name to the absolute path of the install library, which was the previous behavior.
Details:
- Re-enabled the changes made in fb93d24.
- Defined BLIS_ENABLE_SYSTEM in bli_arch.c, bli_cpuid.c, and bli_env.c,
all of which needed the definition (in addition to config_detect.c) in
order for the configure-time hardware detection binary to be compiled
properly. Thanks to Minh Quan Ho for helping identify these additional
files as needing to be updated.
- Added additional comments to all four source files, most notably to
prompt the reader to remember to update all of the files when updating
any of the files. Also made the cpp code in each of the files as
consistent/similar as possible.
- Refer to issues #532 and PR #546 for more history.
Details:
- Re-enable the changes originally made in 8e0c425 but quickly reverted
in 2be78fc.
- Moved the #include of bli_config.h so that it occurs before the
#include of bli_system.h. This allows the #define BLIS_ENABLE_SYSTEM
or #define BLIS_DISABLE_SYSTEM in bli_config.h to be processed by the
time it is needed in bli_system.h. This change should have been
in the original 8e0c425, but was accidentally omitted. Thanks to Minh
Quan Ho for catching this.
- Add #define BLIS_ENABLE_SYSTEM to config_detect.c so that the proper
cpp conditional branch executes in bli_system.h when compiling the
hardware detection binary. The changes made in 8e0c425 were an attempt
to support the definition of BLIS_OS_NONE when configuring with
--disable-system (in issue #532). That commit failed because, aside
from the required but omitted header reordering (second bullet above),
AppVeyor was unable to compile the hardware detection binary as a
result of missing Windows headers. This commit, which builds on PR
#546, should help fix that issue. Thanks to Minh Quan Ho for his
assistance and patience on this matter.
- Add a testsuite for gathering performance (in GFLOPs) and measuring correctness for the POWER10 GEMM reduced precision/integer kernels.
- Reworked GENERIC_GEMM template to hardcode the cache parameters.
- Remove kernel wrapper that checked that only allowed matrices that weren't transposed or conjugated. However, the kernels still assume the matrices are not transposed. This wrapper was removed for performance reasons.
- Renamed and restructured files and functions for clarity.
- Editted the POWER10 document to reflect new changes.
-- Ignore aocl dynamic configuration if multithreading is disabled.
AOCL Dynamic will also be disabled in this case.
-- Added following configuration settings in showconfig output
1. Complex return scheme
2. TRSM preinversion status
3. AOCL dynamic active status
AOCL-Internal: [CPUPL-1565]
Change-Id: Id5a31b233fc08dcd871de4a693aab0b2a5d9f1c4
- AOCL Dynamic feature is added in BLIS which determines optimal
number of threads for the current problem size.
- This feature can be enabled/disabled by modifying the source
code
- This change adds support to enable/disable this feature during
configuration time by adding a new option in configure script
AOCL-Internal : [CPUPL-1565]
Change-Id: I590693f793cabc44d27a7f815adc41631dd01bbe
Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch
Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond)
Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations.
Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations.
Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu)
Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu)
Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs.
Minor code consolidation in all level-3 _front() functions.
Reorganized Windows cpp branch of bli_pthreads.c.
Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS.
Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion.
Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv.
AMD-internal-[CPUPL-1523]
Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd
Details:
- Renamed the files, variables, and functions relating to the packing
block allocator from its legacy name (membrk) to its current name
(pba). This more clearly contrasts the packing block allocator with
the small block allocator (sba).
- Fixed a typo in bli_pack_set_pack_b(), defined in bli_pack.c, that
caused the function to erroneously change the value of the pack_a
field of the global rntm_t instead of the pack_b field. (Apparently
nobody has used this API yet.)
- Comment updates.
1. CMake script changes for adding new files to the build.
2. Added Upper case support for couple of API's.
3. bool is not support in clang so defined it.
AMD Internal : [CPUPL-1422]
Change-Id: I4cac8fb8ef86cd6bacfd29e3b1a84c5da1310f61
1. CMake script changes for build with Clang compiler.
2. CMake script changes for build test and testsuite based on the lib type ST/MT
3. CMake script changes for testcpp and blastest
4. Added python scripts to support library build and testsuite build.
AMD Internal : [CPUPL-1422]
Change-Id: Ie34c3e60e9f8fbf7ea69b47fd1b50ee90099c898
Merged the changes done in UT Austin BLIS repo for DOTC Additional
argument.
Other modifications related to test application included.
Verifed the above code changes through scalapack test applications 'xztrd' , 'xctrd'
Change-Id: I7e16f3953db71890f9e8fbb0f7b363eaad899f62
Signed-off-by: Nagendra <Nagendra.PrasadM@amd.com>
AMD-Internal: [CPUPL-1323]
Details:
- Implemented a configure-time option, --disable-trsm-preinversion, that
optionally disables the pre-inversion of diagonal elements of the
triangular matrix in the trsm operation and instead uses division
instructions within the gemmtrsm microkernels. Pre-inversion is
enabled by default. When it is disabled, performance may suffer
slightly, but numerical robustness should improve for certain
pathological cases involving denormal (subnormal) numbers that would
otherwise result in overflow in the pre-inverted value. Thanks to
Bhaskar Nallani for reporting this issue via #461.
- Added preprocessor macro guards to bli_trsm_cntl.c as well as the
gemmtrsm microkernels for 'haswell' and 'penryn' kernel sets pursuant
to the aforementioned feature.
- Added macros to frame/include/bli_x86_asm_macros.h related to division
instructions.
Details:
- Added a configure option, --[enable|disable]-system, which determines
whether the modest operating system dependencies in BLIS are included.
The most notable example of this on Linux and BSD/OSX is the use of
POSIX threads to ensure thread safety for when application-level
threads call BLIS. When --disable-system is given, the bli_pthreads
implementation is dummied out entirely, allowing the calling code
within BLIS to remain unchanged. Why would anyone want to build BLIS
like this? The motivating example was submitted via #454 in which a
user wanted to build BLIS for a simulator such as gem5 where thread
safety may not be a concern (and where the operating system is largely
absent anyway). Thanks to Stepan Nassyr for suggesting this feature.
- Another, more minor side effect of the --disable-system option is that
the implementation of bli_clock() unconditionally returns 0.0 instead
of the time elapsed since some fixed point in the past. The reasoning
for this is that if the operating system is truly minimal, the system
function call upon which bli_clock() would normally be implemented
(e.g. clock_gettime()) may not be available.
- Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h
to remove redundancies.
- Removed old comments and commented #include of "bli_pthread_wrap.h"
from bli_system.h.
- Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md
and BLISTypedAPI.md, with a note that both are non-functional when
BLIS is configured with --disable-system.
Merged contributions from AMD's AOCL BLIS (#448).
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
only the lower or upper triangle of a square matrix C. For now, only
the conventional/large code path will be supported (in vanilla BLIS).
This was accomplished by leveraging the existing variant logic for
herk. However, some of the infrastructure to support a gemmtsup is
included in this commit, including
- A bli_gemmtsup() front-end, similar to bli_gemmsup().
- A bli_gemmtsup_ref() reference handler function.
- A bli_gemmtsup_int() variant chooser function (with variant calls
commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
wrapper to a set of polymorphic CBLAS-like function wrappers defined
in another header (cblas.hh). These two headers are installed if
running the 'install' target with INSTALL_HH is set to 'yes'. (Also
added a set of unit tests that exercise blis.hh, although they are
disabled for now because they aren't compatible with out-of-tree
builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
various minor updates to dotv and scalv kernels. Also added various
sup kernels contributed by AMD to kernels/zen/3. However, these
kernels are (for now) not yet used, in part because they caused
AppVeyor clang failures, and also because I have not found time to
review and vet them.
- Output the python found during configure into the definition of PYTHON
in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
bug surfaced because the gemmt module verifies its computation using
gemm with its beta parameter set to zero, which, on a cortexa15 system
caused the gemm kernel code to unconditionally multiply the
uninitialized C data by beta. The C matrix likely contained
non-numeric values such as NaN, which then would have resulted in a
false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
in bli_l3_blocksize.c, was inadvertantly being defined in terms of
helper functions meant for trmm. This bug was probably harmless since
the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
kernels/zen/3/bli_gemm_small.c since those macros are not used in
vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
Windows systems.
- Various whitespace changes.
Corrections in bli_gemm_front.c, taken the corrections from both public repo and 2.2.1 branch
[CPUPL-1067]
Change-Id: I4887ece6aa20bdfb87d97e7acebbe04cb9feea02
ifort apparently does not return complex numbers in registers as in C/C++ (or gfortran), but instead creates a "hidden" first parameter for the return value. The option --complex-return=gnu|intel has been added, as well as a guess based on a provided FC if not specified (otherwise default to gnu). This option affects the signatures of cdotc, cdotu, zdotc, and zdotu, and a single library cannot be used with both GNU and Intel Fortran compilers. Fixes#433.