amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-04-19 23:28:52 +00:00

Author	SHA1	Message	Date
Kiran Varaganti	a7da8ee174	AOCL 5.2.2 GA Release	2026-03-12 16:08:56 +00:00
Balasubramanian, Vignesh	73911d5990	Updates to the build systems(CMake and Make) for LPGEMM compilation (#303 ) - The current build systems have the following behaviour with regards to building "aocl_gemm" addon codebase(LPGEMM) when giving "amdzen" as the target architecture(fat-binary) - Make: Attempts to compile LPGEMM kernels using the same compiler flags that the makefile fragments set for BLIS kernels, based on the compiler version. - CMake: With presets, it always enables the addon compilation unless explicitly specified with the ENABLE_ADDON variable. - This poses a bug with older compilers, owing to them not supporting BF16 or INT8 intrinsic compilation. - This patch adds the functionality to check for GCC and Clang compiler versions, and disables LPGEMM compilation if GCC < 11.2 or Clang < 12.0. - Make: Updated the configure script to check for the compiler version if the addon is specified. CMake: Updated the main CMakeLists.txt to check for the compiler version if the addon is specified, and to also force-update the associated cache variable update. Also updated kernels/CMakeLists.txt to check if "aocl_gemm" remains in the ENABLE_ADDONS list after all the checks in the previous layers. AMD-Internal: [CPUPL-7850] Signed-off by : Vignesh Balasubramanian <Vignesh.Balasubramanian@amd.com>	2026-01-16 19:39:55 +05:30
Kiran Varaganti	9734fc18cc	AOCL-5.2 GA Release	2025-12-07 05:22:11 +00:00
Smyth, Edward	a2905f7240	Merge commit 'db3134ed6d239a7962a2b7470d8c46611b9d17ef' into AOCL-5.2-RC * commit 'db3134ed6d239a7962a2b7470d8c46611b9d17ef': Disabled no post-ops path in lpgemm f32 kernels for few gcc versions DTL Log update Add external PR integration process and flowchart to CONTRIBUTING.md Enabled disable-sba-pools feature in AOCL-BLAS (#101) Fix for F32 to BF16 Conversion and AVX512 ISA Support Checks Fixed Integer Overflow Issue in TPSV Add AI Code Review workflow (#211) Add AI Code Review Self-enablement file (#209) Re-tuned GEMV thresholds (#210) Adding bli_print_msg before bli_abort() for bli_thrinfo_sup_create_for_cntl Add missing license text Modified AVPY kernel to ensure consistency of numerical results (#188) Fix memory leak in DGEMV kernel (#187) Tuned DGEMV no-transpose thresholds #193 Set Security flags default enable (#194) Standardize Zen kernel names (2) Compiler warnings fixes (2) coverity issue fix for ztrsm (#176) Fixes Coverity static analysis issue in the DTRSM (#181) Add files via upload (#197) Initialize mem_t structures safely and handle NULL communicator in threading Tidying code Compiler warnings fixes Fixing the coverity issues with CID: 23269 and CID: 137049 (#180) Fixed high priority coverity issues in LPGEMM. (#178) GCC 15 SUP kernel workaround (2) Disable small_gemm for zen4/5 and added single thread check for tiny path (#167) Optimal rerouting of GEMV inputs to avoid packing Updated Guards in s8s8s32of32_sym_quant Framework Fixed out-of-bound access in F32 matrix add/mul ops (#168)	2025-09-22 13:20:13 +01:00
Smyth, Edward	823ec2cd40	Add missing license text Some files have copyright statements but not details of the license. Add this to DTL source code and some build and benchmark related scripts. AMD-Internal: [CPUPL-6579]	2025-09-18 12:04:59 +01:00
Sharma, Shubham	aa95a8ce4a	Added Compiler flags to improve Security (#136 ) Following Flags have been added. 1. D_FORTIFY_SOURCE=2 What it does • At compile time the header files replace certain libc calls (strcpy, sprintf, …) with inline wrappers that perform a compile-time length check whenever the size of the destination buffer is known. • At run time an extra check is executed only if the compiler could not prove the copy is safe. Cost • Only functions that call those specific libc routines pay anything. 2. fstack-protector-strong What it does • Functions that contain local arrays, address‐taken locals, or alloca get a canary word inserted into the stack frame. • The function prologue writes the canary; the epilogue verifies it before the ret. Cost • 8 bytes of additional stack per protected function frame. • Two or three extra instructions per entry/exit. 4. Wl,-z,relro What it does • Marks the relocation tables read-only after relocation is finished. • No effect once the library is fully loaded. Cost • None at run time. 5. Wl,-z,now What it does • Forces the dynamic loader to resolve all external symbols in the library up-front instead of lazily on first call. Cost • Startup: one extra relocation pass. • Steady-state execution: zero or slightly faster, because PLT stubs are bypassed. Usage: cmake -DENABLE_SECURITY_FLAGS=off cmake -DENABLE_SECURITY_FLAGS=on configure --enable-security-flags configure --disable-security-flags AMD-Internal: [CPUPL-6886]	2025-08-18 16:11:02 +05:30
Vlachopoulou, Eleni	1f8a7d2218	Renaming CMAKE_SOURCE_DIR to PROJECT_SOURCE_DIR so that BLIS can be built properly via FetchContent() (#65 )	2025-08-07 15:51:59 +01:00
Smyth, Edward	563b161933	Standardize Python files to use Python 3 Python 2 is no longer maintained, and using python3 avoids accidental invocation of outdated interpreters. AMD-Internal: [CPUPL-6579]	2025-08-06 12:04:26 +01:00
Smyth, Edward	e3dcc15a80	Add configure and CMake options to enable DTL logging and tracing (#86 ) Instead of editing a header file, add options to build systems to allow DTL tracing and/or logging output to be generated. For most users logging is recommended, producing a line of output per application thread of every BLAS call made. Tracing provides more detailed info of internal BLIS calls, and is aimed more at expert users and BLIS developers. Different tracing levels from 1 to 10 provide control of the granularity of information produced. The default level is 5. Note that tracing, especially at higher tracing levels, will impose a significant runtime cost overhead. Example usage: Using configure: ./configure ... --enable-aocl-dtl=log amdzen ./configure ... --enable-aocl-dtl=trace --aocl-dtl-trace-level=6 amdzen ./configure ... --enable-aocl-dtl=all amdzen Using CMake: cmake ... -DENABLE_AOCL_DTL=LOG cmake ... -DENABLE_AOCL_DTL=TRACE -DAOCL_DTL_TRACE_LEVEL=6 cmake ... -DENABLE_AOCL_DTL=ALL Also, modify function AOCL_get_requested_threads_count to correct reported thread count in cases where internal value is recorded as -1 AMD-Internal: [CPUPL-7010]	2025-07-28 15:24:10 +01:00
Smyth, Edward	969ceb7413	Finer control of code path options (#67 ) Add macros to allow specific code options to be enabled or disabled, controlled by options to configure and cmake. This expands on the existing GEMM and/or TRSM functionality to enable/disable SUP handling and replaces the hard coded #define in include files to enable small matrix paths. All options are enabled by default for all BLIS sub-configs but many of them are currently only implemented in AMD specific framework code variants. AMD-Internal: [CPUPL-6906] --------- Co-authored-by: Varaganti, Kiran <Kiran.Varaganti@amd.com>	2025-07-08 10:59:23 +01:00
Smyth, Edward	14e46ad83b	Improvements to x86 make_defs files (#29 ) Various changes to simplify and improve x86 related make_defs files: - Make better use of common definitions in config/zen/amd_config.mk from config/zen/make_defs.mk files - Similarly for config/zen/amd_config.make from the config/zen/make_defs.cmake files - Pass cc_major, cc_minor and cc_revision definitions from configure to generated config.mk file, and use these instead of defining GCC_VERSION in config/zen*/make_defs.mk files - Add znver3 support for LLVM 13 in config/zen3/make_defs.{mk,cmake} - Add znver5 support for LLVM 19 in config/zen5/make_defs.{mk,cmake} - Improve readability of haswell, intel64, skx and x86_64 files - Correct and tidy some comments AMD-Internal: [CPUPL-6579]	2025-06-03 16:20:43 +01:00
Vlachopoulou, Eleni	4555917040	CMake: Removing INT_SIZE variable from presets (#11 ) With this change, the default INT_SIZE of the system will be used. That is compatible with the Make system and default CMake options.	2025-05-21 09:35:32 +01:00
Kiran Varaganti	c850221ed1	AOCL-BLAS 5.1-GA Release	2025-05-08 12:29:21 +05:30
Eleni Vlachopoulou	9f263d2445	CMake: Fix pkgconfig file names Rename generated aocl-blas.pc and aocl-blas-mt.pc to blis.pc and blis-mt.pc. AMD-Internal: [SWLCSG-3446] Change-Id: Ica784c7a0fd1e52b4d419795659947316e932ef6	2025-03-11 11:59:02 +00:00
AngryLoki	ea93d2e2c9	Fix SyntaxWarning messages from python 3.12 (#809 ) Details: - When using regexes in Python, certain characters need backslash escaping, e.g.: ```python regex = re.compile( '^[\s]#include (["<])([\w\.\-/])([">])' ) ``` However, technically escape sequences like `\s` are not valid and should actually be double-escaped: `\\s`. Python 3.12 now warns about such escape sequences, and in a later version these warning will be promoted to errors. See also: https://docs.python.org/dev/whatsnew/3.12.html#other-language-changes. The fix here is to use Python's "raw strings" to avoid double-escaping. This issue can be checked for all files in the current directory with the command `python -m compileall -d . -f -q .` - Thanks to @AngryLoki for the fix. AMD-Internal: [CPUPL-5895] Change-Id: I7ab564beef1d1b81e62d985c5cb30ab6b9a937f2 (cherry picked from commit `729c57c15a`)	2025-02-05 07:13:42 -05:00
jagar	8d0bf148ee	Added separate PC for mt blis library Added separate package configuration file for st and mt library in blis Makefile and CMakeLists.txt Change-Id: I8d851fac10d63983358e1f4c67fd9451246056bf	2025-02-05 05:10:11 -05:00
Edward Smyth	82bdf7c8c7	Code cleanup: Copyright notices - Standardize formatting (spacing etc). - Add full copyright to cmake files (excluding .json) - Correct copyright and disclaimer text for frame and zen, skx and a couple of other kernels to cover all contributors, as is commonly used in other files. - Fixed some typos and missing lines in copyright statements. AMD-Internal: [CPUPL-4415] Change-Id: Ib248bb6033c4d0b408773cf0e2a2cda6c2a74371	2024-08-05 15:35:08 -04:00
Edward Smyth	591a3a7395	Code cleanup: file formats and permissions - Remove execute file permission from source and make files. - dos2unix conversion. - Add missing eol at end of files. Also update .gitignore to not exclude build directory but to exclude any build_* created by cmake builds. AMD-Internal: [CPUPL-4415] Change-Id: I5403290d49fe212659a8015d5e94281fe41eb124	2024-08-05 11:52:33 -04:00
Eleni Vlachopoulou	20d6a9a9f3	CMake: Add installation of .pc files. AMD-Internal: [CPUPL-4938] Change-Id: Iaf1ad702e61d8a81ee9ae6496ff3ba0dda21eceb	2024-07-31 11:54:57 -04:00
Eleni Vlachopoulou	db2e353362	CMake: Adding presets for zen5 configuration with clang & gcc compiler. Change-Id: Ieacc5eeaf8e9f4e1c77e2ff5c6fb455f7ff93393	2024-07-04 05:22:28 -04:00
Edward Smyth	8de8dc2961	Merge commit '81e10346' into amd-main * commit '81e10346': Alloc at least 1 elem in pool_t block_ptrs. (#560) Fix insufficient pool-growing logic in bli_pool.c. (#559) Arm SVE C/ZGEMM Fix FMOV 0 Mistake SH Kernel Unused Eigher Arm SVE C/ZGEMM Support beta==0 Arm SVE Config armsve Use ZGEMM/CGEMM Arm SVE: Update Perf. Graph Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0 Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0 A64FX Config Use ZGEMM/CGEMM Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg Arm SVE Add SGEMM 2Vx10 Unindexed Arm SVE ZGEMM Support Gather Load / Scatt. St. Arm SVE Add ZGEMM 2Vx10 Unindexed Arm SVE Add ZGEMM 2Vx7 Unindexed Arm SVE Add ZGEMM 2Vx8 Unindexed Update Travis CI badge Armv8 Trash New Bulk Kernels Enable testing 1m in `make check`. Config ArmSVE Unregister 12xk. Move 12xk to Old Revert __has_include(). Distinguish w/ BLIS_FAMILY_* Register firestorm into arm64 Metaconfig Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo Add test for Apple M1 (firestorm) Firestorm CPUID Dispatcher Armv8 GEMMSUP Edge Cases Require Signed Ints Make error checking level a thread-local variable. Fix data race in testsuite. Update .appveyor.yml Firestorm Block Size Fixes Armv8 Handle beta == 0 for GEMMSUP ??r Case. Move unused ARM SVE kernels to "old" directory. Add an option to control whether or not to use @rpath. Fix $ORIGIN usage on linux. Arm micro-architecture dispatch (#344) Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries. Armv8 Handle beta == 0 for GEMMSUP ?rc Case. Armv8 Fix 6x8 Row-Maj Ukr Apply patch from @xrq-phys. Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs. bli_error: more cleanup on the error strings array Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9 Arm SVE: Correct PACKM Ker Name: Intrinsic Kers Fix config_name in bli_arch.c Arm Whole GEMMSUP Call Route is Asm/Int Optimized Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref Header Typo Arm: DGEMMSUP ??r(rv) Invoke Edge Size Arm: DGEMMSUP ?rc(rd) Invoke Edge Size Arm: Implement GEMMSUP Fallback Method Arm64 Fix: Support Alpha/Beta in GEMMSUP Intrin Added Apple Firestorm (A14/M1) Subconfig Arm64 8x4 Kernel Use Less Regs Armv8-A Supplimentary GEMMSUP Sizes for RD Armv8-A Fix GEMMSUP-RD Kernels on GNU Asm Armv8-A Adjust Types for PACKM Kernels Armv8-A GEMMSUP-RD 6x8m Armv8-A GEMMSUP-RD 6x8n Armv8-A s/d Packing Kernels Fix Typo Armv8-A Introduced s/d Packing Kernels Armv8-A DGEMMSUP 6x8m Kernel Armv8-A DGEMMSUP Adjustments Armv8-A Add More DGEMMSUP Armv8-A Add GEMMSUP 4x8n Kernel Armv8-A Add Part of GEMMSUP 8x4m Kernel Armv8A DGEMM 4x4 Kernel WIP. Slow Armv8-A Add 8x4 Kernel WIP AMD-Internal: [CPUPL-2698] Change-Id: I194ff69356740bb36ca189fd1bf9fef02eec3803	2024-06-25 05:48:46 -04:00
Edward Smyth	c51b4628b4	BLIS: Implement zen5 sub-configuration in cmake Correction to commit `2450a1813b` to add -DBLIS_CONFIG_FAMILY=zen5 support in cmake. AMD-Internal: [CPUPL-3518] Change-Id: Iecff2b64d5d95960cecbbf98d5269133747b122e	2024-04-15 07:40:50 -04:00
Eleni Vlachopoulou	e14da6f73d	GTestSuite: Generic updates to CMake system and cmake presets. - Updating gemm/cgemm_ukernel.cpp to cast integers so that gtestsuite works for ILP64. - Updating BLIS cmake presets to be conditional on Windows and Linux. - Updating GTestSuite cmake system to use environment variable to set BLIS_PATH and reference library. - Add more cmake presets options in gtestsuite.	2024-03-19 18:14:49 +00:00
Eleni Vlachopoulou	020b9ff7f0	CMake: Enable builds for both static and shared builds for Linux. - Added BUILD_STATIC_LIBS option which is on by default, only on Linux. - Added TEST_WITH_SHARED option which is off by default, only on Linux. - If only shared or static lib is being built, that's the one that will be used for testing. - If both are being built, TEST_WITH_SHARED determins which library wil be used for testing. - Set linux workflows so that they build both static and shared libs, and use linux-static and linux-shared to denote which one should be used for testing. - Set -fPIC for both static and shared builds to fix issues faced when building blis using AOCC 4.0.0 and gtestsuite using gcc 9.4.0. AMD-Internal: [CPUPL-2748] Change-Id: I4227bab97ff31ecddfe218e18499f33b4e4ee63e	2024-03-14 10:32:51 -04:00
Meghana Vankadari	da8fd8c301	Implemented JIT-based microkernel for bf16 datatype Details: - Added new folder named JIT/ under addon/aocl_gemm/. This folder will contain all the JIT related code. - Modified lpgemm_cntx_init code to generate main and fringe kernels for 6x64 bf16 microkernel and store function pointers to all the generated kernels in a global function pointer array. This happens only when gcc version is < 11.2 - When gcc version < 11.2, microkernel uses JIT-generated kernels. otherwise, microkernel uses the intrinsics based implementation. AMD-Internal: [SWLCSG-2622] Change-Id: I16256c797b2546a8cd2049680001947346260461	2024-03-13 05:55:18 +05:30
Eleni Vlachopoulou	14ae6c78dd	CMake: Introducing CMake presets to simplify CI jobs and development. AMD-Internal: [CPUPL-2748] Change-Id: Ic8aa9ccfa317b9ba3c63b1a952f3ef8593b9d990	2024-03-08 05:52:04 -05:00
jagar	394eee90f6	CMake: CMake is updated to support Address-Sanatizer CMakelists.txt is updated to support ASAN to find memory related errors in blis library. ASAN is enabled by configuring cmake with the following option . $ cmake .. -DENABLE_ASAN=ON ASAN supports only on linux with clang compiler. And redzone size default size is 16 bytes and maximum redzone size is 2048 bytes. $ ASAN_OPTIONS=redzone=2048 <exe> AMD-Internal: [CPUPL-2748] Change-Id: I0b70af5c41cf5c68602150daeb67d7432bbe5cb8	2024-03-05 23:19:22 -05:00
Chandrashekara K R	9f7e5b7dbf	CMake: Modified flatten-headers.py file to fix issue observed with ninja on windows. While build blis library using ninja generator on windows, observed ninja is randomly adding "\|\| '(set', 'FAIL_LINE=3&', 'goto', ':ABORT)'" as extra arguments for add_custom_command. Due to this flatten-headers python script was failing to create blis.h and cblas.h headers. Modified the python script to fix above issue. AMD-Internal: [CPUPL-2748] Change-Id: I83b753d08e46f94b282176fcc661ce34e5eee3cf	2024-02-29 15:42:02 +05:30
jagar	099b9863cb	CMake: CMake is updated for Code Coverage CMakelists.txt is Updated to generate code coverage report in html format just by configuring cmake with -DENABLE_COVERAGE=ON. Code supports only on linux with gcc compiler cmake .. -DENABLE_COVERAGE=ON AMD-Internal: [CPUPL-2748] Change-Id: I9b36b6cc3f1f97b53e1c4ee62948a017418e3d41	2024-02-07 06:12:51 -05:00
jagar	1821c2142b	CMake:Fix in testsuite cmake to work for static-st on linux CMakeLists.txt is updated in blis/testsuite to make it work for static single thread version of BLIS. AMD-Internal: [CPUPL-2748] Change-Id: I004e19d4ddbf9cb94d6d23699893a2f684a3fb35	2024-01-09 09:13:03 -05:00
Edward Smyth	ed5010d65b	Code cleanup: AMD copyright notice Standardize format of AMD copyright notice. AMD-Internal: [CPUPL-3519] Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0	2023-11-23 08:54:31 -05:00
Edward Smyth	f471615c66	Code cleanup: No newline at end of file Some text files were missing a newline at the end of the file. One has been added. AMD-Internal: [CPUPL-3519] Change-Id: I4b00876b1230b036723d6b56755c6ca844a7ffce	2023-11-22 17:11:10 -05:00
Edward Smyth	dc41fa3829	User selection of code path in single architecture builds User control over code path using AOCL_ENABLE_INSTRUCTIONS or BLIS_ARCH_TYPE only makes sense for fat binary builds. Thus this functionality is now disabled by default for single architecture builds. User can still override the default selections by using configure options --enable-blis-arch-type or --disable-blis-arch-type. Other changes: - include x86_64 family as using zen codepaths in cmake build system. - Update help and error messages to include AOCL_ENABLE_INSTRUCTIONS. AMD-Internal: [CPUPL-4202] Change-Id: I7aa5fcf89df8675bcc12d81f81781de647e0fcf8	2023-11-22 10:48:44 -05:00
Edward Smyth	c6f3340125	Merge commit '5013a6cb' into amd-main * commit '5013a6cb': More edits and fixes to docs/FAQ.md. Fixed newly broken link to CREDITS in FAQ.md. More minor fixes to FAQ.md and Sandboxes.md. Updates to FAQ.md, Sandboxes.md, and README.md. Safelist 'master', 'dev', 'amd' branches. Re-enable and fix `fb93d24`. Reverted `fb93d24`. Re-enable and fix `8e0c425` (BLIS_ENABLE_SYSTEM). Removed last vestige of #define BLIS_NUM_ARCHS. Added new packm var3 to 'gemmlike'. Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell. Fix more copy-paste errors in the haswell gemmsup code. Do a fast test on OSX. [ci skip] Fix AArch64 tests and consolidate some other tests. Use C++ cross-compiler for ARM tests. Attempt to fix cxx-test for OOT builds. Updated travis-ci.org link in README.md to .com. Disabled (at least temporarily) commit `8e0c425`. Define BLIS_OS_NONE when using --disable-system. Updated stale calls to malloc_intl() in gemmlike. Blacklist clang10/gcc9 and older for 'armsve'. Add test to Travis using C++ compiler to make sure blis.h is C++-compatible. Moved lang defs from _macro_def.h to _lang_defs.h. Minor tweaks to gemmlike sandbox. Added local _check() code to gemmlike sandbox. README.md citation updates (e.g. BLIS7 bibtex). Tweaks to gemmlike to facilitate 3rd party mods. Whitespace tweaks. Add row- and column-strides for A/B in obj_ukr_fn_t. Clean up some warnings that show up on clang/OSX. Remove schema field on obj_t (redundant) and add new API functions. Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects. Disabled sanity check in bli_pool_finalize(). Implement proposed new function pointer fields for obj_t. AMD-Internal: [CPUPL-2698] Change-Id: I6fc33351fa824580cf4f25b63f0370383cd9422d	2023-11-10 13:05:12 -05:00
Eleni Vlachopoulou	75a4d2f72f	CMake: Adding new portable CMake system. - A completely new system, made to be closer to Make system. AMD-Internal: [CPUPL-2748] Change-Id: I83232786406cdc4f0a0950fb6ac8f551e5968529	2023-11-09 15:49:45 +05:30
Edward Smyth	f5505be9f3	Merge commit 'e366665c' into amd-main * commit 'e366665c': Fixed stale API calls to membrk API in gemmlike. Fixed bli_init.c compile-time error on OSX clang. Fixed configure breakage on OSX clang. Fixed one-time use property of bli_init() (#525). CREDITS file update. Added Graviton2 Neoverse N1 performance results. Remove unnecesary windows/zen2 directory. Add vzeroupper to Haswell microkernels. (#524) Fix Win64 AVX512 bug. Add comment about make checkblas on Windows CREDITS file update. Test installation in Travis CI Add symlink to blis.pc.in for out-of-tree builds Revert "Always run `make check`." Always run `make check`. Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script. if the string contains zen and zen2, and zen need to be replaced with another string, then zen2 also be incorrectly replaced. Update POWER10.md Rework POWER10 sandbox Skip clearing temp microtile in gemmlike sandbox. Fix asm warning Sandbox header edits trigger full library rebuild. Add vhsubpd/vhsubpd. Fixed bugs in cpackm kernels, gemmlike code. Armv8A Rename Regs for Safe Darwin Compile Armv8A Rename Regs for Clang Compile: FP32 Part Armv8A Rename Regs for Clang Compile: FP64 Part Asm Flag Mingling for Darwin_Aarch64 Added a new 'gemmlike' sandbox. Updated Fugaku (a64fx) performance results. Add explicit compiler check for Windows. Remove `rm-dupls` function in common.mk. Travis CI Revert Unnecessary Extras from `91d3636` Adjust TravisCI Travis Support Arm SVE Added 512b SVE-based a64fx subconfig + SVE kernels. Replace bli_dlamch with something less archaic (#498) Allow clang for ThunderX2 config AMD-Internal: [CPUPL-2698] Change-Id: I561ca3959b7049a00cc128dee3617be51ae11bc4	2023-10-18 09:09:54 -04:00
Edward Smyth	bb4c158e63	Merge commit 'b683d01b' into amd-main * commit 'b683d01b': Use extra #undef when including ba/ex API headers. Minor preprocessor/header cleanup. Fixed typo in cpp guard in bli_util_ft.h. Defined eqsc, eqv, eqm to test object equality. Defined setijv, getijv to set/get vector elements. Minor API breakage in bli_pack API. Add err_t* "return" parameter to malloc functions. Always stay initialized after BLAS compat calls. Renamed membrk files/vars/functions to pba. Switch allocator mutexes to static initialization. AMD-Internal: [CPUPL-2698] Change-Id: Ied2ca8619f144d4b8a7123ac45a1be0dda3875df	2023-08-21 07:01:38 -04:00
Edward Smyth	7e50ba669b	Code cleanup: No newline at end of file Some text files were missing a newline at the end of the file. One has been added. Also correct file format of windows/tests/inputs.yaml, which was missed in commit `0f0277e104` AMD-Internal: [CPUPL-2870] Change-Id: Icb83a4a27033dc0ff325cb84a1cf399e953ec549	2023-04-21 10:02:48 -04:00
Edward Smyth	0f0277e104	Code cleanup: dos2unix file conversion Source and other files in some directories were a mixture of Unix and DOS file formats. Convert all relevant files to Unix format for consistency. Some Windows-specific files remain in DOS format. AMD-Internal: [CPUPL-2870] Change-Id: Ic9a0fddb2dba6dc8bcf0ad9b3cc93774a46caeeb	2023-04-21 08:41:16 -04:00
Edward Smyth	b531022bac	BLIS cpuid: distinguish submodels within a microarchitecture Incorporate a means of detecting submodels of a microarchitecture, so that different optimizations e.g. block sizes or kernel choices can be used. The details are as follows: - Different models are currently only enabled for zen3 and zen4 architectures (for server parts). - There is a single enumeration (model_t) for all models for all architectures, but function bli_check_valid_model_id() should check the provided model_id against the suitable range within the enumeration for the provided arch_id. - To enable the model_id to be used within the cntx setup functions, checking of a user specified value of BLIS_ARCH_TYPE against the enabled configurations is delayed to a separate function, bli_arch_check_id(). - Default selection based on hardware can be overridden using the BLIS_MODEL_TYPE environment variable. Valid values are: Genoa, Bergamo, Genoa-X, Milan, Milan-X Values are case-insensitive and -X can also be specified as _X or X - Specifying an incorrect value for BLIS_MODEL_TYPE is not an error, but will result in the default option for that architecture being selected. This is different to specifying an incorrect value of BLIS_ARCH_TYPE, which is an error. - The environment variable BLIS_MODEL_TYPE can be renamed using the --rename-blis-model-type argument to configure (or cmake equivalent), in a similar way to renaming BLIS_ARCH_TYPE with --rename-blis-arch-type. - Configure option --disable-blis-arch-type will disable both BLIS_ARCH_TYPE and BLIS_MODEL_TYPE environment variables. - Added code in bli_cpuid.c to detect L1, L2 and L3 cache sizes, currently only for AMD cpus. Functions are provided to query these from other parts of the code, namely: uint32_t bli_cpuid_query_{l1d,l1i,l2,l3}_cache_size() AMD-Internal: [CPUPL-3033] Change-Id: I37a3741abfd59a95e0e905d926c6ede9a0143702	2023-04-20 10:47:44 -04:00
Aayush Kumar	5bd2a777ba	Fixed Compilation Fails when configured with --disable-blas - Moved _blis_impl function declaration outside the BLIS_ENABLE_BLAS guard. - Changed Makefile to continue to compile bla_ files to get _blis_impl interfaces. - Modify CBLAS headers, bli_macro_defs.h and bli_util_api_wrap.{c,h} to add BLIS_ENABLE_CBLAS guards. - Comment out BLIS_ENABLE_BLAS guards in various headers and utility functions. - Define BLIS Fortran-style functions lsame_blis_impl and xerbla_blis_impl. New macros PASTE_LSAME and PASTE_XERBLA are used in bla_*_check headers and some other places to select whether to call lsame and xerbla, or the _blis_impl versions. - Defined various other missing _blis_impl functions. - In bli_util_api_wrap.c, only define any functions if BLIS_ENABLE_BLAS is defined, and only define the subroutine versions of functions like dot, nrm2, etc if BLIS_ENABLE_CBLAS is defined. - BLAS layer is needed if CBLAS layer is enabled. Changed header files build/bli_config.h.in and bli_blas.h, and configure program to help ensure consistency in generated blis.h header and configure output. Undefining BLIS_ENABLE_BLAS_DEFS appears to be broken in UTA BLIS too, thus BLIS_ENABLE_BLAS_DEFS is currently permanently defined. AMD-Internal: [CPUPL-3015] Change-Id: I7c0fe07db85781db46f2c690e174451860b37635	2023-03-23 06:11:52 -04:00
chandrkr	399831d7cb	AOCL-Windows: Script Update to mirror reference kernel files with Unique names Details: 1. Reference kernel File names are to be mirrored with unique names for each architecture configuration while generating fat binary. 2. Python script "blis_ref_kernel_mirror.py" is updated to append the filenames with each configuration (zen, zen2, zen3, zen4 and generic) that is built for windows. AMD-Internal: [CPUPL-3009] Change-Id: Ib02206382199cf2aebe14ff9c869b6089228e1c2	2023-02-14 23:37:13 -05:00
Harihara Sudhan S	42d631bced	Copyright modification - Added copyright information to modified/newly created files missing them Change-Id: If4e73b680246d0363de09587d6dc54bee00ecd71	2022-10-14 12:43:35 +05:30
Edward Smyth	6861fcae91	BLIS: Improve architecture selection at runtime Make BLIS_ARCH_TYPE=0 be an error, so that incorrect meaningful names will get an error rather than "skx" code path. BLIS_ARCH_TYPE=1 is now "generic", so that it should be constant as new code paths are added. Thus all other code path enum values have increased by 2. Also added new options to BLIS configure program to allow: 1. BLIS_ARCH_TYPE functionality to be disabled, e.g.: ./configure --disable-blis-arch-type amdzen 2. Renaming the environment variable tested from "BLIS_ARCH_TYPE" to a specified value, e.g.: ./configure --rename-blis-arch-type=MY_NAME_FOR_ARCH_TYPE amdzen On Windows, these can be enabled with e.g.: cmake ... -DDISABLE_BLIS_ARCH_TYPE=ON or cmake ... -DRENAME_BLIS_ARCH_TYPE=MY_NAME_FOR_ARCH_TYPE This implements changes 2 and 3 in the Jira ticket below. AMD-Internal: [CPUPL-2235] Change-Id: Ie42906bd909f9d83f00a90c5bef9c5bf3ef5adb4	2022-08-19 10:59:35 -04:00
Field G. Van Zee	7a0ba4194f	Added support for addons. Details: - Implemented a new feature called addons, which are similar to sandboxes except that there is no requirement to define gemm or any other particular operation. - Updated configure to accept --enable-addon=<name> or -a <name> syntax for requesting an addon be included within a BLIS build. configure now outputs the list of enabled addons into config.mk. It also outputs the corresponding #include directives for the addons' headers to a new companion to the bli_config.h header file named bli_addon.h. Because addons may wish to make use of existing BLIS types within their own definitions, the addons' headers must be included sometime after that of bli_config.h (which currently is #included before bli_type_defs.h). This is why the #include directives needed to go into a new top-level header file rather than the existing bli_config.h file. - Added a markdown document, docs/Addons.md, to explain addons, how to build with them, and what assumptions their authors should keep in mind as they create them. - Added a gemmlike-like implementation of sandwich gemm called 'gemmd' as an addon in addon/gemmd. The code uses a 'bao_' prefix for local functions, including the user-level object and typed APIs. - Updated .gitignore so that git ignores bli_addon.h files. Change-Id: Ie7efdea366481ce25075cb2459bdbcfd52309717	2022-03-31 12:03:27 +05:30
Dipal M Zambare	f63f78d783	Removed Arch specific code from BLIS framework. - Removed BLIS_CONFIG_EPYC macro - The code dependent on this macro is handled in one of the three ways -- It is updated to work across platforms. -- Added in architecture/feature specific runtime checks. -- Duplicated in AMD specific files. Build system is updated to pick AMD specific files when library is built for any of the zen architecture AMD-Internal: [CPUPL-1960] Change-Id: I6f9f8018e41fa48eb43ae4245c9c2c361857f43b	2022-01-18 11:51:08 +05:30
Chandrashekara K R	b3553c08fa	AOCL-Windows: Updating the blis windows build system. 1. Removed the libomp.lib hardcoded from cmake scripts and made it user configurable. By default libomp.lib is used as an omp library. 2. Added the STATIC_LIBRARY_OPTIONS property in set_target_properties cmake command to link omp library to build static-mt blis library. 3. Updated the blis_ref_kernel_mirror.py to give support for zen4 architecture. AMD-Internal: CPUPL-1630 Change-Id: I54b04cde2fa6a1ddc4b4303f1da808c1efe0484a	2021-12-22 14:47:15 +05:30
Dipal M Zambare	fd8a3aace9	Added support for zen4 architecture - Added configuration option for zen4 architecture - Added auto-detection of zen4 architecture - Added zen4 configuration for all checks related to AMD specific optimizations AMD-Internal: [CPUPL-1937] Change-Id: I1a1a45de04653f725aa53c30dffb6c0f7cc6e39a	2021-11-23 10:29:15 +05:30
nphaniku	4af525a313	AOCL Windows BLIS : Windows build for dynamic dispatch library Change-Id: Ie05eafbeacbd5589b514d9353517330515104939	2021-11-12 08:58:57 +05:30
Dipal M Zambare	5d287fdba0	Include LP64/ILP64 in BLIS binary name Binary name will be chosen based on multi-threading and BLAS integer size configuration as given below. libblis-[mt]-lp64 - when configured to use 32 bit integers libblis-[mt]-ilp64 - when configured to use 64 bit integers AMD-Internal: [CPUPL-1879] Change-Id: I865023c63235a0a72bdfce7057b2cfb8158b1d87	2021-11-12 08:58:51 +05:30

1 2 3 4 5

215 Commits