215 Commits

Author SHA1 Message Date
Kiran Varaganti
a7da8ee174 AOCL 5.2.2 GA Release 2026-03-12 16:08:56 +00:00
Balasubramanian, Vignesh
73911d5990 Updates to the build systems(CMake and Make) for LPGEMM compilation (#303)
- The current build systems have the following behaviour
  with regards to building "aocl_gemm" addon codebase(LPGEMM)
  when giving "amdzen" as the target architecture(fat-binary)
  - Make:  Attempts to compile LPGEMM kernels using the same
                compiler flags that the makefile fragments set for BLIS
                kernels, based on the compiler version.
  - CMake: With presets, it always enables the addon compilation
                 unless explicitly specified with the ENABLE_ADDON variable.

- This poses a bug with older compilers, owing to them not supporting
  BF16 or INT8 intrinsic compilation.

- This patch adds the functionality to check for GCC and Clang compiler versions,
  and disables LPGEMM compilation if GCC < 11.2 or Clang < 12.0.

- Make:  Updated the configure script to check for the compiler version
              if the addon is specified.
  CMake: Updated the main CMakeLists.txt to check for the compiler version
               if the addon is specified, and to also force-update the associated
               cache variable update. Also updated kernels/CMakeLists.txt to
               check if "aocl_gemm" remains in the ENABLE_ADDONS list after
               all the checks in the previous layers.

AMD-Internal: [CPUPL-7850]

Signed-off by : Vignesh Balasubramanian <Vignesh.Balasubramanian@amd.com>
2026-01-16 19:39:55 +05:30
Kiran Varaganti
9734fc18cc AOCL-5.2 GA Release 2025-12-07 05:22:11 +00:00
Smyth, Edward
a2905f7240 Merge commit 'db3134ed6d239a7962a2b7470d8c46611b9d17ef' into AOCL-5.2-RC
* commit 'db3134ed6d239a7962a2b7470d8c46611b9d17ef':
  Disabled no post-ops path in lpgemm f32 kernels for few gcc versions
  DTL Log update
  Add external PR integration process and flowchart to CONTRIBUTING.md
  Enabled disable-sba-pools feature in AOCL-BLAS (#101)
  Fix for F32 to BF16 Conversion and AVX512 ISA Support Checks
  Fixed Integer Overflow Issue in TPSV
  Add AI Code Review workflow (#211)
  Add AI Code Review Self-enablement file (#209)
  Re-tuned GEMV thresholds (#210)
  Adding bli_print_msg before bli_abort() for bli_thrinfo_sup_create_for_cntl
  Add missing license text
  Modified AVPY kernel to ensure consistency of numerical results (#188)
  Fix memory leak in DGEMV kernel (#187)
  Tuned DGEMV no-transpose thresholds #193
  Set Security flags default enable (#194)
  Standardize Zen kernel names (2)
  Compiler warnings fixes (2)
  coverity issue fix for ztrsm (#176)
  Fixes Coverity static analysis issue in the DTRSM (#181)
  Add files via upload (#197)
  Initialize mem_t structures safely and handle NULL communicator in threading
  Tidying code
  Compiler warnings fixes
  Fixing the coverity issues with CID: 23269 and CID: 137049 (#180)
  Fixed high priority coverity issues in LPGEMM. (#178)
  GCC 15 SUP kernel workaround (2)
  Disable small_gemm for zen4/5 and added single thread check for tiny path (#167)
  Optimal rerouting of GEMV inputs to avoid packing
  Updated Guards in s8s8s32of32_sym_quant Framework
  Fixed out-of-bound access in F32 matrix add/mul ops (#168)
2025-09-22 13:20:13 +01:00
Smyth, Edward
823ec2cd40 Add missing license text
Some files have copyright statements but not details of the license.
Add this to DTL source code and some build and benchmark related
scripts.

AMD-Internal: [CPUPL-6579]
2025-09-18 12:04:59 +01:00
Sharma, Shubham
aa95a8ce4a Added Compiler flags to improve Security (#136)
Following Flags have been added.

1. D_FORTIFY_SOURCE=2
What it does
• At compile time the header files replace certain libc calls (strcpy, sprintf, …) with inline wrappers that perform a compile-time length check whenever the size of the destination buffer is known.
• At run time an extra check is executed only if the compiler could not prove the copy is safe.

Cost
• Only functions that call those specific libc routines pay anything.

2. fstack-protector-strong
What it does
• Functions that contain local arrays, address‐taken locals, or alloca get a canary word inserted into the stack frame.
• The function prologue writes the canary; the epilogue verifies it before the ret.

Cost
• 8 bytes of additional stack per protected function frame.
• Two or three extra instructions per entry/exit.

4. Wl,-z,relro
What it does
• Marks the relocation tables read-only after relocation is finished.
• No effect once the library is fully loaded.

Cost
• None at run time.

5. Wl,-z,now
What it does
• Forces the dynamic loader to resolve all external symbols in the library up-front instead of lazily on first call.

Cost
• Startup: one extra relocation pass.
• Steady-state execution: zero or slightly faster, because PLT stubs are bypassed.

Usage:
cmake -DENABLE_SECURITY_FLAGS=off
cmake -DENABLE_SECURITY_FLAGS=on
configure --enable-security-flags
configure --disable-security-flags

AMD-Internal: [CPUPL-6886]
2025-08-18 16:11:02 +05:30
Vlachopoulou, Eleni
1f8a7d2218 Renaming CMAKE_SOURCE_DIR to PROJECT_SOURCE_DIR so that BLIS can be built properly via FetchContent() (#65) 2025-08-07 15:51:59 +01:00
Smyth, Edward
563b161933 Standardize Python files to use Python 3
Python 2 is no longer maintained, and using python3 avoids accidental invocation of outdated interpreters.

AMD-Internal: [CPUPL-6579]
2025-08-06 12:04:26 +01:00
Smyth, Edward
e3dcc15a80 Add configure and CMake options to enable DTL logging and tracing (#86)
Instead of editing a header file, add options to build systems to allow
DTL tracing and/or logging output to be generated. For most users
logging is recommended, producing a line of output per application
thread of every BLAS call made. Tracing provides more detailed info
of internal BLIS calls, and is aimed more at expert users and BLIS
developers. Different tracing levels from 1 to 10 provide control of
the granularity of information produced. The default level is 5. Note
that tracing, especially at higher tracing levels, will impose a
significant runtime cost overhead.

Example usage:

Using configure:

  ./configure ... --enable-aocl-dtl=log amdzen

  ./configure ... --enable-aocl-dtl=trace --aocl-dtl-trace-level=6 amdzen

  ./configure ... --enable-aocl-dtl=all amdzen

Using CMake:

  cmake ... -DENABLE_AOCL_DTL=LOG

  cmake ... -DENABLE_AOCL_DTL=TRACE -DAOCL_DTL_TRACE_LEVEL=6

  cmake ... -DENABLE_AOCL_DTL=ALL

Also, modify function AOCL_get_requested_threads_count to correct
reported thread count in cases where internal value is recorded as -1

AMD-Internal: [CPUPL-7010]
2025-07-28 15:24:10 +01:00
Smyth, Edward
969ceb7413 Finer control of code path options (#67)
Add macros to allow specific code options to be enabled or disabled,
controlled by options to configure and cmake. This expands on the
existing GEMM and/or TRSM functionality to enable/disable SUP handling
and replaces the hard coded #define in include files to enable small matrix
paths.

All options are enabled by default for all BLIS sub-configs but many of them
are currently only implemented in AMD specific framework code variants.

AMD-Internal: [CPUPL-6906]
---------

Co-authored-by: Varaganti, Kiran <Kiran.Varaganti@amd.com>
2025-07-08 10:59:23 +01:00
Smyth, Edward
14e46ad83b Improvements to x86 make_defs files (#29)
Various changes to simplify and improve x86 related make_defs files:
- Make better use of common definitions in config/zen/amd_config.mk
  from config/zen*/make_defs.mk files
- Similarly for config/zen/amd_config.make from the
  config/zen*/make_defs.cmake files
- Pass cc_major, cc_minor and cc_revision definitions from configure
  to generated config.mk file, and use these instead of defining
  GCC_VERSION in config/zen*/make_defs.mk files
- Add znver3 support for LLVM 13 in config/zen3/make_defs.{mk,cmake}
- Add znver5 support for LLVM 19 in config/zen5/make_defs.{mk,cmake}
- Improve readability of haswell, intel64, skx and x86_64 files
- Correct and tidy some comments

AMD-Internal: [CPUPL-6579]
2025-06-03 16:20:43 +01:00
Vlachopoulou, Eleni
4555917040 CMake: Removing INT_SIZE variable from presets (#11)
With this change, the default INT_SIZE of the system will be used.
That is compatible with the Make system and default CMake options.
2025-05-21 09:35:32 +01:00
Kiran Varaganti
c850221ed1 AOCL-BLAS 5.1-GA Release 2025-05-08 12:29:21 +05:30
Eleni Vlachopoulou
9f263d2445 CMake: Fix pkgconfig file names
Rename generated aocl-blas.pc and aocl-blas-mt.pc to blis.pc and blis-mt.pc.

AMD-Internal: [SWLCSG-3446]
Change-Id: Ica784c7a0fd1e52b4d419795659947316e932ef6
2025-03-11 11:59:02 +00:00
AngryLoki
ea93d2e2c9 Fix SyntaxWarning messages from python 3.12 (#809)
Details:
- When using regexes in Python, certain characters need backslash escaping, e.g.:
  ```python
  regex = re.compile( '^[\s]*#include (["<])([\w\.\-/]*)([">])' )
  ```
  However, technically escape sequences like `\s` are not valid and should actually be double-escaped: `\\s`.
  Python 3.12 now warns about such escape sequences, and in a later version these warning will be promoted
  to errors. See also: https://docs.python.org/dev/whatsnew/3.12.html#other-language-changes. The fix here
  is to use Python's "raw strings" to avoid double-escaping. This issue can be checked for all files in the current
  directory with the command `python -m compileall -d . -f -q .`
- Thanks to @AngryLoki for the fix.

AMD-Internal: [CPUPL-5895]
Change-Id: I7ab564beef1d1b81e62d985c5cb30ab6b9a937f2
(cherry picked from commit 729c57c15a)
2025-02-05 07:13:42 -05:00
jagar
8d0bf148ee Added separate PC for mt blis library
Added separate package configuration file for
st and mt library in blis Makefile and CMakeLists.txt

Change-Id: I8d851fac10d63983358e1f4c67fd9451246056bf
2025-02-05 05:10:11 -05:00
Edward Smyth
82bdf7c8c7 Code cleanup: Copyright notices
- Standardize formatting (spacing etc).
- Add full copyright to cmake files (excluding .json)
- Correct copyright and disclaimer text for frame and
  zen, skx and a couple of other kernels to cover all
  contributors, as is commonly used in other files.
- Fixed some typos and missing lines in copyright
  statements.

AMD-Internal: [CPUPL-4415]
Change-Id: Ib248bb6033c4d0b408773cf0e2a2cda6c2a74371
2024-08-05 15:35:08 -04:00
Edward Smyth
591a3a7395 Code cleanup: file formats and permissions
- Remove execute file permission from source and make files.
- dos2unix conversion.
- Add missing eol at end of files.

Also update .gitignore to not exclude build directory but to
exclude any build_* created by cmake builds.

AMD-Internal: [CPUPL-4415]
Change-Id: I5403290d49fe212659a8015d5e94281fe41eb124
2024-08-05 11:52:33 -04:00
Eleni Vlachopoulou
20d6a9a9f3 CMake: Add installation of .pc files.
AMD-Internal: [CPUPL-4938]
Change-Id: Iaf1ad702e61d8a81ee9ae6496ff3ba0dda21eceb
2024-07-31 11:54:57 -04:00
Eleni Vlachopoulou
db2e353362 CMake: Adding presets for zen5 configuration with clang & gcc compiler.
Change-Id: Ieacc5eeaf8e9f4e1c77e2ff5c6fb455f7ff93393
2024-07-04 05:22:28 -04:00
Edward Smyth
8de8dc2961 Merge commit '81e10346' into amd-main
* commit '81e10346':
  Alloc at least 1 elem in pool_t block_ptrs. (#560)
  Fix insufficient pool-growing logic in bli_pool.c. (#559)
  Arm SVE C/ZGEMM Fix FMOV 0 Mistake
  SH Kernel Unused Eigher
  Arm SVE C/ZGEMM Support *beta==0
  Arm SVE Config armsve Use ZGEMM/CGEMM
  Arm SVE: Update Perf. Graph
  Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0
  Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0
  A64FX Config Use ZGEMM/CGEMM
  Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg
  Arm SVE Add SGEMM 2Vx10 Unindexed
  Arm SVE ZGEMM Support Gather Load / Scatt. St.
  Arm SVE Add ZGEMM 2Vx10 Unindexed
  Arm SVE Add ZGEMM 2Vx7 Unindexed
  Arm SVE Add ZGEMM 2Vx8 Unindexed
  Update Travis CI badge
  Armv8 Trash New Bulk Kernels
  Enable testing 1m in `make check`.
  Config ArmSVE Unregister 12xk. Move 12xk to Old
  Revert __has_include(). Distinguish w/ BLIS_FAMILY_**
  Register firestorm into arm64 Metaconfig
  Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo
  Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo
  Add test for Apple M1 (firestorm)
  Firestorm CPUID Dispatcher
  Armv8 GEMMSUP Edge Cases Require Signed Ints
  Make error checking level a thread-local variable.
  Fix data race in testsuite.
  Update .appveyor.yml
  Firestorm Block Size Fixes
  Armv8 Handle *beta == 0 for GEMMSUP ??r Case.
  Move unused ARM SVE kernels to "old" directory.
  Add an option to control whether or not to use @rpath.
  Fix $ORIGIN usage on linux.
  Arm micro-architecture dispatch (#344)
  Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries.
  Armv8 Handle *beta == 0 for GEMMSUP ?rc Case.
  Armv8 Fix 6x8 Row-Maj Ukr
  Apply patch from @xrq-phys.
  Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.
  bli_error: more cleanup on the error strings array
  Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9
  Arm SVE: Correct PACKM Ker Name: Intrinsic Kers
  Fix config_name in bli_arch.c
  Arm Whole GEMMSUP Call Route is Asm/Int Optimized
  Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref
  Header Typo
  Arm: DGEMMSUP ??r(rv) Invoke Edge Size
  Arm: DGEMMSUP ?rc(rd) Invoke Edge Size
  Arm: Implement GEMMSUP Fallback Method
  Arm64 Fix: Support Alpha/Beta in GEMMSUP Intrin
  Added Apple Firestorm (A14/M1) Subconfig
  Arm64 8x4 Kernel Use Less Regs
  Armv8-A Supplimentary GEMMSUP Sizes for RD
  Armv8-A Fix GEMMSUP-RD Kernels on GNU Asm
  Armv8-A Adjust Types for PACKM Kernels
  Armv8-A GEMMSUP-RD 6x8m
  Armv8-A GEMMSUP-RD 6x8n
  Armv8-A s/d Packing Kernels Fix Typo
  Armv8-A Introduced s/d Packing Kernels
  Armv8-A DGEMMSUP 6x8m Kernel
  Armv8-A DGEMMSUP Adjustments
  Armv8-A Add More DGEMMSUP
  Armv8-A Add GEMMSUP 4x8n Kernel
  Armv8-A Add Part of GEMMSUP 8x4m Kernel
  Armv8A DGEMM 4x4 Kernel WIP. Slow
  Armv8-A Add 8x4 Kernel WIP

AMD-Internal: [CPUPL-2698]
Change-Id: I194ff69356740bb36ca189fd1bf9fef02eec3803
2024-06-25 05:48:46 -04:00
Edward Smyth
c51b4628b4 BLIS: Implement zen5 sub-configuration in cmake
Correction to commit 2450a1813b
to add -DBLIS_CONFIG_FAMILY=zen5 support in cmake.

AMD-Internal: [CPUPL-3518]
Change-Id: Iecff2b64d5d95960cecbbf98d5269133747b122e
2024-04-15 07:40:50 -04:00
Eleni Vlachopoulou
e14da6f73d GTestSuite: Generic updates to CMake system and cmake presets.
- Updating gemm/cgemm_ukernel.cpp to cast integers so that gtestsuite works for ILP64.
- Updating BLIS cmake presets to be conditional on Windows and Linux.
- Updating GTestSuite cmake system to use environment variable to set BLIS_PATH and reference library.
- Add more cmake presets options in gtestsuite.
2024-03-19 18:14:49 +00:00
Eleni Vlachopoulou
020b9ff7f0 CMake: Enable builds for both static and shared builds for Linux.
- Added BUILD_STATIC_LIBS option which is on by default, only on Linux.
- Added TEST_WITH_SHARED option which is off by default, only on Linux.
- If only shared or static lib is being built, that's the one that will be used for testing.
- If both are being built, TEST_WITH_SHARED determins which library wil be used for testing.
- Set linux workflows so that they build both static and shared libs, and use linux-static and linux-shared to denote which one should be used for testing.
- Set -fPIC for both static and shared builds to fix issues faced when building blis using AOCC 4.0.0 and gtestsuite using gcc 9.4.0.

AMD-Internal: [CPUPL-2748]
Change-Id: I4227bab97ff31ecddfe218e18499f33b4e4ee63e
2024-03-14 10:32:51 -04:00
Meghana Vankadari
da8fd8c301 Implemented JIT-based microkernel for bf16 datatype
Details:
- Added new folder named JIT/ under addon/aocl_gemm/. This folder
  will contain all the JIT related code.
- Modified lpgemm_cntx_init code to generate main and fringe kernels
  for 6x64 bf16 microkernel and store function pointers to all the
  generated kernels in a global function pointer array. This happens
  only when gcc version is < 11.2
- When gcc version < 11.2, microkernel uses JIT-generated kernels.
  otherwise, microkernel uses the intrinsics based implementation.

AMD-Internal: [SWLCSG-2622]
Change-Id: I16256c797b2546a8cd2049680001947346260461
2024-03-13 05:55:18 +05:30
Eleni Vlachopoulou
14ae6c78dd CMake: Introducing CMake presets to simplify CI jobs and development.
AMD-Internal: [CPUPL-2748]
Change-Id: Ic8aa9ccfa317b9ba3c63b1a952f3ef8593b9d990
2024-03-08 05:52:04 -05:00
jagar
394eee90f6 CMake: CMake is updated to support Address-Sanatizer
CMakelists.txt is updated to support ASAN to find
memory related errors in blis library. ASAN is enabled
by configuring cmake with the following option .

$ cmake .. -DENABLE_ASAN=ON

ASAN supports only on linux with clang compiler.
And redzone size default size is 16 bytes and maximum
redzone size is 2048 bytes.

$ ASAN_OPTIONS=redzone=2048 <exe>

AMD-Internal: [CPUPL-2748]
Change-Id: I0b70af5c41cf5c68602150daeb67d7432bbe5cb8
2024-03-05 23:19:22 -05:00
Chandrashekara K R
9f7e5b7dbf CMake: Modified flatten-headers.py file to fix issue observed with ninja on windows.
While build blis library using ninja generator on windows, observed
ninja is randomly adding "|| '(set', 'FAIL_LINE=3&', 'goto', ':ABORT)'"
as extra arguments for add_custom_command. Due to this flatten-headers
python script was failing to create blis.h and cblas.h headers.
Modified the python script to fix above issue.

AMD-Internal: [CPUPL-2748]
Change-Id: I83b753d08e46f94b282176fcc661ce34e5eee3cf
2024-02-29 15:42:02 +05:30
jagar
099b9863cb CMake: CMake is updated for Code Coverage
CMakelists.txt is Updated to generate code coverage
report in html format just by configuring cmake with
-DENABLE_COVERAGE=ON. Code supports only on linux
with gcc compiler

cmake .. -DENABLE_COVERAGE=ON

AMD-Internal: [CPUPL-2748]
Change-Id: I9b36b6cc3f1f97b53e1c4ee62948a017418e3d41
2024-02-07 06:12:51 -05:00
jagar
1821c2142b CMake:Fix in testsuite cmake to work for static-st on linux
CMakeLists.txt is updated in blis/testsuite to make it work for
static single thread version of BLIS.

AMD-Internal: [CPUPL-2748]
Change-Id: I004e19d4ddbf9cb94d6d23699893a2f684a3fb35
2024-01-09 09:13:03 -05:00
Edward Smyth
ed5010d65b Code cleanup: AMD copyright notice
Standardize format of AMD copyright notice.

AMD-Internal: [CPUPL-3519]
Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0
2023-11-23 08:54:31 -05:00
Edward Smyth
f471615c66 Code cleanup: No newline at end of file
Some text files were missing a newline at the end of the file.
One has been added.

AMD-Internal: [CPUPL-3519]
Change-Id: I4b00876b1230b036723d6b56755c6ca844a7ffce
2023-11-22 17:11:10 -05:00
Edward Smyth
dc41fa3829 User selection of code path in single architecture builds
User control over code path using AOCL_ENABLE_INSTRUCTIONS
or BLIS_ARCH_TYPE only makes sense for fat binary builds.
Thus this functionality is now disabled by default for
single architecture builds. User can still override the default
selections by using configure options --enable-blis-arch-type
or --disable-blis-arch-type.

Other changes:
- include x86_64 family as using zen codepaths in cmake build system.
- Update help and error messages to include AOCL_ENABLE_INSTRUCTIONS.

AMD-Internal: [CPUPL-4202]
Change-Id: I7aa5fcf89df8675bcc12d81f81781de647e0fcf8
2023-11-22 10:48:44 -05:00
Edward Smyth
c6f3340125 Merge commit '5013a6cb' into amd-main
* commit '5013a6cb':
  More edits and fixes to docs/FAQ.md.
  Fixed newly broken link to CREDITS in FAQ.md.
  More minor fixes to FAQ.md and Sandboxes.md.
  Updates to FAQ.md, Sandboxes.md, and README.md.
  Safelist 'master', 'dev', 'amd' branches.
  Re-enable and fix fb93d24.
  Reverted fb93d24.
  Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
  Removed last vestige of #define BLIS_NUM_ARCHS.
  Added new packm var3 to 'gemmlike'.
  Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
  Fix more copy-paste errors in the haswell gemmsup code.
  Do a fast test on OSX. [ci skip]
  Fix AArch64 tests and consolidate some other tests.
  Use C++ cross-compiler for ARM tests.
  Attempt to fix cxx-test for OOT builds.
  Updated travis-ci.org link in README.md to .com.
  Disabled (at least temporarily) commit 8e0c425.
  Define BLIS_OS_NONE when using --disable-system.
  Updated stale calls to malloc_intl() in gemmlike.
  Blacklist clang10/gcc9 and older for 'armsve'.
  Add test to Travis using C++ compiler to make sure blis.h is C++-compatible.
  Moved lang defs from _macro_def.h to _lang_defs.h.
  Minor tweaks to gemmlike sandbox.
  Added local _check() code to gemmlike sandbox.
  README.md citation updates (e.g. BLIS7 bibtex).
  Tweaks to gemmlike to facilitate 3rd party mods.
  Whitespace tweaks.
  Add row- and column-strides for A/B in obj_ukr_fn_t.
  Clean up some warnings that show up on clang/OSX.
  Remove schema field on obj_t (redundant) and add new API functions.
  Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
  Disabled sanity check in bli_pool_finalize().
  Implement proposed new function pointer fields for obj_t.

AMD-Internal: [CPUPL-2698]
Change-Id: I6fc33351fa824580cf4f25b63f0370383cd9422d
2023-11-10 13:05:12 -05:00
Eleni Vlachopoulou
75a4d2f72f CMake: Adding new portable CMake system.
- A completely new system, made to be closer to Make system.

AMD-Internal: [CPUPL-2748]
Change-Id: I83232786406cdc4f0a0950fb6ac8f551e5968529
2023-11-09 15:49:45 +05:30
Edward Smyth
f5505be9f3 Merge commit 'e366665c' into amd-main
* commit 'e366665c':
  Fixed stale API calls to membrk API in gemmlike.
  Fixed bli_init.c compile-time error on OSX clang.
  Fixed configure breakage on OSX clang.
  Fixed one-time use property of bli_init() (#525).
  CREDITS file update.
  Added Graviton2 Neoverse N1 performance results.
  Remove unnecesary windows/zen2 directory.
  Add vzeroupper to Haswell microkernels. (#524)
  Fix Win64 AVX512 bug.
  Add comment about make checkblas on Windows
  CREDITS file update.
  Test installation in Travis CI
  Add symlink to blis.pc.in for out-of-tree builds
  Revert "Always run `make check`."
  Always run `make check`.
  Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script.   if the string contains zen and zen2, and zen need to be replaced with another string, then zen2   also be incorrectly replaced.
  Update POWER10.md
  Rework POWER10 sandbox
  Skip clearing temp microtile in gemmlike sandbox.
  Fix asm warning
  Sandbox header edits trigger full library rebuild.
  Add vhsubpd/vhsubpd.
  Fixed bugs in cpackm kernels, gemmlike code.
  Armv8A Rename Regs for Safe Darwin Compile
  Armv8A Rename Regs for Clang Compile: FP32 Part
  Armv8A Rename Regs for Clang Compile: FP64 Part
  Asm Flag Mingling for Darwin_Aarch64
  Added a new 'gemmlike' sandbox.
  Updated Fugaku (a64fx) performance results.
  Add explicit compiler check for Windows.
  Remove `rm-dupls` function in common.mk.
  Travis CI Revert Unnecessary Extras from 91d3636
  Adjust TravisCI
  Travis Support Arm SVE
  Added 512b SVE-based a64fx subconfig + SVE kernels.
  Replace bli_dlamch with something less archaic (#498)
  Allow clang for ThunderX2 config

AMD-Internal: [CPUPL-2698]
Change-Id: I561ca3959b7049a00cc128dee3617be51ae11bc4
2023-10-18 09:09:54 -04:00
Edward Smyth
bb4c158e63 Merge commit 'b683d01b' into amd-main
* commit 'b683d01b':
  Use extra #undef when including ba/ex API headers.
  Minor preprocessor/header cleanup.
  Fixed typo in cpp guard in bli_util_ft.h.
  Defined eqsc, eqv, eqm to test object equality.
  Defined setijv, getijv to set/get vector elements.
  Minor API breakage in bli_pack API.
  Add err_t* "return" parameter to malloc functions.
  Always stay initialized after BLAS compat calls.
  Renamed membrk files/vars/functions to pba.
  Switch allocator mutexes to static initialization.

AMD-Internal: [CPUPL-2698]
Change-Id: Ied2ca8619f144d4b8a7123ac45a1be0dda3875df
2023-08-21 07:01:38 -04:00
Edward Smyth
7e50ba669b Code cleanup: No newline at end of file
Some text files were missing a newline at the end of the file.
One has been added.

Also correct file format of windows/tests/inputs.yaml, which
was missed in commit 0f0277e104

AMD-Internal: [CPUPL-2870]
Change-Id: Icb83a4a27033dc0ff325cb84a1cf399e953ec549
2023-04-21 10:02:48 -04:00
Edward Smyth
0f0277e104 Code cleanup: dos2unix file conversion
Source and other files in some directories were a mixture of
Unix and DOS file formats. Convert all relevant files to Unix
format for consistency. Some Windows-specific files remain in
DOS format.

AMD-Internal: [CPUPL-2870]
Change-Id: Ic9a0fddb2dba6dc8bcf0ad9b3cc93774a46caeeb
2023-04-21 08:41:16 -04:00
Edward Smyth
b531022bac BLIS cpuid: distinguish submodels within a microarchitecture
Incorporate a means of detecting submodels of a microarchitecture,
so that different optimizations e.g. block sizes or kernel choices
can be used. The details are as follows:
- Different models are currently only enabled for zen3 and zen4
  architectures (for server parts).
- There is a single enumeration (model_t) for all models for all
  architectures, but function bli_check_valid_model_id() should
  check the provided model_id against the suitable range within
  the enumeration for the provided arch_id.
- To enable the model_id to be used within the cntx setup functions,
  checking of a user specified value of BLIS_ARCH_TYPE against
  the enabled configurations is delayed to a separate function,
  bli_arch_check_id().
- Default selection based on hardware can be overridden using the
  BLIS_MODEL_TYPE environment variable. Valid values are:
    Genoa, Bergamo, Genoa-X, Milan, Milan-X
  Values are case-insensitive and -X can also be specified as _X or X
- Specifying an incorrect value for BLIS_MODEL_TYPE is not an error,
  but will result in the default option for that architecture being
  selected. This is different to specifying an incorrect value of
  BLIS_ARCH_TYPE, which is an error.
- The environment variable BLIS_MODEL_TYPE can be renamed using
  the --rename-blis-model-type argument to configure (or cmake
  equivalent), in a similar way to renaming BLIS_ARCH_TYPE with
  --rename-blis-arch-type.
- Configure option --disable-blis-arch-type will disable both
  BLIS_ARCH_TYPE and BLIS_MODEL_TYPE environment variables.
- Added code in bli_cpuid.c to detect L1, L2 and L3 cache sizes,
  currently only for AMD cpus. Functions are provided to query
  these from other parts of the code, namely:
    uint32_t bli_cpuid_query_{l1d,l1i,l2,l3}_cache_size()

AMD-Internal: [CPUPL-3033]
Change-Id: I37a3741abfd59a95e0e905d926c6ede9a0143702
2023-04-20 10:47:44 -04:00
Aayush Kumar
5bd2a777ba Fixed Compilation Fails when configured with --disable-blas
- Moved *_blis_impl function declaration outside the BLIS_ENABLE_BLAS
  guard.
- Changed Makefile to continue to compile bla_ files to get
  *_blis_impl interfaces.
- Modify CBLAS headers, bli_macro_defs.h and bli_util_api_wrap.{c,h}
  to add BLIS_ENABLE_CBLAS guards.
- Comment out BLIS_ENABLE_BLAS guards in various headers and utility
  functions.
- Define BLIS Fortran-style functions lsame_blis_impl and
  xerbla_blis_impl. New macros PASTE_LSAME and PASTE_XERBLA are
  used in bla_*_check headers and some other places to select
  whether to call lsame and xerbla, or the _blis_impl versions.
- Defined various other missing _blis_impl functions.
- In bli_util_api_wrap.c, only define any functions if
  BLIS_ENABLE_BLAS is defined, and only define the subroutine
  versions of functions like dot, nrm2, etc if BLIS_ENABLE_CBLAS
  is defined.
- BLAS layer is needed if CBLAS layer is enabled. Changed header
  files build/bli_config.h.in and bli_blas.h, and configure
  program to help ensure consistency in generated blis.h header
  and configure output.

Undefining BLIS_ENABLE_BLAS_DEFS appears to be broken in UTA BLIS
too, thus BLIS_ENABLE_BLAS_DEFS is currently permanently defined.

AMD-Internal: [CPUPL-3015]

Change-Id: I7c0fe07db85781db46f2c690e174451860b37635
2023-03-23 06:11:52 -04:00
chandrkr
399831d7cb AOCL-Windows: Script Update to mirror reference kernel files with Unique names
Details:
1. Reference kernel File names are to be mirrored with unique names
for each architecture configuration while generating fat binary.
2. Python script "blis_ref_kernel_mirror.py" is updated to append
the filenames with each configuration (zen, zen2, zen3, zen4
and generic) that is built for windows.

AMD-Internal: [CPUPL-3009]
Change-Id: Ib02206382199cf2aebe14ff9c869b6089228e1c2
2023-02-14 23:37:13 -05:00
Harihara Sudhan S
42d631bced Copyright modification
- Added copyright information to modified/newly created
          files missing them

Change-Id: If4e73b680246d0363de09587d6dc54bee00ecd71
2022-10-14 12:43:35 +05:30
Edward Smyth
6861fcae91 BLIS: Improve architecture selection at runtime
Make BLIS_ARCH_TYPE=0 be an error, so that incorrect meaningful names
will get an error rather than "skx" code path. BLIS_ARCH_TYPE=1 is
now "generic", so that it should be constant as new code paths are
added. Thus all other code path enum values have increased by 2.

Also added new options to BLIS configure program to allow:
1. BLIS_ARCH_TYPE functionality to be disabled, e.g.:

./configure --disable-blis-arch-type amdzen

2. Renaming the environment variable tested from "BLIS_ARCH_TYPE" to a
   specified value, e.g.:

./configure --rename-blis-arch-type=MY_NAME_FOR_ARCH_TYPE amdzen

On Windows, these can be enabled with e.g.:

cmake ... -DDISABLE_BLIS_ARCH_TYPE=ON

or

cmake ... -DRENAME_BLIS_ARCH_TYPE=MY_NAME_FOR_ARCH_TYPE

This implements changes 2 and 3 in the Jira ticket below.

AMD-Internal: [CPUPL-2235]
Change-Id: Ie42906bd909f9d83f00a90c5bef9c5bf3ef5adb4
2022-08-19 10:59:35 -04:00
Field G. Van Zee
7a0ba4194f Added support for addons.
Details:
- Implemented a new feature called addons, which are similar to
  sandboxes except that there is no requirement to define gemm or any
  other particular operation.
- Updated configure to accept --enable-addon=<name> or -a <name> syntax
  for requesting an addon be included within a BLIS build. configure now
  outputs the list of enabled addons into config.mk. It also outputs the
  corresponding #include directives for the addons' headers to a new
  companion to the bli_config.h header file named bli_addon.h. Because
  addons may wish to make use of existing BLIS types within their own
  definitions, the addons' headers must be included sometime after that
  of bli_config.h (which currently is #included before bli_type_defs.h).
  This is why the #include directives needed to go into a new top-level
  header file rather than the existing bli_config.h file.
- Added a markdown document, docs/Addons.md, to explain addons, how to
  build with them, and what assumptions their authors should keep in
  mind as they create them.
- Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
  as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
  functions, including the user-level object and typed APIs.
- Updated .gitignore so that git ignores bli_addon.h files.

Change-Id: Ie7efdea366481ce25075cb2459bdbcfd52309717
2022-03-31 12:03:27 +05:30
Dipal M Zambare
f63f78d783 Removed Arch specific code from BLIS framework.
- Removed BLIS_CONFIG_EPYC macro
- The code dependent on this macro is handled in
  one of the three ways

  -- It is updated to work across platforms.
  -- Added in architecture/feature specific runtime checks.
  -- Duplicated in AMD specific files. Build system is updated to
      pick AMD specific files when library is built for any of the
     zen architecture

AMD-Internal: [CPUPL-1960]
Change-Id: I6f9f8018e41fa48eb43ae4245c9c2c361857f43b
2022-01-18 11:51:08 +05:30
Chandrashekara K R
b3553c08fa AOCL-Windows: Updating the blis windows build system.
1. Removed the libomp.lib hardcoded from cmake scripts and made it user configurable. By default libomp.lib is used as an omp library.
2. Added the STATIC_LIBRARY_OPTIONS property in set_target_properties cmake command to link omp library to build static-mt blis library.
3. Updated the blis_ref_kernel_mirror.py to give support for zen4 architecture.

AMD-Internal: CPUPL-1630
Change-Id: I54b04cde2fa6a1ddc4b4303f1da808c1efe0484a
2021-12-22 14:47:15 +05:30
Dipal M Zambare
fd8a3aace9 Added support for zen4 architecture
- Added configuration option for zen4 architecture
  - Added auto-detection of zen4 architecture
  - Added zen4 configuration for all checks related
    to AMD specific optimizations

AMD-Internal: [CPUPL-1937]
Change-Id: I1a1a45de04653f725aa53c30dffb6c0f7cc6e39a
2021-11-23 10:29:15 +05:30
nphaniku
4af525a313 AOCL Windows BLIS : Windows build for dynamic dispatch library
Change-Id: Ie05eafbeacbd5589b514d9353517330515104939
2021-11-12 08:58:57 +05:30
Dipal M Zambare
5d287fdba0 Include LP64/ILP64 in BLIS binary name
Binary name will be chosen based on multi-threading and BLAS
  integer size configuration as given below.

  libblis-[mt]-lp64 - when configured to use 32 bit integers
  libblis-[mt]-ilp64 - when configured to use 64 bit integers

AMD-Internal: [CPUPL-1879]
Change-Id: I865023c63235a0a72bdfce7057b2cfb8158b1d87
2021-11-12 08:58:51 +05:30