amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-06-29 10:47:16 +00:00

Author	SHA1	Message	Date
Edward Smyth	2ee46a3a3a	Merge commit 'cfa3db3f' into amd-main * commit 'cfa3db3f': Fixed bug in mixed-dt gemm introduced in `e9da642`. Removed support for 3m, 4m induced methods. Updated do_sde.sh to get SDE from GitHub. Disable SDE testing of old AMD microarchitectures. Fixed substitution bug in configure. Allow use of 1m with mixing of row/col-pref ukrs. AMD-Internal: [CPUPL-2698] Change-Id: I961f0066243cf26aeb2e174e388b470133cc4a5f	2024-07-08 06:09:11 -04:00
Edward Smyth	8de8dc2961	Merge commit '81e10346' into amd-main * commit '81e10346': Alloc at least 1 elem in pool_t block_ptrs. (#560) Fix insufficient pool-growing logic in bli_pool.c. (#559) Arm SVE C/ZGEMM Fix FMOV 0 Mistake SH Kernel Unused Eigher Arm SVE C/ZGEMM Support beta==0 Arm SVE Config armsve Use ZGEMM/CGEMM Arm SVE: Update Perf. Graph Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0 Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0 A64FX Config Use ZGEMM/CGEMM Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg Arm SVE Add SGEMM 2Vx10 Unindexed Arm SVE ZGEMM Support Gather Load / Scatt. St. Arm SVE Add ZGEMM 2Vx10 Unindexed Arm SVE Add ZGEMM 2Vx7 Unindexed Arm SVE Add ZGEMM 2Vx8 Unindexed Update Travis CI badge Armv8 Trash New Bulk Kernels Enable testing 1m in `make check`. Config ArmSVE Unregister 12xk. Move 12xk to Old Revert __has_include(). Distinguish w/ BLIS_FAMILY_* Register firestorm into arm64 Metaconfig Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo Add test for Apple M1 (firestorm) Firestorm CPUID Dispatcher Armv8 GEMMSUP Edge Cases Require Signed Ints Make error checking level a thread-local variable. Fix data race in testsuite. Update .appveyor.yml Firestorm Block Size Fixes Armv8 Handle beta == 0 for GEMMSUP ??r Case. Move unused ARM SVE kernels to "old" directory. Add an option to control whether or not to use @rpath. Fix $ORIGIN usage on linux. Arm micro-architecture dispatch (#344) Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries. Armv8 Handle beta == 0 for GEMMSUP ?rc Case. Armv8 Fix 6x8 Row-Maj Ukr Apply patch from @xrq-phys. Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs. bli_error: more cleanup on the error strings array Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9 Arm SVE: Correct PACKM Ker Name: Intrinsic Kers Fix config_name in bli_arch.c Arm Whole GEMMSUP Call Route is Asm/Int Optimized Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref Header Typo Arm: DGEMMSUP ??r(rv) Invoke Edge Size Arm: DGEMMSUP ?rc(rd) Invoke Edge Size Arm: Implement GEMMSUP Fallback Method Arm64 Fix: Support Alpha/Beta in GEMMSUP Intrin Added Apple Firestorm (A14/M1) Subconfig Arm64 8x4 Kernel Use Less Regs Armv8-A Supplimentary GEMMSUP Sizes for RD Armv8-A Fix GEMMSUP-RD Kernels on GNU Asm Armv8-A Adjust Types for PACKM Kernels Armv8-A GEMMSUP-RD 6x8m Armv8-A GEMMSUP-RD 6x8n Armv8-A s/d Packing Kernels Fix Typo Armv8-A Introduced s/d Packing Kernels Armv8-A DGEMMSUP 6x8m Kernel Armv8-A DGEMMSUP Adjustments Armv8-A Add More DGEMMSUP Armv8-A Add GEMMSUP 4x8n Kernel Armv8-A Add Part of GEMMSUP 8x4m Kernel Armv8A DGEMM 4x4 Kernel WIP. Slow Armv8-A Add 8x4 Kernel WIP AMD-Internal: [CPUPL-2698] Change-Id: I194ff69356740bb36ca189fd1bf9fef02eec3803	2024-06-25 05:48:46 -04:00
Edward Smyth	ed5010d65b	Code cleanup: AMD copyright notice Standardize format of AMD copyright notice. AMD-Internal: [CPUPL-3519] Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0	2023-11-23 08:54:31 -05:00
Field G. Van Zee	e9da6425e2	Allow use of 1m with mixing of row/col-pref ukrs. Details: - Fixed a bug that broke the use of 1m for dcomplex when the single- precision real and double-precision real ukernels had opposing I/O preferences (row-preferential sgemm ukernel + column-preferential dgemm ukernel, or vice versa). The fix involved adjusting the API to bli_cntx_set_ind_blkszs() so that the induced method context init function (e.g., bli_cntx_init_<subconfig>_ind()) could call that function for only one datatype at a time. This allowed the blocksize scaling (which varies depending on whether we're doing 1m_r or 1m_c) to happen on a per-datatype basis. This fixes issue #557. Thanks to Devin Matthews and RuQing Xu for helping discover and report this bug. - The aforementioned 1m fix required moving the 1m_r/1m_c logic from bli_cntx_ref.c into a new function, bli_l3_set_schemas(), which is called from each level-3 _front() function. The pack_t schemas in the cntx_t were also removed entirely, along with the associated accessor functions. This in turn required updating the trsm1m-related virtual ukernels to read the pack schema for B from the auxinfo_t struct rather than the context. This also required slight tweaks to bli_gemm_md.c. - Repositioned the logic for transposing the operation to accommodate the microkernel IO preference. This mostly only affects gemm. Thanks to Devin Matthews for his help with this. - Updated dpackm pack ukernels in the 'armsve' kernel set to avoid querying pack_t schemas from the context. - Removed the num_t dt argument from the ind_cntx_init_ft type defined in bli_gks.c. The context initialization functions for induced methods were previously passed a dt argument, but I can no longer figure out why they were passed this value. To reduce confusion, I've removed the dt argument (including also from the function defintion + prototype). - Commented out setting of cntx_t schemas in bli_cntx_ind_stage.c. This breaks high-leve implementations of 3m and 4m, but this is okay since those implementations will be removed very soon. - Removed some older blocks of preprocessor-disabled code. - Comment update to test_libblis.c.	2021-10-13 14:15:38 -05:00
Devin Matthews	32a6d93ef6	Merge pull request #543 from xrq-phys/armsve-packm-fix ARMSVE Block SVE-Intrinsic Kernels for GCC 8-9	2021-10-09 15:53:54 -05:00
RuQing Xu	ccf16289d2	Arm SVE C/ZGEMM Fix FMOV 0 Mistake FMOV [hsd]M, #imm does not allow zero immediate. Use wzr, xzr instead.	2021-10-08 12:34:14 +09:00
RuQing Xu	82b61283b2	SH Kernel Unused Eigher	2021-10-08 12:17:29 +09:00
RuQing Xu	1749dfa493	Arm SVE C/ZGEMM Support *beta==0	2021-10-08 12:13:08 +09:00
RuQing Xu	66a018e6ad	Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0	2021-10-08 12:13:08 +09:00
RuQing Xu	9e1e781cb5	Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0	2021-10-08 12:13:08 +09:00
RuQing Xu	e4cabb977d	Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg	2021-10-08 12:13:08 +09:00
RuQing Xu	b677e0d61b	Arm SVE Add SGEMM 2Vx10 Unindexed	2021-10-08 12:13:07 +09:00
RuQing Xu	3f68e8309f	Arm SVE ZGEMM Support Gather Load / Scatt. St.	2021-10-08 12:13:07 +09:00
RuQing Xu	c19db2ff82	Arm SVE Add ZGEMM 2Vx10 Unindexed	2021-10-08 12:13:07 +09:00
RuQing Xu	e13abde30b	Arm SVE Add ZGEMM 2Vx7 Unindexed	2021-10-08 12:13:06 +09:00
RuQing Xu	49b9d7998e	Arm SVE Add ZGEMM 2Vx8 Unindexed	2021-10-08 12:12:48 +09:00
RuQing Xu	2604f40713	Config ArmSVE Unregister 12xk. Move 12xk to Old	2021-10-07 02:39:00 +09:00
RuQing Xu	1e3200326b	Revert __has_include(). Distinguish w/ BLIS_FAMILY_**	2021-10-07 02:37:14 +09:00
Devin Matthews	80c5366e4a	Move unused ARM SVE kernels to "old" directory.	2021-10-04 15:40:28 -05:00
Devin Matthews	13dbd5b5d3	Apply patch from @xrq-phys.	2021-10-02 16:08:05 -05:00
Devin Matthews	ae0eeeaf77	Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.	2021-09-29 16:43:38 -05:00
RuQing Xu	30c29b256e	Arm SVE Exclude SVE-Intrinsic Kernels for GCC 8-9 Affected configs: a64fx.	2021-09-16 05:01:03 +09:00
RuQing Xu	bffa85be59	Arm SVE: Correct PACKM Ker Name: Intrinsic Kers SVE-Intrinsic-based kernels ought not to use asm in their names.	2021-09-16 04:31:45 +09:00
RuQing Xu	61584deddf	Added 512b SVE-based a64fx subconfig + SVE kernels. Details: - Added 512-bit specific 'a64fx' subconfiguration that uses empirically tuned block size by Stepan Nassyr. This subconfig also sets the sector cache size and enables memory-tagging code in SVE gemm kernels. This subconfig utilizes (16, k) and (10, k) DPACKM kernels. - Added a vector-length agnostic 'armsve' subconfiguration that computes blocksizes according to the analytical model. This part is ported from Stepan Nassyr's repository. - Implemented vector-length-agnostic [d/s/sh] gemm kernels for Arm SVE at size (2*VL, 10). These kernels use unindexed FMLA instructions because indexed FMLA takes 2 FMA units in many implementations. PS: There are indexed-FLMA kernels in Stepan Nassyr's repository. - Implemented 512-bit SVE dpackm kernels with in-register transpose support for sizes (16, k) and (10, k). - Extended 256-bit SVE dpackm kernels by Linaro Ltd. to 512-bit for size (12, k). This dpackm kernel is not currently used by any subconfiguration. - Implemented several experimental dgemmsup kernels which would improve performance in a few cases. However, those dgemmsup kernels generally underperform hence they are not currently used in any subconfig. - Note: This commit squashes several commits submitted by RuQing Xu via PR #424.	2021-05-19 09:52:29 -05:00
Guodong Xu	f032d5d4a6	New kernel set for Arm SVE using assembly (#396 ) Here adds two kernels for Arm SVE vector extensions. 1. a gemm kernel for double at sizes 8x8. 2. a packm kernel for double at dimension 8xk. To achive best performance, variable length agonostic programming is not used. Vector length (VL) of 256 bits is mandated in both kernels. Kernels to support other VLs can be added later. "SVE is a vector extension for AArch64 execution mode for the A64 instruction set of the Armv8 architecture. Unlike other SIMD architectures, SVE does not define the size of the vector registers, but constrains into a range of possible values, from a minimum of 128 bits up to a maximum of 2048 in 128-bit wide units. Therefore, any CPU vendor can implement the extension by choosing the vector register size that better suits the workloads the CPU is targeting. Instructions are provided specifically to query an implementation for its register size, to guarantee that the applications can run on different implementations of the ISA without the need to recompile the code." [1] [1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning Signed-off-by: Guodong Xu <guodong.xu@linaro.org>	2020-04-29 12:08:46 -05:00

25 Commits