amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-25 19:04:32 +00:00

Author	SHA1	Message	Date
Shubham Sharma.	f378fc57b5	DGEMM Native AVX512 updates - In the initial patch - for m, n non-multiple of MR and NR respectively we are calling bli_dgemm_ker_var2. Now we have implemented macro-kernel for these fringe cases as well. - Replaced RBP register with R11 in the macro-kernel. - Retuned MC, KC and NC with these new changes. This will result in better performance for matrix sizes like m=4000 or greater when running on single thread. AMD-Internal: [CPUPL-5262] Change-Id: I66c111ceb7feee776703339680d57e8d6d5c809a	2024-07-31 12:23:34 -04:00
Shubham Sharma.	a7744361e4	DGEMM optimizations for Turin Classic - Introduced new 8x24 macro kernels. - 4 new kernels are added for beta 0, beta 1, beta -1 and beta N. - IR and JR loop moved to ASM region. - Kernels support row major storage scheme. - Prefetch of current micro panel of C is enabled. - Kernel supports negative offsets for A and B matrices. - Moved alpha scaling from DGEMM kernel to B pack kernel. - Tuned blocksizes for new kernel. - Added support for alpha scaling in 24xk pack kernel. - Reverted back to old b_next computation in gemm_ker_var2. - BugFix in 8x24 DGEMM kernel for beta 1, comparsion for jmp conditions was done using integer instructions, which caused beta 1 path to never be taken. Fixed this by changing the comparsion to double. AMD-Internal: [CPUPL-5262] Change-Id: Ieec207eea2a164603c8a8ea88e0b1d3095c29a3f	2024-07-09 07:53:27 -04:00
Shubham Sharma	1d6dd726cd	Fixed Prefetch in Turin DGEMM kernel - Fixed the prefetch of next micro panel of B matrix in 8x24 DGEMM kernel. Change-Id: Id84bb2841abb86bda780062d67266377fda12038	2024-06-20 10:31:08 +05:30
Shubham Sharma.	580282e655	DGEMM optimizations for Turin Classic - Introduced new 8x24 row preferred kernel for zen5. - Kernel supports row/col/gen storage schemes. - Prefetch of current panel of A and C are enabled. - Prefetch of next panel of B is enabled. - Kernel supports negative offsets for A and B matrices. - Cache block tuning is done for zen5 core. AMD-Internal: [CPUPL-5262] Change-Id: I058ea7e1b751c20c516d7b27a1f27cef96ef730f	2024-06-17 05:18:49 -04:00
Edward Smyth	2450a1813b	BLIS: Implement zen5 sub-configuration Implement full support for zen5 as a separate BLIS sub-configuration and code path within amdzen configuration family. AMD-Internal: [CPUPL-3518] Change-Id: Iaa5096e0b83bf0f0c3fd1c41e601ccd29bda3c09	2024-04-12 07:26:31 -04:00

5 Commits