Add new MFMA 16x16x16x2 example for GEMM with PADDING_K_FIRST optimization

- Introduced new subdirectory for MFMA 16x16x16x2 implementation. - Added CMake configuration and source files for the new example. - Implemented block GEMM and pipeline strategies to optimize performance. - Included necessary policies and tensor distribution for efficient memory access. - Updated the main GEMM kernel to support the new configuration.
2026-05-02 04:31:25 +00:00 · 2026-02-03 23:06:07 +00:00
parent f96d74b55d
commit 2e3a716d72
17 changed files with 2075 additions and 0 deletions
--- a/tutorial/ck_tile/gemm/CMakeLists.txt
+++ b/tutorial/ck_tile/gemm/CMakeLists.txt
@@ -8,3 +8,5 @@ include_directories(AFTER
 add_subdirectory(01_naive_gemm)
 add_subdirectory(02_padding_k_first)
 add_subdirectory(03_mfma_16x16x16)
+add_subdirectory(04_mfma_16x16x16x2)
+add_subdirectory(05_xor_bank_conflict_free)