Add new MFMA 16x16x16x2 example for GEMM with PADDING_K_FIRST optimization

- Introduced new subdirectory for MFMA 16x16x16x2 implementation.
- Added CMake configuration and source files for the new example.
- Implemented block GEMM and pipeline strategies to optimize performance.
- Included necessary policies and tensor distribution for efficient memory access.
- Updated the main GEMM kernel to support the new configuration.
This commit is contained in:
AviralGoelAMD
2026-02-03 23:06:07 +00:00
parent f96d74b55d
commit 2e3a716d72
17 changed files with 2075 additions and 0 deletions

View File

@@ -8,3 +8,5 @@ include_directories(AFTER
add_subdirectory(01_naive_gemm)
add_subdirectory(02_padding_k_first)
add_subdirectory(03_mfma_16x16x16)
add_subdirectory(04_mfma_16x16x16x2)
add_subdirectory(05_xor_bank_conflict_free)