AviralGoelAMD
|
adb8f67b4f
|
feat: add new optimized tutorial kernels
- Add 01_naive_gemm baseline implementation
- Add 02_padding_k_first with PADDING_K_FIRST + MFMA_32x32x16
- Add 03_mfma_16x16x16 with PADDING_K_FIRST + MFMA_16x16x16
- Share common reference_gemm.hpp in parent gemm/ directory
|
2026-01-29 12:45:18 +00:00 |
|