composable_kernel

ROCm/composable_kernel

Fork 0

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-02 04:31:25 +00:00

Commit Graph

Author	SHA1	Message	Date
AviralGoelAMD	2e3a716d72	Add new MFMA 16x16x16x2 example for GEMM with PADDING_K_FIRST optimization - Introduced new subdirectory for MFMA 16x16x16x2 implementation. - Added CMake configuration and source files for the new example. - Implemented block GEMM and pipeline strategies to optimize performance. - Included necessary policies and tensor distribution for efficient memory access. - Updated the main GEMM kernel to support the new configuration.	2026-02-03 23:06:07 +00:00
AviralGoelAMD	adb8f67b4f	feat: add new optimized tutorial kernels - Add 01_naive_gemm baseline implementation - Add 02_padding_k_first with PADDING_K_FIRST + MFMA_32x32x16 - Add 03_mfma_16x16x16 with PADDING_K_FIRST + MFMA_16x16x16 - Share common reference_gemm.hpp in parent gemm/ directory	2026-01-29 12:45:18 +00:00

Author

SHA1

Message

Date

AviralGoelAMD

2e3a716d72

Add new MFMA 16x16x16x2 example for GEMM with PADDING_K_FIRST optimization

- Introduced new subdirectory for MFMA 16x16x16x2 implementation.
- Added CMake configuration and source files for the new example.
- Implemented block GEMM and pipeline strategies to optimize performance.
- Included necessary policies and tensor distribution for efficient memory access.
- Updated the main GEMM kernel to support the new configuration.

2026-02-03 23:06:07 +00:00

AviralGoelAMD

adb8f67b4f

feat: add new optimized tutorial kernels

- Add 01_naive_gemm baseline implementation
- Add 02_padding_k_first with PADDING_K_FIRST + MFMA_32x32x16
- Add 03_mfma_16x16x16 with PADDING_K_FIRST + MFMA_16x16x16
- Share common reference_gemm.hpp in parent gemm/ directory

2026-01-29 12:45:18 +00:00

2 Commits