mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-02 04:31:25 +00:00
- Introduced new subdirectory for MFMA 16x16x16x2 implementation. - Added CMake configuration and source files for the new example. - Implemented block GEMM and pipeline strategies to optimize performance. - Included necessary policies and tensor distribution for efficient memory access. - Updated the main GEMM kernel to support the new configuration.