mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-03 13:48:30 +00:00

Files

Enrico Degregori d559ec00a8 [rocm-libraries] ROCm/rocm-libraries#8554 (commit be9af54)

refactor(ck): mx gemm kernel unification

## Motivation

CK tile currently has two separate MX GEMM kernels for gfx950 and
gfx1250. This pull request refactors and modernizes the MX GEMM kernel
and example to use new scale tensor handling, improved kernel argument
structures, and updated pipeline and kernel APIs. The changes simplify
the interface and improve type safety.

JIRA ID ROCM-26313

## Technical Details

- Add support for gfx950 in MX GEMM kernel for gfx1250 and remove unused
kernel
 - Unify comp async pipeline for GEMM and MX GEMM
 - Unify eight waves pipeline for GEMM and MX GEMM
 - Move preshuffle MX GEMM pipeline to gemm ops and remove gemm_mx ops
 - Unify testing framework for MX GEMM
 - Add gfx950 tests for grouped MX GEMM

## Test Plan

 - `test_mx_gemm_async.cpp` for MX GEMM on gfx950
 - `test_mx_grouped_gemm_comp_async.cpp` for grouped MX GEMM on gfx950

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

2026-07-01 08:21:02 +00:00

algorithm

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

arch

[rocm-libraries] ROCm/rocm-libraries#7850 (commit e8f2756)

2026-06-29 18:51:17 +00:00

container

[rocm-libraries] ROCm/rocm-libraries#6768 (commit 43ca43f)

2026-06-05 12:27:41 +00:00

numeric

[rocm-libraries] ROCm/rocm-libraries#6768 (commit 43ca43f)

2026-06-05 12:27:41 +00:00

tensor

[rocm-libraries] ROCm/rocm-libraries#8554 (commit be9af54)

2026-07-01 08:21:02 +00:00

utility

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

config.hpp

[rocm-libraries] ROCm/rocm-libraries#6768 (commit 43ca43f)

2026-06-05 12:27:41 +00:00

README.md

introducing ck_tile! (#1216 )

2024-04-15 19:27:12 -05:00

README.md

ck_tile/core

ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)

algorithm/
    coordinate transform and some other reusable algorithm
arch/
    contains some basic device building block like mma, buffer addressing, etc...
container/
    contains basic container data structure, array/sequence/tuple/...
numeric/
    data type, and data type related math
tensor/
    tensor descriptors and tile level API
utility/
    other utility function for both host/device