composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-29 19:28:33 +00:00

Author	SHA1	Message	Date
Andriy Roshchenko	b8440b3aeb	[rocm-libraries] ROCm/rocm-libraries#8325 (commit 559eaf6) [GFX1250][MX GEMM] Unified FLATMM GroupedGemm Implementation for MX Data Types (#8325) ## Motivation Design and test a unified FLATMM GroupedGemm interface so that it supports all MX FP8, FP6, and FP4 data types on both the gfx950 and gfx1250 architectures and works seamlessly across these platforms. ## Technical Details Implementation exposes Grouped Gemm interface for MX FLATMM and MX TDM FLATMM pipelines. ## Test Plan Add the following tests: - ck_tile/grouped_gemm_mx/test_grouped_gemm_mx_flatmm_non_tdm.cpp - ck_tile/grouped_gemm_mx/test_grouped_gemm_mx_flatmm_tdm.cpp - ck_tile/flatmm/test_mx_flatmm_persistent.cpp Verify on the gfx950 and gfx1250 architectures. ## Test Result All tests pass. Verified on A0 hardware with rocm-7.14.0a20260517 ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-06-15 16:12:33 +00:00
Andriy Roshchenko	d5c9215064	[rocm-libraries] ROCm/rocm-libraries#7359 (commit dd62f9f) [CK_TILE][GFX1250] Enable MX GEMM FLATMM with ASYNC ## Motivation Enables MX GEMM FLATMM pipeline on gfx1250. The pipeline uses an async load instruction for tensor A, which complements the existing MX GEMM FLATMM pipeline with TDM load. At this time, only FLATMM MX pipelines are enabled on gfx1250. ## Technical Details The existing gfx950 implementation was extended to support gfx1250 architecture. All three MX FP data types are supported across the two ASICs. It should be noted that while the TDM pipeline uses an emulated 32x32x128 warp-tile instruction, the present submission relies on the built-in 16x16x128 instruction, called 4 times per warp. ## Test Plan Existing `test/ck_tile/flatmm` tests were extended to cover new gfx1250 functionality. To help facilitate the testing in development, `example/ck_tile/18_flatmm/script/smoke_test_mx.sh` script was introduced to verify various combinations of supported data types and pipeline versions. ## Test Result The present submission is expected to work on both gfx950 and gfx1250 hardware for all reasonable sizes and all MX FP8/FP6/FP4 data types. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. - [x] Relies on #6978 and should only be merged after the changes are merged to the `develop`.	2026-05-29 17:02:45 +00:00
Illia Silin	c24e528481	[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76) [CK] suppress compiler warnings while building pytorch. (#7760) ## Motivation Recently added compiler flags that are required to suppress false warnings by latest staging compiler are not recognized by older compiler versions and are triggering an avalanche of warnings. Previous attempt to suppress them by using -Wno-unknown-warning-option flag didn't help, because that flag wasn't recognized either and just added more warnings. I've verified that current approach by checking the clang version actually works as intended and makes the warnings go away. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-27 06:56:58 -07:00
Illia Silin	e02c566795	[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24) [CK] upgrade CI to rocm7.13 as default compiler (#7612) ## Motivation Upgrade the default docker and compiler version in CI to rocm7.13. In order to pass all the checks I had to also clean up a lot of non-ascii characters in the source code comments and modify a couple of tests that were affected by a new compiler logic. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Aviral Goel <aviral.goel@amd.com>	2026-05-22 02:43:50 +00:00
Illia Silin	717f2efef7	[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d) [CK] add composable kernel support on gfx1250 (#6978) ## Motivation Add composable kernel support on gfx1250. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Qun Lin <qlin@amd.com> Co-authored-by: jialuo12_amdeng <jia.luo@amd.com> Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com> Co-authored-by: hsivasun_amdeng <haresh.sivasuntharampillai@amd.com>	2026-05-15 06:46:51 -07:00
Illia Silin	ac18460782	[rocm-libraries] ROCm/rocm-libraries#7384 (commit 10e9d70) [CK] Suppress new staging compiler errors (#7384) ## Motivation This should make new builds with staging compiler pass. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-14 12:51:08 -07:00
Aviral Goel	39d983fce2	[rocm-libraries] ROCm/rocm-libraries#5082 (commit 9313659) ck_tile: add gtest unit tests for MX flatmm (gfx950) (#5082) ## Summary - Add correctness unit tests for the MX-format flatmm kernel (`example/ck_tile/18_flatmm/mxgemm`) under `test/ck_tile/flatmm/` - Tests cover all five dtype combinations: FP4×FP4, FP8×FP8, FP6×FP6, FP8×FP4, FP4×FP8 - Tests cover all four kernel dispatch paths (the `has_hot_loop` × `tail_num` product): - `has_hot_loop=false, tail=ODD` (K=256, num_loop=1) - `has_hot_loop=false, tail=EVEN` (K=512, num_loop=2) - `has_hot_loop=true, tail=ODD` (K=768, num_loop=3) - `has_hot_loop=true, tail=EVEN` (K=1024, num_loop=4) - Remove unsupported `-split_k` CLI option from `tile_example_mx_flatmm`; the pre-shuffled B layout is incompatible with K-splitting and the option silently produced wrong results ## Changes New files (`test/ck_tile/flatmm/`): - `CMakeLists.txt` — builds 40 kernel instances as a shared OBJECT library, links into 5 per-dtype test executables; forwards `-DCK_TILE_USE_OCP_FP8` when `CK_USE_OCP_FP8` is ON - `test_mx_flatmm_base.hpp` — base test fixture with `run_test_with_validation(M, N, K, kbatch=1)` - `test_mx_flatmm_fixtures.hpp` — concrete `TestMXFlatmm` typed test class and type aliases - `test_mx_flatmm_fp{4fp4,8fp8,6fp6,8fp4,4fp8}.cpp` — per-dtype `TYPED_TEST_SUITE` files Modified files: - `example/ck_tile/18_flatmm/mxgemm/mx_flatmm_arch_traits.hpp` — moved `preShuffleWeight` here (was in `mx_flatmm.cpp`) so it is includeable by both the example and the tests - `example/ck_tile/18_flatmm/mxgemm/mx_flatmm.cpp` / `run_mx_flatmm.inc` — removed `-split_k` CLI arg, hardcoded `k_batch=1`, fixed `k_split` formula, updated call sites after `preShuffleWeight` move - `test/ck_tile/CMakeLists.txt` — added `add_subdirectory(flatmm)` --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2026-03-11 15:46:58 -07:00

7 Commits