Aviral Goel
1a4aa7fd89
[rocm-libraries] ROCm/rocm-libraries#5082 (commit 9313659)
...
ck_tile: add gtest unit tests for MX flatmm (gfx950)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Summary
- Add correctness unit tests for the MX-format flatmm kernel
(`example/ck_tile/18_flatmm/mxgemm`) under `test/ck_tile/flatmm/`
- Tests cover all five dtype combinations: FP4×FP4, FP8×FP8, FP6×FP6,
FP8×FP4, FP4×FP8
- Tests cover all four kernel dispatch paths (the `has_hot_loop` ×
`tail_num` product):
- `has_hot_loop=false, tail=ODD` (K=256, num_loop=1)
- `has_hot_loop=false, tail=EVEN` (K=512, num_loop=2)
- `has_hot_loop=true, tail=ODD` (K=768, num_loop=3)
- `has_hot_loop=true, tail=EVEN` (K=1024, num_loop=4)
- Remove unsupported `-split_k` CLI option from
`tile_example_mx_flatmm`; the pre-shuffled B layout is incompatible with
K-splitting and the option silently produced wrong results
## Changes
**New files (`test/ck_tile/flatmm/`):**
- `CMakeLists.txt` — builds 40 kernel instances as a shared OBJECT
library, links into 5 per-dtype test executables; forwards
`-DCK_TILE_USE_OCP_FP8` when `CK_USE_OCP_FP8` is ON
- `test_mx_flatmm_base.hpp` — base test fixture with
`run_test_with_validation(M, N, K, kbatch=1)`
- `test_mx_flatmm_fixtures.hpp` — concrete `TestMXFlatmm` typed test
class and type aliases
- `test_mx_flatmm_fp{4fp4,8fp8,6fp6,8fp4,4fp8}.cpp` — per-dtype
`TYPED_TEST_SUITE` files
**Modified files:**
- `example/ck_tile/18_flatmm/mxgemm/mx_flatmm_arch_traits.hpp` — moved
`preShuffleWeight` here (was in `mx_flatmm.cpp`) so it is includeable by
both the example and the tests
- `example/ck_tile/18_flatmm/mxgemm/mx_flatmm.cpp` / `run_mx_flatmm.inc`
— removed `-split_k` CLI arg, hardcoded `k_batch=1`, fixed `k_split`
formula, updated call sites after `preShuffleWeight` move
- `test/ck_tile/CMakeLists.txt` — added `add_subdirectory(flatmm)`
2026-03-11 22:47:59 +00:00
Yi DING
e135dd518d
[CK_TILE] Add mxfp4 flatmm ( #3080 )
...
* Squashed commit of the following:
commit 3e1a851dad834776efbe4fe365ac82c4ed312010
Author: Ding, Yi <yi.ding@amd.com >
Date: Thu Oct 23 06:10:54 2025 +0000
Fix & clean after rebase
commit 1edf485092f44411da9a1796a4a6b72d5cdb67c6
Author: Ding, Yi <yi.ding@amd.com >
Date: Wed Oct 22 10:46:13 2025 +0000
Squashed commit of the following:
commit 0b6b9dbd1b
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 02:04:27 2025 -0500
fix bandwidth calculation
commit 9aebf53bb7
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 22 00:58:59 2025 -0500
updates
commit 62607de56c
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Sep 19 00:39:46 2025 -0500
fix a bug, set the A DS_read preload size to 4 for MXFP4
commit 92ad6fcc0a
Author: mtgu0705 <mtgu@amd.com >
Date: Thu Sep 18 01:19:03 2025 -0500
fix a_wrap preload issue for large MPerBlock.
commit f2db44710f
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 21:34:03 2025 -0500
optimized the VGPR repack issue for MXFP4
commit 346a400027
Author: Gino Lu <gino.lu@amd.com >
Date: Wed Sep 17 04:19:44 2025 -0500
fix time error
commit 80c1743034
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 17 03:58:00 2025 -0500
updated, function passed.
commit ce26d9071e
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 22:21:39 2025 -0500
fix, function partially passed
commit 0a89ed13a5
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 16 03:01:12 2025 -0500
fix, reference function passed, next check kernel function
commit ec9bcef591
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Sep 16 02:29:01 2025 -0500
let pack/unpack return pk_fp4_t
commit a333206929
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 20:50:26 2025 -0500
fix
commit 3893c06540
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 15 05:51:06 2025 -0500
fix bug
commit 8052bea019
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 04:02:05 2025 -0500
fix core dump issue, function is not correct.
commit 9ceb3fd508
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 03:03:02 2025 -0500
updates, build pass
commit cc94eb6045
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 15 00:05:18 2025 -0500
updates
commit 22586c3135
Author: Gino Lu <gino.lu@amd.com >
Date: Sun Sep 14 23:40:28 2025 -0500
fix bug
commit e92e67b8dd
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 03:28:50 2025 -0500
fix interface
commit 8b1dd60c08
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 12 02:53:50 2025 -0500
add interface in warp_gemm_impl
commit c6135f6abe
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 10 05:03:08 2025 -0500
updates some fixes.
commit b0d71b8d19
Author: mtgu0705 <mtgu@amd.com >
Date: Tue Sep 9 04:37:42 2025 -0500
fix after merge ginolu/add_wgmfma_dispatcher
commit f119c30317
Merge: c5030e602 72c8ef856
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 22:09:15 2025 -0500
Merge remote-tracking branch 'origin/ginolu/add_wgmfma_dispatcher' into mtgu/cktile_mxfp4_flatmm_dev
commit c5030e602e
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 21:42:47 2025 -0500
update mx flatmm tail pipeline
commit 72c8ef8567
Merge: 9661bb400 e4a772890
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:10:23 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 9661bb400b
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 8 19:09:55 2025 -0500
fix type error
commit 0509597f55
Author: mtgu0705 <mtgu@amd.com >
Date: Mon Sep 8 04:01:40 2025 -0500
update hotloop pipeline
commit 754ae0461b
Merge: 15d44406e 83f607e2a
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:22:26 2025 -0500
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 15d44406e5
Author: Gino Lu <gino.lu@amd.com >
Date: Fri Sep 5 04:21:26 2025 -0500
fix clang format
commit 146963d62a
Author: mtgu0705 <mtgu@amd.com >
Date: Wed Sep 3 10:00:54 2025 -0500
some updates
commit 12526b626a
Merge: 47cee0471 00fd72b2d
Author: asleepzzz <hanwen.chang@amd.com >
Date: Wed Sep 3 13:22:03 2025 +0800
Merge branch 'develop' into ginolu/add_wgmfma_dispatcher
commit 47cee04712
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 02:11:02 2025 -0500
fix vec size error
commit d2892925e5
Author: Gino Lu <gino.lu@amd.com >
Date: Mon Sep 1 01:23:39 2025 -0500
fix format error
commit 16993acd1d
Author: mtgu0705 <mtgu@amd.com >
Date: Sat Aug 30 03:19:07 2025 -0500
update codes
commit 9c37e55d13
Author: mtgu0705 <mtgu@amd.com >
Date: Fri Aug 29 11:27:33 2025 -0500
init ck_tile mxfp4 flatmm
commit 5c484a5672
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 28 08:02:50 2025 +0000
Add bias for f16xf4 moe_flatmm
commit dd6539f366
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 27 13:39:47 2025 +0000
update case construction
commit 65b702454c
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 26 12:32:29 2025 +0000
support swiglu activaion and use rcpf to accelerate silu
commit b422e41e08
Author: Gino Lu <gino.lu@amd.com >
Date: Tue Aug 26 02:33:55 2025 -0500
first commit
commit d05eed931d
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 04:01:59 2025 -0500
add line to last
commit d69cab7f0c
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Fri Aug 22 03:20:46 2025 -0500
adjust A_LDS descriptor to avoid bankconflict
commit 65989e940c
Author: root <root@smci355-ccs-aus-m02-25.cs-aus.dcgpu >
Date: Thu Aug 21 09:46:52 2025 -0500
enable hotloop
commit c378e9bdf8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 09:12:21 2025 +0000
support atomic_pk_add_bf16 on gfx950
commit 85976b0b87
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 21 06:58:55 2025 +0000
use int64_t as expert stride to avoid overflow
commit 9fbcc8f8a4
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 13:53:32 2025 +0000
use v4i32 as the storage type for B to avoid repack operation
commit 81899bd920
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 06:40:03 2025 +0000
add pk_fp4_t and e8m0_t support for amd_buffer_load_impl
commit c27eb0771a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 20 04:39:14 2025 +0000
optimize cvt_pkf4_to_f16 implementation
commit 3ca0bd500a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Aug 19 14:56:46 2025 +0000
optimize A_LDS descriptor to avoid bankconflict
commit f7f0306eea
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 18:43:37 2025 +0000
fix gate-up when GU_NRepeat > 1
commit be55c0f9cb
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 18 17:28:11 2025 +0000
add fp16xf4 moe
commit 599e1f5b32
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Aug 17 17:51:18 2025 +0000
rename example
commit 7899fb4a8d
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 15 06:20:46 2025 +0000
remove additional check when e8m0->float
commit 714b341797
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 14 09:34:12 2025 +0000
eliminate repeat dequant
commit 53e8c0c533
Merge: 5de620895 cc9c7b9e5
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:51:49 2025 +0000
Merge remote-tracking branch 'origin/moe_flatmm' into feat-mixed_input_flatmm
commit 5de6208952
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 16:16:48 2025 +0000
update f16xMXF4
commit 732ebdee8b
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 13 10:48:53 2025 +0000
update scale-preshuffle for MXF4
commit edb58d0680
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 11:24:34 2025 +0000
update
commit cc9c7b9e58
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 08:38:23 2025 +0000
optimize gemm2 atomic_add pattern
commit 200a11afc8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:59:47 2025 +0000
update scale for mxfp4
commit 87aed564dc
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 07:56:14 2025 +0000
update case construction
commit 8b85fa6cf2
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 06:03:06 2025 +0000
update granularity control
commit 1b8c7097b8
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 11 03:42:46 2025 +0000
fix TileConfig
commit 8ba1c708dc
Author: Gino Lu <gino.lu@amd.com >
Date: Thu Aug 7 21:37:28 2025 +0800
Add e8m0 scaled convert into CK_TILE (#2617 )
* first commit
* remove redundent code
* modify according to comments.
* fix type_convert error with scaled_type_convert
commit f788d3d629
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 8 20:19:16 2025 +0000
add mixed_prec fp16xfp4
commit 3dea10a277
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Thu Aug 7 09:22:04 2025 +0000
debug mixed_prec flatmm
commit 0ba513b148
Merge: 90e910f3a c0cb4d036
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Aug 6 16:49:47 2025 +0800
Merge pull request #2626 from ROCm/felix/flatmm_fix_splitk
fix split k
commit 6d3cbc7c0e
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Aug 6 08:33:33 2025 +0000
add moe_flatmm
commit c0cb4d036d
Author: coderfeli <coderfeli@163.com >
Date: Wed Aug 6 02:45:31 2025 +0000
fix split k
commit 90e910f3a7
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Aug 4 07:16:36 2025 +0000
fix flatmm with scaling when WarpTileM == 32
commit aa5e008fa5
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 11:01:23 2025 +0000
optimize scaling epilogue
commit ac5908c0bb
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Aug 1 07:28:38 2025 +0000
fix wrong config for fp8 scaling
commit 3f43b841d4
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 06:20:30 2025 +0000
prune debug message
commit 2e5d4c74cd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 30 04:52:08 2025 +0000
fix compile error
commit c117a1986a
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Tue Jul 29 15:42:58 2025 +0000
Add persistent option on flatmm for tuning
commit a587701117
Author: AMD-dteng <dteng@amd.com >
Date: Tue Jul 29 22:48:00 2025 +0800
update pipeline v1: add atomic IGLP schedule
commit f9e48148d2
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 09:09:27 2025 +0000
fix error log throwing
commit 1b6d7cf407
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Mon Jul 28 08:24:51 2025 +0000
crz idea
commit 5473f06461
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Sun Jul 27 11:57:38 2025 +0000
Add permuteN optimzization when NRepeat % 2 == 0 on flatmm
commit bfb9f4002f
Author: sjfeng <j514681085@icloud.com >
Date: Sun Jul 27 17:24:08 2025 +0800
try to remove c_shuffle_lds
commit 1264f4d2ab
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Fri Jul 25 07:41:48 2025 +0000
fix loop-dim mismatch and improve c_shuffle alu parallelism
commit 1239d8a546
Merge: 406645448 b908f5e80
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 08:46:51 2025 +0000
merge flatmm -scale
commit 4066454483
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 24 16:19:58 2025 +0800
revert delete of inc file
commit 68390988c9
Author: solin <bingzhou@amd.com >
Date: Thu Jul 24 04:38:16 2025 +0000
reorg flatmm code
commit b908f5e803
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:12:31 2025 +0000
fix flatmm syntax error on gfx950
commit 5a1183ebbd
Author: Feng Shijie <Shijie.Feng@amd.com >
Date: Wed Jul 23 19:04:22 2025 +0000
support flatmm scaling
commit 89fa639207
Author: valarLip <340077269@qq.com >
Date: Wed Jul 23 08:44:12 2025 +0000
merge flatmm pipe v0 from dteng_flatmm_opt
commit 3f7d848dd3
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:38:12 2025 +0800
build pass
commit 6dacf833da
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 07:20:26 2025 +0000
fix bug
commit 7e1bd4b839
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 23 15:01:53 2025 +0800
sync
commit 46a538e39e
Author: valarLip <340077269@qq.com >
Date: Tue Jul 22 08:09:35 2025 +0000
adaptive scheduler instead of Macro definition
commit 9aa3396a79
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Thu Jul 17 08:40:35 2025 +0000
fix tail handler bug
commit fb76450e63
Author: lalala-sh <Jiaxing.Wen@amd.com >
Date: Wed Jul 16 10:12:19 2025 +0000
merge from dteng_flatmm_opt
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
* Fix crash on small M
* Apply suggestion from @Copilot
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: AMD-dteng <dteng@amd.com >
Co-authored-by: solin <bingzhou@amd.com >
Co-authored-by: sjfeng <j514681085@icloud.com >
Co-authored-by: valarLip <340077269@qq.com >
Co-authored-by: asleepzzz <hanwen.chang@amd.com >
Co-authored-by: Feng Shijie <Shijie.Feng@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: Gino Lu <gino.lu@amd.com >
Co-authored-by: mtgu0705 <mtgu@amd.com >
2025-10-31 11:29:05 +08:00