Commit Graph

2195 Commits

Author SHA1 Message Date
aska-0096
dd24786f78 Enable splitk for mxfp4; clang format; 2025-06-02 12:23:01 +00:00
aska-0096
5696e3c9f5 fix the cmake issue 2025-05-30 15:51:58 +00:00
aska-0096
bb5bdff61c remove unnecessary files 2025-05-30 08:39:25 +00:00
Ding, Yi
0cd2e6e782 Fix OOB; add MB96 instances 2025-05-30 07:46:28 +00:00
Ding, Yi
6cba96e510 Use v1 pipeline for example_moe_gemm2_xdl_mx_fp4_bns 2025-05-30 05:46:31 +00:00
mtgu0705
aeb717a132 add pipeline v1 for MOE Gemm2 2025-05-30 05:25:43 +00:00
OscarXu
798345a1cf Fix moe blockscale gemm1 barrier 0x800 for new compiler 2025-05-30 04:13:42 +00:00
Ding, Yi
50956c6c7b Merge remote-tracking branch 'origin/wjx/moe_v3_aiter' into gfx950-mxfp4 2025-05-30 03:56:35 +00:00
Ding, Yi
69418725a6 Merge remote-tracking branch 'origin/moe_bs_fp8_no_asm' into gfx950-mxfp4 2025-05-30 03:15:47 +00:00
aska-0096
3c24d690a1 remove single rate mfma restriction for f8 blockscale gemm 2025-05-29 10:13:50 +00:00
aska-0096
33085c8458 clang format, remove single rate mfma restriction for f8 2025-05-29 10:10:12 +00:00
aska-0096
d563dac424 fix performance bug of bpreshuffle f8 gemm 2025-05-29 10:02:46 +00:00
valarLip
ccddc5215e recover example 2025-05-29 09:09:40 +00:00
aska-0096
c3d52993c4 update the flag name for f8blockscale 2025-05-29 08:47:34 +00:00
OscarXu
6be76c53b6 No asm ver. for merging moe blocksale fp8 into mainline 2025-05-29 03:38:56 -05:00
aska-0096
ced28892b6 Merge branch 'gfx950-mxfp4' of https://github.com/ROCm/composable_kernel into gfx950-mxfp4 2025-05-29 08:21:58 +00:00
aska-0096
0db8d71dc1 Remove debug infos; Enable flags for blockscale f8 2025-05-29 08:21:54 +00:00
Ding, Yi
e4a40c7214 Add fp8 profiler instances 2025-05-29 08:19:31 +00:00
OscarXu
52d68c9529 flag and barrier fix for copmiler branch MainOpSelV3 2025-05-29 03:13:11 -05:00
Ding, Yi
f9ccd1a378 Fix bf8 config 2025-05-29 02:20:47 +00:00
Ding, Yi
2b4b189a5f Fix fp8 config 2025-05-29 02:18:02 +00:00
OscarXu
653bc83f8a Remove rocm6.3 workaround flags and macro 2025-05-28 21:05:21 -05:00
Ding, Yi
35b436c0d9 Clang-format after 2 merges 2025-05-28 11:16:00 +00:00
Ding, Yi
aecac410d0 Merge remote-tracking branch 'origin/f8blk_scale_opt' into wip-f4-mergemoe-2 2025-05-28 11:15:22 +00:00
OscarXu
772debdf8f Fix do_weight in gemm1. Fix cshuffle_datatype. Clang-format 2025-05-28 18:29:06 +08:00
Ding, Yi
ad7fd89c1d Merge remote-tracking branch 'origin/feiw/mxfp4_moe_2Stages' into wip-f4 2025-05-28 09:28:26 +00:00
Ding, Yi
4f9bfb1566 Add more fp4 wp instances 2025-05-28 07:33:40 +00:00
Ding, Yi
857ef9f8c4 Merge preshuffle device 2025-05-28 07:02:28 +00:00
Ding, Yi
e2e0e0025e Profiler add f4 wp 2025-05-28 05:12:39 +00:00
aska-0096
78d0fd4e65 add vmcnt guard for async copy 2025-05-28 03:47:46 +00:00
Ding, Yi
b99c50a5d5 pad ascale 2025-05-28 03:35:33 +00:00
Ding, Yi
cf5b4c11a2 Pad shuffled a scale only 2025-05-28 02:37:14 +00:00
aska-0096
65255e12fb Unconditional Ascale padding 2025-05-28 01:55:23 +00:00
mtgu0705
2f0ee8ccb1 change the gemm1 tile from 64x128x128 to 128x64x128 2025-05-27 20:43:38 -05:00
mtgu0705
52b764d59f update MX moe GEMM1 hotloopscheduling 2025-05-27 20:43:22 -05:00
aska-0096
63c9388881 Pad the M for scale buffer unconditionaly 2025-05-27 11:52:12 +00:00
aska-0096
9da2995163 Merge branch 'wip-f4' of https://github.com/ROCm/composable_kernel into wip-f4 2025-05-27 10:23:21 +00:00
aska-0096
04f7265c19 refactor the pipeline 2025-05-27 10:14:45 +00:00
Ding, Yi
d3015785cb Fix 'Merge gemm_mx_common.hpp' 2025-05-27 09:08:02 +00:00
aska-0096
71e7346bf4 Merge branch 'wip-f4' of https://github.com/ROCm/composable_kernel into wip-f4 2025-05-27 07:32:16 +00:00
aska-0096
137e28d151 temp save, 4.4~4.5 2025-05-27 07:31:16 +00:00
Ding, Yi
85ac576109 Merge gemm_mx_common.hpp 2025-05-27 06:13:03 +00:00
Ding, Yi
123053b685 Merge remote-tracking branch 'origin/wip-f4-wp' into wip-f4 2025-05-27 03:36:38 +00:00
aska-0096
61748eddba Add NT flag to B/BScale buffer 2025-05-27 02:26:43 +00:00
Ding, Yi
91eb136937 Fix v1; use M padding 2025-05-26 10:32:26 +00:00
aska-0096
d1d56e89ef fix the correctness issue 2025-05-26 09:29:36 +00:00
Ding, Yi
40af523e2c Add rotating to mx examples 2025-05-26 05:05:54 +00:00
aska-0096
4a3205f94a Merge branch 'wip-f4-wp' of https://github.com/ROCm/composable_kernel into wip-f4-wp 2025-05-26 02:22:09 +00:00
Lin, Qun
d5e7580473 correct a typo in tail 2025-05-25 19:22:47 -05:00
mtgu0705
a36a747e29 rename the block pipeline 2025-05-24 00:03:43 -05:00