Commit Graph

607 Commits

Author SHA1 Message Date
mtgu0705
680de28f77 commit with debug info 2025-05-19 21:47:35 -05:00
mtgu0705
2e6fafaf75 updated code, build passed. 2025-05-18 22:29:32 -05:00
mtgu0705
a4b5a374b9 Merge remote-tracking branch 'origin/wip-f4-pk' into mx_moe_f4_scale_shuffle 2025-05-17 09:49:24 -05:00
mtgu0705
d4aaf9d0d0 Merge remote-tracking branch 'origin/moe_mx_fp4_for_aiter' into mx_moe_f4_scale_shuffle 2025-05-17 09:30:48 -05:00
mtgu0705
eeeba8901f update code 2025-05-17 09:28:26 -05:00
OscarXu
6fb2b54ff4 mx_fp4 default parameter change 2025-05-17 01:03:28 -05:00
mtgu0705
94fb9190be init moe mx f4 scale shuffle 2025-05-16 14:46:09 -05:00
aska-0096
248e287866 generalize the pipeline scheduling. 2025-05-16 10:41:59 +00:00
aska-0096
a0379d81e7 modify the way we represent fp4 2025-05-16 09:44:04 +00:00
OscarXu
ec8d00d58d mx_moe_fp4 ready for aiter with clang-format. 2025-05-16 04:09:26 -05:00
OscarXu
39ff3fbf05 v3 function pass 2025-05-16 03:42:48 -05:00
OscarXu
c5be9a501b v1 function pass. 2025-05-16 03:16:38 -05:00
aska-0096
a1bec7670a tempsave 2025-05-16 08:14:56 +00:00
OscarXu
efd3c24587 minor fix 2025-05-16 01:02:51 -05:00
OscarXu
f70f778e27 v1 compile pass. Function not ready 2025-05-15 08:01:56 -05:00
Ding, Yi
9009d75c7a Pack e8m0 as int32_t 2025-05-15 09:12:17 +00:00
aska-0096
062e16d54a Improve the pipeline 2025-05-15 09:08:36 +00:00
OscarXu
68dbe558df compile error fix 2025-05-15 16:55:20 +08:00
OscarXu
c0babbca62 Merge remote-tracking branch 'origin/fp4_gu_moe' into fp4_gu_moe_gemm1 2025-05-15 16:12:19 +08:00
OscarXu
17922821ec Add gemm1 v1 2025-05-15 16:11:43 +08:00
mtgu0705
4e2ec31e4d rename moe block selector and pipeline 2025-05-15 02:12:50 -05:00
mtgu0705
dfba3c11e7 fix the bug, 128x128x256 tile function passed 2025-05-15 00:11:10 -05:00
mtgu0705
7cfd1db335 update debug 2025-05-14 21:41:28 -05:00
mtgu0705
efdd420742 debug save 2025-05-14 09:33:24 -05:00
mtgu0705
102151ebcf temp save 2025-05-14 08:13:47 -05:00
mtgu0705
2700b217be 16x16x128 input size blockscale function passed 2025-05-14 03:20:59 -05:00
Ding, Yi
4ba9fe186c Use random scale for init1 2025-05-14 05:42:39 +00:00
mtgu0705
1bbb50b212 mfma using asm, device result correct, host result need to check 2025-05-13 20:57:34 -05:00
Ding, Yi
521471c956 Fix fp8/bf8 B-row 2025-05-13 10:13:18 +00:00
mtgu0705
6dfe24c53e updated 2025-05-13 04:15:53 -05:00
Ding, Yi
178e361101 Fix fp8/bf8; remove duplicated code 2025-05-13 07:52:13 +00:00
mtgu0705
5b26ad3bbf update CE elementOP 2025-05-13 02:19:13 -05:00
mtgu0705
5ba86c210b updated and build passed 2025-05-13 14:49:37 +08:00
aska-0096
79246e6cb8 function pass with inline asm hacky 2025-05-12 16:54:44 +00:00
mtgu0705
cc43f88f08 add code for mxfp4 gemm, blockscale not supported yet 2025-05-12 20:56:50 +08:00
Your Name
58f848cc07 Merge branch 'wjx/align_v3_pipeline' into fp4_gu_moe 2025-05-12 16:21:39 +08:00
Ding, Yi
4b19b934e8 fix fp8; fix even/odd 2025-05-12 07:31:28 +00:00
mtgu0705
726551dec4 (M, N, K)=(128, 128, 128) function failed. 2025-05-11 10:16:26 +00:00
aska-0096
41ea1066ac implement shuffled scale mxfp4gemm, blocker: opsel not effect 2025-05-11 05:54:13 +00:00
mtgu0705
70648240f9 added fp4_bpreshuffle example, build failures 2025-05-10 21:34:32 +08:00
aska-0096
6c761bf9b8 tempsave; buggy at passed 4 e8m0 to scaled mfma 2025-05-10 09:57:49 +00:00
Your Name
5421e71155 Merge branch 'wip-f4' into mt1 2025-05-10 14:56:21 +08:00
aska-0096
0987b0af44 remove unnecessary hacky 2025-05-09 16:07:22 +00:00
aska-0096
7bde4b8d34 Add pipeline v3. Have some runtime issue and register spill 2025-05-09 09:47:22 +00:00
mtgu0705
c0e010711c update for function debug 2025-05-09 08:37:04 +00:00
aska-0096
bb043a3202 remove some unnecessary hacky; enable 256x256x256 tilesize 2025-05-09 07:54:28 +00:00
mtgu0705
f2a474e2e9 fix update 2025-05-09 11:04:39 +08:00
mtgu0705
11f386108e some fixes 2025-05-08 23:38:27 +08:00
aska-0096
b2efb06315 Spilt the fp4 target. Fix the known bugs. 128x128x128 sanity checked; remove prints 2025-05-08 15:07:33 +00:00
mtgu0705
7c49f9dd31 add mx fp8 b_preshuffle support, function not yet tested. 2025-05-08 22:41:54 +08:00