mtgu0705
|
680de28f77
|
commit with debug info
|
2025-05-19 21:47:35 -05:00 |
|
mtgu0705
|
2e6fafaf75
|
updated code, build passed.
|
2025-05-18 22:29:32 -05:00 |
|
mtgu0705
|
a4b5a374b9
|
Merge remote-tracking branch 'origin/wip-f4-pk' into mx_moe_f4_scale_shuffle
|
2025-05-17 09:49:24 -05:00 |
|
mtgu0705
|
d4aaf9d0d0
|
Merge remote-tracking branch 'origin/moe_mx_fp4_for_aiter' into mx_moe_f4_scale_shuffle
|
2025-05-17 09:30:48 -05:00 |
|
mtgu0705
|
eeeba8901f
|
update code
|
2025-05-17 09:28:26 -05:00 |
|
OscarXu
|
6fb2b54ff4
|
mx_fp4 default parameter change
|
2025-05-17 01:03:28 -05:00 |
|
mtgu0705
|
94fb9190be
|
init moe mx f4 scale shuffle
|
2025-05-16 14:46:09 -05:00 |
|
aska-0096
|
248e287866
|
generalize the pipeline scheduling.
|
2025-05-16 10:41:59 +00:00 |
|
aska-0096
|
a0379d81e7
|
modify the way we represent fp4
|
2025-05-16 09:44:04 +00:00 |
|
OscarXu
|
ec8d00d58d
|
mx_moe_fp4 ready for aiter with clang-format.
|
2025-05-16 04:09:26 -05:00 |
|
OscarXu
|
39ff3fbf05
|
v3 function pass
|
2025-05-16 03:42:48 -05:00 |
|
OscarXu
|
c5be9a501b
|
v1 function pass.
|
2025-05-16 03:16:38 -05:00 |
|
aska-0096
|
a1bec7670a
|
tempsave
|
2025-05-16 08:14:56 +00:00 |
|
OscarXu
|
efd3c24587
|
minor fix
|
2025-05-16 01:02:51 -05:00 |
|
OscarXu
|
f70f778e27
|
v1 compile pass. Function not ready
|
2025-05-15 08:01:56 -05:00 |
|
Ding, Yi
|
9009d75c7a
|
Pack e8m0 as int32_t
|
2025-05-15 09:12:17 +00:00 |
|
aska-0096
|
062e16d54a
|
Improve the pipeline
|
2025-05-15 09:08:36 +00:00 |
|
OscarXu
|
68dbe558df
|
compile error fix
|
2025-05-15 16:55:20 +08:00 |
|
OscarXu
|
c0babbca62
|
Merge remote-tracking branch 'origin/fp4_gu_moe' into fp4_gu_moe_gemm1
|
2025-05-15 16:12:19 +08:00 |
|
OscarXu
|
17922821ec
|
Add gemm1 v1
|
2025-05-15 16:11:43 +08:00 |
|
mtgu0705
|
4e2ec31e4d
|
rename moe block selector and pipeline
|
2025-05-15 02:12:50 -05:00 |
|
mtgu0705
|
dfba3c11e7
|
fix the bug, 128x128x256 tile function passed
|
2025-05-15 00:11:10 -05:00 |
|
mtgu0705
|
7cfd1db335
|
update debug
|
2025-05-14 21:41:28 -05:00 |
|
mtgu0705
|
efdd420742
|
debug save
|
2025-05-14 09:33:24 -05:00 |
|
mtgu0705
|
102151ebcf
|
temp save
|
2025-05-14 08:13:47 -05:00 |
|
mtgu0705
|
2700b217be
|
16x16x128 input size blockscale function passed
|
2025-05-14 03:20:59 -05:00 |
|
Ding, Yi
|
4ba9fe186c
|
Use random scale for init1
|
2025-05-14 05:42:39 +00:00 |
|
mtgu0705
|
1bbb50b212
|
mfma using asm, device result correct, host result need to check
|
2025-05-13 20:57:34 -05:00 |
|
Ding, Yi
|
521471c956
|
Fix fp8/bf8 B-row
|
2025-05-13 10:13:18 +00:00 |
|
mtgu0705
|
6dfe24c53e
|
updated
|
2025-05-13 04:15:53 -05:00 |
|
Ding, Yi
|
178e361101
|
Fix fp8/bf8; remove duplicated code
|
2025-05-13 07:52:13 +00:00 |
|
mtgu0705
|
5b26ad3bbf
|
update CE elementOP
|
2025-05-13 02:19:13 -05:00 |
|
mtgu0705
|
5ba86c210b
|
updated and build passed
|
2025-05-13 14:49:37 +08:00 |
|
aska-0096
|
79246e6cb8
|
function pass with inline asm hacky
|
2025-05-12 16:54:44 +00:00 |
|
mtgu0705
|
cc43f88f08
|
add code for mxfp4 gemm, blockscale not supported yet
|
2025-05-12 20:56:50 +08:00 |
|
Your Name
|
58f848cc07
|
Merge branch 'wjx/align_v3_pipeline' into fp4_gu_moe
|
2025-05-12 16:21:39 +08:00 |
|
Ding, Yi
|
4b19b934e8
|
fix fp8; fix even/odd
|
2025-05-12 07:31:28 +00:00 |
|
mtgu0705
|
726551dec4
|
(M, N, K)=(128, 128, 128) function failed.
|
2025-05-11 10:16:26 +00:00 |
|
aska-0096
|
41ea1066ac
|
implement shuffled scale mxfp4gemm, blocker: opsel not effect
|
2025-05-11 05:54:13 +00:00 |
|
mtgu0705
|
70648240f9
|
added fp4_bpreshuffle example, build failures
|
2025-05-10 21:34:32 +08:00 |
|
aska-0096
|
6c761bf9b8
|
tempsave; buggy at passed 4 e8m0 to scaled mfma
|
2025-05-10 09:57:49 +00:00 |
|
Your Name
|
5421e71155
|
Merge branch 'wip-f4' into mt1
|
2025-05-10 14:56:21 +08:00 |
|
aska-0096
|
0987b0af44
|
remove unnecessary hacky
|
2025-05-09 16:07:22 +00:00 |
|
aska-0096
|
7bde4b8d34
|
Add pipeline v3. Have some runtime issue and register spill
|
2025-05-09 09:47:22 +00:00 |
|
mtgu0705
|
c0e010711c
|
update for function debug
|
2025-05-09 08:37:04 +00:00 |
|
aska-0096
|
bb043a3202
|
remove some unnecessary hacky; enable 256x256x256 tilesize
|
2025-05-09 07:54:28 +00:00 |
|
mtgu0705
|
f2a474e2e9
|
fix update
|
2025-05-09 11:04:39 +08:00 |
|
mtgu0705
|
11f386108e
|
some fixes
|
2025-05-08 23:38:27 +08:00 |
|
aska-0096
|
b2efb06315
|
Spilt the fp4 target. Fix the known bugs. 128x128x128 sanity checked; remove prints
|
2025-05-08 15:07:33 +00:00 |
|
mtgu0705
|
7c49f9dd31
|
add mx fp8 b_preshuffle support, function not yet tested.
|
2025-05-08 22:41:54 +08:00 |
|