OscarXu
f03a96ffcf
align file and function name
2025-05-16 03:54:28 -05:00
OscarXu
39ff3fbf05
v3 function pass
2025-05-16 03:42:48 -05:00
OscarXu
c5be9a501b
v1 function pass.
2025-05-16 03:16:38 -05:00
OscarXu
f70f778e27
v1 compile pass. Function not ready
2025-05-15 08:01:56 -05:00
OscarXu
68dbe558df
compile error fix
2025-05-15 16:55:20 +08:00
OscarXu
7c76e19e55
Merge remote-tracking branch 'origin/fp4_gu_moe' into fp4_gu_moe_gemm1
2025-05-15 16:37:08 +08:00
mtgu0705
b65bc1ba4a
added mx moe block v3 support, function passed
2025-05-15 03:26:37 -05:00
OscarXu
bcb4f7e98c
Add gemm1 v1 to selector
2025-05-15 16:22:45 +08:00
OscarXu
c0babbca62
Merge remote-tracking branch 'origin/fp4_gu_moe' into fp4_gu_moe_gemm1
2025-05-15 16:12:19 +08:00
OscarXu
17922821ec
Add gemm1 v1
2025-05-15 16:11:43 +08:00
mtgu0705
4e2ec31e4d
rename moe block selector and pipeline
2025-05-15 02:12:50 -05:00
Your Name
e060999d29
v3
2025-05-15 13:19:07 +08:00
mtgu0705
dfba3c11e7
fix the bug, 128x128x256 tile function passed
2025-05-15 00:11:10 -05:00
mtgu0705
7cfd1db335
update debug
2025-05-14 21:41:28 -05:00
mtgu0705
102151ebcf
temp save
2025-05-14 08:13:47 -05:00
mtgu0705
2700b217be
16x16x128 input size blockscale function passed
2025-05-14 03:20:59 -05:00
mtgu0705
1bbb50b212
mfma using asm, device result correct, host result need to check
2025-05-13 20:57:34 -05:00
mtgu0705
6dfe24c53e
updated
2025-05-13 04:15:53 -05:00
mtgu0705
c11fef9197
added code for debug
2025-05-13 15:25:03 +08:00
mtgu0705
5ba86c210b
updated and build passed
2025-05-13 14:49:37 +08:00
mtgu0705
cc43f88f08
add code for mxfp4 gemm, blockscale not supported yet
2025-05-12 20:56:50 +08:00
Your Name
ce51d2d099
Revert "use mem_op::set when topk=1"
...
This reverts commit def952a178 .
2025-05-12 16:24:30 +08:00
Your Name
dff709c79f
Revert "update"
...
This reverts commit 960b2bce1c .
2025-05-12 16:24:12 +08:00
Your Name
58f848cc07
Merge branch 'wjx/align_v3_pipeline' into fp4_gu_moe
2025-05-12 16:21:39 +08:00
aska-0096
22f8f5e5d8
fix bug for a lds read
2025-05-12 04:00:21 +00:00
mtgu0705
0bddd63d9c
fix bugs, build passed
2025-05-11 17:49:43 +08:00
mtgu0705
d3f007c775
hotfix
2025-05-11 16:58:36 +08:00
mtgu0705
7d272f7f63
fixed some bugs
2025-05-10 22:01:20 +08:00
mtgu0705
70648240f9
added fp4_bpreshuffle example, build failures
2025-05-10 21:34:32 +08:00
Your Name
5421e71155
Merge branch 'wip-f4' into mt1
2025-05-10 14:56:21 +08:00
mtgu0705
bb49d2e623
fix the bug, functional test passed
2025-05-10 14:33:44 +08:00
mtgu0705
493a8639b3
fix a bug
2025-05-10 13:07:43 +08:00
aska-0096
087b20dc1d
clang format
2025-05-09 16:15:10 +00:00
aska-0096
0987b0af44
remove unnecessary hacky
2025-05-09 16:07:22 +00:00
aska-0096
86bff9c46e
Fix pipe v3 correctness issue
2025-05-09 15:54:43 +00:00
aska-0096
7bde4b8d34
Add pipeline v3. Have some runtime issue and register spill
2025-05-09 09:47:22 +00:00
mtgu0705
c0e010711c
update for function debug
2025-05-09 08:37:04 +00:00
aska-0096
bb043a3202
remove some unnecessary hacky; enable 256x256x256 tilesize
2025-05-09 07:54:28 +00:00
mtgu0705
f2a474e2e9
fix update
2025-05-09 11:04:39 +08:00
mtgu0705
11f386108e
some fixes
2025-05-08 23:38:27 +08:00
aska-0096
b2efb06315
Spilt the fp4 target. Fix the known bugs. 128x128x128 sanity checked; remove prints
2025-05-08 15:07:33 +00:00
mtgu0705
7c49f9dd31
add mx fp8 b_preshuffle support, function not yet tested.
2025-05-08 22:41:54 +08:00
lalala-sh
6c459e8c38
Merge branch 'develop' into wjx/align_v3_pipeline
2025-05-08 18:08:31 +08:00
lalala-sh
def952a178
use mem_op::set when topk=1
2025-05-08 09:49:16 +00:00
lalala-sh
960b2bce1c
update
2025-05-08 09:48:23 +00:00
aska-0096
32e6c8aad3
16x16x128 correct; 64x64x128 failed
2025-05-08 09:42:53 +00:00
Ding, Yi
6519ba3aa9
wip3
2025-05-08 07:44:22 +00:00
Thomas Ning
c757046d49
Revert "Disable the SMFMA instruction for gfx90a. ( #2174 )" ( #2175 )
...
This reverts commit a32d907771 .
2025-05-08 00:07:03 -07:00
Khushbu Agarwal
a32d907771
Disable the SMFMA instruction for gfx90a. ( #2174 )
...
* remove smfma for gfx90a
* clang formatted
2025-05-07 23:09:22 -07:00
BingYuan.Zhou
6a3960c1e1
Flatmm merge ( #2168 )
...
* sync with function interface of cshuffleepiloge,fix flatmm build fail
* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-05-08 12:59:57 +08:00