Commit Graph

370 Commits

Author SHA1 Message Date
mtgu0705
40ed20a30d updated the codes 2025-06-04 02:04:28 -05:00
mtgu0705
5117e99822 Merge remote-tracking branch 'origin/mx_moe_f4_scaleshuffle_Bnoshuffle' into mxfp4_moe_blockscale_buf2lds 2025-06-04 00:14:18 -05:00
aska-0096
86c8bef5d7 Refactor thread_copy_lds_direct_load; fix gfx942 direct lds load example; fix f16_pki4 example 2025-06-03 14:54:30 +00:00
Ding, Yi
0cbc5e2bdb Use packed_size_v for A/BPackedSize 2025-06-03 06:10:07 +00:00
Ding, Yi
331ccb8ca2 Merge remote-tracking branch 'origin/develop' into gfx950-mxfp4 2025-06-03 05:38:56 +00:00
mtgu0705
edffec8fc4 change the code to suit threadwise loading are continued 2025-06-03 00:36:07 -05:00
aska-0096
dd24786f78 Enable splitk for mxfp4; clang format; 2025-06-02 12:23:01 +00:00
aska-0096
bb5bdff61c remove unnecessary files 2025-05-30 08:39:25 +00:00
mtgu0705
aeb717a132 add pipeline v1 for MOE Gemm2 2025-05-30 05:25:43 +00:00
Ding, Yi
69418725a6 Merge remote-tracking branch 'origin/moe_bs_fp8_no_asm' into gfx950-mxfp4 2025-05-30 03:15:47 +00:00
OscarXu
6be76c53b6 No asm ver. for merging moe blocksale fp8 into mainline 2025-05-29 03:38:56 -05:00
Ding, Yi
aecac410d0 Merge remote-tracking branch 'origin/f8blk_scale_opt' into wip-f4-mergemoe-2 2025-05-28 11:15:22 +00:00
Ding, Yi
ad7fd89c1d Merge remote-tracking branch 'origin/feiw/mxfp4_moe_2Stages' into wip-f4 2025-05-28 09:28:26 +00:00
Ding, Yi
857ef9f8c4 Merge preshuffle device 2025-05-28 07:02:28 +00:00
Ding, Yi
e2e0e0025e Profiler add f4 wp 2025-05-28 05:12:39 +00:00
Ding, Yi
85ac576109 Merge gemm_mx_common.hpp 2025-05-27 06:13:03 +00:00
Ding, Yi
123053b685 Merge remote-tracking branch 'origin/wip-f4-wp' into wip-f4 2025-05-27 03:36:38 +00:00
Bartłomiej Kocot
037764bbc6 Fix grid size calc for bwd wei (#2226) 2025-05-26 16:51:09 +02:00
Ding, Yi
91eb136937 Fix v1; use M padding 2025-05-26 10:32:26 +00:00
Andriy Roshchenko
f03da29b65 Merge branch origin/wip-f4 into andriy/wip-f4 2025-05-23 22:14:30 +00:00
aska-0096
574d65efed temp save 2025-05-23 14:51:24 +00:00
aska-0096
7f7c4d35c7 lds conflict free + buffer load lds 2025-05-22 08:04:52 +00:00
Andriy Roshchenko
e302ab8f0c Merge branch origin/develop into wip-fp4 2025-05-22 06:31:47 +00:00
Ding, Yi
352542c49e Better kernel selection in device classes 2025-05-22 06:05:10 +00:00
OscarXu
fc9ef98e7b Add gemm2 64x128x128 asm. Fix BF16 ref. 2025-05-21 16:57:57 +08:00
mtgu0705
513f92f5b9 update buffer load to lds feature, build passed 2025-05-21 02:40:20 -05:00
aska-0096
3a05fa135a lds conflict free + buffer load lds 2025-05-20 23:58:37 -05:00
aska-0096
e1084fe7d6 tempsave. compile pass, function wrong 2025-05-20 10:57:26 +00:00
mtgu0705
589e1dfea9 init mx fp4 B no preshuffle version 2025-05-20 04:40:22 -05:00
OscarXu
9fdfff82ea Fix v2 topk_weight cal. Add silu asm. 2025-05-20 13:42:06 +08:00
OscarXu
a4ed178b4c Add gemm2 v3 64x128x128 2025-05-19 19:33:04 +08:00
aska-0096
f3a296bad4 lds conflict free + buffer load lds 2025-05-19 09:40:39 +00:00
aska-0096
e2c8f98fef generalize the pipeline scheduling. 2025-05-19 02:29:02 +00:00
aska-0096
3e8b07ef58 tempsave; modify the way we represent fp4 2025-05-19 02:28:23 +00:00
mtgu0705
a4b5a374b9 Merge remote-tracking branch 'origin/wip-f4-pk' into mx_moe_f4_scale_shuffle 2025-05-17 09:49:24 -05:00
OscarXu
9c9bf1aece Moe blockscale gemm1&gemm2 asm support for aiter. Suppression cmkae flag til new compler. 2025-05-17 04:33:52 -05:00
arai713
5b3430b868 Narrowing error fix for codegen compilation (#2194)
* removed comment with special characters

* fix for arg/template change after merge from develop

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-05-16 11:11:54 -07:00
aska-0096
248e287866 generalize the pipeline scheduling. 2025-05-16 10:41:59 +00:00
aska-0096
a0379d81e7 modify the way we represent fp4 2025-05-16 09:44:04 +00:00
Mateusz Ozga
fa3c6811d8 Disable conv for Filter1x1Stride1Pad0 when K or C is even (#2186) 2025-05-16 10:18:47 +02:00
aska-0096
a1bec7670a tempsave 2025-05-16 08:14:56 +00:00
Ding, Yi
c04d44b5f6 Merge remote-tracking branch 'origin/develop' into wip-f4 2025-05-16 07:11:26 +00:00
OscarXu
7c76e19e55 Merge remote-tracking branch 'origin/fp4_gu_moe' into fp4_gu_moe_gemm1 2025-05-15 16:37:08 +08:00
mtgu0705
b65bc1ba4a added mx moe block v3 support, function passed 2025-05-15 03:26:37 -05:00
OscarXu
17922821ec Add gemm1 v1 2025-05-15 16:11:43 +08:00
Your Name
e060999d29 v3 2025-05-15 13:19:07 +08:00
Bartłomiej Kocot
c53b7bd22e Switch to v2 pipeline for grouped conv bwd data (#2181)
* Change to old pipeline for grouped conv bwd data

* fix

* fix

* fix

* fix

* fix

* fix

* Fix
2025-05-13 10:14:30 +02:00
mtgu0705
5ba86c210b updated and build passed 2025-05-13 14:49:37 +08:00
OscarXu
19aa39b978 Merge remote-tracking branch 'origin/wjx/moe_v3_aiter' into moe_bs_stage1_dev 2025-05-13 10:46:40 +08:00
aska-0096
79246e6cb8 function pass with inline asm hacky 2025-05-12 16:54:44 +00:00