mtgu0705
40ed20a30d
updated the codes
2025-06-04 02:04:28 -05:00
mtgu0705
5117e99822
Merge remote-tracking branch 'origin/mx_moe_f4_scaleshuffle_Bnoshuffle' into mxfp4_moe_blockscale_buf2lds
2025-06-04 00:14:18 -05:00
aska-0096
86c8bef5d7
Refactor thread_copy_lds_direct_load; fix gfx942 direct lds load example; fix f16_pki4 example
2025-06-03 14:54:30 +00:00
Ding, Yi
0cbc5e2bdb
Use packed_size_v for A/BPackedSize
2025-06-03 06:10:07 +00:00
Ding, Yi
331ccb8ca2
Merge remote-tracking branch 'origin/develop' into gfx950-mxfp4
2025-06-03 05:38:56 +00:00
mtgu0705
edffec8fc4
change the code to suit threadwise loading are continued
2025-06-03 00:36:07 -05:00
aska-0096
dd24786f78
Enable splitk for mxfp4; clang format;
2025-06-02 12:23:01 +00:00
aska-0096
bb5bdff61c
remove unnecessary files
2025-05-30 08:39:25 +00:00
mtgu0705
aeb717a132
add pipeline v1 for MOE Gemm2
2025-05-30 05:25:43 +00:00
Ding, Yi
69418725a6
Merge remote-tracking branch 'origin/moe_bs_fp8_no_asm' into gfx950-mxfp4
2025-05-30 03:15:47 +00:00
OscarXu
6be76c53b6
No asm ver. for merging moe blocksale fp8 into mainline
2025-05-29 03:38:56 -05:00
Ding, Yi
aecac410d0
Merge remote-tracking branch 'origin/f8blk_scale_opt' into wip-f4-mergemoe-2
2025-05-28 11:15:22 +00:00
Ding, Yi
ad7fd89c1d
Merge remote-tracking branch 'origin/feiw/mxfp4_moe_2Stages' into wip-f4
2025-05-28 09:28:26 +00:00
Ding, Yi
857ef9f8c4
Merge preshuffle device
2025-05-28 07:02:28 +00:00
Ding, Yi
e2e0e0025e
Profiler add f4 wp
2025-05-28 05:12:39 +00:00
Ding, Yi
85ac576109
Merge gemm_mx_common.hpp
2025-05-27 06:13:03 +00:00
Ding, Yi
123053b685
Merge remote-tracking branch 'origin/wip-f4-wp' into wip-f4
2025-05-27 03:36:38 +00:00
Bartłomiej Kocot
037764bbc6
Fix grid size calc for bwd wei ( #2226 )
2025-05-26 16:51:09 +02:00
Ding, Yi
91eb136937
Fix v1; use M padding
2025-05-26 10:32:26 +00:00
Andriy Roshchenko
f03da29b65
Merge branch origin/wip-f4 into andriy/wip-f4
2025-05-23 22:14:30 +00:00
aska-0096
574d65efed
temp save
2025-05-23 14:51:24 +00:00
aska-0096
7f7c4d35c7
lds conflict free + buffer load lds
2025-05-22 08:04:52 +00:00
Andriy Roshchenko
e302ab8f0c
Merge branch origin/develop into wip-fp4
2025-05-22 06:31:47 +00:00
Ding, Yi
352542c49e
Better kernel selection in device classes
2025-05-22 06:05:10 +00:00
OscarXu
fc9ef98e7b
Add gemm2 64x128x128 asm. Fix BF16 ref.
2025-05-21 16:57:57 +08:00
mtgu0705
513f92f5b9
update buffer load to lds feature, build passed
2025-05-21 02:40:20 -05:00
aska-0096
3a05fa135a
lds conflict free + buffer load lds
2025-05-20 23:58:37 -05:00
aska-0096
e1084fe7d6
tempsave. compile pass, function wrong
2025-05-20 10:57:26 +00:00
mtgu0705
589e1dfea9
init mx fp4 B no preshuffle version
2025-05-20 04:40:22 -05:00
OscarXu
9fdfff82ea
Fix v2 topk_weight cal. Add silu asm.
2025-05-20 13:42:06 +08:00
OscarXu
a4ed178b4c
Add gemm2 v3 64x128x128
2025-05-19 19:33:04 +08:00
aska-0096
f3a296bad4
lds conflict free + buffer load lds
2025-05-19 09:40:39 +00:00
aska-0096
e2c8f98fef
generalize the pipeline scheduling.
2025-05-19 02:29:02 +00:00
aska-0096
3e8b07ef58
tempsave; modify the way we represent fp4
2025-05-19 02:28:23 +00:00
mtgu0705
a4b5a374b9
Merge remote-tracking branch 'origin/wip-f4-pk' into mx_moe_f4_scale_shuffle
2025-05-17 09:49:24 -05:00
OscarXu
9c9bf1aece
Moe blockscale gemm1&gemm2 asm support for aiter. Suppression cmkae flag til new compler.
2025-05-17 04:33:52 -05:00
arai713
5b3430b868
Narrowing error fix for codegen compilation ( #2194 )
...
* removed comment with special characters
* fix for arg/template change after merge from develop
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-05-16 11:11:54 -07:00
aska-0096
248e287866
generalize the pipeline scheduling.
2025-05-16 10:41:59 +00:00
aska-0096
a0379d81e7
modify the way we represent fp4
2025-05-16 09:44:04 +00:00
Mateusz Ozga
fa3c6811d8
Disable conv for Filter1x1Stride1Pad0 when K or C is even ( #2186 )
2025-05-16 10:18:47 +02:00
aska-0096
a1bec7670a
tempsave
2025-05-16 08:14:56 +00:00
Ding, Yi
c04d44b5f6
Merge remote-tracking branch 'origin/develop' into wip-f4
2025-05-16 07:11:26 +00:00
OscarXu
7c76e19e55
Merge remote-tracking branch 'origin/fp4_gu_moe' into fp4_gu_moe_gemm1
2025-05-15 16:37:08 +08:00
mtgu0705
b65bc1ba4a
added mx moe block v3 support, function passed
2025-05-15 03:26:37 -05:00
OscarXu
17922821ec
Add gemm1 v1
2025-05-15 16:11:43 +08:00
Your Name
e060999d29
v3
2025-05-15 13:19:07 +08:00
Bartłomiej Kocot
c53b7bd22e
Switch to v2 pipeline for grouped conv bwd data ( #2181 )
...
* Change to old pipeline for grouped conv bwd data
* fix
* fix
* fix
* fix
* fix
* fix
* Fix
2025-05-13 10:14:30 +02:00
mtgu0705
5ba86c210b
updated and build passed
2025-05-13 14:49:37 +08:00
OscarXu
19aa39b978
Merge remote-tracking branch 'origin/wjx/moe_v3_aiter' into moe_bs_stage1_dev
2025-05-13 10:46:40 +08:00
aska-0096
79246e6cb8
function pass with inline asm hacky
2025-05-12 16:54:44 +00:00