Commit Graph

1044 Commits

Author SHA1 Message Date
mtgu0705
ea65ce0c64 fix schedule 2025-06-27 03:34:26 -05:00
mtgu0705
2bd6422af3 rename nbs 2025-06-18 10:19:50 -05:00
mtgu0705
7516bcd792 mxfp4 moe gemm1 function passed 2025-06-18 09:44:43 -05:00
mtgu0705
0743f88625 added mxfp4 mpe gemm1 support, build pass, function not correct 2025-06-18 04:47:32 -05:00
mtgu0705
c1465916f7 enable moe gemm2 pipeline schedul perf: 1892.69 Tflops 2025-06-18 00:40:56 -05:00
mtgu0705
daa013bf62 moe gemm2 bpreshuffle function passed 2025-06-18 00:23:44 -05:00
mtgu0705
54c930d3b9 init the fp4 moe bpreshuffe, build pass, test failed 2025-06-17 05:34:10 -05:00
mtgu0705
ed5a1ab186 fixed gemm1 LDS allocation issue 2025-06-12 04:08:18 -05:00
mtgu0705
b16b13dc22 added mxfp4 moe gemm1 support for blockscale buf2lds 2025-06-05 21:19:31 -05:00
mtgu0705
40ed20a30d updated the codes 2025-06-04 02:04:28 -05:00
mtgu0705
5117e99822 Merge remote-tracking branch 'origin/mx_moe_f4_scaleshuffle_Bnoshuffle' into mxfp4_moe_blockscale_buf2lds 2025-06-04 00:14:18 -05:00
mtgu0705
620a4d2a0e added gather offset xor, function passed. 2025-06-03 10:13:03 -05:00
aska-0096
86c8bef5d7 Refactor thread_copy_lds_direct_load; fix gfx942 direct lds load example; fix f16_pki4 example 2025-06-03 14:54:30 +00:00
Ding, Yi
407489d2c0 Fix ThreadwiseTensorSliceTransfer_v4::Run (Fuse scale) 2025-06-03 09:29:50 +00:00
aska-0096
8ecc3812de doc the kGroup definition 2025-06-03 08:53:04 +00:00
aska-0096
3491918bfb fix moe pki4 on gfx950 2025-06-03 07:40:47 +00:00
Ding, Yi
1c2da4b2bf Fix warning 2025-06-03 06:48:49 +00:00
Ding, Yi
0cbc5e2bdb Use packed_size_v for A/BPackedSize 2025-06-03 06:10:07 +00:00
Ding, Yi
331ccb8ca2 Merge remote-tracking branch 'origin/develop' into gfx950-mxfp4 2025-06-03 05:38:56 +00:00
mtgu0705
edffec8fc4 change the code to suit threadwise loading are continued 2025-06-03 00:36:07 -05:00
Ding, Yi
f9bf27548e Generate random tensor values with multiple threads 2025-06-03 02:40:24 +00:00
Khushbu Agarwal
2e38eb4f1c Rotating buffer PR CI fix (#2257)
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)"

This reverts commit bbdaf79a52.

* fix regression
2025-06-02 10:25:01 -07:00
aska-0096
dd24786f78 Enable splitk for mxfp4; clang format; 2025-06-02 12:23:01 +00:00
valarLip
0fdbf6bcd1 extend buffer load for fp16/bf16x16 (#2270)
* extend buffer load for fp16/bf16x16

* format
2025-06-02 10:29:54 +08:00
Mirza Halilčević
fbce6c7bb6 Define CHAR_BIT during hipRTC (#2264)
* Fix failing codegen tests.

* fix clang format

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
2025-05-30 08:23:44 -07:00
aska-0096
bb5bdff61c remove unnecessary files 2025-05-30 08:39:25 +00:00
Ding, Yi
0cd2e6e782 Fix OOB; add MB96 instances 2025-05-30 07:46:28 +00:00
mtgu0705
aeb717a132 add pipeline v1 for MOE Gemm2 2025-05-30 05:25:43 +00:00
OscarXu
798345a1cf Fix moe blockscale gemm1 barrier 0x800 for new compiler 2025-05-30 04:13:42 +00:00
Ding, Yi
69418725a6 Merge remote-tracking branch 'origin/moe_bs_fp8_no_asm' into gfx950-mxfp4 2025-05-30 03:15:47 +00:00
Illia Silin
4e561af18c Revert "add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue (#2185)" (#2260)
This reverts commit fd6a859b44.
2025-05-29 16:22:16 -07:00
joyeamd
fd6a859b44 add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue (#2185)
* add cshuffle's mxdlperwavepershuffle support, not finished

* add epilogue functions

* add cshuffle's mxdlperwavepershuffle support, not finished

* add epilogue functions

* update cshuffle logic

* update cshuffle_logics

* add some change within review

* update some codes following the code review

* update epilogue logic

* remove from problem

* update codes following review.

* fix some issues
2025-05-29 14:31:14 +02:00
aska-0096
3c24d690a1 remove single rate mfma restriction for f8 blockscale gemm 2025-05-29 10:13:50 +00:00
aska-0096
33085c8458 clang format, remove single rate mfma restriction for f8 2025-05-29 10:10:12 +00:00
aska-0096
d563dac424 fix performance bug of bpreshuffle f8 gemm 2025-05-29 10:02:46 +00:00
OscarXu
6be76c53b6 No asm ver. for merging moe blocksale fp8 into mainline 2025-05-29 03:38:56 -05:00
aska-0096
0db8d71dc1 Remove debug infos; Enable flags for blockscale f8 2025-05-29 08:21:54 +00:00
OscarXu
52d68c9529 flag and barrier fix for copmiler branch MainOpSelV3 2025-05-29 03:13:11 -05:00
OscarXu
653bc83f8a Remove rocm6.3 workaround flags and macro 2025-05-28 21:05:21 -05:00
Bartłomiej Kocot
e7906dd644 Change relu to clamp for grouped conv fwd instances (#2249) 2025-05-29 00:51:25 +02:00
Illia Silin
bbdaf79a52 Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)
This reverts commit 99857e10e6.
2025-05-28 09:46:52 -06:00
Ding, Yi
35b436c0d9 Clang-format after 2 merges 2025-05-28 11:16:00 +00:00
Ding, Yi
aecac410d0 Merge remote-tracking branch 'origin/f8blk_scale_opt' into wip-f4-mergemoe-2 2025-05-28 11:15:22 +00:00
OscarXu
772debdf8f Fix do_weight in gemm1. Fix cshuffle_datatype. Clang-format 2025-05-28 18:29:06 +08:00
Ding, Yi
ad7fd89c1d Merge remote-tracking branch 'origin/feiw/mxfp4_moe_2Stages' into wip-f4 2025-05-28 09:28:26 +00:00
Ding, Yi
857ef9f8c4 Merge preshuffle device 2025-05-28 07:02:28 +00:00
Khushbu Agarwal
99857e10e6 [CK_tile] Add rotating buffer feature for universal gemm (#2200)
* Add rotating buffer feature for universal gemm

* adding changes in tile_engine

* Updated code to merge kernel_launch

* removing comments

* Enable rotating buffer changes to flatmm

* Created diff launch_kernel function for rotating buffer

* Simplfied calculation using macros

* merge code with new changes in tile_engine

* clang formatted

* Redefine macros
2025-05-27 23:00:58 -07:00
Ding, Yi
e2e0e0025e Profiler add f4 wp 2025-05-28 05:12:39 +00:00
aska-0096
78d0fd4e65 add vmcnt guard for async copy 2025-05-28 03:47:46 +00:00
aska-0096
65255e12fb Unconditional Ascale padding 2025-05-28 01:55:23 +00:00