mtgu0705
|
ea65ce0c64
|
fix schedule
|
2025-06-27 03:34:26 -05:00 |
|
mtgu0705
|
2bd6422af3
|
rename nbs
|
2025-06-18 10:19:50 -05:00 |
|
mtgu0705
|
7516bcd792
|
mxfp4 moe gemm1 function passed
|
2025-06-18 09:44:43 -05:00 |
|
mtgu0705
|
0743f88625
|
added mxfp4 mpe gemm1 support, build pass, function not correct
|
2025-06-18 04:47:32 -05:00 |
|
mtgu0705
|
c1465916f7
|
enable moe gemm2 pipeline schedul perf: 1892.69 Tflops
|
2025-06-18 00:40:56 -05:00 |
|
mtgu0705
|
daa013bf62
|
moe gemm2 bpreshuffle function passed
|
2025-06-18 00:23:44 -05:00 |
|
mtgu0705
|
54c930d3b9
|
init the fp4 moe bpreshuffe, build pass, test failed
|
2025-06-17 05:34:10 -05:00 |
|
mtgu0705
|
ed5a1ab186
|
fixed gemm1 LDS allocation issue
|
2025-06-12 04:08:18 -05:00 |
|
mtgu0705
|
b16b13dc22
|
added mxfp4 moe gemm1 support for blockscale buf2lds
|
2025-06-05 21:19:31 -05:00 |
|
mtgu0705
|
40ed20a30d
|
updated the codes
|
2025-06-04 02:04:28 -05:00 |
|
mtgu0705
|
5117e99822
|
Merge remote-tracking branch 'origin/mx_moe_f4_scaleshuffle_Bnoshuffle' into mxfp4_moe_blockscale_buf2lds
|
2025-06-04 00:14:18 -05:00 |
|
mtgu0705
|
620a4d2a0e
|
added gather offset xor, function passed.
|
2025-06-03 10:13:03 -05:00 |
|
aska-0096
|
86c8bef5d7
|
Refactor thread_copy_lds_direct_load; fix gfx942 direct lds load example; fix f16_pki4 example
|
2025-06-03 14:54:30 +00:00 |
|
Ding, Yi
|
407489d2c0
|
Fix ThreadwiseTensorSliceTransfer_v4::Run (Fuse scale)
|
2025-06-03 09:29:50 +00:00 |
|
aska-0096
|
8ecc3812de
|
doc the kGroup definition
|
2025-06-03 08:53:04 +00:00 |
|
aska-0096
|
3491918bfb
|
fix moe pki4 on gfx950
|
2025-06-03 07:40:47 +00:00 |
|
Ding, Yi
|
1c2da4b2bf
|
Fix warning
|
2025-06-03 06:48:49 +00:00 |
|
Ding, Yi
|
0cbc5e2bdb
|
Use packed_size_v for A/BPackedSize
|
2025-06-03 06:10:07 +00:00 |
|
Ding, Yi
|
331ccb8ca2
|
Merge remote-tracking branch 'origin/develop' into gfx950-mxfp4
|
2025-06-03 05:38:56 +00:00 |
|
mtgu0705
|
edffec8fc4
|
change the code to suit threadwise loading are continued
|
2025-06-03 00:36:07 -05:00 |
|
Ding, Yi
|
f9bf27548e
|
Generate random tensor values with multiple threads
|
2025-06-03 02:40:24 +00:00 |
|
Khushbu Agarwal
|
2e38eb4f1c
|
Rotating buffer PR CI fix (#2257)
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)"
This reverts commit bbdaf79a52.
* fix regression
|
2025-06-02 10:25:01 -07:00 |
|
aska-0096
|
dd24786f78
|
Enable splitk for mxfp4; clang format;
|
2025-06-02 12:23:01 +00:00 |
|
valarLip
|
0fdbf6bcd1
|
extend buffer load for fp16/bf16x16 (#2270)
* extend buffer load for fp16/bf16x16
* format
|
2025-06-02 10:29:54 +08:00 |
|
Mirza Halilčević
|
fbce6c7bb6
|
Define CHAR_BIT during hipRTC (#2264)
* Fix failing codegen tests.
* fix clang format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com>
|
2025-05-30 08:23:44 -07:00 |
|
aska-0096
|
bb5bdff61c
|
remove unnecessary files
|
2025-05-30 08:39:25 +00:00 |
|
Ding, Yi
|
0cd2e6e782
|
Fix OOB; add MB96 instances
|
2025-05-30 07:46:28 +00:00 |
|
mtgu0705
|
aeb717a132
|
add pipeline v1 for MOE Gemm2
|
2025-05-30 05:25:43 +00:00 |
|
OscarXu
|
798345a1cf
|
Fix moe blockscale gemm1 barrier 0x800 for new compiler
|
2025-05-30 04:13:42 +00:00 |
|
Ding, Yi
|
69418725a6
|
Merge remote-tracking branch 'origin/moe_bs_fp8_no_asm' into gfx950-mxfp4
|
2025-05-30 03:15:47 +00:00 |
|
Illia Silin
|
4e561af18c
|
Revert "add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue (#2185)" (#2260)
This reverts commit fd6a859b44.
|
2025-05-29 16:22:16 -07:00 |
|
joyeamd
|
fd6a859b44
|
add CShuffleM/NXdlPerWavePerShuffle in cshuffle_epilogue (#2185)
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* add cshuffle's mxdlperwavepershuffle support, not finished
* add epilogue functions
* update cshuffle logic
* update cshuffle_logics
* add some change within review
* update some codes following the code review
* update epilogue logic
* remove from problem
* update codes following review.
* fix some issues
|
2025-05-29 14:31:14 +02:00 |
|
aska-0096
|
3c24d690a1
|
remove single rate mfma restriction for f8 blockscale gemm
|
2025-05-29 10:13:50 +00:00 |
|
aska-0096
|
33085c8458
|
clang format, remove single rate mfma restriction for f8
|
2025-05-29 10:10:12 +00:00 |
|
aska-0096
|
d563dac424
|
fix performance bug of bpreshuffle f8 gemm
|
2025-05-29 10:02:46 +00:00 |
|
OscarXu
|
6be76c53b6
|
No asm ver. for merging moe blocksale fp8 into mainline
|
2025-05-29 03:38:56 -05:00 |
|
aska-0096
|
0db8d71dc1
|
Remove debug infos; Enable flags for blockscale f8
|
2025-05-29 08:21:54 +00:00 |
|
OscarXu
|
52d68c9529
|
flag and barrier fix for copmiler branch MainOpSelV3
|
2025-05-29 03:13:11 -05:00 |
|
OscarXu
|
653bc83f8a
|
Remove rocm6.3 workaround flags and macro
|
2025-05-28 21:05:21 -05:00 |
|
Bartłomiej Kocot
|
e7906dd644
|
Change relu to clamp for grouped conv fwd instances (#2249)
|
2025-05-29 00:51:25 +02:00 |
|
Illia Silin
|
bbdaf79a52
|
Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)
This reverts commit 99857e10e6.
|
2025-05-28 09:46:52 -06:00 |
|
Ding, Yi
|
35b436c0d9
|
Clang-format after 2 merges
|
2025-05-28 11:16:00 +00:00 |
|
Ding, Yi
|
aecac410d0
|
Merge remote-tracking branch 'origin/f8blk_scale_opt' into wip-f4-mergemoe-2
|
2025-05-28 11:15:22 +00:00 |
|
OscarXu
|
772debdf8f
|
Fix do_weight in gemm1. Fix cshuffle_datatype. Clang-format
|
2025-05-28 18:29:06 +08:00 |
|
Ding, Yi
|
ad7fd89c1d
|
Merge remote-tracking branch 'origin/feiw/mxfp4_moe_2Stages' into wip-f4
|
2025-05-28 09:28:26 +00:00 |
|
Ding, Yi
|
857ef9f8c4
|
Merge preshuffle device
|
2025-05-28 07:02:28 +00:00 |
|
Khushbu Agarwal
|
99857e10e6
|
[CK_tile] Add rotating buffer feature for universal gemm (#2200)
* Add rotating buffer feature for universal gemm
* adding changes in tile_engine
* Updated code to merge kernel_launch
* removing comments
* Enable rotating buffer changes to flatmm
* Created diff launch_kernel function for rotating buffer
* Simplfied calculation using macros
* merge code with new changes in tile_engine
* clang formatted
* Redefine macros
|
2025-05-27 23:00:58 -07:00 |
|
Ding, Yi
|
e2e0e0025e
|
Profiler add f4 wp
|
2025-05-28 05:12:39 +00:00 |
|
aska-0096
|
78d0fd4e65
|
add vmcnt guard for async copy
|
2025-05-28 03:47:46 +00:00 |
|
aska-0096
|
65255e12fb
|
Unconditional Ascale padding
|
2025-05-28 01:55:23 +00:00 |
|