aska-0096
|
d563dac424
|
fix performance bug of bpreshuffle f8 gemm
|
2025-05-29 10:02:46 +00:00 |
|
aska-0096
|
c3d52993c4
|
update the flag name for f8blockscale
|
2025-05-29 08:47:34 +00:00 |
|
aska-0096
|
ced28892b6
|
Merge branch 'gfx950-mxfp4' of https://github.com/ROCm/composable_kernel into gfx950-mxfp4
|
2025-05-29 08:21:58 +00:00 |
|
aska-0096
|
0db8d71dc1
|
Remove debug infos; Enable flags for blockscale f8
|
2025-05-29 08:21:54 +00:00 |
|
Ding, Yi
|
e4a40c7214
|
Add fp8 profiler instances
|
2025-05-29 08:19:31 +00:00 |
|
Ding, Yi
|
f9ccd1a378
|
Fix bf8 config
|
2025-05-29 02:20:47 +00:00 |
|
Ding, Yi
|
2b4b189a5f
|
Fix fp8 config
|
2025-05-29 02:18:02 +00:00 |
|
Ding, Yi
|
35b436c0d9
|
Clang-format after 2 merges
|
2025-05-28 11:16:00 +00:00 |
|
Ding, Yi
|
aecac410d0
|
Merge remote-tracking branch 'origin/f8blk_scale_opt' into wip-f4-mergemoe-2
|
2025-05-28 11:15:22 +00:00 |
|
Ding, Yi
|
ad7fd89c1d
|
Merge remote-tracking branch 'origin/feiw/mxfp4_moe_2Stages' into wip-f4
|
2025-05-28 09:28:26 +00:00 |
|
Ding, Yi
|
4f9bfb1566
|
Add more fp4 wp instances
|
2025-05-28 07:33:40 +00:00 |
|
Ding, Yi
|
857ef9f8c4
|
Merge preshuffle device
|
2025-05-28 07:02:28 +00:00 |
|
Ding, Yi
|
e2e0e0025e
|
Profiler add f4 wp
|
2025-05-28 05:12:39 +00:00 |
|
aska-0096
|
78d0fd4e65
|
add vmcnt guard for async copy
|
2025-05-28 03:47:46 +00:00 |
|
Ding, Yi
|
b99c50a5d5
|
pad ascale
|
2025-05-28 03:35:33 +00:00 |
|
Ding, Yi
|
cf5b4c11a2
|
Pad shuffled a scale only
|
2025-05-28 02:37:14 +00:00 |
|
aska-0096
|
65255e12fb
|
Unconditional Ascale padding
|
2025-05-28 01:55:23 +00:00 |
|
mtgu0705
|
2f0ee8ccb1
|
change the gemm1 tile from 64x128x128 to 128x64x128
|
2025-05-27 20:43:38 -05:00 |
|
mtgu0705
|
52b764d59f
|
update MX moe GEMM1 hotloopscheduling
|
2025-05-27 20:43:22 -05:00 |
|
aska-0096
|
63c9388881
|
Pad the M for scale buffer unconditionaly
|
2025-05-27 11:52:12 +00:00 |
|
aska-0096
|
9da2995163
|
Merge branch 'wip-f4' of https://github.com/ROCm/composable_kernel into wip-f4
|
2025-05-27 10:23:21 +00:00 |
|
aska-0096
|
04f7265c19
|
refactor the pipeline
|
2025-05-27 10:14:45 +00:00 |
|
Ding, Yi
|
d3015785cb
|
Fix 'Merge gemm_mx_common.hpp'
|
2025-05-27 09:08:02 +00:00 |
|
aska-0096
|
71e7346bf4
|
Merge branch 'wip-f4' of https://github.com/ROCm/composable_kernel into wip-f4
|
2025-05-27 07:32:16 +00:00 |
|
aska-0096
|
137e28d151
|
temp save, 4.4~4.5
|
2025-05-27 07:31:16 +00:00 |
|
Ding, Yi
|
85ac576109
|
Merge gemm_mx_common.hpp
|
2025-05-27 06:13:03 +00:00 |
|
Ding, Yi
|
123053b685
|
Merge remote-tracking branch 'origin/wip-f4-wp' into wip-f4
|
2025-05-27 03:36:38 +00:00 |
|
aska-0096
|
61748eddba
|
Add NT flag to B/BScale buffer
|
2025-05-27 02:26:43 +00:00 |
|
Ding, Yi
|
91eb136937
|
Fix v1; use M padding
|
2025-05-26 10:32:26 +00:00 |
|
aska-0096
|
d1d56e89ef
|
fix the correctness issue
|
2025-05-26 09:29:36 +00:00 |
|
Ding, Yi
|
40af523e2c
|
Add rotating to mx examples
|
2025-05-26 05:05:54 +00:00 |
|
aska-0096
|
4a3205f94a
|
Merge branch 'wip-f4-wp' of https://github.com/ROCm/composable_kernel into wip-f4-wp
|
2025-05-26 02:22:09 +00:00 |
|
Lin, Qun
|
d5e7580473
|
correct a typo in tail
|
2025-05-25 19:22:47 -05:00 |
|
mtgu0705
|
a36a747e29
|
rename the block pipeline
|
2025-05-24 00:03:43 -05:00 |
|
Andriy Roshchenko
|
fdfc9c6fd8
|
Merge remote-tracking branch 'origin/develop' into andriy/wip-f4
|
2025-05-23 23:02:43 +00:00 |
|
Andriy Roshchenko
|
f03da29b65
|
Merge branch origin/wip-f4 into andriy/wip-f4
|
2025-05-23 22:14:30 +00:00 |
|
Andriy Roshchenko
|
1c91f6bf1e
|
Fix example_gemm_mx build
|
2025-05-23 22:00:07 +00:00 |
|
Illia Silin
|
8146e471f1
|
fix the buffer intrinsic names for clang >=20 (#2228)
|
2025-05-23 14:58:25 -07:00 |
|
aska-0096
|
574d65efed
|
temp save
|
2025-05-23 14:51:24 +00:00 |
|
feifei14119
|
2e39bf06f7
|
fix typo
|
2025-05-23 11:23:01 +00:00 |
|
mtgu0705
|
2216ff0521
|
update mx moe gemm1 gemm2 TF and BW calculation
|
2025-05-23 05:29:39 -05:00 |
|
mtgu0705
|
d6bfdc9d7d
|
update mx moe gemm1_bns tile size to 64x128x256
|
2025-05-23 05:10:45 -05:00 |
|
feifei14119
|
ce4e7b39da
|
gemm1 func pass
|
2025-05-23 09:26:38 +00:00 |
|
joye
|
8afac88f89
|
fix f4 pipeline issues
|
2025-05-23 17:13:10 +08:00 |
|
Illia Silin
|
1b846143c6
|
Revert "Update the buffer load/store intrinsic names for clang>=20. (#2192)" (#2227)
This reverts commit 58f9e9ffbc.
|
2025-05-22 15:41:17 -07:00 |
|
Andriy Roshchenko
|
715ad01bf2
|
Fix MX MFMA tests
|
2025-05-22 21:51:36 +00:00 |
|
Illia Silin
|
bc2551ac3b
|
disable building device_mha_operations by default (#2225)
|
2025-05-22 14:03:04 -07:00 |
|
Adam Dickin
|
417a6b65b6
|
Add MIOPEN_REQ_LIBS_ONLY option for cmake to build only the libs MIOpen requires (#2224)
* cut out anything we dont need for MIOpen to test
* refactor exclusion code to be more streamlined.
|
2025-05-22 11:14:33 -07:00 |
|
Ding, Yi
|
ce50d4bd62
|
Fix fp4 ckProfiler
|
2025-05-22 09:39:22 +00:00 |
|
aska-0096
|
a4dae9eb86
|
optimize offset math in dma
|
2025-05-22 08:15:31 +00:00 |
|