Commit Graph

680 Commits

Author SHA1 Message Date
coderfeli
418baed327 moe gemm1 scaleready 2025-02-12 05:19:01 +00:00
coderfeli
b02c0b8257 gemm1 scale debug 2025-02-11 14:52:01 +00:00
coderfeli
e4ca61f9e7 moe gemm2 scales ok 2025-02-11 12:01:39 +00:00
coderfeli
66d08ea327 impl topk weight scatter 2025-02-11 07:43:59 +00:00
coderfeli
a8a82e0cfc fix warnings and impl scale for gemm2, build ok 2025-02-11 01:54:08 +00:00
coderfeli
69f54ee822 impl 3ds epilog ok 2025-02-10 14:50:56 +00:00
coderfeli
72752420e9 merge gemm1 gemm2 together and run ok 2025-02-10 09:06:22 +00:00
coderfeli
66cff9103f merge gemm1 and gemm2 2025-02-10 07:52:32 +00:00
coderfeli
aa15c49a67 add moegemm in device and grid 2025-02-10 07:51:55 +00:00
coderfeli
2e53f9725b skip empty expert 2025-02-10 01:26:08 +00:00
coderfeli
fcf6106b4b add skip expert 2025-02-10 01:12:50 +00:00
coderfeli
e21f36fc24 moegemm2 ok 2025-02-09 13:44:42 +00:00
coderfeli
1230145590 gemm2 result ok 2025-02-09 09:02:32 +00:00
coderfeli
7ba5bff4c2 one tile ok 2025-02-08 12:31:25 +00:00
coderfeli
8a5bb9f34b add files , build and run ok 2025-02-08 09:52:10 +00:00
coderfeli
bd64a30b0b add empty expert jump 2025-02-08 06:58:13 +00:00
coderfeli
e15351ca46 tile m = 64 ok 2025-02-07 12:01:15 +00:00
coderfeli
48d87d9c66 a 16x16 ok 2025-02-07 11:41:09 +00:00
coderfeli
24734db8b5 add ret logit for empty expert 2025-02-07 10:06:21 +00:00
coderfeli
965c9f0c17 debug 16x16 load 2025-02-07 08:17:33 +00:00
coderfeli
83970cbe6c fix hack in oob 2025-02-07 02:19:36 +00:00
coderfeli
f9abcf80e8 use offsets in transfer ok 2025-02-07 01:54:12 +00:00
coderfeli
f8d15f2af4 add others 2025-02-04 03:05:58 +00:00
coderfeli
00627feda4 results ok 2025-02-04 03:05:17 +00:00
coderfeli
6b51413b6e compile ok 2025-01-24 08:36:49 +00:00
aska-0096
35ba08646f fp8 add_rmsnorm_dynamic_dequant 2025-01-10 11:12:16 +00:00
aska-0096
487a05d612 refine blockgemm pipeline version as base struct. 2025-01-08 14:27:42 +00:00
aska-0096
22fe522d0c optimize software pipeline 2025-01-08 09:28:32 +00:00
aska-0096
0dbe537032 refine weight preshuffle format. 2025-01-02 13:59:58 +00:00
aska-0096
72c1ddacb9 Merge branch 'add_a8w8_preshuffle_ckprofiler' of https://github.com/ROCm/composable_kernel into update_cka8w8_uc 2024-12-31 07:23:50 +00:00
aska-0096
6f24c2d814 disable N, K Padding, splitk enabled 2024-12-31 06:31:06 +00:00
aska-0096
f60f9d5917 sanity pass, most tile size enabled. TODO: NWave!=4 2024-12-30 18:22:08 +00:00
aska-0096
482ca684ba Merge branch 'dev/a8w8_b_preshuffle' of https://github.com/ROCm/composable_kernel into add_a8w8_preshuffle_ckprofiler 2024-12-30 09:21:35 +00:00
aska-0096
74ef5021b6 tempsave 2024-12-30 09:20:25 +00:00
coderfeli
db84352941 fix warnings and revert cmake and fix clang format 2024-12-30 08:24:50 +00:00
coderfeli
5765ba51ce auto calculate hard code params 2024-12-30 07:59:47 +00:00
coderfeli
3f9dbcac63 use new pipeline for b preshuffle, run ok; revert olds to fix ckprofiler 2024-12-30 06:52:10 +00:00
coderfeli
54f44e6232 fix brepeat, kloop and lds two buffer; works ok now 2024-12-30 00:25:46 +00:00
coderfeli
2c056624af fix tail 2024-12-27 08:30:03 +00:00
coderfeli
174b46b04a add cpu shuffle 2024-12-27 07:31:14 +00:00
coderfeli
c8d9660f3b using develop branch timer 2024-12-27 06:47:36 +00:00
coderfeli
031ddf356d fix performance regression on blockgemm v3 pipe 2024-12-27 06:40:43 +00:00
coderfeli
400cac2839 Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8 2024-12-27 05:42:38 +00:00
aska-0096
7cec63a631 remove agpr usage when vgpr usage <256 2024-12-27 03:09:26 +00:00
coderfeli
e6f5a78b14 add double buffer scratch 2024-12-26 15:02:04 +00:00
coderfeli
3784329b68 can run 2024-12-26 13:01:07 +00:00
coderfeli
4a1ec81595 add bypass logic and build 2024-12-26 10:05:25 +00:00
aska-0096
1a089f6f88 sanity bug fix 2024-12-26 10:05:17 +00:00
carlushuang
3d15f364b3 [CK_TILE] optimize moe-sorting kernel (#1771)
* opt moe sorting

* remove commented code
2024-12-23 10:59:02 +08:00
Illia Silin
07339c7383 fix typo for CK_USE_OCP_FP8 (#1769) 2024-12-20 07:52:24 -08:00