Commit Graph

1726 Commits

Author SHA1 Message Date
coderfeli
84b27d7504 merge max_token_id and fix err 2025-02-14 08:19:54 +00:00
coderfeli
b3ae04f8ed fix ref gemm no padding 2025-02-14 07:48:41 +00:00
coderfeli
83be79ba58 add max_token_id 2025-02-14 06:22:17 +00:00
coderfeli
1078d22916 add logics and debug 2025-02-14 05:23:15 +00:00
coderfeli
d4b8f1e3b0 add codes for a scatter 2025-02-14 11:05:26 +08:00
coderfeli
82e1f1b903 change cshuffle cluster, mi300x reach roofline 2025-02-13 10:11:36 +00:00
coderfeli
568ad1e16d fix mtile 64,128 for gemm1 2025-02-12 09:35:13 +00:00
coderfeli
59f3e0093a remove d2 for gemm1 2025-02-12 09:24:48 +00:00
coderfeli
418baed327 moe gemm1 scaleready 2025-02-12 05:19:01 +00:00
coderfeli
b02c0b8257 gemm1 scale debug 2025-02-11 14:52:01 +00:00
coderfeli
e4ca61f9e7 moe gemm2 scales ok 2025-02-11 12:01:39 +00:00
coderfeli
66d08ea327 impl topk weight scatter 2025-02-11 07:43:59 +00:00
coderfeli
a8a82e0cfc fix warnings and impl scale for gemm2, build ok 2025-02-11 01:54:08 +00:00
coderfeli
69f54ee822 impl 3ds epilog ok 2025-02-10 14:50:56 +00:00
coderfeli
72752420e9 merge gemm1 gemm2 together and run ok 2025-02-10 09:06:22 +00:00
coderfeli
66cff9103f merge gemm1 and gemm2 2025-02-10 07:52:32 +00:00
coderfeli
aa15c49a67 add moegemm in device and grid 2025-02-10 07:51:55 +00:00
coderfeli
2e53f9725b skip empty expert 2025-02-10 01:26:08 +00:00
coderfeli
fcf6106b4b add skip expert 2025-02-10 01:12:50 +00:00
coderfeli
e21f36fc24 moegemm2 ok 2025-02-09 13:44:42 +00:00
coderfeli
1230145590 gemm2 result ok 2025-02-09 09:02:32 +00:00
coderfeli
7ba5bff4c2 one tile ok 2025-02-08 12:31:25 +00:00
coderfeli
8a5bb9f34b add files , build and run ok 2025-02-08 09:52:10 +00:00
coderfeli
bd64a30b0b add empty expert jump 2025-02-08 06:58:13 +00:00
coderfeli
6b71f3d8c5 fp8 ok 2025-02-07 14:44:50 +00:00
coderfeli
7822382b3c tileM 32,64,128 ok 2025-02-07 12:31:44 +00:00
coderfeli
e15351ca46 tile m = 64 ok 2025-02-07 12:01:15 +00:00
coderfeli
48d87d9c66 a 16x16 ok 2025-02-07 11:41:09 +00:00
coderfeli
24734db8b5 add ret logit for empty expert 2025-02-07 10:06:21 +00:00
coderfeli
965c9f0c17 debug 16x16 load 2025-02-07 08:17:33 +00:00
coderfeli
83970cbe6c fix hack in oob 2025-02-07 02:19:36 +00:00
coderfeli
f9abcf80e8 use offsets in transfer ok 2025-02-07 01:54:12 +00:00
coderfeli
e947d11ea0 save outputs 2025-02-05 08:19:41 +00:00
coderfeli
9afc4a0b4d perf ok 2025-02-04 04:13:59 +00:00
coderfeli
f8d15f2af4 add others 2025-02-04 03:05:58 +00:00
coderfeli
00627feda4 results ok 2025-02-04 03:05:17 +00:00
coderfeli
6b51413b6e compile ok 2025-01-24 08:36:49 +00:00
aska-0096
b755f37502 add save_x=true instance 2025-01-13 07:12:46 +00:00
aska-0096
35ba08646f fp8 add_rmsnorm_dynamic_dequant 2025-01-10 11:12:16 +00:00
aska-0096
487a05d612 refine blockgemm pipeline version as base struct. 2025-01-08 14:27:42 +00:00
aska-0096
22fe522d0c optimize software pipeline 2025-01-08 09:28:32 +00:00
aska-0096
9dd74e0d58 tempsave 2025-01-03 02:53:55 +00:00
aska-0096
0dbe537032 refine weight preshuffle format. 2025-01-02 13:59:58 +00:00
aska-0096
72c1ddacb9 Merge branch 'add_a8w8_preshuffle_ckprofiler' of https://github.com/ROCm/composable_kernel into update_cka8w8_uc 2024-12-31 07:23:50 +00:00
aska-0096
5bbff07d40 use bpreshuffle as independent example 2024-12-31 07:20:01 +00:00
aska-0096
bbbedc1fd7 add fp16 instances 2024-12-31 07:14:56 +00:00
aska-0096
6f24c2d814 disable N, K Padding, splitk enabled 2024-12-31 06:31:06 +00:00
aska-0096
f60f9d5917 sanity pass, most tile size enabled. TODO: NWave!=4 2024-12-30 18:22:08 +00:00
aska-0096
482ca684ba Merge branch 'dev/a8w8_b_preshuffle' of https://github.com/ROCm/composable_kernel into add_a8w8_preshuffle_ckprofiler 2024-12-30 09:21:35 +00:00
aska-0096
74ef5021b6 tempsave 2024-12-30 09:20:25 +00:00