Commit Graph

1705 Commits

Author SHA1 Message Date
coderfeli
aa15c49a67 add moegemm in device and grid 2025-02-10 07:51:55 +00:00
coderfeli
fcf6106b4b add skip expert 2025-02-10 01:12:50 +00:00
coderfeli
bd64a30b0b add empty expert jump 2025-02-08 06:58:13 +00:00
coderfeli
6b71f3d8c5 fp8 ok 2025-02-07 14:44:50 +00:00
coderfeli
7822382b3c tileM 32,64,128 ok 2025-02-07 12:31:44 +00:00
coderfeli
e15351ca46 tile m = 64 ok 2025-02-07 12:01:15 +00:00
coderfeli
48d87d9c66 a 16x16 ok 2025-02-07 11:41:09 +00:00
coderfeli
24734db8b5 add ret logit for empty expert 2025-02-07 10:06:21 +00:00
coderfeli
965c9f0c17 debug 16x16 load 2025-02-07 08:17:33 +00:00
coderfeli
83970cbe6c fix hack in oob 2025-02-07 02:19:36 +00:00
coderfeli
f9abcf80e8 use offsets in transfer ok 2025-02-07 01:54:12 +00:00
coderfeli
e947d11ea0 save outputs 2025-02-05 08:19:41 +00:00
coderfeli
9afc4a0b4d perf ok 2025-02-04 04:13:59 +00:00
coderfeli
f8d15f2af4 add others 2025-02-04 03:05:58 +00:00
coderfeli
00627feda4 results ok 2025-02-04 03:05:17 +00:00
coderfeli
6b51413b6e compile ok 2025-01-24 08:36:49 +00:00
aska-0096
b755f37502 add save_x=true instance 2025-01-13 07:12:46 +00:00
aska-0096
35ba08646f fp8 add_rmsnorm_dynamic_dequant 2025-01-10 11:12:16 +00:00
aska-0096
487a05d612 refine blockgemm pipeline version as base struct. 2025-01-08 14:27:42 +00:00
aska-0096
22fe522d0c optimize software pipeline 2025-01-08 09:28:32 +00:00
aska-0096
9dd74e0d58 tempsave 2025-01-03 02:53:55 +00:00
aska-0096
0dbe537032 refine weight preshuffle format. 2025-01-02 13:59:58 +00:00
aska-0096
72c1ddacb9 Merge branch 'add_a8w8_preshuffle_ckprofiler' of https://github.com/ROCm/composable_kernel into update_cka8w8_uc 2024-12-31 07:23:50 +00:00
aska-0096
5bbff07d40 use bpreshuffle as independent example 2024-12-31 07:20:01 +00:00
aska-0096
bbbedc1fd7 add fp16 instances 2024-12-31 07:14:56 +00:00
aska-0096
6f24c2d814 disable N, K Padding, splitk enabled 2024-12-31 06:31:06 +00:00
aska-0096
f60f9d5917 sanity pass, most tile size enabled. TODO: NWave!=4 2024-12-30 18:22:08 +00:00
aska-0096
482ca684ba Merge branch 'dev/a8w8_b_preshuffle' of https://github.com/ROCm/composable_kernel into add_a8w8_preshuffle_ckprofiler 2024-12-30 09:21:35 +00:00
aska-0096
74ef5021b6 tempsave 2024-12-30 09:20:25 +00:00
coderfeli
db84352941 fix warnings and revert cmake and fix clang format 2024-12-30 08:24:50 +00:00
coderfeli
5765ba51ce auto calculate hard code params 2024-12-30 07:59:47 +00:00
coderfeli
3f9dbcac63 use new pipeline for b preshuffle, run ok; revert olds to fix ckprofiler 2024-12-30 06:52:10 +00:00
coderfeli
54f44e6232 fix brepeat, kloop and lds two buffer; works ok now 2024-12-30 00:25:46 +00:00
coderfeli
c263bbe7e0 fix cmake rm compile options 2024-12-27 14:42:47 +00:00
coderfeli
1137424459 fix fp16 build 2024-12-27 12:39:28 +00:00
coderfeli
fda5f8cfb0 fix missed files and fix clang format 2024-12-27 11:56:46 +00:00
coderfeli
e92395d9b1 Merge remote-tracking branch 'origin/cka8w8_devtimer' into update_cka8w8_uc 2024-12-27 11:09:05 +00:00
aska-0096
7efafa1169 use empty hipstream in ckprofiler 2024-12-27 09:01:26 +00:00
coderfeli
2c056624af fix tail 2024-12-27 08:30:03 +00:00
coderfeli
174b46b04a add cpu shuffle 2024-12-27 07:31:14 +00:00
coderfeli
842d910e3c Merge branch 'update_cka8w8' into update_cka8w8_uc 2024-12-27 06:53:52 +00:00
coderfeli
e2127d7a96 impl fp16 in ckprofiler 2024-12-27 06:53:40 +00:00
coderfeli
c8d9660f3b using develop branch timer 2024-12-27 06:47:36 +00:00
coderfeli
031ddf356d fix performance regression on blockgemm v3 pipe 2024-12-27 06:40:43 +00:00
coderfeli
400cac2839 Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8 2024-12-27 05:42:38 +00:00
coderfeli
04f09f087e fix build 2024-12-27 11:44:06 +08:00
coderfeli
1d074e34dd add configs to fix tunning cases 2024-12-27 11:43:53 +08:00
aska-0096
7cec63a631 remove agpr usage when vgpr usage <256 2024-12-27 03:09:26 +00:00
coderfeli
e6f5a78b14 add double buffer scratch 2024-12-26 15:02:04 +00:00
coderfeli
3784329b68 can run 2024-12-26 13:01:07 +00:00