coderfeli
|
aa15c49a67
|
add moegemm in device and grid
|
2025-02-10 07:51:55 +00:00 |
|
coderfeli
|
fcf6106b4b
|
add skip expert
|
2025-02-10 01:12:50 +00:00 |
|
coderfeli
|
bd64a30b0b
|
add empty expert jump
|
2025-02-08 06:58:13 +00:00 |
|
coderfeli
|
6b71f3d8c5
|
fp8 ok
|
2025-02-07 14:44:50 +00:00 |
|
coderfeli
|
7822382b3c
|
tileM 32,64,128 ok
|
2025-02-07 12:31:44 +00:00 |
|
coderfeli
|
e15351ca46
|
tile m = 64 ok
|
2025-02-07 12:01:15 +00:00 |
|
coderfeli
|
48d87d9c66
|
a 16x16 ok
|
2025-02-07 11:41:09 +00:00 |
|
coderfeli
|
24734db8b5
|
add ret logit for empty expert
|
2025-02-07 10:06:21 +00:00 |
|
coderfeli
|
965c9f0c17
|
debug 16x16 load
|
2025-02-07 08:17:33 +00:00 |
|
coderfeli
|
83970cbe6c
|
fix hack in oob
|
2025-02-07 02:19:36 +00:00 |
|
coderfeli
|
f9abcf80e8
|
use offsets in transfer ok
|
2025-02-07 01:54:12 +00:00 |
|
coderfeli
|
e947d11ea0
|
save outputs
|
2025-02-05 08:19:41 +00:00 |
|
coderfeli
|
9afc4a0b4d
|
perf ok
|
2025-02-04 04:13:59 +00:00 |
|
coderfeli
|
f8d15f2af4
|
add others
|
2025-02-04 03:05:58 +00:00 |
|
coderfeli
|
00627feda4
|
results ok
|
2025-02-04 03:05:17 +00:00 |
|
coderfeli
|
6b51413b6e
|
compile ok
|
2025-01-24 08:36:49 +00:00 |
|
aska-0096
|
b755f37502
|
add save_x=true instance
|
2025-01-13 07:12:46 +00:00 |
|
aska-0096
|
35ba08646f
|
fp8 add_rmsnorm_dynamic_dequant
|
2025-01-10 11:12:16 +00:00 |
|
aska-0096
|
487a05d612
|
refine blockgemm pipeline version as base struct.
|
2025-01-08 14:27:42 +00:00 |
|
aska-0096
|
22fe522d0c
|
optimize software pipeline
|
2025-01-08 09:28:32 +00:00 |
|
aska-0096
|
9dd74e0d58
|
tempsave
|
2025-01-03 02:53:55 +00:00 |
|
aska-0096
|
0dbe537032
|
refine weight preshuffle format.
|
2025-01-02 13:59:58 +00:00 |
|
aska-0096
|
72c1ddacb9
|
Merge branch 'add_a8w8_preshuffle_ckprofiler' of https://github.com/ROCm/composable_kernel into update_cka8w8_uc
|
2024-12-31 07:23:50 +00:00 |
|
aska-0096
|
5bbff07d40
|
use bpreshuffle as independent example
|
2024-12-31 07:20:01 +00:00 |
|
aska-0096
|
bbbedc1fd7
|
add fp16 instances
|
2024-12-31 07:14:56 +00:00 |
|
aska-0096
|
6f24c2d814
|
disable N, K Padding, splitk enabled
|
2024-12-31 06:31:06 +00:00 |
|
aska-0096
|
f60f9d5917
|
sanity pass, most tile size enabled. TODO: NWave!=4
|
2024-12-30 18:22:08 +00:00 |
|
aska-0096
|
482ca684ba
|
Merge branch 'dev/a8w8_b_preshuffle' of https://github.com/ROCm/composable_kernel into add_a8w8_preshuffle_ckprofiler
|
2024-12-30 09:21:35 +00:00 |
|
aska-0096
|
74ef5021b6
|
tempsave
|
2024-12-30 09:20:25 +00:00 |
|
coderfeli
|
db84352941
|
fix warnings and revert cmake and fix clang format
|
2024-12-30 08:24:50 +00:00 |
|
coderfeli
|
5765ba51ce
|
auto calculate hard code params
|
2024-12-30 07:59:47 +00:00 |
|
coderfeli
|
3f9dbcac63
|
use new pipeline for b preshuffle, run ok; revert olds to fix ckprofiler
|
2024-12-30 06:52:10 +00:00 |
|
coderfeli
|
54f44e6232
|
fix brepeat, kloop and lds two buffer; works ok now
|
2024-12-30 00:25:46 +00:00 |
|
coderfeli
|
c263bbe7e0
|
fix cmake rm compile options
|
2024-12-27 14:42:47 +00:00 |
|
coderfeli
|
1137424459
|
fix fp16 build
|
2024-12-27 12:39:28 +00:00 |
|
coderfeli
|
fda5f8cfb0
|
fix missed files and fix clang format
|
2024-12-27 11:56:46 +00:00 |
|
coderfeli
|
e92395d9b1
|
Merge remote-tracking branch 'origin/cka8w8_devtimer' into update_cka8w8_uc
|
2024-12-27 11:09:05 +00:00 |
|
aska-0096
|
7efafa1169
|
use empty hipstream in ckprofiler
|
2024-12-27 09:01:26 +00:00 |
|
coderfeli
|
2c056624af
|
fix tail
|
2024-12-27 08:30:03 +00:00 |
|
coderfeli
|
174b46b04a
|
add cpu shuffle
|
2024-12-27 07:31:14 +00:00 |
|
coderfeli
|
842d910e3c
|
Merge branch 'update_cka8w8' into update_cka8w8_uc
|
2024-12-27 06:53:52 +00:00 |
|
coderfeli
|
e2127d7a96
|
impl fp16 in ckprofiler
|
2024-12-27 06:53:40 +00:00 |
|
coderfeli
|
c8d9660f3b
|
using develop branch timer
|
2024-12-27 06:47:36 +00:00 |
|
coderfeli
|
031ddf356d
|
fix performance regression on blockgemm v3 pipe
|
2024-12-27 06:40:43 +00:00 |
|
coderfeli
|
400cac2839
|
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8
|
2024-12-27 05:42:38 +00:00 |
|
coderfeli
|
04f09f087e
|
fix build
|
2024-12-27 11:44:06 +08:00 |
|
coderfeli
|
1d074e34dd
|
add configs to fix tunning cases
|
2024-12-27 11:43:53 +08:00 |
|
aska-0096
|
7cec63a631
|
remove agpr usage when vgpr usage <256
|
2024-12-27 03:09:26 +00:00 |
|
coderfeli
|
e6f5a78b14
|
add double buffer scratch
|
2024-12-26 15:02:04 +00:00 |
|
coderfeli
|
3784329b68
|
can run
|
2024-12-26 13:01:07 +00:00 |
|