coderfeli
|
84b27d7504
|
merge max_token_id and fix err
|
2025-02-14 08:19:54 +00:00 |
|
coderfeli
|
b3ae04f8ed
|
fix ref gemm no padding
|
2025-02-14 07:48:41 +00:00 |
|
coderfeli
|
83be79ba58
|
add max_token_id
|
2025-02-14 06:22:17 +00:00 |
|
coderfeli
|
1078d22916
|
add logics and debug
|
2025-02-14 05:23:15 +00:00 |
|
coderfeli
|
d4b8f1e3b0
|
add codes for a scatter
|
2025-02-14 11:05:26 +08:00 |
|
coderfeli
|
82e1f1b903
|
change cshuffle cluster, mi300x reach roofline
|
2025-02-13 10:11:36 +00:00 |
|
coderfeli
|
568ad1e16d
|
fix mtile 64,128 for gemm1
|
2025-02-12 09:35:13 +00:00 |
|
coderfeli
|
59f3e0093a
|
remove d2 for gemm1
|
2025-02-12 09:24:48 +00:00 |
|
coderfeli
|
418baed327
|
moe gemm1 scaleready
|
2025-02-12 05:19:01 +00:00 |
|
coderfeli
|
b02c0b8257
|
gemm1 scale debug
|
2025-02-11 14:52:01 +00:00 |
|
coderfeli
|
e4ca61f9e7
|
moe gemm2 scales ok
|
2025-02-11 12:01:39 +00:00 |
|
coderfeli
|
66d08ea327
|
impl topk weight scatter
|
2025-02-11 07:43:59 +00:00 |
|
coderfeli
|
a8a82e0cfc
|
fix warnings and impl scale for gemm2, build ok
|
2025-02-11 01:54:08 +00:00 |
|
coderfeli
|
69f54ee822
|
impl 3ds epilog ok
|
2025-02-10 14:50:56 +00:00 |
|
coderfeli
|
72752420e9
|
merge gemm1 gemm2 together and run ok
|
2025-02-10 09:06:22 +00:00 |
|
coderfeli
|
66cff9103f
|
merge gemm1 and gemm2
|
2025-02-10 07:52:32 +00:00 |
|
coderfeli
|
aa15c49a67
|
add moegemm in device and grid
|
2025-02-10 07:51:55 +00:00 |
|
coderfeli
|
2e53f9725b
|
skip empty expert
|
2025-02-10 01:26:08 +00:00 |
|
coderfeli
|
fcf6106b4b
|
add skip expert
|
2025-02-10 01:12:50 +00:00 |
|
coderfeli
|
e21f36fc24
|
moegemm2 ok
|
2025-02-09 13:44:42 +00:00 |
|
coderfeli
|
1230145590
|
gemm2 result ok
|
2025-02-09 09:02:32 +00:00 |
|
coderfeli
|
7ba5bff4c2
|
one tile ok
|
2025-02-08 12:31:25 +00:00 |
|
coderfeli
|
8a5bb9f34b
|
add files , build and run ok
|
2025-02-08 09:52:10 +00:00 |
|
coderfeli
|
bd64a30b0b
|
add empty expert jump
|
2025-02-08 06:58:13 +00:00 |
|
coderfeli
|
6b71f3d8c5
|
fp8 ok
|
2025-02-07 14:44:50 +00:00 |
|
coderfeli
|
7822382b3c
|
tileM 32,64,128 ok
|
2025-02-07 12:31:44 +00:00 |
|
coderfeli
|
e15351ca46
|
tile m = 64 ok
|
2025-02-07 12:01:15 +00:00 |
|
coderfeli
|
48d87d9c66
|
a 16x16 ok
|
2025-02-07 11:41:09 +00:00 |
|
coderfeli
|
24734db8b5
|
add ret logit for empty expert
|
2025-02-07 10:06:21 +00:00 |
|
coderfeli
|
965c9f0c17
|
debug 16x16 load
|
2025-02-07 08:17:33 +00:00 |
|
coderfeli
|
83970cbe6c
|
fix hack in oob
|
2025-02-07 02:19:36 +00:00 |
|
coderfeli
|
f9abcf80e8
|
use offsets in transfer ok
|
2025-02-07 01:54:12 +00:00 |
|
coderfeli
|
e947d11ea0
|
save outputs
|
2025-02-05 08:19:41 +00:00 |
|
coderfeli
|
9afc4a0b4d
|
perf ok
|
2025-02-04 04:13:59 +00:00 |
|
coderfeli
|
f8d15f2af4
|
add others
|
2025-02-04 03:05:58 +00:00 |
|
coderfeli
|
00627feda4
|
results ok
|
2025-02-04 03:05:17 +00:00 |
|
coderfeli
|
6b51413b6e
|
compile ok
|
2025-01-24 08:36:49 +00:00 |
|
aska-0096
|
b755f37502
|
add save_x=true instance
|
2025-01-13 07:12:46 +00:00 |
|
aska-0096
|
35ba08646f
|
fp8 add_rmsnorm_dynamic_dequant
|
2025-01-10 11:12:16 +00:00 |
|
aska-0096
|
487a05d612
|
refine blockgemm pipeline version as base struct.
|
2025-01-08 14:27:42 +00:00 |
|
aska-0096
|
22fe522d0c
|
optimize software pipeline
|
2025-01-08 09:28:32 +00:00 |
|
aska-0096
|
9dd74e0d58
|
tempsave
|
2025-01-03 02:53:55 +00:00 |
|
aska-0096
|
0dbe537032
|
refine weight preshuffle format.
|
2025-01-02 13:59:58 +00:00 |
|
aska-0096
|
72c1ddacb9
|
Merge branch 'add_a8w8_preshuffle_ckprofiler' of https://github.com/ROCm/composable_kernel into update_cka8w8_uc
|
2024-12-31 07:23:50 +00:00 |
|
aska-0096
|
5bbff07d40
|
use bpreshuffle as independent example
|
2024-12-31 07:20:01 +00:00 |
|
aska-0096
|
bbbedc1fd7
|
add fp16 instances
|
2024-12-31 07:14:56 +00:00 |
|
aska-0096
|
6f24c2d814
|
disable N, K Padding, splitk enabled
|
2024-12-31 06:31:06 +00:00 |
|
aska-0096
|
f60f9d5917
|
sanity pass, most tile size enabled. TODO: NWave!=4
|
2024-12-30 18:22:08 +00:00 |
|
aska-0096
|
482ca684ba
|
Merge branch 'dev/a8w8_b_preshuffle' of https://github.com/ROCm/composable_kernel into add_a8w8_preshuffle_ckprofiler
|
2024-12-30 09:21:35 +00:00 |
|
aska-0096
|
74ef5021b6
|
tempsave
|
2024-12-30 09:20:25 +00:00 |
|