coderfeli
|
418baed327
|
moe gemm1 scaleready
|
2025-02-12 05:19:01 +00:00 |
|
coderfeli
|
b02c0b8257
|
gemm1 scale debug
|
2025-02-11 14:52:01 +00:00 |
|
coderfeli
|
e4ca61f9e7
|
moe gemm2 scales ok
|
2025-02-11 12:01:39 +00:00 |
|
coderfeli
|
66d08ea327
|
impl topk weight scatter
|
2025-02-11 07:43:59 +00:00 |
|
coderfeli
|
a8a82e0cfc
|
fix warnings and impl scale for gemm2, build ok
|
2025-02-11 01:54:08 +00:00 |
|
coderfeli
|
69f54ee822
|
impl 3ds epilog ok
|
2025-02-10 14:50:56 +00:00 |
|
coderfeli
|
72752420e9
|
merge gemm1 gemm2 together and run ok
|
2025-02-10 09:06:22 +00:00 |
|
coderfeli
|
66cff9103f
|
merge gemm1 and gemm2
|
2025-02-10 07:52:32 +00:00 |
|
coderfeli
|
aa15c49a67
|
add moegemm in device and grid
|
2025-02-10 07:51:55 +00:00 |
|
coderfeli
|
2e53f9725b
|
skip empty expert
|
2025-02-10 01:26:08 +00:00 |
|
coderfeli
|
fcf6106b4b
|
add skip expert
|
2025-02-10 01:12:50 +00:00 |
|
coderfeli
|
e21f36fc24
|
moegemm2 ok
|
2025-02-09 13:44:42 +00:00 |
|
coderfeli
|
1230145590
|
gemm2 result ok
|
2025-02-09 09:02:32 +00:00 |
|
coderfeli
|
7ba5bff4c2
|
one tile ok
|
2025-02-08 12:31:25 +00:00 |
|
coderfeli
|
8a5bb9f34b
|
add files , build and run ok
|
2025-02-08 09:52:10 +00:00 |
|
coderfeli
|
bd64a30b0b
|
add empty expert jump
|
2025-02-08 06:58:13 +00:00 |
|
coderfeli
|
e15351ca46
|
tile m = 64 ok
|
2025-02-07 12:01:15 +00:00 |
|
coderfeli
|
48d87d9c66
|
a 16x16 ok
|
2025-02-07 11:41:09 +00:00 |
|
coderfeli
|
24734db8b5
|
add ret logit for empty expert
|
2025-02-07 10:06:21 +00:00 |
|
coderfeli
|
965c9f0c17
|
debug 16x16 load
|
2025-02-07 08:17:33 +00:00 |
|
coderfeli
|
83970cbe6c
|
fix hack in oob
|
2025-02-07 02:19:36 +00:00 |
|
coderfeli
|
f9abcf80e8
|
use offsets in transfer ok
|
2025-02-07 01:54:12 +00:00 |
|
coderfeli
|
f8d15f2af4
|
add others
|
2025-02-04 03:05:58 +00:00 |
|
coderfeli
|
00627feda4
|
results ok
|
2025-02-04 03:05:17 +00:00 |
|
coderfeli
|
6b51413b6e
|
compile ok
|
2025-01-24 08:36:49 +00:00 |
|
aska-0096
|
35ba08646f
|
fp8 add_rmsnorm_dynamic_dequant
|
2025-01-10 11:12:16 +00:00 |
|
aska-0096
|
487a05d612
|
refine blockgemm pipeline version as base struct.
|
2025-01-08 14:27:42 +00:00 |
|
aska-0096
|
22fe522d0c
|
optimize software pipeline
|
2025-01-08 09:28:32 +00:00 |
|
aska-0096
|
0dbe537032
|
refine weight preshuffle format.
|
2025-01-02 13:59:58 +00:00 |
|
aska-0096
|
72c1ddacb9
|
Merge branch 'add_a8w8_preshuffle_ckprofiler' of https://github.com/ROCm/composable_kernel into update_cka8w8_uc
|
2024-12-31 07:23:50 +00:00 |
|
aska-0096
|
6f24c2d814
|
disable N, K Padding, splitk enabled
|
2024-12-31 06:31:06 +00:00 |
|
aska-0096
|
f60f9d5917
|
sanity pass, most tile size enabled. TODO: NWave!=4
|
2024-12-30 18:22:08 +00:00 |
|
aska-0096
|
482ca684ba
|
Merge branch 'dev/a8w8_b_preshuffle' of https://github.com/ROCm/composable_kernel into add_a8w8_preshuffle_ckprofiler
|
2024-12-30 09:21:35 +00:00 |
|
aska-0096
|
74ef5021b6
|
tempsave
|
2024-12-30 09:20:25 +00:00 |
|
coderfeli
|
db84352941
|
fix warnings and revert cmake and fix clang format
|
2024-12-30 08:24:50 +00:00 |
|
coderfeli
|
5765ba51ce
|
auto calculate hard code params
|
2024-12-30 07:59:47 +00:00 |
|
coderfeli
|
3f9dbcac63
|
use new pipeline for b preshuffle, run ok; revert olds to fix ckprofiler
|
2024-12-30 06:52:10 +00:00 |
|
coderfeli
|
54f44e6232
|
fix brepeat, kloop and lds two buffer; works ok now
|
2024-12-30 00:25:46 +00:00 |
|
coderfeli
|
2c056624af
|
fix tail
|
2024-12-27 08:30:03 +00:00 |
|
coderfeli
|
174b46b04a
|
add cpu shuffle
|
2024-12-27 07:31:14 +00:00 |
|
coderfeli
|
c8d9660f3b
|
using develop branch timer
|
2024-12-27 06:47:36 +00:00 |
|
coderfeli
|
031ddf356d
|
fix performance regression on blockgemm v3 pipe
|
2024-12-27 06:40:43 +00:00 |
|
coderfeli
|
400cac2839
|
Merge branch 'develop' of https://github.com/ROCm/composable_kernel into update_cka8w8
|
2024-12-27 05:42:38 +00:00 |
|
aska-0096
|
7cec63a631
|
remove agpr usage when vgpr usage <256
|
2024-12-27 03:09:26 +00:00 |
|
coderfeli
|
e6f5a78b14
|
add double buffer scratch
|
2024-12-26 15:02:04 +00:00 |
|
coderfeli
|
3784329b68
|
can run
|
2024-12-26 13:01:07 +00:00 |
|
coderfeli
|
4a1ec81595
|
add bypass logic and build
|
2024-12-26 10:05:25 +00:00 |
|
aska-0096
|
1a089f6f88
|
sanity bug fix
|
2024-12-26 10:05:17 +00:00 |
|
carlushuang
|
3d15f364b3
|
[CK_TILE] optimize moe-sorting kernel (#1771)
* opt moe sorting
* remove commented code
|
2024-12-23 10:59:02 +08:00 |
|
Illia Silin
|
07339c7383
|
fix typo for CK_USE_OCP_FP8 (#1769)
|
2024-12-20 07:52:24 -08:00 |
|