valarLip
|
baf4710ef6
|
porting fmoe_sorting from moe_sorting (#1884)
* porting fmoe_sorting from moe_sorting
* pass default example test
* remod
[ROCm/composable_kernel commit: 0e5e29c4e2]
|
2025-02-13 15:34:34 +08:00 |
|
carlushuang
|
8ed234da8c
|
[CK_TILE] moe sorting ex kernel to support expert > 128 (#1840)
* moe sorting ex
* fix bug for race condition
* fix bug and optimze large expert
* fix
* optimize with sub_token_oneshot
* support skip empty tokens for expert sorting
* update moe_sorting
* tidy code
[ROCm/composable_kernel commit: c0adab4850]
|
2025-02-11 17:49:17 +08:00 |
|
carlushuang
|
2fec988802
|
[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808)
* fix mock token id
* prepare host for g1u1
* reformat inline-asm
* restructure uk_0
* restructure gate_up
* done
* change default to init=1
* update readme
* fix a bug in interleave pipeline
* rcp for silu
[ROCm/composable_kernel commit: 1ff50e78c6]
|
2025-01-16 17:51:10 +08:00 |
|
carlushuang
|
4c4be7b14f
|
[CK_TILE] optimize moe-sorting kernel (#1771)
* opt moe sorting
* remove commented code
[ROCm/composable_kernel commit: 3d15f364b3]
|
2024-12-23 10:59:02 +08:00 |
|
carlushuang
|
8acce2dee1
|
[CK_TILE] fused-moe first version (#1634)
* moe pipeline
* update code
* compile OK
* update
* update cpu reference
* update pipeline_gemm0
* compiler ok
* update pipeline
* rename to ex pipeline
* block-asm
* update
* update
* update first gemm ok
* compute correct
* update file structure
* update README
* update
* update
* update code
* update API
* return unsupport case
* add comment
* update readme
* update
* uncomment
* update
* fix build err
---------
Co-authored-by: valarLip <340077269@qq.com>
[ROCm/composable_kernel commit: 440e28b08f]
|
2024-11-26 11:14:56 +08:00 |
|