carlushuang
807501ac3d
[CK_TILE] optimize moe sorting kernel, boost large context case up to 20x ( #2153 )
...
* combine 2-3 as single stage
* support zeroing
* improve long tokens
* update specialization
* b16 ws
* 8bit topk optimize
* update 15 example
[ROCm/composable_kernel commit: 4e9b76f88c ]
2025-05-06 17:32:07 +08:00
carlushuang
581c75f3b7
[CK_TILE] add moe-sorting MP kernel ( #1910 )
...
* moe sorting ex
* fix bug for race condition
* fix bug and optimze large expert
* fix
* optimize with sub_token_oneshot
* support skip empty tokens for expert sorting
* update moe_sorting
* tidy code
* support mp kernel
* hint mp
* remove use less code
* porting to example 15
---------
Co-authored-by: valarLip <340077269@qq.com >
[ROCm/composable_kernel commit: 353a612b44 ]
2025-02-25 17:56:55 +08:00
valarLip
baf4710ef6
porting fmoe_sorting from moe_sorting ( #1884 )
...
* porting fmoe_sorting from moe_sorting
* pass default example test
* remod
[ROCm/composable_kernel commit: 0e5e29c4e2 ]
2025-02-13 15:34:34 +08:00
carlushuang
2fec988802
[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block ( #1808 )
...
* fix mock token id
* prepare host for g1u1
* reformat inline-asm
* restructure uk_0
* restructure gate_up
* done
* change default to init=1
* update readme
* fix a bug in interleave pipeline
* rcp for silu
[ROCm/composable_kernel commit: 1ff50e78c6 ]
2025-01-16 17:51:10 +08:00
carlushuang
8acce2dee1
[CK_TILE] fused-moe first version ( #1634 )
...
* moe pipeline
* update code
* compile OK
* update
* update cpu reference
* update pipeline_gemm0
* compiler ok
* update pipeline
* rename to ex pipeline
* block-asm
* update
* update
* update first gemm ok
* compute correct
* update file structure
* update README
* update
* update
* update code
* update API
* return unsupport case
* add comment
* update readme
* update
* uncomment
* update
* fix build err
---------
Co-authored-by: valarLip <340077269@qq.com >
[ROCm/composable_kernel commit: 440e28b08f ]
2024-11-26 11:14:56 +08:00