[CK_TILE] optimize moe sorting kernel, boost large context case up to 20x (#2153)

* combine 2-3 as single stage

* support zeroing

* improve long tokens

* update specialization

* b16 ws

* 8bit topk optimize

* update 15 example

[ROCm/composable_kernel commit: 4e9b76f88c]
This commit is contained in:
carlushuang
2025-05-06 17:32:07 +08:00
committed by GitHub
parent d22cc72f32
commit 807501ac3d
15 changed files with 1216 additions and 115 deletions

View File

@@ -257,5 +257,5 @@
#endif
#ifndef CK_TILE_WA_ISSUE_2028
#define CK_TILE_WA_ISSUE_2028 1
#define CK_TILE_WA_ISSUE_2028 0
#endif