[CK_TILE] optimize moe sorting kernel, boost large context case up to 20x (#2153)

* combine 2-3 as single stage

* support zeroing

* improve long tokens

* update specialization

* b16 ws

* 8bit topk optimize

* update 15 example
This commit is contained in:
carlushuang
2025-05-06 17:32:07 +08:00
committed by GitHub
parent b8fa27bfef
commit 4e9b76f88c
15 changed files with 1216 additions and 115 deletions

View File

@@ -257,5 +257,5 @@
#endif
#ifndef CK_TILE_WA_ISSUE_2028
#define CK_TILE_WA_ISSUE_2028 1
#define CK_TILE_WA_ISSUE_2028 0
#endif