mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-20 04:49:54 +00:00
[CK_TILE] optimize moe sorting kernel, boost large context case up to 20x (#2153)
* combine 2-3 as single stage
* support zeroing
* improve long tokens
* update specialization
* b16 ws
* 8bit topk optimize
* update 15 example
[ROCm/composable_kernel commit: 4e9b76f88c]
This commit is contained in:
@@ -257,5 +257,5 @@
|
||||
#endif
|
||||
|
||||
#ifndef CK_TILE_WA_ISSUE_2028
|
||||
#define CK_TILE_WA_ISSUE_2028 1
|
||||
#define CK_TILE_WA_ISSUE_2028 0
|
||||
#endif
|
||||
|
||||
Reference in New Issue
Block a user