Commit Graph

9 Commits

Author SHA1 Message Date
felix
817752cdb4 hotfix fix sorting int64 (#2025)
* fix sorting int64

* clang format

* fix example issue

* update WA issue #

---------

Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: a82f338fb9]
2025-03-28 11:31:52 +08:00
carlushuang
f588e4b08e [CK_TILE] return value with macro in ck_tile::kernel_launch API (#1982)
* return value with macro and revert the return value

* [CK-TILE] no-macro launch api solution (#1992)

* no-macro solution

* address -Wcomma

---------

Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>

[ROCm/composable_kernel commit: e3c9886cdf]
2025-03-20 11:00:29 -07:00
valarLip
3d06952a2b hotfix fmoe build issue (#1976)
[ROCm/composable_kernel commit: 52b1cd7780]
2025-03-13 15:11:59 +08:00
carlushuang
581c75f3b7 [CK_TILE] add moe-sorting MP kernel (#1910)
* moe sorting ex

* fix bug for race condition

* fix bug and optimze large expert

* fix

* optimize with sub_token_oneshot

* support skip empty tokens for expert sorting

* update moe_sorting

* tidy code

* support mp kernel

* hint mp

* remove use less code

* porting to example 15

---------

Co-authored-by: valarLip <340077269@qq.com>

[ROCm/composable_kernel commit: 353a612b44]
2025-02-25 17:56:55 +08:00
valarLip
baf4710ef6 porting fmoe_sorting from moe_sorting (#1884)
* porting fmoe_sorting from moe_sorting

* pass default example test

* remod

[ROCm/composable_kernel commit: 0e5e29c4e2]
2025-02-13 15:34:34 +08:00
carlushuang
8ed234da8c [CK_TILE] moe sorting ex kernel to support expert > 128 (#1840)
* moe sorting ex

* fix bug for race condition

* fix bug and optimze large expert

* fix

* optimize with sub_token_oneshot

* support skip empty tokens for expert sorting

* update moe_sorting

* tidy code

[ROCm/composable_kernel commit: c0adab4850]
2025-02-11 17:49:17 +08:00
carlushuang
2fec988802 [CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808)
* fix mock token id

* prepare host for g1u1

* reformat inline-asm

* restructure uk_0

* restructure gate_up

* done

* change default to init=1

* update readme

* fix a bug in interleave pipeline

* rcp for silu

[ROCm/composable_kernel commit: 1ff50e78c6]
2025-01-16 17:51:10 +08:00
carlushuang
4c4be7b14f [CK_TILE] optimize moe-sorting kernel (#1771)
* opt moe sorting

* remove commented code

[ROCm/composable_kernel commit: 3d15f364b3]
2024-12-23 10:59:02 +08:00
carlushuang
8acce2dee1 [CK_TILE] fused-moe first version (#1634)
* moe pipeline

* update code

* compile OK

* update

* update cpu reference

* update pipeline_gemm0

* compiler ok

* update pipeline

* rename to ex pipeline

* block-asm

* update

* update

* update first gemm ok

* compute correct

* update file structure

* update README

* update

* update

* update code

* update API

* return unsupport case

* add comment

* update readme

* update

* uncomment

* update

* fix build err

---------

Co-authored-by: valarLip <340077269@qq.com>

[ROCm/composable_kernel commit: 440e28b08f]
2024-11-26 11:14:56 +08:00