Commit Graph

6 Commits

Author SHA1 Message Date
carlushuang
ad346270c2 [CK_TILE] moe sorting ex kernel to support expert > 128 (#1840)
* moe sorting ex

* fix bug for race condition

* fix bug and optimze large expert

* fix

* optimize with sub_token_oneshot

* support skip empty tokens for expert sorting

* update moe_sorting

* tidy code

[ROCm/composable_kernel commit: c0adab4850]
2025-02-11 17:49:17 +08:00
carlushuang
21264b4e60 [CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808)
* fix mock token id

* prepare host for g1u1

* reformat inline-asm

* restructure uk_0

* restructure gate_up

* done

* change default to init=1

* update readme

* fix a bug in interleave pipeline

* rcp for silu

[ROCm/composable_kernel commit: 1ff50e78c6]
2025-01-16 17:51:10 +08:00
carlushuang
edccbb3694 [CK_TILE] optimize moe-sorting kernel (#1771)
* opt moe sorting

* remove commented code

[ROCm/composable_kernel commit: 3d15f364b3]
2024-12-23 10:59:02 +08:00
carlushuang
74b0db75f7 [CK_TILE] fused-moe first version (#1634)
* moe pipeline

* update code

* compile OK

* update

* update cpu reference

* update pipeline_gemm0

* compiler ok

* update pipeline

* rename to ex pipeline

* block-asm

* update

* update

* update first gemm ok

* compute correct

* update file structure

* update README

* update

* update

* update code

* update API

* return unsupport case

* add comment

* update readme

* update

* uncomment

* update

* fix build err

---------

Co-authored-by: valarLip <340077269@qq.com>

[ROCm/composable_kernel commit: 440e28b08f]
2024-11-26 11:14:56 +08:00
carlushuang
d6ab951548 [CK_TILE]Moe update index (#1672)
* update MOCK_ID for moe-sorting

* add moe-smoothquant

* update a comment

* fix format

* hot fix

* update topk in overflow case

* update comments

* update bf16 cvt

---------

Co-authored-by: valarLip <340077269@qq.com>

[ROCm/composable_kernel commit: 36c7ce4e0e]
2024-11-25 13:12:35 +08:00
dummycoderfe
77f0f4ee48 Ck tile/moe sorting (#1624)
* add moe_sorting & check ok

* fix comments & typo

* Run remod.py under include/ck_tile & example/ck_tile directories

* format codes

* fix output ci check bug

* fix moe sorting readme and error commit file

* use magiv div to accelerate compute

* add an loop unroll for moe lds ops

* add extblocksnel to set zeros for moebufs

* [Ck_tile] moe set zero run ok, add size check and fix ref check

* [Ck_tile]fix moe_sorting fuse set_zero remod

* [Ck_tile] change name style, fix zero buffer size err, change folder

* [Ck_tile] moe_sorting: fix name style

* [Ck_tile] moe_sorting, remove useless params in traits

* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: bec6fbc65f]
2024-11-09 17:57:27 +08:00