Commit Graph

21 Commits

Author SHA1 Message Date
carlushuang
47b8632296 hot fix check eid range (#2924)
* hot fix check eid range

* fix clang format

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 2e9428eb63]
2025-09-29 09:38:38 -07:00
Khushbu Agarwal
bb5eeef2af Fix for Add the API to load SGPR (#2913)
* Revert "Revert "[CK-Tile] Add the API to load SGPR  (#2878)" (#2904)"

This reverts commit 5cc40c160f.

* Fix: sgpr minor issue

* cyclic dependency resolved

* clang formatted

* removing unused variable

* clang formatted

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: b56e5d1d79]
2025-09-25 10:32:42 -07:00
asleepzzz
5cc40c160f Revert "[CK-Tile] Add the API to load SGPR (#2878)" (#2904)
This reverts commit fb5e953a05.

[ROCm/composable_kernel commit: f161b5b738]
2025-09-23 14:33:51 -07:00
Thomas Ning
fb5e953a05 [CK-Tile] Add the API to load SGPR (#2878)
* Have a workable version for SGPR

* have a workable version for atomic add

* Revert "have a workable version for atomic add"

This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.

* substitute with the new sgpr read api

* update the CHANGELOG

* have a workable version for atomic add

* Revert "have a workable version for atomic add"

This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.

* change to static for logic

* have a workable version for atomic add

* Revert "have a workable version for atomic add"

This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.

[ROCm/composable_kernel commit: 2cbbf5dcb3]
2025-09-23 01:23:56 -07:00
linqunAMD
4fab7b0a30 [Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel (#2728)
* fix copy basic build error

* fix other ck tile test build error

[ROCm/composable_kernel commit: 4a49dac7c6]
2025-08-28 20:30:30 +08:00
linqunAMD
615ca9842d Support Wave32 in CK_TILE - Part 1 (#2594)
* Support wave32/wave64 in CK_TILE - Part 1

* remove blocksize in kernel launch

* fix build error

* fix clang format

* fix clang format 2

* fix clang format 3

* fix fmha build error

* fix fmha build 2

* fix fmha build 3

* fix build error 4

* address review comment

* update change log

* replace KernelBlockSize with kBlockSize

* fix CI fail

* fix clang format

* address review comment and rebase code.

* fix universal test fail

---------

Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 9fcc1ee9fd]
2025-08-18 10:08:31 -07:00
Illia Silin
24f228df3b upgrade from clang-format-12 to clang-format-18 (#2568)
* upgrade to clang-format-18

* update to clang-format-18 in pre-commit-config

[ROCm/composable_kernel commit: 504b101da3]
2025-07-28 11:34:07 -07:00
carlushuang
f31119e021 [CK_TILE] moe sorting optimize local_token (#2469)
* fix bug in loops that need use local tokens to compute

* support extra chain local_token

* update

* update

* refine some main

* update

* support dispatch_policy

* fix 15 example

[ROCm/composable_kernel commit: cfe211cc60]
2025-07-15 09:42:18 +08:00
Po Yen Chen
7001322416 [CK_TILE] Fix compilation errors introduced in #2320, #2219 and #2214 (#2388)
* Fix compilation errors

* Fix more ck_tile example compilation errors

[ROCm/composable_kernel commit: 7d669440a6]
2025-06-23 12:29:15 +08:00
carlushuang
f540c6ccb4 [CK_TILE] moe_sorting support "local_tokens" feature for EP case (#2335)
* support local_token for hipgraph

* update README

* fix comment

* fix fmoe example

[ROCm/composable_kernel commit: a4e1248dba]
2025-06-18 10:49:43 +08:00
Satyanvesh Dittakavi
bde406245a Do not use warpSize as compile time constant as it is removed (#2320)
* Do not use warpSize as compile time constant as it is removed

* Update tile_image_to_column_shape.hpp

update warpSize usage.

* clean-up all use of warpSize, make sure code builds

* fix

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: 4c57157d50]
2025-06-17 11:54:30 -07:00
carlushuang
a7eb83a51b [CK_TILE] moe sorting optimization : refactor subtoken logic to let more kernel pickup mp kernel (#2327)
* refactor subtoken logic to let more kernel pickup mp kernel

* typo

[ROCm/composable_kernel commit: 8aff45a8af]
2025-06-12 11:44:22 +08:00
carlushuang
652263d073 [CK_TILE] optimize moe sorting kernel, boost large context case up to 20x (#2153)
* combine 2-3 as single stage

* support zeroing

* improve long tokens

* update specialization

* b16 ws

* 8bit topk optimize

* update 15 example

[ROCm/composable_kernel commit: 4e9b76f88c]
2025-05-06 17:32:07 +08:00
felix
20ffa0f474 hotfix fix sorting int64 (#2025)
* fix sorting int64

* clang format

* fix example issue

* update WA issue #

---------

Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: a82f338fb9]
2025-03-28 11:31:52 +08:00
carlushuang
1d32e34075 [CK_TILE] add moe-sorting MP kernel (#1910)
* moe sorting ex

* fix bug for race condition

* fix bug and optimze large expert

* fix

* optimize with sub_token_oneshot

* support skip empty tokens for expert sorting

* update moe_sorting

* tidy code

* support mp kernel

* hint mp

* remove use less code

* porting to example 15

---------

Co-authored-by: valarLip <340077269@qq.com>

[ROCm/composable_kernel commit: 353a612b44]
2025-02-25 17:56:55 +08:00
carlushuang
ad346270c2 [CK_TILE] moe sorting ex kernel to support expert > 128 (#1840)
* moe sorting ex

* fix bug for race condition

* fix bug and optimze large expert

* fix

* optimize with sub_token_oneshot

* support skip empty tokens for expert sorting

* update moe_sorting

* tidy code

[ROCm/composable_kernel commit: c0adab4850]
2025-02-11 17:49:17 +08:00
carlushuang
21264b4e60 [CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808)
* fix mock token id

* prepare host for g1u1

* reformat inline-asm

* restructure uk_0

* restructure gate_up

* done

* change default to init=1

* update readme

* fix a bug in interleave pipeline

* rcp for silu

[ROCm/composable_kernel commit: 1ff50e78c6]
2025-01-16 17:51:10 +08:00
carlushuang
edccbb3694 [CK_TILE] optimize moe-sorting kernel (#1771)
* opt moe sorting

* remove commented code

[ROCm/composable_kernel commit: 3d15f364b3]
2024-12-23 10:59:02 +08:00
carlushuang
74b0db75f7 [CK_TILE] fused-moe first version (#1634)
* moe pipeline

* update code

* compile OK

* update

* update cpu reference

* update pipeline_gemm0

* compiler ok

* update pipeline

* rename to ex pipeline

* block-asm

* update

* update

* update first gemm ok

* compute correct

* update file structure

* update README

* update

* update

* update code

* update API

* return unsupport case

* add comment

* update readme

* update

* uncomment

* update

* fix build err

---------

Co-authored-by: valarLip <340077269@qq.com>

[ROCm/composable_kernel commit: 440e28b08f]
2024-11-26 11:14:56 +08:00
carlushuang
d6ab951548 [CK_TILE]Moe update index (#1672)
* update MOCK_ID for moe-sorting

* add moe-smoothquant

* update a comment

* fix format

* hot fix

* update topk in overflow case

* update comments

* update bf16 cvt

---------

Co-authored-by: valarLip <340077269@qq.com>

[ROCm/composable_kernel commit: 36c7ce4e0e]
2024-11-25 13:12:35 +08:00
dummycoderfe
77f0f4ee48 Ck tile/moe sorting (#1624)
* add moe_sorting & check ok

* fix comments & typo

* Run remod.py under include/ck_tile & example/ck_tile directories

* format codes

* fix output ci check bug

* fix moe sorting readme and error commit file

* use magiv div to accelerate compute

* add an loop unroll for moe lds ops

* add extblocksnel to set zeros for moebufs

* [Ck_tile] moe set zero run ok, add size check and fix ref check

* [Ck_tile]fix moe_sorting fuse set_zero remod

* [Ck_tile] change name style, fix zero buffer size err, change folder

* [Ck_tile] moe_sorting: fix name style

* [Ck_tile] moe_sorting, remove useless params in traits

* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc

---------

Co-authored-by: dummycoderfe <noplydummmycoder@163.com>
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

[ROCm/composable_kernel commit: bec6fbc65f]
2024-11-09 17:57:27 +08:00