composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-15 18:42:06 +00:00

Author	SHA1	Message	Date
Thomas Ning	0c8c232a0a	Shuffle fix for gfx950 (#3491 ) * solve compiler issue * solve the gfx950 mfma shuffle regression * refactor jenkinsfile to handle arch name better * [CK TILE] set divisor to count of thread along k dimension * fix the compiler error * solve degradation * Finish the multiplies fix * fix the scales * solve compilation error * solve the composes * solve the error of tile sweeper * fix the test and example * fix for gfx950 --------- Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com> Co-authored-by: Cong Ma <congma13@amd.com> [ROCm/composable_kernel commit: `00c46785a8`]	2026-01-13 09:21:29 -08:00
linqunAMD	245c274287	[CK_TILE] Port hw independent changes from internal repo to develop branch (#3301 ) * [CK_TILE] Port hw independent changes from internal repo to develop branch It includes PR#96, #114, #120, #121. * correct rebase error [ROCm/composable_kernel commit: `fc7bf0ab1c`]	2025-12-12 09:28:37 -08:00
Aviral Goel	216c23b945	chore(copyright): update copyright header for include directory (#3293 ) [ROCm/composable_kernel commit: `de6466481f`]	2025-11-26 11:00:05 -07:00
Michael Mcminn	699f7daae3	Ud fix moe sorting gfx908 (#2720 ) * Adding a ds permute fallback for the gfx908 and older for row_newbcast:7 instruction * Better macro for selecting ROW_NEWBCAST * clang-format the update --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `afe1ff618d`]	2025-11-03 07:31:31 -08:00
felix	b6f6b7cd2a	Felix/opt sorting (#2902 ) * merge felix/sorting * opt moe sorting (#2822) * opt moe storing for 2k --------- Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com> Co-authored-by: coderfeli <coderfeli@163.com> [ROCm/composable_kernel commit: `4c826abfff`]	2025-10-15 09:24:03 +08:00
joyeamd	ed83bcb9a2	update s_barrier's logic in gfx12 architecture (#3003 ) change s_waitcnt's logic in gfx1250 change s_waitcnt's logic in gfx1250 update comment [ROCm/composable_kernel commit: `b9d74e7746`]	2025-10-14 08:49:34 -07:00
carlushuang	3c553f66b2	hot fix check eid range (#2924 ) * hot fix check eid range * fix clang format --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com> [ROCm/composable_kernel commit: `2e9428eb63`]	2025-09-29 09:38:38 -07:00
Khushbu Agarwal	9ed178a93e	Fix for Add the API to load SGPR (#2913 ) * Revert "Revert "[CK-Tile] Add the API to load SGPR (#2878)" (#2904)" This reverts commit 4c78cc31c5b8e0c9db09c24fa35393f603a8a47f. * Fix: sgpr minor issue * cyclic dependency resolved * clang formatted * removing unused variable * clang formatted --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> [ROCm/composable_kernel commit: `b56e5d1d79`]	2025-09-25 10:32:42 -07:00
asleepzzz	651a5dd0b9	Revert "[CK-Tile] Add the API to load SGPR (#2878 )" (#2904 ) This reverts commit 3e008a2d22ad1ba8a9b2c7eca369a8593b7d6e95. [ROCm/composable_kernel commit: `f161b5b738`]	2025-09-23 14:33:51 -07:00
Thomas Ning	e3702467d5	[CK-Tile] Add the API to load SGPR (#2878 ) * Have a workable version for SGPR * have a workable version for atomic add * Revert "have a workable version for atomic add" This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb. * substitute with the new sgpr read api * update the CHANGELOG * have a workable version for atomic add * Revert "have a workable version for atomic add" This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb. * change to static for logic * have a workable version for atomic add * Revert "have a workable version for atomic add" This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb. [ROCm/composable_kernel commit: `2cbbf5dcb3`]	2025-09-23 01:23:56 -07:00
linqunAMD	cd6d731322	[Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel (#2728 ) * fix copy basic build error * fix other ck tile test build error [ROCm/composable_kernel commit: `4a49dac7c6`]	2025-08-28 20:30:30 +08:00
linqunAMD	807f7510b5	Support Wave32 in CK_TILE - Part 1 (#2594 ) * Support wave32/wave64 in CK_TILE - Part 1 * remove blocksize in kernel launch * fix build error * fix clang format * fix clang format 2 * fix clang format 3 * fix fmha build error * fix fmha build 2 * fix fmha build 3 * fix build error 4 * address review comment * update change log * replace KernelBlockSize with kBlockSize * fix CI fail * fix clang format * address review comment and rebase code. * fix universal test fail --------- Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com> [ROCm/composable_kernel commit: `9fcc1ee9fd`]	2025-08-18 10:08:31 -07:00
Illia Silin	3345f5f417	upgrade from clang-format-12 to clang-format-18 (#2568 ) * upgrade to clang-format-18 * update to clang-format-18 in pre-commit-config [ROCm/composable_kernel commit: `504b101da3`]	2025-07-28 11:34:07 -07:00
carlushuang	34ca5f6a68	[CK_TILE] moe sorting optimize local_token (#2469 ) * fix bug in loops that need use local tokens to compute * support extra chain local_token * update * update * refine some main * update * support dispatch_policy * fix 15 example [ROCm/composable_kernel commit: `cfe211cc60`]	2025-07-15 09:42:18 +08:00
Po Yen Chen	4456552543	[CK_TILE] Fix compilation errors introduced in #2320 , #2219 and #2214 (#2388 ) * Fix compilation errors * Fix more ck_tile example compilation errors [ROCm/composable_kernel commit: `7d669440a6`]	2025-06-23 12:29:15 +08:00
carlushuang	8660f6ef22	[CK_TILE] moe_sorting support "local_tokens" feature for EP case (#2335 ) * support local_token for hipgraph * update README * fix comment * fix fmoe example [ROCm/composable_kernel commit: `a4e1248dba`]	2025-06-18 10:49:43 +08:00
Satyanvesh Dittakavi	a4517b0a9d	Do not use warpSize as compile time constant as it is removed (#2320 ) * Do not use warpSize as compile time constant as it is removed * Update tile_image_to_column_shape.hpp update warpSize usage. * clean-up all use of warpSize, make sure code builds * fix --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: Bartlomiej Kocot <barkocot@amd.com> [ROCm/composable_kernel commit: `4c57157d50`]	2025-06-17 11:54:30 -07:00
carlushuang	5d7302c240	[CK_TILE] moe sorting optimization : refactor subtoken logic to let more kernel pickup mp kernel (#2327 ) * refactor subtoken logic to let more kernel pickup mp kernel * typo [ROCm/composable_kernel commit: `8aff45a8af`]	2025-06-12 11:44:22 +08:00
carlushuang	807501ac3d	[CK_TILE] optimize moe sorting kernel, boost large context case up to 20x (#2153 ) * combine 2-3 as single stage * support zeroing * improve long tokens * update specialization * b16 ws * 8bit topk optimize * update 15 example [ROCm/composable_kernel commit: `4e9b76f88c`]	2025-05-06 17:32:07 +08:00
felix	817752cdb4	hotfix fix sorting int64 (#2025 ) * fix sorting int64 * clang format * fix example issue * update WA issue # --------- Co-authored-by: coderfeli <coderfeli@163.com> Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `a82f338fb9`]	2025-03-28 11:31:52 +08:00
carlushuang	581c75f3b7	[CK_TILE] add moe-sorting MP kernel (#1910 ) * moe sorting ex * fix bug for race condition * fix bug and optimze large expert * fix * optimize with sub_token_oneshot * support skip empty tokens for expert sorting * update moe_sorting * tidy code * support mp kernel * hint mp * remove use less code * porting to example 15 --------- Co-authored-by: valarLip <340077269@qq.com> [ROCm/composable_kernel commit: `353a612b44`]	2025-02-25 17:56:55 +08:00
carlushuang	8ed234da8c	[CK_TILE] moe sorting ex kernel to support expert > 128 (#1840 ) * moe sorting ex * fix bug for race condition * fix bug and optimze large expert * fix * optimize with sub_token_oneshot * support skip empty tokens for expert sorting * update moe_sorting * tidy code [ROCm/composable_kernel commit: `c0adab4850`]	2025-02-11 17:49:17 +08:00
carlushuang	2fec988802	[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808 ) * fix mock token id * prepare host for g1u1 * reformat inline-asm * restructure uk_0 * restructure gate_up * done * change default to init=1 * update readme * fix a bug in interleave pipeline * rcp for silu [ROCm/composable_kernel commit: `1ff50e78c6`]	2025-01-16 17:51:10 +08:00
carlushuang	4c4be7b14f	[CK_TILE] optimize moe-sorting kernel (#1771 ) * opt moe sorting * remove commented code [ROCm/composable_kernel commit: `3d15f364b3`]	2024-12-23 10:59:02 +08:00
carlushuang	8acce2dee1	[CK_TILE] fused-moe first version (#1634 ) * moe pipeline * update code * compile OK * update * update cpu reference * update pipeline_gemm0 * compiler ok * update pipeline * rename to ex pipeline * block-asm * update * update * update first gemm ok * compute correct * update file structure * update README * update * update * update code * update API * return unsupport case * add comment * update readme * update * uncomment * update * fix build err --------- Co-authored-by: valarLip <340077269@qq.com> [ROCm/composable_kernel commit: `440e28b08f`]	2024-11-26 11:14:56 +08:00
carlushuang	4fad52fea6	[CK_TILE]Moe update index (#1672 ) * update MOCK_ID for moe-sorting * add moe-smoothquant * update a comment * fix format * hot fix * update topk in overflow case * update comments * update bf16 cvt --------- Co-authored-by: valarLip <340077269@qq.com> [ROCm/composable_kernel commit: `36c7ce4e0e`]	2024-11-25 13:12:35 +08:00
dummycoderfe	eec0fed606	Ck tile/moe sorting (#1624 ) * add moe_sorting & check ok * fix comments & typo * Run remod.py under include/ck_tile & example/ck_tile directories * format codes * fix output ci check bug * fix moe sorting readme and error commit file * use magiv div to accelerate compute * add an loop unroll for moe lds ops * add extblocksnel to set zeros for moebufs * [Ck_tile] moe set zero run ok, add size check and fix ref check * [Ck_tile]fix moe_sorting fuse set_zero remod * [Ck_tile] change name style, fix zero buffer size err, change folder * [Ck_tile] moe_sorting: fix name style * [Ck_tile] moe_sorting, remove useless params in traits * [Ck_tile] change outputtile cnt * unit_size; change output buf alloc --------- Co-authored-by: dummycoderfe <noplydummmycoder@163.com> Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com> Co-authored-by: carlushuang <carlus.huang@amd.com> [ROCm/composable_kernel commit: `bec6fbc65f`]	2024-11-09 17:57:27 +08:00

27 Commits