Aviral Goel
ee7a68b10f
chore(copyright): update copyright header for include directory ( #3293 )
...
[ROCm/composable_kernel commit: de6466481f ]
2025-11-26 11:00:05 -07:00
Michael Mcminn
d33b51181b
Ud fix moe sorting gfx908 ( #2720 )
...
* Adding a ds permute fallback for the gfx908 and older for row_newbcast:7 instruction
* Better macro for selecting ROW_NEWBCAST
* clang-format the update
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: afe1ff618d ]
2025-11-03 07:31:31 -08:00
felix
7b584fd2d2
Felix/opt sorting ( #2902 )
...
* merge felix/sorting
* opt moe sorting (#2822 )
* opt moe storing for 2k
---------
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
[ROCm/composable_kernel commit: 4c826abfff ]
2025-10-15 09:24:03 +08:00
joyeamd
2592957760
update s_barrier's logic in gfx12 architecture ( #3003 )
...
change s_waitcnt's logic in gfx1250
change s_waitcnt's logic in gfx1250
update comment
[ROCm/composable_kernel commit: b9d74e7746 ]
2025-10-14 08:49:34 -07:00
Sami Remes
93ba707be4
Use __builtin_amdgcn_readfirstlane for buffer resource in fused_moe ( #2893 )
...
* Use __builtin_amdgcn_readfirstlane for buffer resource in fused_moe
* also do the same for amd_buffer_addressing_builtins.hpp
* merge with develop
* fix clang format
---------
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: ef43078788 ]
2025-09-30 15:12:30 -07:00
carlushuang
47b8632296
hot fix check eid range ( #2924 )
...
* hot fix check eid range
* fix clang format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 2e9428eb63 ]
2025-09-29 09:38:38 -07:00
Khushbu Agarwal
bb5eeef2af
Fix for Add the API to load SGPR ( #2913 )
...
* Revert "Revert "[CK-Tile] Add the API to load SGPR (#2878 )" (#2904 )"
This reverts commit 5cc40c160f .
* Fix: sgpr minor issue
* cyclic dependency resolved
* clang formatted
* removing unused variable
* clang formatted
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: b56e5d1d79 ]
2025-09-25 10:32:42 -07:00
asleepzzz
5cc40c160f
Revert "[CK-Tile] Add the API to load SGPR ( #2878 )" ( #2904 )
...
This reverts commit fb5e953a05 .
[ROCm/composable_kernel commit: f161b5b738 ]
2025-09-23 14:33:51 -07:00
Thomas Ning
fb5e953a05
[CK-Tile] Add the API to load SGPR ( #2878 )
...
* Have a workable version for SGPR
* have a workable version for atomic add
* Revert "have a workable version for atomic add"
This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
* substitute with the new sgpr read api
* update the CHANGELOG
* have a workable version for atomic add
* Revert "have a workable version for atomic add"
This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
* change to static for logic
* have a workable version for atomic add
* Revert "have a workable version for atomic add"
This reverts commit 792377a590c26cfff9c8f545d9a9e8484a7422eb.
[ROCm/composable_kernel commit: 2cbbf5dcb3 ]
2025-09-23 01:23:56 -07:00
linqunAMD
4fab7b0a30
[Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel ( #2728 )
...
* fix copy basic build error
* fix other ck tile test build error
[ROCm/composable_kernel commit: 4a49dac7c6 ]
2025-08-28 20:30:30 +08:00
linqunAMD
615ca9842d
Support Wave32 in CK_TILE - Part 1 ( #2594 )
...
* Support wave32/wave64 in CK_TILE - Part 1
* remove blocksize in kernel launch
* fix build error
* fix clang format
* fix clang format 2
* fix clang format 3
* fix fmha build error
* fix fmha build 2
* fix fmha build 3
* fix build error 4
* address review comment
* update change log
* replace KernelBlockSize with kBlockSize
* fix CI fail
* fix clang format
* address review comment and rebase code.
* fix universal test fail
---------
Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 9fcc1ee9fd ]
2025-08-18 10:08:31 -07:00
Tianyuan Wu
abb90422b4
[CK_TILE] CK_TILE GEMM WMMA Support for GFX11/GFX12 ( #2466 )
...
* WMMA GEMM F16 Implementation
Signed-off-by: root <tianyuwu@amd.com >
* Self-review
Signed-off-by: root <tianyuwu@amd.com >
* ASIC check minor tweak
Signed-off-by: root <tianyuwu@amd.com >
* add missing include file
* Set GPU_TARGETS to gfx11/12 generic
Signed-off-by: root <tianyuwu@amd.com >
* INT8 GFX12
Signed-off-by: root <tianyuwu@amd.com >
* add int8x16 branch
* Fix CI script
Signed-off-by: root <tianyuwu@amd.com >
* Fix typo
Signed-off-by: root <tianyuwu@amd.com >
* Add CK_Tile WMMA example
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
* Fix CI
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
* fix clang format
* Set M/N_Warp Back to Constant
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
* Use GemmConfigComputeV3 by default
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Remove CK_Tile wmma gemm examples from the CI list
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Add atomic add fallback method for gfx11
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Fix typo
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Omit copyright year
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Support non-square cases
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Fix CI
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Add get_device_ip()
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Revert "Add atomic add fallback method for gfx11"
This reverts commit 4f664969c01b37976c8518c19833d9f1574cd746.
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
* Revert "Enable CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT for gfx12"
This reverts commit 949129a3858a825b2a2c4d3ec01663df18a165a5.
* Revise method name and typos
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
* clang-format
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Try fix CI
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Revert "Try fix CI"
This reverts commit 084c683227e64ab6a8137db00c8165fb05bdc902.
* clang-format
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
* Fix typo caused by merge
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
* Fix typo caused by merging
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
---------
Signed-off-by: root <tianyuwu@amd.com >
Signed-off-by: Tianyuan Wu <tianyuwu@amd.com >
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
Co-authored-by: joye <joye@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 68134b60e4 ]
2025-08-15 16:22:27 -07:00
Illia Silin
24f228df3b
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
[ROCm/composable_kernel commit: 504b101da3 ]
2025-07-28 11:34:07 -07:00
carlushuang
f31119e021
[CK_TILE] moe sorting optimize local_token ( #2469 )
...
* fix bug in loops that need use local tokens to compute
* support extra chain local_token
* update
* update
* refine some main
* update
* support dispatch_policy
* fix 15 example
[ROCm/composable_kernel commit: cfe211cc60 ]
2025-07-15 09:42:18 +08:00
Po Yen Chen
7001322416
[CK_TILE] Fix compilation errors introduced in #2320 , #2219 and #2214 ( #2388 )
...
* Fix compilation errors
* Fix more ck_tile example compilation errors
[ROCm/composable_kernel commit: 7d669440a6 ]
2025-06-23 12:29:15 +08:00
carlushuang
f540c6ccb4
[CK_TILE] moe_sorting support "local_tokens" feature for EP case ( #2335 )
...
* support local_token for hipgraph
* update README
* fix comment
* fix fmoe example
[ROCm/composable_kernel commit: a4e1248dba ]
2025-06-18 10:49:43 +08:00
Satyanvesh Dittakavi
bde406245a
Do not use warpSize as compile time constant as it is removed ( #2320 )
...
* Do not use warpSize as compile time constant as it is removed
* Update tile_image_to_column_shape.hpp
update warpSize usage.
* clean-up all use of warpSize, make sure code builds
* fix
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: 4c57157d50 ]
2025-06-17 11:54:30 -07:00
carlushuang
a7eb83a51b
[CK_TILE] moe sorting optimization : refactor subtoken logic to let more kernel pickup mp kernel ( #2327 )
...
* refactor subtoken logic to let more kernel pickup mp kernel
* typo
[ROCm/composable_kernel commit: 8aff45a8af ]
2025-06-12 11:44:22 +08:00
carlushuang
652263d073
[CK_TILE] optimize moe sorting kernel, boost large context case up to 20x ( #2153 )
...
* combine 2-3 as single stage
* support zeroing
* improve long tokens
* update specialization
* b16 ws
* 8bit topk optimize
* update 15 example
[ROCm/composable_kernel commit: 4e9b76f88c ]
2025-05-06 17:32:07 +08:00
felix
20ffa0f474
hotfix fix sorting int64 ( #2025 )
...
* fix sorting int64
* clang format
* fix example issue
* update WA issue #
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: a82f338fb9 ]
2025-03-28 11:31:52 +08:00
Illia Silin
9d24409070
Replace buffer load/store intrinsics with builtins ( #1876 )
...
* replace buffer load/store intrinsics with builtins
* fix clang format
* replace buffer load/store intrinsics with built-ins in ck_tile
* fix clang format
* add switch between buffer intrinsics and built-ins
* change the builtins threshold to clang20
* fix clang format
* fix some compilation errors
* revert changes in ck_tile
* revert changes in ck_tile
* delete all root files and folders when CI completes
* try changing the username in CI
* fix groovy syntax
* add user and group id info to ci dockers
* change ownership of all files in CI to jenkins at the end
* update changelog
[ROCm/composable_kernel commit: a88bf76ecc ]
2025-03-05 14:33:28 -08:00
carlushuang
1d32e34075
[CK_TILE] add moe-sorting MP kernel ( #1910 )
...
* moe sorting ex
* fix bug for race condition
* fix bug and optimze large expert
* fix
* optimize with sub_token_oneshot
* support skip empty tokens for expert sorting
* update moe_sorting
* tidy code
* support mp kernel
* hint mp
* remove use less code
* porting to example 15
---------
Co-authored-by: valarLip <340077269@qq.com >
[ROCm/composable_kernel commit: 353a612b44 ]
2025-02-25 17:56:55 +08:00
carlushuang
ad346270c2
[CK_TILE] moe sorting ex kernel to support expert > 128 ( #1840 )
...
* moe sorting ex
* fix bug for race condition
* fix bug and optimze large expert
* fix
* optimize with sub_token_oneshot
* support skip empty tokens for expert sorting
* update moe_sorting
* tidy code
[ROCm/composable_kernel commit: c0adab4850 ]
2025-02-11 17:49:17 +08:00
carlushuang
21264b4e60
[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block ( #1808 )
...
* fix mock token id
* prepare host for g1u1
* reformat inline-asm
* restructure uk_0
* restructure gate_up
* done
* change default to init=1
* update readme
* fix a bug in interleave pipeline
* rcp for silu
[ROCm/composable_kernel commit: 1ff50e78c6 ]
2025-01-16 17:51:10 +08:00
carlushuang
edccbb3694
[CK_TILE] optimize moe-sorting kernel ( #1771 )
...
* opt moe sorting
* remove commented code
[ROCm/composable_kernel commit: 3d15f364b3 ]
2024-12-23 10:59:02 +08:00
Xu, Shengnan
8a566c94b4
added moe interleaving pipeline ( #1712 )
...
* added moe interleaving pipeline
* remove redundant code
* formater
---------
Co-authored-by: root <root@hjbog-srdc-14.amd.com >
[ROCm/composable_kernel commit: f57d720c67 ]
2024-12-15 20:13:10 +08:00
carlushuang
74b0db75f7
[CK_TILE] fused-moe first version ( #1634 )
...
* moe pipeline
* update code
* compile OK
* update
* update cpu reference
* update pipeline_gemm0
* compiler ok
* update pipeline
* rename to ex pipeline
* block-asm
* update
* update
* update first gemm ok
* compute correct
* update file structure
* update README
* update
* update
* update code
* update API
* return unsupport case
* add comment
* update readme
* update
* uncomment
* update
* fix build err
---------
Co-authored-by: valarLip <340077269@qq.com >
[ROCm/composable_kernel commit: 440e28b08f ]
2024-11-26 11:14:56 +08:00
carlushuang
d6ab951548
[CK_TILE]Moe update index ( #1672 )
...
* update MOCK_ID for moe-sorting
* add moe-smoothquant
* update a comment
* fix format
* hot fix
* update topk in overflow case
* update comments
* update bf16 cvt
---------
Co-authored-by: valarLip <340077269@qq.com >
[ROCm/composable_kernel commit: 36c7ce4e0e ]
2024-11-25 13:12:35 +08:00
dummycoderfe
77f0f4ee48
Ck tile/moe sorting ( #1624 )
...
* add moe_sorting & check ok
* fix comments & typo
* Run remod.py under include/ck_tile & example/ck_tile directories
* format codes
* fix output ci check bug
* fix moe sorting readme and error commit file
* use magiv div to accelerate compute
* add an loop unroll for moe lds ops
* add extblocksnel to set zeros for moebufs
* [Ck_tile] moe set zero run ok, add size check and fix ref check
* [Ck_tile]fix moe_sorting fuse set_zero remod
* [Ck_tile] change name style, fix zero buffer size err, change folder
* [Ck_tile] moe_sorting: fix name style
* [Ck_tile] moe_sorting, remove useless params in traits
* [Ck_tile] change outputtile cnt * unit_size; change output buf alloc
---------
Co-authored-by: dummycoderfe <noplydummmycoder@163.com >
Co-authored-by: Po Yen, Chen <PoYen.Chen@amd.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
[ROCm/composable_kernel commit: bec6fbc65f ]
2024-11-09 17:57:27 +08:00