Commit Graph

3879 Commits

Author SHA1 Message Date
Khushbu Agarwal
2498b499a1 [CK_TILE] Adding support for TiledPermuteN on preshuffle Block Scale Gemm (#3019)
* Adding support for TiledPermuteN

* Adding test

* resolving remod.py

---------

Co-authored-by: root <root@banff-cyxtera-s73-2.ctr.dcgpu>

[ROCm/composable_kernel commit: 0584399571]
2025-10-24 11:06:51 -07:00
assistant-librarian[bot]
0d4c6c2c13 Merge commit 'f39626fcf72d0188946040fe6441437415707343' into develop 2025-10-24 16:13:23 +00:00
Max Podkorytov
99ad6f60e4 [CK][host] limit the rotating count to prevent oom (#3089)
* [CK][host] limit the rotating count to prevent oom

* add numeric header for accumulate

[ROCm/composable_kernel commit: f39626fcf7]
2025-10-24 08:55:54 -07:00
Max Podkorytov
c67f3501b0 limit the rotating count to prevent oom (#3087)
[ROCm/composable_kernel commit: fdcc1f75c3]
2025-10-24 08:55:34 -07:00
assistant-librarian[bot]
2550111808 Merge commit '775b96ea6a8bb0d82d635dc1a396c8d98091c832' into develop 2025-10-24 15:12:08 +00:00
andrew clark
07d67497ff Fixing Run CI Check for Changed Files (#3072)
* Fixing check for changed files

* Testing CI skip behavior

* Testing CI Trigger

This should skip CI

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 775b96ea6a]
2025-10-24 07:52:43 -07:00
kyle-256
c4448c9d7c [CK_TILE] add tensorwise quant in grouped gemm (#3007)
* add tensorwise quant in grouped gemm

* fix example issue

* update test cases

* format codes

* clang format

* use GTEST_FAIL

* fix a bug in test_grouped_gemm_util

* skip test when use wmma on grouped_quant kernel

* change cmake

* change code based on comments

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 3c12a02827]
2025-10-24 07:41:54 -07:00
assistant-librarian[bot]
52434da15a Merge commit '6bbc05e1bd1f1dd1bcc61a1e815f470cd4c9ac7f' into develop 2025-10-24 09:13:29 +00:00
yinglu
6a7861bbec conv:tf32:add missed instances (#3081)
* conv:tf32:add missed instances

[ROCm/composable_kernel commit: 6bbc05e1bd]
2025-10-24 16:28:36 +08:00
assistant-librarian[bot]
9fde1d98ad Merge commit 'd0364641ed7f7520ed0163e4768d900b8c07af7a' into develop 2025-10-23 20:13:04 +00:00
Robin Voetter
e316ba18ed [CK_BUILDER] old ck build fixes (#3075)
* Disable c++20-compat warnings when building old CK in C++20 mode

Turns out that this creates some warnings for no good reason.

* ck-builder: add missing layouts and element-wise op names

For layouts, we can directly use the ::name attribute, which should
cover all layouts. For element-wise ops, I just added the ones which
are currently missing when compiling CK with -DMIOPEN_REQ_LIBS_ONLY.

[ROCm/composable_kernel commit: d0364641ed]
2025-10-23 13:01:19 -07:00
Thrupti Raj Lakshmana Gowda
96942c824f Excluding Tile engine from build (#3085)
[ROCm/composable_kernel commit: 0fd7d1a607]
2025-10-23 12:57:18 -07:00
Geo Min
0e6a5289fa adding commit hash (#3084)
[ROCm/composable_kernel commit: 2546fc241e]
2025-10-23 12:32:26 -07:00
assistant-librarian[bot]
8505bc05c9 Merge commit 'fe4eaeb2eb28088e07d7c7e5f8bd7499831a427c' into develop 2025-10-23 19:11:30 +00:00
Yi DING
5338925d70 Use filename but not path to filter compilation (#3083)
* prologue

* Use filename but not path to filter test compilation

[ROCm/composable_kernel commit: fe4eaeb2eb]
2025-10-23 12:01:26 -07:00
assistant-librarian[bot]
0bd24dbfbf Merge commit 'bedade257241fef37a28c6e540e73f1c056d27b9' into develop 2025-10-23 18:15:09 +00:00
Gino Lu
d6933e661d [CK_TILE] Add fp4 warp gemm 16x16x128 (#2738)
* first commit

* fix format error

* fix vec size error

* fix clang format

* fix type error

* add interface in warp_gemm_impl

* fix interface

* fix bug

* fix bug

---------

Co-authored-by: asleepzzz <hanwen.chang@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: bedade2572]
2025-10-23 10:55:51 -07:00
Rostyslav Geyyer
1df6f6af8e Rearrange pointers to fix the reinterpret_cast issue (#3077)
[ROCm/composable_kernel commit: 6df69abeef]
2025-10-23 10:54:13 -07:00
Qianfeng
6ad906b040 [CK_TILE] Fix in set_slice_tile (#2232)
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: fbd101b1ac]
2025-10-23 10:34:02 -07:00
assistant-librarian[bot]
6c1c433260 Merge commit 'b9789a0742e4623a109472fad567ccea14c7ed89' into develop 2025-10-23 07:13:33 +00:00
Michal Kulikowski
c37371e3ef [CK][Examples] Fixing stride issues in ck examples by workaround - Bypassing hostTensor validation.
Signed-off-by: Michal Kulikowski <Michal.Kulikowski@amd.com>


[ROCm/composable_kernel commit: b9789a0742]
2025-10-23 08:46:02 +02:00
assistant-librarian[bot]
32b15133e0 Merge commit '0d3860dfdb3299dea139953c3ce62da5325019c6' into develop 2025-10-23 01:39:45 +00:00
Haocong WANG
895983c816 [CKTILE] FMHA fwd trload lse fix (#3046)
* enable storelse for fmha_fwd_trload kernel

* fix lse in trload

* fix the mask related bug

[ROCm/composable_kernel commit: 0d3860dfdb]
2025-10-23 09:33:33 +08:00
assistant-librarian[bot]
c4c504b867 Merge commit '1b95803431d50361d22c3b76c4caf6608e83069d' into develop 2025-10-22 20:13:26 +00:00
spolifroni-amd
7c14d97d0e updated the changelog with 7.1 and beyond info
[ROCm/composable_kernel commit: 1b95803431]
2025-10-22 13:35:45 -06:00
assistant-librarian[bot]
898ae9c620 Merge commit '211d64e18a1bf2ecb1d13c5eb87983bdcabb3b5e' into develop 2025-10-22 15:12:27 +00:00
lalala-sh
0329d71fb9 [CK_TILE] Update flatmm related kernels (#3022)
---------

Co-authored-by: Ding, Yi <yi.ding@amd.com>
Co-authored-by: felix <felix.li@amd.com>

[ROCm/composable_kernel commit: 211d64e18a]
2025-10-22 22:36:11 +08:00
assistant-librarian[bot]
2934bb0489 Merge commit 'cbd1279ae68d8b463b9b20106e968f8ccf2a11e6' into develop 2025-10-22 12:17:24 +00:00
Johannes Graner
a6c3252766 [CK_TILE] Conv bwd splitN support (#3047)
* Conv bwd splitN support

* Adjust splitting calculations to lengths format

* Prepare indexing for future splitK support

[ROCm/composable_kernel commit: cbd1279ae6]
2025-10-22 13:34:06 +02:00
assistant-librarian[bot]
2708c12866 Merge commit '5a27a97391d08652c3da0a5347209c19d3ebb03d' into develop 2025-10-22 07:14:09 +00:00
MHYangAMD
f23b8cde7b Introduce tree reduction for BlockReduce2dCrossWarpSync (#2588)
* Introduce tree reduction for BlockReduce2dCrossWarpSync

* Rename original impl to BlockReduce2dLinearCrossWarpSync

* Replace warp_size with get_warp_size()

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 5a27a97391]
2025-10-22 14:41:35 +08:00
assistant-librarian[bot]
5fbbd3eed7 Merge commit '37dff024c1d2c6420a91d9a4b0801b350db3eede' into develop 2025-10-22 04:13:42 +00:00
John Shumway
a488126d3e [CK_BUILDER] Add compile-time reflection for a convolution instance (#3065)
* [CK_BILDER] Add compile-time reflection for a convolution instance

Introduce InstanceTraits template metaprogramming framework to enable runtime introspection of device kernel template parameters without requiring implementation knowledge. This reflection system extracts configuration details (block sizes, data types, layouts, tuning parameters) directly from kernel specializations through template
pattern matching. In particular, the GetInstanceString method returns a string that uniquely idenitfies the kernel, by explicitly serializing all template paramter values.

This provides critical functionality for MIOpen integration, since the existing GetTypeString method is ambiguous, and only captures some of the template paramters.

The implementation uses a two-level design: a primary InstanceTraits template declaration in instance_traits.hpp serves as the interface, while kernel-specific specializations (e.g., for DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3) provide the actual extraction logic. This separation allows the reflection system to scale to additional kernel types without modifying the core interface.

Key architectural decisions:

- Forward-declare device kernels in instance_traits.hpp to avoid  circular dependencies, since device implementation headers will  include the reflection headers

- Use compile-time constants and type aliases to expose kernel  parameters, enabling zero-overhead introspection

- Provide a templated instance_string() function that generates human-readable  kernel configuration strings by serializing all template parameters  in order, useful for debugging and kernel identification

- Guard reflection integration with preprocessor definition CK_EXPERIMENTAL_BUILDER to keep  it opt-in until the API stabilizes

- Add GetInstanceString() virtual method to BaseOperator, allowing  runtime polymorphic access to compile-time kernel information

This infrastructure also enables upcoming higher-level semantic reflection abstractions (like ConvTraits) to query kernel configurations programmatically.

Includes unit tests validating both the trait extraction accuracy and the string generation format.

[ROCm/composable_kernel commit: 37dff024c1]
2025-10-21 21:10:19 -07:00
assistant-librarian[bot]
6ecded14e2 Merge commit '3a28632b203f9219ed4906d46457872ef1084054' into develop 2025-10-21 14:13:05 +00:00
Bartłomiej Kocot
ebd8495721 Gridwise gemm conv v3 force padded layout on gfx950 (#2961)
* Gridwise gemm conv v3 force padded layout on gfx950

* fix bug in other gridwise

* fix

* Update gridwise_gemm_wmma_cshuffle_v3_common.hpp

[ROCm/composable_kernel commit: 3a28632b20]
2025-10-21 15:41:02 +02:00
assistant-librarian[bot]
c8e373c4ab Merge commit '35754d2ec817087a2a7de53729f2a97c7c9f05fa' into develop 2025-10-21 13:22:20 +00:00
Yashvardhan Agarwal
12e9bcd7e2 fix identity value of AbsMax (#3058)
* fix identity value of AbsMax

- Identity value of AbsMax should be 0 not numeric<T>::lowest()

* Update include/ck_tile/core/utility/reduce_operator.hpp

resolved comment

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>

---------

Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>

[ROCm/composable_kernel commit: 35754d2ec8]
2025-10-21 14:42:08 +02:00
assistant-librarian[bot]
e237b82762 Merge commit '4043401db186ee006f14fb00842af29c194ba209' into develop 2025-10-21 08:15:24 +00:00
Johannes Graner
0be14218d4 Fix race conditions in ck_tile remod (#3061)
[ROCm/composable_kernel commit: 4043401db1]
2025-10-21 09:35:04 +02:00
assistant-librarian[bot]
d3658e9aa2 Merge commit 'ff6efa2fb17db0266b0ff2fa531ffc9fad31b0cc' into develop 2025-10-21 03:28:40 +00:00
Max Podkorytov
eecc99e83d refine
[ROCm/composable_kernel commit: ff6efa2fb1]
2025-10-20 23:13:58 -04:00
Max Podkorytov
983a221831 update build instructions
[ROCm/composable_kernel commit: b9e966e574]
2025-10-20 23:13:58 -04:00
assistant-librarian[bot]
9d8f10c9f3 Merge commit 'e20923f384492dab3dafdbace6f2bd2b45186cc2' into develop 2025-10-21 02:41:02 +00:00
Yi DING
0c61d0da8d [CK_TILE] Add fmt: skip to FMHA codegen scripts for readability (#3057)
* fmt: skip for fmha_bwd.py

* more fmt: skip

* thank you, copilot

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: e20923f384]
2025-10-21 10:15:04 +08:00
assistant-librarian[bot]
93cc3fd985 Merge commit '2570462ecf46b51267548d41eb749c67a52d6085' into develop 2025-10-20 21:11:26 +00:00
Max Podkorytov
df3f347a27 [CK_TILE] Fix transpose_vectors for 2x2 8-bit tiles (#3042)
fix transpose_vectors logic for 2x2 8-bit tiles

    add a test which goes through this code path.

    factor out constexpr'd cases into smaller functions.

    add inline docs about the data movement

    impact: gemms with 8-bit non-rcr inputs on gfx942


[ROCm/composable_kernel commit: 2570462ecf]
2025-10-20 13:40:44 -07:00
assistant-librarian[bot]
156cfffbc6 Merge commit '9f770610948b2666cc021e8ae6955821caad7791' into develop 2025-10-20 16:13:25 +00:00
Thrupti Raj Lakshmana Gowda
09acf06d06 [CK TILE ENGINE] Code changes to finding GPU id from TARGET (#3055)
* Reading gpuname from target for gemm in ck tile engine

* Reading gpuname from target for gemm preshuffle in ck tile engine

* Reading gpuname from target for gemm preshuffle in ck tile engine

* Get GPU changes for GEMM Muti D in TILE ENGINE

* Addressing errors for gpu name in cktileengine

[ROCm/composable_kernel commit: 9f77061094]
2025-10-20 09:02:18 -07:00
assistant-librarian[bot]
d0b980ba30 Merge commit 'f18b79f328df35e2305416b890dbb9eb561fa9e2' into develop 2025-10-20 15:12:34 +00:00
John Shumway
f57d4937c6 [CK_BUILDER] Add experimental builder directory and configuration for composable_kernel (#3043)
Add experimental builder infrastructure for composable_kernel

- Add experimental/builder directory with README documentation.
- Create initial test infrastructure with CMakeLists.txt and placeholder test.
- Update root CMakeLists.txt to support CK_EXPERIMENTAL_BUILDER option.
- Update .gitignore to not treat `experimental/builder` as a CMake build directory.

This establishes the directory structure  for a high-level builder pattern that will provide a semantically-clear interface for constructing CK operations, with initial focus on convolution kernels for MIOpen integration.


[ROCm/composable_kernel commit: f18b79f328]
2025-10-20 07:54:09 -07:00