Commit Graph

4046 Commits

Author SHA1 Message Date
Aviral Goel
d810876d63 feat(precommit-hooks): add check for correct copyright header (#3302)
* chore(copyright): update copyright header for left files

* feat(copyright): add copyright check to precommit hooks

* chore(copyright): update copyright header for include/ck_tile directory

* chore(copyright): update copyright header for example directory

* chore(copyright): update copyright header for .github directory

* refactor: copyright_check script with better if else handling

* chore(copyright): update compyright header for remaining files

* feat: add script to automate copyright addition

[ROCm/composable_kernel commit: 6d25525adc]
2025-12-10 22:50:43 -08:00
Aviral Goel
f38b64ae67 docs: add notes on tile distribution and inline comments (#3297)
* docs: add notes on tile distribution and inline comments

* Apply suggestions from code review

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: fbbdd36ea8]
2025-12-10 22:47:19 -08:00
assistant-librarian[bot]
72cc7dfc77 Merge commit '8270900d606398868e747b7f9097484ee73a4cb4' into develop 2025-12-11 01:41:32 +00:00
Geo Min
f2a77cf0bd [ci] Bumping TheRock commit hash (#3385)
* Bumping TheRock commit hash

* new docker hash

* Using new runner name

[ROCm/composable_kernel commit: 8270900d60]
2025-12-10 17:34:41 -08:00
assistant-librarian[bot]
988b7e109d Merge commit '15ed65db35e6702593cd8ed1d603222fb11684e4' into develop 2025-12-10 21:13:52 +00:00
John Shumway
c868964f6a Improve sequence sorting and add unit tests (#3376)
Old sequence sort code was showing up on build profiles. Convert it to constexpr functions for much more efficient build-time execution. The sorting is still O(N^2), but our sequences are small enough it executes quickly. This reduced compilation time of a small convolution by more than 10% and time overall time spent in the compiler on a narrow build by %6.

[ROCm/composable_kernel commit: 15ed65db35]
2025-12-10 12:25:23 -08:00
assistant-librarian[bot]
9daab9664d Merge commit 'b15df372553e0f80a660124f1b558d9cb276bd08' into develop 2025-12-10 16:15:45 +00:00
Po Yen Chen
737c80d47d fix: python 3.8 compatibility in fmha codegen (#3388)
[ROCm/composable_kernel commit: b15df37255]
2025-12-10 07:08:41 -08:00
assistant-librarian[bot]
636a90b531 Merge commit 'fc22320d783a6b73798a23d8d20fb24e3a5e4040' into develop 2025-12-10 08:15:28 +00:00
Ville Pietilä
d719c09343 [CK_TILE] Split-K autodeduction (#3351)
* First version of split-K autodeduction.

* Fix circular dependency and kernel construction.

* Fix tolerance calculation for bwd weight example.

* Simplify kernel construction.

* Fix kernel launching bug for split-K autodeduce.

* Add split-K autodeduction support for the two stage example.

* Fix a corner case.

* Fix clang-format.

* Fix clang-format for inc files.

* Add missing header.

* Prevent too large split-K values.

* Fix formatting.

* Add unit tests for IsSupportedArgument in grouped bwd conv.

* clang-format.

* Fix merge conflicts.

* Address feedback from code review.

* clang-format

* Fix new tests after merge.

---------

Co-authored-by: Ville Pietilä <>

[ROCm/composable_kernel commit: fc22320d78]
2025-12-10 09:30:30 +02:00
assistant-librarian[bot]
490d6daf13 Merge commit '1aa93ef551a31405aef5c8c14e869241ba96639d' into develop 2025-12-10 02:46:30 +00:00
Zzz9990
822da5d3a7 [CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377)
* update coherence
---------

Co-authored-by: Zzz9990 <Zzz9990>

[ROCm/composable_kernel commit: 1aa93ef551]
2025-12-10 10:03:28 +08:00
assistant-librarian[bot]
dfeb7a11b9 Merge commit '934ba1208ab7cfc82c20f73b14994b64c3843d2d' into develop 2025-12-09 23:12:58 +00:00
Illia Silin
ee0d92f8fc use hipTensor from monorepo for daily builds (#3386)
[ROCm/composable_kernel commit: 934ba1208a]
2025-12-09 14:39:08 -08:00
assistant-librarian[bot]
d6fe69e6fd Merge commit '0d8259affd4f59eb8b1143b658d83d3800270f43' into develop 2025-12-09 20:14:23 +00:00
Illia Silin
5f4c14b336 temporarily disable daily builds on gfx1010 and gfx908 (#3384)
[ROCm/composable_kernel commit: 0d8259affd]
2025-12-09 10:37:13 -08:00
assistant-librarian[bot]
636bc57ab2 Merge commit '7582c9e73fc3e580a2255988310cb25391f80162' into develop 2025-12-09 16:14:29 +00:00
Illia Silin
cdacf1d5f5 Upgrade to ROCm7.1.1 as default compiler. (#3370)
* upgrade to rocm7.1.1 as new default compiler

* fix jenkinsfile

[ROCm/composable_kernel commit: 7582c9e73f]
2025-12-09 07:35:32 -08:00
dependabot[bot]
821b976ead Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx (#3374)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.20.1 to 1.31.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.20.1...v1.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 50ca3f83eb]
2025-12-09 07:10:34 -08:00
assistant-librarian[bot]
1e173e2ab9 Merge commit '6f0966e1e9fca5c513d16a729237d676b583e266' into develop 2025-12-09 10:14:29 +00:00
lalala-sh
77f9a0a615 fix a16w4 moe bugs (#3373)
* fix valid mask bug

* update format

[ROCm/composable_kernel commit: 6f0966e1e9]
2025-12-09 17:54:55 +08:00
assistant-librarian[bot]
260bbb49fb Merge commit 'c1c2e41a0387e8e76970ad86959e28963f569d54' into develop 2025-12-09 03:37:16 +00:00
Yi DING
b726f9606c [CK_TILE] Generate random tensor values with multiple threads (#3324)
[ROCm/composable_kernel commit: c1c2e41a03]
2025-12-09 11:02:33 +08:00
assistant-librarian[bot]
375e499d10 Merge commit 'c363a98d4154c647c1a2d5331ad0d76879b84dfa' into develop 2025-12-08 21:13:22 +00:00
Sami Remes
b85cf9d37c [CK_TILE] Support more layouts for BQuant GEMM (#3349)
* WIP: preparing to add transpose bq support

* WIP: handle both row/col layout for BQ windows/tile dstr

* Fix build

* WIP: adding some test, debugging numerical errors

* Fix all but pkint4 tests

* Remove test_gemm_quant_typed.cpp again

* update disabled tests

* add conversion from pkint4 for b matrix

* fix formatting

* fix formatting

* Fix tr_load and use override b datatype for clarity

* fix formatting

* make bquant preshuffle tests bqlayout column-major

[ROCm/composable_kernel commit: c363a98d41]
2025-12-08 13:05:56 -08:00
Erwin Terpstra
7e54399be4 [CK Tile] Grouped GEMM aquant mode and non-persistent kernel (#3337)
* wip: add aquant to grouped gemm quant example

* fix: properly handle hot loop count in aquant pipeline

* fix: add separate GemmConfig structs for AQuant, automatically select the correct one

* feat: finish support for a non-persistent kernel invocation for grouped gemm quant, and add support code to example

* refactor: cleaned up grouped gemm quant example a bit by reusing pipeline selection logic

* chore: add warp gemm dispatchers for a couple of TransposeC K=32 variants

* feat: add quant grouped gemm tests cases for aquant (regular and transpose C) and non-persistent kernel

* fix: update base pipeline classes according to changes in develop branch

* Revert "chore: add warp gemm dispatchers for a couple of TransposeC K=32 variants"

This reverts commit b3fd4d326d.

* feat: remove aquant config from grouped gemm quant example, update to add persistency as runtime parameter

* chore: removed work-around for aquant bug that has been fixed

* chore: fix typo in command-line parameters

* fix: correct K warp tile size for gfx950

* chore: incorrect warp tile configuration on gfx942

[ROCm/composable_kernel commit: fe07b5a1bf]
2025-12-08 12:19:22 -08:00
assistant-librarian[bot]
564276eff9 Merge commit 'ca6143f0b2237a1af80ef5550f1b774fd463676d' into develop 2025-12-08 17:14:48 +00:00
Anton Gorenko
9cb42b092a Add a workaround for a compiler issue for bwd on gfx90a and ROCm 7.1.1 (#3369)
Sometimes there are not enough wait-states between v_mfma_f32... and v_accvgpr_read_b32 instructions if they are separated by s_cbranch.
The workaround is to read accvgprs to vgpr before branching.

[ROCm/composable_kernel commit: ca6143f0b2]
2025-12-08 07:44:17 -08:00
assistant-librarian[bot]
f1f46d5f75 Merge commit '878b4e7f46d7e47618f4d860d71b438cb6d992fd' into develop 2025-12-08 12:18:59 +00:00
Yi DING
e63ba15ae2 [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 (#3287)
* [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2

* typo

[ROCm/composable_kernel commit: 878b4e7f46]
2025-12-08 19:20:44 +08:00
assistant-librarian[bot]
e5a3277261 Merge commit '04612c30ceab818cd6c03a3e833a6c6d1a21dafa' into develop 2025-12-08 11:12:53 +00:00
Bartłomiej Kocot
13c9c8580f [CK_BUILDER] Ck Tile Grouped convolution factory (#3352)
* [BUILDER] Ck Tile Grouped convolution factory

* Part 2

* Fixes after rebase

* Remove leftovers

[ROCm/composable_kernel commit: 04612c30ce]
2025-12-08 10:32:56 +01:00
yinglu
fc7547a552 ck: add tf32 in DTYPES to control instances build(#3317)
[ROCm/composable_kernel commit: 8fec8054b2]
2025-12-08 16:24:20 +08:00
assistant-librarian[bot]
66f05c1fbf Merge commit '86a84ae61122b8ed2d2e40e45f108a8fa23d3210' into develop 2025-12-05 23:13:30 +00:00
Thomas Ning
771f37e4aa Add the gfx1011 support on CK Tile with the SGPR builtin reading protection (#3350)
* Finish the fixes

* add the gfx1010 support macro

* Fix the compilation error

[ROCm/composable_kernel commit: 86a84ae611]
2025-12-05 14:18:30 -08:00
assistant-librarian[bot]
b2019db495 Merge commit '6b1bceca7baea62941793e562d6ff58c571d9191' into develop 2025-12-05 18:14:37 +00:00
Khushbu Agarwal
5ab9a6cfe4 [CK_Tile] Enable PreshuffleB for 2d block scale Gemm (#3298)
* formatted

* formatted

* formatting

* formatting

* formatting

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Split cpp file to reduce building time
- Support multiple GemmConfig

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update Readme

* enable prefill shapes

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Add support for rowcol and tensor GEMM operations

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update README

* adding preshuffle quant as new parameter and its associated new files

* remove debugging statements

* adding test

* enable preshuffle quant with permuteN

* updating readme and correcponding gemmconfigs

* updating cmake file

* fixing CI failures for grouped quant gemm

* debugging permuteN

* debugging

* debugging PermuteN

* initial commit

* resolving merge conflicts

* adding test cases

* fixing bq tensor calculation

---------

Co-authored-by: Cong Ma <congma13@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 6b1bceca7b]
2025-12-05 09:57:52 -08:00
assistant-librarian[bot]
e4b2f98d0d Merge commit '608232ce82636e7c9ab8dec55dc7507c6792fb65' into develop 2025-12-05 17:31:42 +00:00
Illia Silin
12738d2e45 do not build hipblaslt for gfx90a to save time and disc space (#3362)
[ROCm/composable_kernel commit: 608232ce82]
2025-12-05 08:39:18 -08:00
Cong Ma
70a8425dfb Congma/ck tile/aquant mem pipeline (#3346)
* [CK TILE GEMM QUANT] Fix the bug in HotLoopTail of memory pipeline


[ROCm/composable_kernel commit: ed080f5a56]
2025-12-05 09:35:27 -07:00
John Shumway
99a748498a Ignore .cmake-format.yaml (#3356)
We don't want to add cmake formatting until we are in the super repo, but its handy if developers want to experiment with formatting. For now we should ignore .cmake-format.yaml.

[ROCm/composable_kernel commit: 7541d9b5b0]
2025-12-05 08:26:00 -08:00
Bartłomiej Kocot
17e2c816c3 Profile resnet layout fixes (#3360)
[ROCm/composable_kernel commit: 82f796a1f0]
2025-12-05 08:20:46 -08:00
assistant-librarian[bot]
e4f7f176c8 Merge commit 'f5b0af22722b130f03cac590ca9b8729b1b84991' into develop 2025-12-05 16:14:41 +00:00
John Shumway
a157e33311 Simplify includes for CK builder reflection (#3357)
We only want to import enums and types into the builder reflection code. But, some of the enums are included in much larger files or even big trees of include files. This leads to unintended mixing of code and very confusing interactions and symbol conflicts. We organize the includes and extract two new enum-only headers to help with decoupling in CK. This refactoring is critical if we want to include reflection in a device-operator "describe" method.

* Remove a few unnecessary includes from headers in builder/reflect/.
* Extract enums scheduler and pipeline to their own headers so they can be used without importing other code.
* Order includes alphabetically for better organization.

The immediate goal is to unblock reflection integration, and this type of cleanup helps the flexibility and robustness of the CK header library.

[ROCm/composable_kernel commit: f5b0af2272]
2025-12-05 07:44:10 -08:00
Bartłomiej Kocot
157d2c87db Add new section to changelog (#3295)
* Add new section to changelog

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: 35fc7c9e4f]
2025-12-05 07:14:52 -08:00
assistant-librarian[bot]
86c35117b5 Merge commit 'f7650ee82b306a05d9c3c44d3feefdd570a4bd58' into develop 2025-12-05 09:13:29 +00:00
jakpiase
54f903def5 fix enforcing fixedvectorsizes for ck tile conv (#3344)
[ROCm/composable_kernel commit: f7650ee82b]
2025-12-05 09:30:22 +01:00
assistant-librarian[bot]
eeadb34e8f Merge commit '13f6d635653bd5ffbfcac8577f1ef09590c23d78' into develop 2025-12-05 03:38:26 +00:00
John Shumway
62e5b29702 Clean up conv_traits.hpp (#3354)
When I asked for a description of operators that didn't have ConvTraits, I was getting very long confusing errors about ConvTraits not being defined. Now we get specific errors explaining which concepts are violated, making it easier to know which code to generalize or update.

* Add concepts to conv_traits.hpp to get better error message.
* Put the correct requires clauses in the right places to get descriptive error messages.
* General cleanup of functions in conv_traits.hpp to make functions easier to read.



[ROCm/composable_kernel commit: 13f6d63565]
2025-12-04 19:12:36 -08:00
assistant-librarian[bot]
5da2114921 Merge commit '05292b3604e143e98ec2cb67edb2e3d2ad1d6ecb' into develop 2025-12-05 02:45:20 +00:00