Commit Graph

3949 Commits

Author SHA1 Message Date
Po Yen Chen
fb5c7e0314 fix: python 3.8 compatibility in fmha codegen (#3388)
[ROCm/composable_kernel commit: b15df37255]
2025-12-10 07:08:41 -08:00
assistant-librarian[bot]
636a90b531 Merge commit 'fc22320d783a6b73798a23d8d20fb24e3a5e4040' into develop 2025-12-10 08:15:28 +00:00
Ville Pietilä
fbf53fb970 [CK_TILE] Split-K autodeduction (#3351)
* First version of split-K autodeduction.

* Fix circular dependency and kernel construction.

* Fix tolerance calculation for bwd weight example.

* Simplify kernel construction.

* Fix kernel launching bug for split-K autodeduce.

* Add split-K autodeduction support for the two stage example.

* Fix a corner case.

* Fix clang-format.

* Fix clang-format for inc files.

* Add missing header.

* Prevent too large split-K values.

* Fix formatting.

* Add unit tests for IsSupportedArgument in grouped bwd conv.

* clang-format.

* Fix merge conflicts.

* Address feedback from code review.

* clang-format

* Fix new tests after merge.

---------

Co-authored-by: Ville Pietilä <>

[ROCm/composable_kernel commit: fc22320d78]
2025-12-10 09:30:30 +02:00
assistant-librarian[bot]
490d6daf13 Merge commit '1aa93ef551a31405aef5c8c14e869241ba96639d' into develop 2025-12-10 02:46:30 +00:00
Zzz9990
09e81b46ba [CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377)
* update coherence
---------

Co-authored-by: Zzz9990 <Zzz9990>

[ROCm/composable_kernel commit: 1aa93ef551]
2025-12-10 10:03:28 +08:00
assistant-librarian[bot]
dfeb7a11b9 Merge commit '934ba1208ab7cfc82c20f73b14994b64c3843d2d' into develop 2025-12-09 23:12:58 +00:00
Illia Silin
2185fc59cb use hipTensor from monorepo for daily builds (#3386)
[ROCm/composable_kernel commit: 934ba1208a]
2025-12-09 14:39:08 -08:00
assistant-librarian[bot]
d6fe69e6fd Merge commit '0d8259affd4f59eb8b1143b658d83d3800270f43' into develop 2025-12-09 20:14:23 +00:00
Illia Silin
25918f26a2 temporarily disable daily builds on gfx1010 and gfx908 (#3384)
[ROCm/composable_kernel commit: 0d8259affd]
2025-12-09 10:37:13 -08:00
assistant-librarian[bot]
636bc57ab2 Merge commit '7582c9e73fc3e580a2255988310cb25391f80162' into develop 2025-12-09 16:14:29 +00:00
Illia Silin
43b4ec3209 Upgrade to ROCm7.1.1 as default compiler. (#3370)
* upgrade to rocm7.1.1 as new default compiler

* fix jenkinsfile

[ROCm/composable_kernel commit: 7582c9e73f]
2025-12-09 07:35:32 -08:00
dependabot[bot]
e416856bf0 Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx (#3374)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.20.1 to 1.31.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.20.1...v1.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 50ca3f83eb]
2025-12-09 07:10:34 -08:00
assistant-librarian[bot]
1e173e2ab9 Merge commit '6f0966e1e9fca5c513d16a729237d676b583e266' into develop 2025-12-09 10:14:29 +00:00
lalala-sh
9691ccf03c fix a16w4 moe bugs (#3373)
* fix valid mask bug

* update format

[ROCm/composable_kernel commit: 6f0966e1e9]
2025-12-09 17:54:55 +08:00
assistant-librarian[bot]
260bbb49fb Merge commit 'c1c2e41a0387e8e76970ad86959e28963f569d54' into develop 2025-12-09 03:37:16 +00:00
Yi DING
9c7b0388a9 [CK_TILE] Generate random tensor values with multiple threads (#3324)
[ROCm/composable_kernel commit: c1c2e41a03]
2025-12-09 11:02:33 +08:00
assistant-librarian[bot]
375e499d10 Merge commit 'c363a98d4154c647c1a2d5331ad0d76879b84dfa' into develop 2025-12-08 21:13:22 +00:00
Sami Remes
64f4467064 [CK_TILE] Support more layouts for BQuant GEMM (#3349)
* WIP: preparing to add transpose bq support

* WIP: handle both row/col layout for BQ windows/tile dstr

* Fix build

* WIP: adding some test, debugging numerical errors

* Fix all but pkint4 tests

* Remove test_gemm_quant_typed.cpp again

* update disabled tests

* add conversion from pkint4 for b matrix

* fix formatting

* fix formatting

* Fix tr_load and use override b datatype for clarity

* fix formatting

* make bquant preshuffle tests bqlayout column-major

[ROCm/composable_kernel commit: c363a98d41]
2025-12-08 13:05:56 -08:00
Erwin Terpstra
142ec27ea0 [CK Tile] Grouped GEMM aquant mode and non-persistent kernel (#3337)
* wip: add aquant to grouped gemm quant example

* fix: properly handle hot loop count in aquant pipeline

* fix: add separate GemmConfig structs for AQuant, automatically select the correct one

* feat: finish support for a non-persistent kernel invocation for grouped gemm quant, and add support code to example

* refactor: cleaned up grouped gemm quant example a bit by reusing pipeline selection logic

* chore: add warp gemm dispatchers for a couple of TransposeC K=32 variants

* feat: add quant grouped gemm tests cases for aquant (regular and transpose C) and non-persistent kernel

* fix: update base pipeline classes according to changes in develop branch

* Revert "chore: add warp gemm dispatchers for a couple of TransposeC K=32 variants"

This reverts commit b3fd4d326d.

* feat: remove aquant config from grouped gemm quant example, update to add persistency as runtime parameter

* chore: removed work-around for aquant bug that has been fixed

* chore: fix typo in command-line parameters

* fix: correct K warp tile size for gfx950

* chore: incorrect warp tile configuration on gfx942

[ROCm/composable_kernel commit: fe07b5a1bf]
2025-12-08 12:19:22 -08:00
assistant-librarian[bot]
564276eff9 Merge commit 'ca6143f0b2237a1af80ef5550f1b774fd463676d' into develop 2025-12-08 17:14:48 +00:00
Anton Gorenko
84e56d1120 Add a workaround for a compiler issue for bwd on gfx90a and ROCm 7.1.1 (#3369)
Sometimes there are not enough wait-states between v_mfma_f32... and v_accvgpr_read_b32 instructions if they are separated by s_cbranch.
The workaround is to read accvgprs to vgpr before branching.

[ROCm/composable_kernel commit: ca6143f0b2]
2025-12-08 07:44:17 -08:00
assistant-librarian[bot]
f1f46d5f75 Merge commit '878b4e7f46d7e47618f4d860d71b438cb6d992fd' into develop 2025-12-08 12:18:59 +00:00
Yi DING
8b98fe0353 [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2 (#3287)
* [CK_TILE] Optimize Flatmm MXFP4 by Eliminating Runtime Division by 2

* typo

[ROCm/composable_kernel commit: 878b4e7f46]
2025-12-08 19:20:44 +08:00
assistant-librarian[bot]
e5a3277261 Merge commit '04612c30ceab818cd6c03a3e833a6c6d1a21dafa' into develop 2025-12-08 11:12:53 +00:00
Bartłomiej Kocot
75156c492e [CK_BUILDER] Ck Tile Grouped convolution factory (#3352)
* [BUILDER] Ck Tile Grouped convolution factory

* Part 2

* Fixes after rebase

* Remove leftovers

[ROCm/composable_kernel commit: 04612c30ce]
2025-12-08 10:32:56 +01:00
yinglu
cec66a4b18 ck: add tf32 in DTYPES to control instances build(#3317)
[ROCm/composable_kernel commit: 8fec8054b2]
2025-12-08 16:24:20 +08:00
assistant-librarian[bot]
66f05c1fbf Merge commit '86a84ae61122b8ed2d2e40e45f108a8fa23d3210' into develop 2025-12-05 23:13:30 +00:00
Thomas Ning
10e48d2f3c Add the gfx1011 support on CK Tile with the SGPR builtin reading protection (#3350)
* Finish the fixes

* add the gfx1010 support macro

* Fix the compilation error

[ROCm/composable_kernel commit: 86a84ae611]
2025-12-05 14:18:30 -08:00
assistant-librarian[bot]
b2019db495 Merge commit '6b1bceca7baea62941793e562d6ff58c571d9191' into develop 2025-12-05 18:14:37 +00:00
Khushbu Agarwal
bc49b0e57b [CK_Tile] Enable PreshuffleB for 2d block scale Gemm (#3298)
* formatted

* formatted

* formatting

* formatting

* formatting

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Split cpp file to reduce building time
- Support multiple GemmConfig

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update Readme

* enable prefill shapes

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Add support for rowcol and tensor GEMM operations

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update README

* adding preshuffle quant as new parameter and its associated new files

* remove debugging statements

* adding test

* enable preshuffle quant with permuteN

* updating readme and correcponding gemmconfigs

* updating cmake file

* fixing CI failures for grouped quant gemm

* debugging permuteN

* debugging

* debugging PermuteN

* initial commit

* resolving merge conflicts

* adding test cases

* fixing bq tensor calculation

---------

Co-authored-by: Cong Ma <congma13@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 6b1bceca7b]
2025-12-05 09:57:52 -08:00
assistant-librarian[bot]
e4b2f98d0d Merge commit '608232ce82636e7c9ab8dec55dc7507c6792fb65' into develop 2025-12-05 17:31:42 +00:00
Illia Silin
67d6c4514a do not build hipblaslt for gfx90a to save time and disc space (#3362)
[ROCm/composable_kernel commit: 608232ce82]
2025-12-05 08:39:18 -08:00
Cong Ma
8bdc28e607 Congma/ck tile/aquant mem pipeline (#3346)
* [CK TILE GEMM QUANT] Fix the bug in HotLoopTail of memory pipeline


[ROCm/composable_kernel commit: ed080f5a56]
2025-12-05 09:35:27 -07:00
John Shumway
1cffd4042e Ignore .cmake-format.yaml (#3356)
We don't want to add cmake formatting until we are in the super repo, but its handy if developers want to experiment with formatting. For now we should ignore .cmake-format.yaml.

[ROCm/composable_kernel commit: 7541d9b5b0]
2025-12-05 08:26:00 -08:00
Bartłomiej Kocot
b411358e21 Profile resnet layout fixes (#3360)
[ROCm/composable_kernel commit: 82f796a1f0]
2025-12-05 08:20:46 -08:00
assistant-librarian[bot]
e4f7f176c8 Merge commit 'f5b0af22722b130f03cac590ca9b8729b1b84991' into develop 2025-12-05 16:14:41 +00:00
John Shumway
a94db7fc98 Simplify includes for CK builder reflection (#3357)
We only want to import enums and types into the builder reflection code. But, some of the enums are included in much larger files or even big trees of include files. This leads to unintended mixing of code and very confusing interactions and symbol conflicts. We organize the includes and extract two new enum-only headers to help with decoupling in CK. This refactoring is critical if we want to include reflection in a device-operator "describe" method.

* Remove a few unnecessary includes from headers in builder/reflect/.
* Extract enums scheduler and pipeline to their own headers so they can be used without importing other code.
* Order includes alphabetically for better organization.

The immediate goal is to unblock reflection integration, and this type of cleanup helps the flexibility and robustness of the CK header library.

[ROCm/composable_kernel commit: f5b0af2272]
2025-12-05 07:44:10 -08:00
Bartłomiej Kocot
4beaf7709d Add new section to changelog (#3295)
* Add new section to changelog

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: 35fc7c9e4f]
2025-12-05 07:14:52 -08:00
assistant-librarian[bot]
86c35117b5 Merge commit 'f7650ee82b306a05d9c3c44d3feefdd570a4bd58' into develop 2025-12-05 09:13:29 +00:00
jakpiase
5147792114 fix enforcing fixedvectorsizes for ck tile conv (#3344)
[ROCm/composable_kernel commit: f7650ee82b]
2025-12-05 09:30:22 +01:00
assistant-librarian[bot]
eeadb34e8f Merge commit '13f6d635653bd5ffbfcac8577f1ef09590c23d78' into develop 2025-12-05 03:38:26 +00:00
John Shumway
b1dc9e64f6 Clean up conv_traits.hpp (#3354)
When I asked for a description of operators that didn't have ConvTraits, I was getting very long confusing errors about ConvTraits not being defined. Now we get specific errors explaining which concepts are violated, making it easier to know which code to generalize or update.

* Add concepts to conv_traits.hpp to get better error message.
* Put the correct requires clauses in the right places to get descriptive error messages.
* General cleanup of functions in conv_traits.hpp to make functions easier to read.



[ROCm/composable_kernel commit: 13f6d63565]
2025-12-04 19:12:36 -08:00
assistant-librarian[bot]
5da2114921 Merge commit '05292b3604e143e98ec2cb67edb2e3d2ad1d6ecb' into develop 2025-12-05 02:45:20 +00:00
Po Yen Chen
5737132878 [CK_TILE][FMHA] Integrate FAv2 & FAv3 (WIP) in the single fmha_fwd() API (#3153)
* Let fmha_fwd_v3() compatible with fmha_fwd()

* Decouple get_fwd_blobs() and FmhaFwdKernel

* Decouple compatibility checks from get_fwd_blobs()

* Extract product feature checks out from get_fwd_blobs()

* Remove duplicated code in factories and redundant checks

* Remove FmhaFwdKernel<>::GetName()

* Let FmhaFwdApiPool support pipelines with different mask_impl

* Add tile setting for fmha fwd v3 pipeline

* Add fwd v3 instances to tile_example_fmha_fwd manually

* Remove unused function import

* Undo irrelevant changes

* Remove fwd v3 instances from tile_example_fmha_fwd

* Finish fmha fwd v3 kernel instance codegen

* Fix formatting

* Remove unused F_idx attribute

* Add is_generic_attention_mask<> traits

* Add constraints to the fmha fwd v3 pipeline

* Unify traits & problem used for fmha fwd v3

* Unify kernel launch code for fmha fwd v2 & v3

* Unify kernel template selection logic

* Use same kernel codegen template for both v2 & v3

* Rename api() property as render() method

* Allow specifying filter for fmha fwd api pool

* Allow specifying function name when rendering api pool items

* Separate fmha fwd v3 kernel dispatching logic from v2

* Remove lambda assignment

* Add simple v2/v3 dispatch logic

* Stop generating empty if-clauses

Skip iterating over dictionaries that have no traits, and avoid assigning i_* to them.

* Use "".join() to concatenate fmha fwd api string content

* Add more feature checks for fmha fwd v3 pipeline

* Check features before dispatch to fmha_fwd_v3()

* Add more feature checks for fmha_fwd_v3()

* Add missing filter call

* Use Tuple to reserve the dtype orders

* Fix wrong pipeline matching logic

* Add fmha fwd v3 group mode instances

* Add functor_transform<>

* Add type constraints to make_tile_window()

* Remove fmha fwd v3 example

* Fix wrong product(aiter mha_fwd()) config

* Fix wrong fmha fwd v2/v3 selection logic

* Fix formatting

* Add comment to warning v3 kernel users

* Fix wrong codegen logics

* Remove unnecessary param

* Fix format

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 05292b3604]
2025-12-05 10:31:12 +08:00
Illia Silin
88a222f851 fix hipblaslt build for different archs (#3358)
[ROCm/composable_kernel commit: d1193e8637]
2025-12-04 18:29:14 -08:00
assistant-librarian[bot]
a9c43a3678 Merge commit 'd184eed823ca50dcafc57c66228f12300c0c9ccc' into develop 2025-12-04 20:13:50 +00:00
Max Podkorytov
e8e9f89bbe [CK-Tile] Refactor base pipeline usage (#3251)
* initial poc

* factor out common parts in operator()

* cv4

* rest of the universal gemm pipelines

* fix test

* remove boilerplate from tile engine

* fix example

* fix example

* format

* fix tests build for gemm

* remove base pipeline codegen from gemm instance builder

* unify v3 logic with the rest of universal gemm pipelines

* fix build for multi abd test

* fix test gemm multi d

* fix build for weight preshuffle

* fix grouped gemm test

* fix grouped gemm multi d test

* fix grouped gemm preshuffle

* fix grouped gemm example except for quant

* fix gemm preshuffle

* fix splitk 2 stage example

* fix batched gemm example

* fix multid example

* fix multiabd example

* fix batched gemm test

* fixup

* fix examples build

* fix grouped gemm test build

* fix smoke builder

[ROCm/composable_kernel commit: d184eed823]
2025-12-04 11:45:49 -08:00
assistant-librarian[bot]
4b6531908e Merge commit 'd9d4c9c3dfe38fe54bae5b3b1b9b523b011992dd' into develop 2025-12-04 19:25:29 +00:00
spolifroni-amd
7afa7d9e43 [composable_kernel] initial draft of the ck tile conceptual doc (#3242)
* Adding CK Tile documentation

* Updates based on feedback

* Fix tile window API description

* Fix remaining images

* add documentation about flush_cache and rotating_buffer functionality in ck_tile

* Supplement the documentation

* light edit of the ck tile conceptual doc

* Fixes for ruff check.

* Fixes for ruff check 2.

* Fixes for ruff check 3.

---------

Co-authored-by: Vidyasagar <vanantha@amd.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>

[ROCm/composable_kernel commit: d9d4c9c3df]
2025-12-04 11:09:21 -08:00
assistant-librarian[bot]
3becd86717 Merge commit 'cd21e20ae7d4d3a6309ce238bb94814e145585d6' into develop 2025-12-04 15:14:37 +00:00