Commit Graph

4046 Commits

Author SHA1 Message Date
assistant-librarian[bot]
cd533b2f79 Merge commit '3dfa794fab62dca7c0499791d37298a49630d5ee' into develop 2025-12-16 17:15:31 +00:00
Illia Silin
f35e7b59cc Add build trace diagnostics to CI. (#3432)
* generate and visualize build traces for all archs

* generate build traces in all cases

* fix jenkins logic

* fix typo

* use more threads for parsing dependency map

* add script to parse ninja traces and issue warnings

* fix python script syntax and header

* fix python syntax one more time

* fix python syntax

[ROCm/composable_kernel commit: 3dfa794fab]
2025-12-16 08:22:52 -08:00
assistant-librarian[bot]
ad434f0976 Merge commit '1e6bbed1fb77d790f2b5ec4ef8a6617e99c8f145' into develop 2025-12-16 00:38:54 +00:00
DarylHawkinsAMD
29ed00bbd1 [CK_BUILDER] CK Tile header installation for builder, algorithm concept improvements (#3419)
* Added install of CK_Tile headers when using CK_EXPERIMENTAL_BUILDER. MIOpen needs this since the builder uses features from CK Tile and the CK Tile install is excluded when doing a narrow build for MIOpen
* Changed algorithm concept type checks to be concepts instead of constexpr bool functions. This improves compiler error messages when using these concepts in static_asserts

---------

Co-authored-by: Daryl Hawkins <DarylHawkins@amd.com>

[ROCm/composable_kernel commit: 1e6bbed1fb]
2025-12-15 16:24:36 -07:00
assistant-librarian[bot]
3c8ce1482b Merge commit '2544e394cff83d5992e265f9a29b640a7c74e90d' into develop 2025-12-15 20:13:52 +00:00
John Shumway
ec9afcfe8d Add missing enums to data_type_sizeof (#3430)
Fixes broken build on gfx942. This was some test code that got merged at the same time.

[ROCm/composable_kernel commit: 2544e394cf]
2025-12-15 11:49:36 -08:00
assistant-librarian[bot]
9e037c82a8 Merge commit '5e2d25e20f40eb7a6ba2e788f82f677649fb37d6' into develop 2025-12-15 16:15:30 +00:00
Aviral Goel
389e797a9b build: reduce build time for bquant tests by splitting into multiple cpp & support on other gfx10 case (#3395)
* build: reduce build time for bqaunt unit tests by splitting into multiple cpp

* reduce the test case & add the gfx10 support

* fix: copyright header for new file

* chore: add copyright to pass the CI

* build: Hot fix to reduce massive build time by just disabling the instances

* Update include/ck_tile/core/config.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: khushbu <khuagarw@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: 5e2d25e20f]
2025-12-15 07:19:29 -08:00
Sami Remes
4a29a8f84d [CK_TILE] Fix some inconsistencies with OverrideBDatatype in BQuant GEMM (#3394)
* Fix some inconsistencies with OverrideBDatatype

* fix formatting

* Fix BGlobalPrefetch, no static

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: a0cdb0b493]
2025-12-15 07:18:38 -08:00
assistant-librarian[bot]
6b08653fb7 Merge commit '7e93eed8787afd175d3a045303096a4a98638f4b' into develop 2025-12-15 15:18:13 +00:00
linqunAMD
7cdba74e97 [ck][gfx12] support contraction on gfx12 (#3421)
* support contraction on gfx12

* increase tolerance for gfx11 in example contraction

the precsion of gfx11 wmma is less than others.

[ROCm/composable_kernel commit: 7e93eed878]
2025-12-15 07:16:01 -08:00
linqunAMD
8811c57d44 [ck_tile] remove duplicate functions in ck_tile (#3311)
* [ck_tile] remove duplicated shuffle_b and shuffle_b_permuteN

* [ck_tile] move get_k_warp to gemm_shape

* resolve code rebase error

[ROCm/composable_kernel commit: 6d7299ff78]
2025-12-15 07:13:00 -08:00
assistant-librarian[bot]
742acf2707 Merge commit 'fe35ba5dac168619462669192423ff40548d532d' into develop 2025-12-15 13:25:53 +00:00
Johannes Graner
2fe4c8acec Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380)
* Parallelization in dataset generation

* Parallelizable tests for fwd, bwd data, bwd weight with datasets

* .gitignore generated datasets

* Test parallelization script with round-robin GPU scheduling

* Parallelization updates to test generation and running

* Dataset paths relative to executable

* Update output from test generation

* Default to one GPU in test generation

* Add small dataset tests to Jenkins

* Update copyright lines

* Update test_data/generate_test_dataset.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Move trap disable

* Common get path function

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: fe35ba5dac]
2025-12-15 13:38:25 +01:00
assistant-librarian[bot]
4f79e0d308 Merge commit '3b773109e5b98a7b11d2976e465ecb7c57f2bea6' into develop 2025-12-15 12:19:45 +00:00
Bartłomiej Kocot
a45c051ac9 [CK TILE][AICK-439] Fix cshuffle epilogue wave per shuffle (#3364)
* [CK TILE] Fix cshufle epligoue wave per shuffle

* Align shuffle per tile with smem

* fixes

* Fixes for double smem

* fix

[ROCm/composable_kernel commit: 3b773109e5]
2025-12-15 12:59:48 +01:00
assistant-librarian[bot]
6164d076de Merge commit '3143a5a480e4fcf216670012fe491b44324f03b6' into develop 2025-12-15 07:16:25 +00:00
Johannes Graner
6238fe6d0d [CK Grouped Gemm] Disable split-k kernel for split-k > 1 with non-contiguous strides (#3405)
* Disable kernel for split-k > 1 with non-contiguous strides

* Update device_grouped_gemm_xdl_splitk_cshuffle.hpp

---------

AICK-441 (partial)

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 3143a5a480]
2025-12-15 08:03:00 +01:00
assistant-librarian[bot]
669906c786 Merge commit 'f5573f56d9d4981def16f575ddb14535b93bb9bb' into develop 2025-12-15 04:28:43 +00:00
Linjun-AMD
51886bf22b Add attention sink support for FMHA FWD (#3368)
* Revert "Revert "Add attn sink (#2892)" (#3250)"

This reverts commit e3be392d13e6ee107d823af32aca2d3ff03ca69d.

* fix conflict

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Add F_sink parameter to FmhaFwdPipeline

* Update tile_fmha_traits.hpp

* Refactor pipeline creation in fmha_fwd.py

Updated the pipeline creation logic to include 'sink' parameter in product combinations and adjusted the FmhaFwdPipeline calls accordingly.

* Update fmha_fwd.py

* Update fmha_fwd.py

* Update example/ck_tile/01_fmha/script/correct_test_fwd_sink.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update CHANGELOG.md

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Update CHANGELOG with new features and support

* Update fmha_fwd.hpp

* Update CHANGELOG.md

* Update smoke_test_fwd_sink.sh

* Update correct_test_fwd_sink.sh

* Update smoke_test_fwd_sink.sh

---------

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: f5573f56d9]
2025-12-15 12:21:59 +08:00
assistant-librarian[bot]
ea731b5f29 Merge commit '22b945e06ea4b4de188d7ff4ec7ae4bf127be9f9' into develop 2025-12-14 22:12:40 +00:00
Emily Martins
eeb78c46a4 [CK_TILE] Stream-K Tree Reduction and Cache Skipping Integration (#3371)
* CK Tile Stream-K Tree Reduction

This change adds the first implementation of the Stream-K tree reduction
strategy into CK Tile. The tree reduction reduces the the number of
steps for accumulating results for a tile from O(N) to O(logN) where N
is the number of workgroups contributing to a C tile.

Additionally, in the original non-atomic reduction strategy, atomics
were used to set the flags buffer and to read from the flags buffer.
Howeover, through investigation with the tree reduciton, atomics with
default (relaxed) semantics were not enough to guarantee workgroups
would not read stale data, leading to incorrect results. Stronger
acquire/release memory orderings are too expensive. So, this change
also eliminates the use of atomics for setting the flags. Instead, we
leverage cache modifiers (e.g., GLC) to avoid writing to cache, thereby
avoiding the use of atomics.

Prelimiary tests were also added for the normal reduction and tree
reduction. More will be added in a future PR via tile engine.

* Move Stream-K kernel files to a subdirectory

* Cleanup Code Style & Handle Unsupported Reductions

This change makes the following small changes:
- Add an explicit else block for unimplemented reduction strategies
- Clarify type of sk_flags_ptr via auto*
- Add description for extra_iters_before_me variable

* Run new copyright script on new files

[ROCm/composable_kernel commit: 22b945e06e]
2025-12-14 14:49:49 -07:00
assistant-librarian[bot]
ca5fb0a3b7 Merge commit '9ac51aa0f44bae776609036f291c3cd2666e84ee' into develop 2025-12-14 21:11:46 +00:00
John Shumway
a3270d2eb0 Add describe() method to device ops for runtime introspection (#3375)
Introduces a polymorphic describe() method to BaseOperator that enables runtime introspection of kernel configurations through a unified interface.

Key changes:

* Add virtual describe() method to BaseOperator returning Description objects
* Implement describe() in 6 device operation classes (conv fwd/bwd variants)
* Create conv_describe.hpp with factory function for ConvDescription
* Extract type definitions to conv_types.hpp to resolve circular dependencies
* Add InstanceStringDescription for kernels without full ConvDescription support

Other Improvements:

* Update tests to use describe() instead of GetInstanceString()
* Remove circular dependency include from conv_traits.hpp
* Add ODD_C to ConvFwdSpecialization enum and fix OddC mapping
* Replace silent fallback in conv_layout() with compile-time error

This provides a foundation for runtime kernel introspection and better tooling support for analyzing and debugging kernel configurations.

[ROCm/composable_kernel commit: 9ac51aa0f4]
2025-12-14 12:49:12 -08:00
assistant-librarian[bot]
d0b4a2a403 Merge commit '21f06aa47ded64b9a07d81bf4b743c21462178db' into develop 2025-12-14 19:11:55 +00:00
Enrico Degregori
5c81464568 CK Tile: Enable padding blockscale example (#3417)
* Fix host code padding

* restructure the ref code

* clean up

* Fix compilation error

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 21f06aa47d]
2025-12-14 10:25:47 -08:00
assistant-librarian[bot]
5346923492 Merge commit '6219b12730e29c357a02177dbee6e565987fcc56' into develop 2025-12-13 15:11:36 +00:00
Robin Voetter
417ed79412 [CK_BUILDER] convolution testing (#3267)
* Add README.md for testing

* Add tensor_memory_manager.

* ck-builder: tensor memory manager rebase fixes

This fixes some issues caused by the API being changed recently.
Also, this streamlines the ckt namespace to always be ck_tile::builder::test,
as this is already being used by other tests

Really, this commit should be squashed into the previous,
but I'm keeping it separate for brevity.

* ck-builder: test arguments initial prototype

* ck-builder: test system initial prototype

* ck-builder: fix non-standardized copyright comments

* ck-builder: new prototype

* ck-builder: group testing inputs/outputs into a separate structure

This is basically the return of the tensor memory manager after all,
except that the design is more closely tied to the actual operation.
Using a struct allows us to add additional input/output tensors
without breaking code (by defaulting those new parameters). Note
that the tensors are split into a separate inputs/outputs because we
usually want to allocate the output _twice_: once for the real
computation and once for the reference computation.

* ck-builder: simplify prototype naming; start docs

* ck-builder: update testing readme

* ck-builder: testing documentation

* ck-builder: HipStatusMatcher

This matcher can be used to check HIP status codes and provide
nice and readable error messages.

* ck-builder: tensor_buffer.hpp tests

* ck-builder: conv_fwd.hpp tests

* ck-builder: add example end-to-end test in conv fwd 2d fp16

* ck-builder: simplify extent usage

* ck-builder: update testing doc

* ck-builder: skip end to end test on non-gfx9

* fix check_copyright_year interpreter

/bin/bash is not guaranteed to exist on Linux. Signed,
a NixOS user

* ck-builder: fix copyrights

* ck-builder: reduce conv fwd testing size

This test allocated 24GB of memory, too much for 16GB cards.

---------

Co-authored-by: John Shumway <jshumway@amd.com>

[ROCm/composable_kernel commit: 6219b12730]
2025-12-13 15:33:41 +01:00
assistant-librarian[bot]
76cfa34242 Merge commit '9707ddb444f42b490c73b7884babccde2988ed7e' into develop 2025-12-13 00:36:51 +00:00
Cong Ma
d287385933 [CK TILE GEMM STREAMK] update identifier names according to the new code style (#3348)
* [CK TILE GEMM STREAMK] update identifier names according to the new code style

[ROCm/composable_kernel commit: 9707ddb444]
2025-12-12 17:08:26 -07:00
assistant-librarian[bot]
fd68c6a534 Merge commit 'b4a34371a6a075fd00e22cf589f683de5f9271e3' into develop 2025-12-12 19:12:28 +00:00
Enrico Degregori
7cbd8b75a0 Fix compilation ab scale multi target (#3413)
[ROCm/composable_kernel commit: b4a34371a6]
2025-12-12 10:26:47 -08:00
assistant-librarian[bot]
d418c72980 Merge commit 'fc7bf0ab1c5ed28e5962681007f84a2e8d3ee051' into develop 2025-12-12 18:17:09 +00:00
linqunAMD
245c274287 [CK_TILE] Port hw independent changes from internal repo to develop branch (#3301)
* [CK_TILE] Port hw independent changes from internal repo to develop branch

It includes PR#96, #114, #120, #121.

* correct rebase error

[ROCm/composable_kernel commit: fc7bf0ab1c]
2025-12-12 09:28:37 -08:00
Illia Silin
f9bf419b01 disable test_tile_gemm_quant_bquant_preshuffle (#3420)
[ROCm/composable_kernel commit: 9869641324]
2025-12-12 09:27:12 -08:00
assistant-librarian[bot]
8ecb5dd922 Merge commit '8d7a4e0c73e1d2741fecea200f14bda1dcacc8f7' into develop 2025-12-12 05:14:30 +00:00
dependabot[bot]
b4d5a50216 Bump rocm-docs-core[api_reference] from 1.31.0 to 1.31.1 in /docs/sphinx (#3410)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.31.0 to 1.31.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.31.0...v1.31.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 8d7a4e0c73]
2025-12-11 21:09:40 -08:00
assistant-librarian[bot]
b964d752f0 Merge commit '4011dbfec31a711aaa4c1071c31bdc55f9b7974a' into develop 2025-12-11 23:13:32 +00:00
Max Podkorytov
2ac57c22c1 [CK-Tile] fixup codegen for tile engine ops gemm multid and gemm preshuffle (#3383)
* fixup gemm multi-d and preshuffle in tile engine codegen

---------

Co-authored-by: Thrupti Raj Lakshmana Gowda <thruptiraj.lakshmanagowda@amd.com>

[ROCm/composable_kernel commit: 4011dbfec3]
2025-12-11 14:23:43 -08:00
assistant-librarian[bot]
f7eba31069 Merge commit 'ff194a427129beabd419904ee173c221bcc2a5e5' into develop 2025-12-11 19:37:59 +00:00
Aviral Goel
5d5dbdfb0d build: Hot fix to reduce massive build time by just disabling the instances (#3408)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: ff194a4271]
2025-12-11 10:39:20 -08:00
Aviral Goel
32faf7b8e3 chore: add copyright to pass the CI (#3407)
[ROCm/composable_kernel commit: 45c4ea510c]
2025-12-11 10:34:15 -08:00
assistant-librarian[bot]
f9ad462542 Merge commit '4dcc3e59c1c0195dae7ee9da9ab76d18a4cafe9f' into develop 2025-12-11 17:17:01 +00:00
Aviral Goel
f2a25da322 chore: update copyright header for misc files (#3402)
* chore: update copyright header for misc files

* fix: typo in kernel resulting in ci failure

[ROCm/composable_kernel commit: 4dcc3e59c1]
2025-12-11 08:25:29 -08:00
Illia Silin
f55ff25622 Fix compilation errors with latest clang22 version. (#3396)
* remove target attributes from deduction guides

* switch CK_TILE_HOST_DEVICE_EXTERN based on clang version

[ROCm/composable_kernel commit: b2925ee207]
2025-12-11 08:09:29 -08:00
eliotwang
d5645ff481 Bf16*fp4 gemm (#2801)
* support bf16*mxfp4 gemm

* rebase bf16*fp4 example to develop branch

* Clean up commented debug code in GEMM kernel

* rename example folder

* support bf16*mxfp4 gemm

* rebase bf16*fp4 example to develop branch

* Clean up commented debug code in GEMM kernel

* rename example folder

* rebase to new develop

* fix clang format

* update code according to reviewer's comment

* Update README.md

* update code according to reviewer's comment

* update code according to reviewer's comment

* Update CMakeLists.txt

* Update README.md

* Update CMakeLists.txt

* Delete files

* Delete files

* Add unit tests

* Update test_gemm_quant_base.hpp

* merge bf16*fp4 example to develop branch

* fix clang format

* fix clang format

* Update CMakeLists.txt

* fix ci test

* fix clang format

* resolve conflicts

---------

Co-authored-by: eliotwang <charyang@smci355-ccs-aus-m10-29.cs-aus.dcgpu>
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 715671e419]
2025-12-11 07:20:29 -08:00
assistant-librarian[bot]
b69f9eb589 Merge commit 'ce99cab6056d1ffef5acb6f4ad7ede87a46a3cfc' into develop 2025-12-11 08:17:07 +00:00
Enrico Degregori
53dc636c6e Wmma support for gemm_ab_scale (#3314)
* Support gemm_ab_scale:

 - Add tests
 - Integrate scaling implementation in multiple D
 - Generalize existing b_scale for ab_scale
 - Add instances
 - Generalize implementation for ScaleBlockM, ScaleBlockN, ScaleBlockK
 - Add support for all layouts supported by xdl
 - Fix splitk xdl

* Fix copyright

* Wmma support for gemm_blockscale_wp (#3315)

* Support for  preshuffle with ab scale

 - add support for b preshuffle in GridwiseGemm_wmma_cshuffle_v3_ab_scale
 - add support for AScaleLayout amnd BScaleLayout (can be different
   from ALayout and BLayout, respectively)
 - add Run method in v1 pipeline to support preshuffle + scaling
 - add support for preshuffle gemms in common invoker
 - Add splitk support

* Fix copyright header

[ROCm/composable_kernel commit: ce99cab605]
2025-12-11 09:06:20 +01:00
Ville Pietilä
fe0fe6f4ad [CK_BUILDER] Improve CK Builder and CK Builder tests (#3382)
* Remove stale documentation.

* Add placeholder for conv algorithm design description. Add link to conv factory description.

* Improve testing transfer parameters.

* Python script to check the block tilings.

* Improve tests and conv types serialization.

* Change representation of boolean values from 1/0 to true/false in instance strings.

* Change representation of boolean values from 1/0 to true/false in conv algorithm types.

* Test code improvements.

* Improve covn descriptions tests.

* Improve conv signature definition in conv fwd builder tests.

* clang-format.

* Remove obsolete script.

* Revert StaticAssertTypeEq changes in conv layout tests.

* Remove obsolete using declaration.

---------

Co-authored-by: Ville Pietilä <>

[ROCm/composable_kernel commit: d66e5f667c]
2025-12-11 09:50:00 +02:00
assistant-librarian[bot]
a1037bfc3c Merge commit '6d25525adc2344d5b62b12b9ffddee50f89cd0ff' into develop 2025-12-11 07:16:06 +00:00