Commit Graph

2789 Commits

Author SHA1 Message Date
Geo Min
7ad4a687fc details from org var (#3431)
[ROCm/composable_kernel commit: f4729de395]
2025-12-17 11:54:13 -08:00
Yashvardhan Agarwal
83dc6ad263 [ck_tile] refactor reduce kernel (#3257)
* refactor reduce kernel

- Rename Reduce kernel as per convention

- Move kept_dim and reduce_dims from runtime to compile-time parameters

- Update Reduce2dProblem template to include KeptDim, ReduceDims, and
Rank

- Remove IsSupportedArgument validation function as it's unnecessary.
Not using the GuaranteedLastDimensionVectorStride while making tensor
view or descriptor which removes the bounds enforced earlier. We still
calculate and use vector size.

- Update reduce example to demonstrate NCHW->NHW reduction with
non-contiguous support

- Update tests

Kernel now handles both contiguous and non-contiguous memory layout.

* fix compile errors

[ROCm/composable_kernel commit: ea10a78203]
2025-12-17 21:46:08 +02:00
ltqin
c8397e8ef2 flashattention fwd add (80, 96) instance (#3415)
* add hdim (96,96) instance

* change to (80,96)

* format py

* remove 96 in optdim

* when N=6 change to llvm_amdgcn_raw_buffer_load_i32x3

[ROCm/composable_kernel commit: 92653168c2]
2025-12-17 09:16:11 -08:00
Matti Eskelinen
e404594325 Fix minor issues in cmake-ck-dev script (#3438)
* Remove extra slash from cmake-ck-dev.sh

* Add quoting around string variables

[ROCm/composable_kernel commit: fe3d52d9b0]
2025-12-17 08:57:21 -08:00
music-dino
76d5fb93fe Add rocm to prefix path for codegen (#3404)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Mirza Halilčević <109971222+mirza-halilcevic@users.noreply.github.com>

[ROCm/composable_kernel commit: 55c2886b17]
2025-12-17 08:51:13 -08:00
spolifroni-amd
c92c3ac29d [CK] Evened out the wording in ed out the wording in the changelog (#3418)
[ROCm/composable_kernel commit: 871c2ece2d]
2025-12-17 08:48:56 -08:00
rocking
97b2015929 Fix FMHA fp8 hdim=64 incorrect result in MI200 (#3423)
* Fix incorrect result in hdim=64

* Add change log

[ROCm/composable_kernel commit: 292f87aa03]
2025-12-17 08:16:54 -08:00
andrew clark
2de39368c2 Adding sscache stats monitoring (#3428)
* Adding additional sccache and redis logging to each build

* Removing custom workspace

* Removing script reference

* Logging complete sccache stats

* Ensuring monitor is stopped if build fails

* Including additional sccache logging

* Removing build duration log

* Fixing groovy syntax error

* Fixing syntax

* Modifying logging statements

* Fixing syntax

* Modifying logging

* Modifying logging

* Including additional logging

* Fixing logging message

* Logging build path

* Testing

* Testing workspace path logs

* Adding additonal logging to monitor

* Modifying comments

* Adding copyright info

* Cleaning unnecessary logs

* Removing build time logs

* Merge branch 'develop' into aick-457

[ROCm/composable_kernel commit: e67cd7edeb]
2025-12-17 09:15:27 -07:00
kensclin
9b63a65886 Support A/B Quantization in Blockscale GEMM (#3343)
* Support A/B Quantization in Blockscale GEMM

* Support A/B Quantization in Blockscale GEMM

* Support A/B Quantization in Blockscale GEMM

* Support A/B Quantization in Blockscale GEMM

* Support A/B Quantization in Blockscale GEMM

* Implement review suggested changes

* Implement review suggested changes

* Sync with develop

* fix pre-commit error

* Add unit tests for blockscale AB-Quantization

* fix pre-commit error

* fix pre-commit error

* fix compile error

* fix compile error

* fix clang-format

* fix clang-format

* fix enumeration values not handled in switch

* rebase file

* Add missing enums to data_type_sizeof (#3430)

Fixes broken build on gfx942. This was some test code that got merged at the same time.

* [CK_BUILDER] CK Tile header installation for builder, algorithm concept improvements (#3419)

* Added install of CK_Tile headers when using CK_EXPERIMENTAL_BUILDER. MIOpen needs this since the builder uses features from CK Tile and the CK Tile install is excluded when doing a narrow build for MIOpen
* Changed algorithm concept type checks to be concepts instead of constexpr bool functions. This improves compiler error messages when using these concepts in static_asserts

---------

Co-authored-by: Daryl Hawkins <DarylHawkins@amd.com>

* Add build trace diagnostics to CI. (#3432)

* generate and visualize build traces for all archs

* generate build traces in all cases

* fix jenkins logic

* fix typo

* use more threads for parsing dependency map

* add script to parse ninja traces and issue warnings

* fix python script syntax and header

* fix python syntax one more time

* fix python syntax

* Support A/B Quantization in Blockscale GEMM

* Implement review suggested changes

* Sync with develop

* Add unit tests for blockscale AB-Quantization

* fix enumeration values not handled in switch

* rebase file

* rebase file

---------

Co-authored-by: John Shumway <jshumway@amd.com>
Co-authored-by: DarylHawkinsAMD <Daryl.Hawkins@amd.com>
Co-authored-by: Daryl Hawkins <DarylHawkins@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 0500fcc017]
2025-12-17 07:13:47 -08:00
KateJu
c3d078376b fix some minor error (#3409)
ReduceWithNoIndexTesBtHalfFloat_AMAX: fix typo error to
ReduceWithNoIndexTesBHalfFloat_AMAX
reduce_blockwise_test<int8_t, float to reduce_blockwise_test<int8_t,
int32_t to solve error message "The reduction setting is invalid,
exiting!"

[ROCm/composable_kernel commit: 292df2719f]
2025-12-16 19:50:49 -08:00
Yi DING
af1927262c [CK_TILE] Add FP8xF4 Flatmm (#3401)
* Refactor policy

* fix a bank conflict

* Enable mixed mx flatmm

* Update

[ROCm/composable_kernel commit: 57e1e4a848]
2025-12-17 10:01:48 +08:00
Illia Silin
f35e7b59cc Add build trace diagnostics to CI. (#3432)
* generate and visualize build traces for all archs

* generate build traces in all cases

* fix jenkins logic

* fix typo

* use more threads for parsing dependency map

* add script to parse ninja traces and issue warnings

* fix python script syntax and header

* fix python syntax one more time

* fix python syntax

[ROCm/composable_kernel commit: 3dfa794fab]
2025-12-16 08:22:52 -08:00
DarylHawkinsAMD
29ed00bbd1 [CK_BUILDER] CK Tile header installation for builder, algorithm concept improvements (#3419)
* Added install of CK_Tile headers when using CK_EXPERIMENTAL_BUILDER. MIOpen needs this since the builder uses features from CK Tile and the CK Tile install is excluded when doing a narrow build for MIOpen
* Changed algorithm concept type checks to be concepts instead of constexpr bool functions. This improves compiler error messages when using these concepts in static_asserts

---------

Co-authored-by: Daryl Hawkins <DarylHawkins@amd.com>

[ROCm/composable_kernel commit: 1e6bbed1fb]
2025-12-15 16:24:36 -07:00
John Shumway
ec9afcfe8d Add missing enums to data_type_sizeof (#3430)
Fixes broken build on gfx942. This was some test code that got merged at the same time.

[ROCm/composable_kernel commit: 2544e394cf]
2025-12-15 11:49:36 -08:00
Aviral Goel
389e797a9b build: reduce build time for bquant tests by splitting into multiple cpp & support on other gfx10 case (#3395)
* build: reduce build time for bqaunt unit tests by splitting into multiple cpp

* reduce the test case & add the gfx10 support

* fix: copyright header for new file

* chore: add copyright to pass the CI

* build: Hot fix to reduce massive build time by just disabling the instances

* Update include/ck_tile/core/config.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: khushbu <khuagarw@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: 5e2d25e20f]
2025-12-15 07:19:29 -08:00
Sami Remes
4a29a8f84d [CK_TILE] Fix some inconsistencies with OverrideBDatatype in BQuant GEMM (#3394)
* Fix some inconsistencies with OverrideBDatatype

* fix formatting

* Fix BGlobalPrefetch, no static

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: a0cdb0b493]
2025-12-15 07:18:38 -08:00
linqunAMD
7cdba74e97 [ck][gfx12] support contraction on gfx12 (#3421)
* support contraction on gfx12

* increase tolerance for gfx11 in example contraction

the precsion of gfx11 wmma is less than others.

[ROCm/composable_kernel commit: 7e93eed878]
2025-12-15 07:16:01 -08:00
linqunAMD
8811c57d44 [ck_tile] remove duplicate functions in ck_tile (#3311)
* [ck_tile] remove duplicated shuffle_b and shuffle_b_permuteN

* [ck_tile] move get_k_warp to gemm_shape

* resolve code rebase error

[ROCm/composable_kernel commit: 6d7299ff78]
2025-12-15 07:13:00 -08:00
Johannes Graner
2fe4c8acec Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380)
* Parallelization in dataset generation

* Parallelizable tests for fwd, bwd data, bwd weight with datasets

* .gitignore generated datasets

* Test parallelization script with round-robin GPU scheduling

* Parallelization updates to test generation and running

* Dataset paths relative to executable

* Update output from test generation

* Default to one GPU in test generation

* Add small dataset tests to Jenkins

* Update copyright lines

* Update test_data/generate_test_dataset.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Move trap disable

* Common get path function

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: fe35ba5dac]
2025-12-15 13:38:25 +01:00
Bartłomiej Kocot
a45c051ac9 [CK TILE][AICK-439] Fix cshuffle epilogue wave per shuffle (#3364)
* [CK TILE] Fix cshufle epligoue wave per shuffle

* Align shuffle per tile with smem

* fixes

* Fixes for double smem

* fix

[ROCm/composable_kernel commit: 3b773109e5]
2025-12-15 12:59:48 +01:00
Johannes Graner
6238fe6d0d [CK Grouped Gemm] Disable split-k kernel for split-k > 1 with non-contiguous strides (#3405)
* Disable kernel for split-k > 1 with non-contiguous strides

* Update device_grouped_gemm_xdl_splitk_cshuffle.hpp

---------

AICK-441 (partial)

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 3143a5a480]
2025-12-15 08:03:00 +01:00
Linjun-AMD
51886bf22b Add attention sink support for FMHA FWD (#3368)
* Revert "Revert "Add attn sink (#2892)" (#3250)"

This reverts commit e3be392d13e6ee107d823af32aca2d3ff03ca69d.

* fix conflict

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Add F_sink parameter to FmhaFwdPipeline

* Update tile_fmha_traits.hpp

* Refactor pipeline creation in fmha_fwd.py

Updated the pipeline creation logic to include 'sink' parameter in product combinations and adjusted the FmhaFwdPipeline calls accordingly.

* Update fmha_fwd.py

* Update fmha_fwd.py

* Update example/ck_tile/01_fmha/script/correct_test_fwd_sink.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update CHANGELOG.md

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Update CHANGELOG with new features and support

* Update fmha_fwd.hpp

* Update CHANGELOG.md

* Update smoke_test_fwd_sink.sh

* Update correct_test_fwd_sink.sh

* Update smoke_test_fwd_sink.sh

---------

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: f5573f56d9]
2025-12-15 12:21:59 +08:00
Emily Martins
eeb78c46a4 [CK_TILE] Stream-K Tree Reduction and Cache Skipping Integration (#3371)
* CK Tile Stream-K Tree Reduction

This change adds the first implementation of the Stream-K tree reduction
strategy into CK Tile. The tree reduction reduces the the number of
steps for accumulating results for a tile from O(N) to O(logN) where N
is the number of workgroups contributing to a C tile.

Additionally, in the original non-atomic reduction strategy, atomics
were used to set the flags buffer and to read from the flags buffer.
Howeover, through investigation with the tree reduciton, atomics with
default (relaxed) semantics were not enough to guarantee workgroups
would not read stale data, leading to incorrect results. Stronger
acquire/release memory orderings are too expensive. So, this change
also eliminates the use of atomics for setting the flags. Instead, we
leverage cache modifiers (e.g., GLC) to avoid writing to cache, thereby
avoiding the use of atomics.

Prelimiary tests were also added for the normal reduction and tree
reduction. More will be added in a future PR via tile engine.

* Move Stream-K kernel files to a subdirectory

* Cleanup Code Style & Handle Unsupported Reductions

This change makes the following small changes:
- Add an explicit else block for unimplemented reduction strategies
- Clarify type of sk_flags_ptr via auto*
- Add description for extra_iters_before_me variable

* Run new copyright script on new files

[ROCm/composable_kernel commit: 22b945e06e]
2025-12-14 14:49:49 -07:00
John Shumway
a3270d2eb0 Add describe() method to device ops for runtime introspection (#3375)
Introduces a polymorphic describe() method to BaseOperator that enables runtime introspection of kernel configurations through a unified interface.

Key changes:

* Add virtual describe() method to BaseOperator returning Description objects
* Implement describe() in 6 device operation classes (conv fwd/bwd variants)
* Create conv_describe.hpp with factory function for ConvDescription
* Extract type definitions to conv_types.hpp to resolve circular dependencies
* Add InstanceStringDescription for kernels without full ConvDescription support

Other Improvements:

* Update tests to use describe() instead of GetInstanceString()
* Remove circular dependency include from conv_traits.hpp
* Add ODD_C to ConvFwdSpecialization enum and fix OddC mapping
* Replace silent fallback in conv_layout() with compile-time error

This provides a foundation for runtime kernel introspection and better tooling support for analyzing and debugging kernel configurations.

[ROCm/composable_kernel commit: 9ac51aa0f4]
2025-12-14 12:49:12 -08:00
Enrico Degregori
5c81464568 CK Tile: Enable padding blockscale example (#3417)
* Fix host code padding

* restructure the ref code

* clean up

* Fix compilation error

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 21f06aa47d]
2025-12-14 10:25:47 -08:00
Robin Voetter
417ed79412 [CK_BUILDER] convolution testing (#3267)
* Add README.md for testing

* Add tensor_memory_manager.

* ck-builder: tensor memory manager rebase fixes

This fixes some issues caused by the API being changed recently.
Also, this streamlines the ckt namespace to always be ck_tile::builder::test,
as this is already being used by other tests

Really, this commit should be squashed into the previous,
but I'm keeping it separate for brevity.

* ck-builder: test arguments initial prototype

* ck-builder: test system initial prototype

* ck-builder: fix non-standardized copyright comments

* ck-builder: new prototype

* ck-builder: group testing inputs/outputs into a separate structure

This is basically the return of the tensor memory manager after all,
except that the design is more closely tied to the actual operation.
Using a struct allows us to add additional input/output tensors
without breaking code (by defaulting those new parameters). Note
that the tensors are split into a separate inputs/outputs because we
usually want to allocate the output _twice_: once for the real
computation and once for the reference computation.

* ck-builder: simplify prototype naming; start docs

* ck-builder: update testing readme

* ck-builder: testing documentation

* ck-builder: HipStatusMatcher

This matcher can be used to check HIP status codes and provide
nice and readable error messages.

* ck-builder: tensor_buffer.hpp tests

* ck-builder: conv_fwd.hpp tests

* ck-builder: add example end-to-end test in conv fwd 2d fp16

* ck-builder: simplify extent usage

* ck-builder: update testing doc

* ck-builder: skip end to end test on non-gfx9

* fix check_copyright_year interpreter

/bin/bash is not guaranteed to exist on Linux. Signed,
a NixOS user

* ck-builder: fix copyrights

* ck-builder: reduce conv fwd testing size

This test allocated 24GB of memory, too much for 16GB cards.

---------

Co-authored-by: John Shumway <jshumway@amd.com>

[ROCm/composable_kernel commit: 6219b12730]
2025-12-13 15:33:41 +01:00
Cong Ma
d287385933 [CK TILE GEMM STREAMK] update identifier names according to the new code style (#3348)
* [CK TILE GEMM STREAMK] update identifier names according to the new code style

[ROCm/composable_kernel commit: 9707ddb444]
2025-12-12 17:08:26 -07:00
Enrico Degregori
7cbd8b75a0 Fix compilation ab scale multi target (#3413)
[ROCm/composable_kernel commit: b4a34371a6]
2025-12-12 10:26:47 -08:00
linqunAMD
245c274287 [CK_TILE] Port hw independent changes from internal repo to develop branch (#3301)
* [CK_TILE] Port hw independent changes from internal repo to develop branch

It includes PR#96, #114, #120, #121.

* correct rebase error

[ROCm/composable_kernel commit: fc7bf0ab1c]
2025-12-12 09:28:37 -08:00
Illia Silin
f9bf419b01 disable test_tile_gemm_quant_bquant_preshuffle (#3420)
[ROCm/composable_kernel commit: 9869641324]
2025-12-12 09:27:12 -08:00
dependabot[bot]
b4d5a50216 Bump rocm-docs-core[api_reference] from 1.31.0 to 1.31.1 in /docs/sphinx (#3410)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.31.0 to 1.31.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.31.0...v1.31.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 8d7a4e0c73]
2025-12-11 21:09:40 -08:00
Max Podkorytov
2ac57c22c1 [CK-Tile] fixup codegen for tile engine ops gemm multid and gemm preshuffle (#3383)
* fixup gemm multi-d and preshuffle in tile engine codegen

---------

Co-authored-by: Thrupti Raj Lakshmana Gowda <thruptiraj.lakshmanagowda@amd.com>

[ROCm/composable_kernel commit: 4011dbfec3]
2025-12-11 14:23:43 -08:00
Aviral Goel
5d5dbdfb0d build: Hot fix to reduce massive build time by just disabling the instances (#3408)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: ff194a4271]
2025-12-11 10:39:20 -08:00
Aviral Goel
32faf7b8e3 chore: add copyright to pass the CI (#3407)
[ROCm/composable_kernel commit: 45c4ea510c]
2025-12-11 10:34:15 -08:00
Aviral Goel
f2a25da322 chore: update copyright header for misc files (#3402)
* chore: update copyright header for misc files

* fix: typo in kernel resulting in ci failure

[ROCm/composable_kernel commit: 4dcc3e59c1]
2025-12-11 08:25:29 -08:00
Illia Silin
f55ff25622 Fix compilation errors with latest clang22 version. (#3396)
* remove target attributes from deduction guides

* switch CK_TILE_HOST_DEVICE_EXTERN based on clang version

[ROCm/composable_kernel commit: b2925ee207]
2025-12-11 08:09:29 -08:00
eliotwang
d5645ff481 Bf16*fp4 gemm (#2801)
* support bf16*mxfp4 gemm

* rebase bf16*fp4 example to develop branch

* Clean up commented debug code in GEMM kernel

* rename example folder

* support bf16*mxfp4 gemm

* rebase bf16*fp4 example to develop branch

* Clean up commented debug code in GEMM kernel

* rename example folder

* rebase to new develop

* fix clang format

* update code according to reviewer's comment

* Update README.md

* update code according to reviewer's comment

* update code according to reviewer's comment

* Update CMakeLists.txt

* Update README.md

* Update CMakeLists.txt

* Delete files

* Delete files

* Add unit tests

* Update test_gemm_quant_base.hpp

* merge bf16*fp4 example to develop branch

* fix clang format

* fix clang format

* Update CMakeLists.txt

* fix ci test

* fix clang format

* resolve conflicts

---------

Co-authored-by: eliotwang <charyang@smci355-ccs-aus-m10-29.cs-aus.dcgpu>
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 715671e419]
2025-12-11 07:20:29 -08:00
Enrico Degregori
53dc636c6e Wmma support for gemm_ab_scale (#3314)
* Support gemm_ab_scale:

 - Add tests
 - Integrate scaling implementation in multiple D
 - Generalize existing b_scale for ab_scale
 - Add instances
 - Generalize implementation for ScaleBlockM, ScaleBlockN, ScaleBlockK
 - Add support for all layouts supported by xdl
 - Fix splitk xdl

* Fix copyright

* Wmma support for gemm_blockscale_wp (#3315)

* Support for  preshuffle with ab scale

 - add support for b preshuffle in GridwiseGemm_wmma_cshuffle_v3_ab_scale
 - add support for AScaleLayout amnd BScaleLayout (can be different
   from ALayout and BLayout, respectively)
 - add Run method in v1 pipeline to support preshuffle + scaling
 - add support for preshuffle gemms in common invoker
 - Add splitk support

* Fix copyright header

[ROCm/composable_kernel commit: ce99cab605]
2025-12-11 09:06:20 +01:00
Ville Pietilä
fe0fe6f4ad [CK_BUILDER] Improve CK Builder and CK Builder tests (#3382)
* Remove stale documentation.

* Add placeholder for conv algorithm design description. Add link to conv factory description.

* Improve testing transfer parameters.

* Python script to check the block tilings.

* Improve tests and conv types serialization.

* Change representation of boolean values from 1/0 to true/false in instance strings.

* Change representation of boolean values from 1/0 to true/false in conv algorithm types.

* Test code improvements.

* Improve covn descriptions tests.

* Improve conv signature definition in conv fwd builder tests.

* clang-format.

* Remove obsolete script.

* Revert StaticAssertTypeEq changes in conv layout tests.

* Remove obsolete using declaration.

---------

Co-authored-by: Ville Pietilä <>

[ROCm/composable_kernel commit: d66e5f667c]
2025-12-11 09:50:00 +02:00
Aviral Goel
d810876d63 feat(precommit-hooks): add check for correct copyright header (#3302)
* chore(copyright): update copyright header for left files

* feat(copyright): add copyright check to precommit hooks

* chore(copyright): update copyright header for include/ck_tile directory

* chore(copyright): update copyright header for example directory

* chore(copyright): update copyright header for .github directory

* refactor: copyright_check script with better if else handling

* chore(copyright): update compyright header for remaining files

* feat: add script to automate copyright addition

[ROCm/composable_kernel commit: 6d25525adc]
2025-12-10 22:50:43 -08:00
Aviral Goel
f38b64ae67 docs: add notes on tile distribution and inline comments (#3297)
* docs: add notes on tile distribution and inline comments

* Apply suggestions from code review

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: fbbdd36ea8]
2025-12-10 22:47:19 -08:00
Geo Min
f2a77cf0bd [ci] Bumping TheRock commit hash (#3385)
* Bumping TheRock commit hash

* new docker hash

* Using new runner name

[ROCm/composable_kernel commit: 8270900d60]
2025-12-10 17:34:41 -08:00
John Shumway
c868964f6a Improve sequence sorting and add unit tests (#3376)
Old sequence sort code was showing up on build profiles. Convert it to constexpr functions for much more efficient build-time execution. The sorting is still O(N^2), but our sequences are small enough it executes quickly. This reduced compilation time of a small convolution by more than 10% and time overall time spent in the compiler on a narrow build by %6.

[ROCm/composable_kernel commit: 15ed65db35]
2025-12-10 12:25:23 -08:00
Po Yen Chen
737c80d47d fix: python 3.8 compatibility in fmha codegen (#3388)
[ROCm/composable_kernel commit: b15df37255]
2025-12-10 07:08:41 -08:00
Ville Pietilä
d719c09343 [CK_TILE] Split-K autodeduction (#3351)
* First version of split-K autodeduction.

* Fix circular dependency and kernel construction.

* Fix tolerance calculation for bwd weight example.

* Simplify kernel construction.

* Fix kernel launching bug for split-K autodeduce.

* Add split-K autodeduction support for the two stage example.

* Fix a corner case.

* Fix clang-format.

* Fix clang-format for inc files.

* Add missing header.

* Prevent too large split-K values.

* Fix formatting.

* Add unit tests for IsSupportedArgument in grouped bwd conv.

* clang-format.

* Fix merge conflicts.

* Address feedback from code review.

* clang-format

* Fix new tests after merge.

---------

Co-authored-by: Ville Pietilä <>

[ROCm/composable_kernel commit: fc22320d78]
2025-12-10 09:30:30 +02:00
Zzz9990
822da5d3a7 [CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377)
* update coherence
---------

Co-authored-by: Zzz9990 <Zzz9990>

[ROCm/composable_kernel commit: 1aa93ef551]
2025-12-10 10:03:28 +08:00
Illia Silin
ee0d92f8fc use hipTensor from monorepo for daily builds (#3386)
[ROCm/composable_kernel commit: 934ba1208a]
2025-12-09 14:39:08 -08:00
Illia Silin
5f4c14b336 temporarily disable daily builds on gfx1010 and gfx908 (#3384)
[ROCm/composable_kernel commit: 0d8259affd]
2025-12-09 10:37:13 -08:00
Illia Silin
cdacf1d5f5 Upgrade to ROCm7.1.1 as default compiler. (#3370)
* upgrade to rocm7.1.1 as new default compiler

* fix jenkinsfile

[ROCm/composable_kernel commit: 7582c9e73f]
2025-12-09 07:35:32 -08:00
dependabot[bot]
821b976ead Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx (#3374)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.20.1 to 1.31.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.20.1...v1.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/composable_kernel commit: 50ca3f83eb]
2025-12-09 07:10:34 -08:00