Commit Graph

2798 Commits

Author SHA1 Message Date
Thomas Ning
99b4da18a2 Update tile_engine/ops/pooling/pool_profiler.hpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-19 11:18:18 -08:00
Thomas Ning
9e97c2dc40 Update tile_engine/ops/pooling/pool_instance_builder.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-19 11:17:53 -08:00
Aleksander Dudek
39ad7898fa [CK_TILE] Add pooling tests for ckTileEngine 2025-12-16 11:20:47 +00:00
Aleksander Dudek
ef33d16a23 [CK_TILE] fix formatting of pooling in ckTileEngine with clang-format part3 2025-12-16 10:51:57 +00:00
Aleksander Dudek
4c8dbd3847 Merge branch 'develop' into ckTileEnginePooling 2025-12-16 10:42:16 +00:00
Aleksander Dudek
1bccd37e06 [CK_TILE] fix formatting of pooling in ckTileEngine with clang-format 2025-12-16 10:40:41 +00:00
DarylHawkinsAMD
1e6bbed1fb [CK_BUILDER] CK Tile header installation for builder, algorithm concept improvements (#3419)
* Added install of CK_Tile headers when using CK_EXPERIMENTAL_BUILDER. MIOpen needs this since the builder uses features from CK Tile and the CK Tile install is excluded when doing a narrow build for MIOpen
* Changed algorithm concept type checks to be concepts instead of constexpr bool functions. This improves compiler error messages when using these concepts in static_asserts

---------

Co-authored-by: Daryl Hawkins <DarylHawkins@amd.com>
2025-12-15 16:24:36 -07:00
John Shumway
2544e394cf Add missing enums to data_type_sizeof (#3430)
Fixes broken build on gfx942. This was some test code that got merged at the same time.
2025-12-15 11:49:36 -08:00
Aviral Goel
5e2d25e20f build: reduce build time for bquant tests by splitting into multiple cpp & support on other gfx10 case (#3395)
* build: reduce build time for bqaunt unit tests by splitting into multiple cpp

* reduce the test case & add the gfx10 support

* fix: copyright header for new file

* chore: add copyright to pass the CI

* build: Hot fix to reduce massive build time by just disabling the instances

* Update include/ck_tile/core/config.hpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: khushbu <khuagarw@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-15 07:19:29 -08:00
Sami Remes
a0cdb0b493 [CK_TILE] Fix some inconsistencies with OverrideBDatatype in BQuant GEMM (#3394)
* Fix some inconsistencies with OverrideBDatatype

* fix formatting

* Fix BGlobalPrefetch, no static

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-12-15 07:18:38 -08:00
linqunAMD
7e93eed878 [ck][gfx12] support contraction on gfx12 (#3421)
* support contraction on gfx12

* increase tolerance for gfx11 in example contraction

the precsion of gfx11 wmma is less than others.
2025-12-15 07:16:01 -08:00
linqunAMD
6d7299ff78 [ck_tile] remove duplicate functions in ck_tile (#3311)
* [ck_tile] remove duplicated shuffle_b and shuffle_b_permuteN

* [ck_tile] move get_k_warp to gemm_shape

* resolve code rebase error
2025-12-15 07:13:00 -08:00
Johannes Graner
fe35ba5dac Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380)
* Parallelization in dataset generation

* Parallelizable tests for fwd, bwd data, bwd weight with datasets

* .gitignore generated datasets

* Test parallelization script with round-robin GPU scheduling

* Parallelization updates to test generation and running

* Dataset paths relative to executable

* Update output from test generation

* Default to one GPU in test generation

* Add small dataset tests to Jenkins

* Update copyright lines

* Update test_data/generate_test_dataset.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Move trap disable

* Common get path function

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-15 13:38:25 +01:00
Bartłomiej Kocot
3b773109e5 [CK TILE][AICK-439] Fix cshuffle epilogue wave per shuffle (#3364)
* [CK TILE] Fix cshufle epligoue wave per shuffle

* Align shuffle per tile with smem

* fixes

* Fixes for double smem

* fix
2025-12-15 12:59:48 +01:00
Johannes Graner
3143a5a480 [CK Grouped Gemm] Disable split-k kernel for split-k > 1 with non-contiguous strides (#3405)
* Disable kernel for split-k > 1 with non-contiguous strides

* Update device_grouped_gemm_xdl_splitk_cshuffle.hpp

---------

AICK-441 (partial)

Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-12-15 08:03:00 +01:00
Linjun-AMD
f5573f56d9 Add attention sink support for FMHA FWD (#3368)
* Revert "Revert "Add attn sink (#2892)" (#3250)"

This reverts commit 5adaa201ed.

* fix conflict

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Add F_sink parameter to FmhaFwdPipeline

* Update tile_fmha_traits.hpp

* Refactor pipeline creation in fmha_fwd.py

Updated the pipeline creation logic to include 'sink' parameter in product combinations and adjusted the FmhaFwdPipeline calls accordingly.

* Update fmha_fwd.py

* Update fmha_fwd.py

* Update example/ck_tile/01_fmha/script/correct_test_fwd_sink.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update CHANGELOG.md

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Update CHANGELOG with new features and support

* Update fmha_fwd.hpp

* Update CHANGELOG.md

* Update smoke_test_fwd_sink.sh

* Update correct_test_fwd_sink.sh

* Update smoke_test_fwd_sink.sh

---------

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-15 12:21:59 +08:00
Emily Martins
22b945e06e [CK_TILE] Stream-K Tree Reduction and Cache Skipping Integration (#3371)
* CK Tile Stream-K Tree Reduction

This change adds the first implementation of the Stream-K tree reduction
strategy into CK Tile. The tree reduction reduces the the number of
steps for accumulating results for a tile from O(N) to O(logN) where N
is the number of workgroups contributing to a C tile.

Additionally, in the original non-atomic reduction strategy, atomics
were used to set the flags buffer and to read from the flags buffer.
Howeover, through investigation with the tree reduciton, atomics with
default (relaxed) semantics were not enough to guarantee workgroups
would not read stale data, leading to incorrect results. Stronger
acquire/release memory orderings are too expensive. So, this change
also eliminates the use of atomics for setting the flags. Instead, we
leverage cache modifiers (e.g., GLC) to avoid writing to cache, thereby
avoiding the use of atomics.

Prelimiary tests were also added for the normal reduction and tree
reduction. More will be added in a future PR via tile engine.

* Move Stream-K kernel files to a subdirectory

* Cleanup Code Style & Handle Unsupported Reductions

This change makes the following small changes:
- Add an explicit else block for unimplemented reduction strategies
- Clarify type of sk_flags_ptr via auto*
- Add description for extra_iters_before_me variable

* Run new copyright script on new files
2025-12-14 14:49:49 -07:00
John Shumway
9ac51aa0f4 Add describe() method to device ops for runtime introspection (#3375)
Introduces a polymorphic describe() method to BaseOperator that enables runtime introspection of kernel configurations through a unified interface.

Key changes:

* Add virtual describe() method to BaseOperator returning Description objects
* Implement describe() in 6 device operation classes (conv fwd/bwd variants)
* Create conv_describe.hpp with factory function for ConvDescription
* Extract type definitions to conv_types.hpp to resolve circular dependencies
* Add InstanceStringDescription for kernels without full ConvDescription support

Other Improvements:

* Update tests to use describe() instead of GetInstanceString()
* Remove circular dependency include from conv_traits.hpp
* Add ODD_C to ConvFwdSpecialization enum and fix OddC mapping
* Replace silent fallback in conv_layout() with compile-time error

This provides a foundation for runtime kernel introspection and better tooling support for analyzing and debugging kernel configurations.
2025-12-14 12:49:12 -08:00
Enrico Degregori
21f06aa47d CK Tile: Enable padding blockscale example (#3417)
* Fix host code padding

* restructure the ref code

* clean up

* Fix compilation error

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
2025-12-14 10:25:47 -08:00
Robin Voetter
6219b12730 [CK_BUILDER] convolution testing (#3267)
* Add README.md for testing

* Add tensor_memory_manager.

* ck-builder: tensor memory manager rebase fixes

This fixes some issues caused by the API being changed recently.
Also, this streamlines the ckt namespace to always be ck_tile::builder::test,
as this is already being used by other tests

Really, this commit should be squashed into the previous,
but I'm keeping it separate for brevity.

* ck-builder: test arguments initial prototype

* ck-builder: test system initial prototype

* ck-builder: fix non-standardized copyright comments

* ck-builder: new prototype

* ck-builder: group testing inputs/outputs into a separate structure

This is basically the return of the tensor memory manager after all,
except that the design is more closely tied to the actual operation.
Using a struct allows us to add additional input/output tensors
without breaking code (by defaulting those new parameters). Note
that the tensors are split into a separate inputs/outputs because we
usually want to allocate the output _twice_: once for the real
computation and once for the reference computation.

* ck-builder: simplify prototype naming; start docs

* ck-builder: update testing readme

* ck-builder: testing documentation

* ck-builder: HipStatusMatcher

This matcher can be used to check HIP status codes and provide
nice and readable error messages.

* ck-builder: tensor_buffer.hpp tests

* ck-builder: conv_fwd.hpp tests

* ck-builder: add example end-to-end test in conv fwd 2d fp16

* ck-builder: simplify extent usage

* ck-builder: update testing doc

* ck-builder: skip end to end test on non-gfx9

* fix check_copyright_year interpreter

/bin/bash is not guaranteed to exist on Linux. Signed,
a NixOS user

* ck-builder: fix copyrights

* ck-builder: reduce conv fwd testing size

This test allocated 24GB of memory, too much for 16GB cards.

---------

Co-authored-by: John Shumway <jshumway@amd.com>
2025-12-13 15:33:41 +01:00
Cong Ma
9707ddb444 [CK TILE GEMM STREAMK] update identifier names according to the new code style (#3348)
* [CK TILE GEMM STREAMK] update identifier names according to the new code style
2025-12-12 17:08:26 -07:00
Enrico Degregori
b4a34371a6 Fix compilation ab scale multi target (#3413) 2025-12-12 10:26:47 -08:00
linqunAMD
fc7bf0ab1c [CK_TILE] Port hw independent changes from internal repo to develop branch (#3301)
* [CK_TILE] Port hw independent changes from internal repo to develop branch

It includes PR#96, #114, #120, #121.

* correct rebase error
2025-12-12 09:28:37 -08:00
Illia Silin
9869641324 disable test_tile_gemm_quant_bquant_preshuffle (#3420) 2025-12-12 09:27:12 -08:00
Aleksander Dudek
d5cef00770 Merge branch 'develop' into ckTileEnginePooling 2025-12-12 10:33:06 +00:00
dependabot[bot]
8d7a4e0c73 Bump rocm-docs-core[api_reference] from 1.31.0 to 1.31.1 in /docs/sphinx (#3410)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.31.0 to 1.31.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.31.0...v1.31.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-11 21:09:40 -08:00
Max Podkorytov
4011dbfec3 [CK-Tile] fixup codegen for tile engine ops gemm multid and gemm preshuffle (#3383)
* fixup gemm multi-d and preshuffle in tile engine codegen

---------

Co-authored-by: Thrupti Raj Lakshmana Gowda <thruptiraj.lakshmanagowda@amd.com>
2025-12-11 14:23:43 -08:00
Aviral Goel
ff194a4271 build: Hot fix to reduce massive build time by just disabling the instances (#3408)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2025-12-11 10:39:20 -08:00
Aviral Goel
45c4ea510c chore: add copyright to pass the CI (#3407) 2025-12-11 10:34:15 -08:00
Aviral Goel
4dcc3e59c1 chore: update copyright header for misc files (#3402)
* chore: update copyright header for misc files

* fix: typo in kernel resulting in ci failure
2025-12-11 08:25:29 -08:00
Illia Silin
b2925ee207 Fix compilation errors with latest clang22 version. (#3396)
* remove target attributes from deduction guides

* switch CK_TILE_HOST_DEVICE_EXTERN based on clang version
2025-12-11 08:09:29 -08:00
eliotwang
715671e419 Bf16*fp4 gemm (#2801)
* support bf16*mxfp4 gemm

* rebase bf16*fp4 example to develop branch

* Clean up commented debug code in GEMM kernel

* rename example folder

* support bf16*mxfp4 gemm

* rebase bf16*fp4 example to develop branch

* Clean up commented debug code in GEMM kernel

* rename example folder

* rebase to new develop

* fix clang format

* update code according to reviewer's comment

* Update README.md

* update code according to reviewer's comment

* update code according to reviewer's comment

* Update CMakeLists.txt

* Update README.md

* Update CMakeLists.txt

* Delete files

* Delete files

* Add unit tests

* Update test_gemm_quant_base.hpp

* merge bf16*fp4 example to develop branch

* fix clang format

* fix clang format

* Update CMakeLists.txt

* fix ci test

* fix clang format

* resolve conflicts

---------

Co-authored-by: eliotwang <charyang@smci355-ccs-aus-m10-29.cs-aus.dcgpu>
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2025-12-11 07:20:29 -08:00
Aleksander Dudek
f6d2243288 [CK_TILE] Add pooling to ckTileEngine part4 fix suppported configurations 2025-12-11 12:11:42 +00:00
Enrico Degregori
ce99cab605 Wmma support for gemm_ab_scale (#3314)
* Support gemm_ab_scale:

 - Add tests
 - Integrate scaling implementation in multiple D
 - Generalize existing b_scale for ab_scale
 - Add instances
 - Generalize implementation for ScaleBlockM, ScaleBlockN, ScaleBlockK
 - Add support for all layouts supported by xdl
 - Fix splitk xdl

* Fix copyright

* Wmma support for gemm_blockscale_wp (#3315)

* Support for  preshuffle with ab scale

 - add support for b preshuffle in GridwiseGemm_wmma_cshuffle_v3_ab_scale
 - add support for AScaleLayout amnd BScaleLayout (can be different
   from ALayout and BLayout, respectively)
 - add Run method in v1 pipeline to support preshuffle + scaling
 - add support for preshuffle gemms in common invoker
 - Add splitk support

* Fix copyright header
2025-12-11 09:06:20 +01:00
Ville Pietilä
d66e5f667c [CK_BUILDER] Improve CK Builder and CK Builder tests (#3382)
* Remove stale documentation.

* Add placeholder for conv algorithm design description. Add link to conv factory description.

* Improve testing transfer parameters.

* Python script to check the block tilings.

* Improve tests and conv types serialization.

* Change representation of boolean values from 1/0 to true/false in instance strings.

* Change representation of boolean values from 1/0 to true/false in conv algorithm types.

* Test code improvements.

* Improve covn descriptions tests.

* Improve conv signature definition in conv fwd builder tests.

* clang-format.

* Remove obsolete script.

* Revert StaticAssertTypeEq changes in conv layout tests.

* Remove obsolete using declaration.

---------

Co-authored-by: Ville Pietilä <>
2025-12-11 09:50:00 +02:00
Aviral Goel
6d25525adc feat(precommit-hooks): add check for correct copyright header (#3302)
* chore(copyright): update copyright header for left files

* feat(copyright): add copyright check to precommit hooks

* chore(copyright): update copyright header for include/ck_tile directory

* chore(copyright): update copyright header for example directory

* chore(copyright): update copyright header for .github directory

* refactor: copyright_check script with better if else handling

* chore(copyright): update compyright header for remaining files

* feat: add script to automate copyright addition
2025-12-10 22:50:43 -08:00
Aviral Goel
fbbdd36ea8 docs: add notes on tile distribution and inline comments (#3297)
* docs: add notes on tile distribution and inline comments

* Apply suggestions from code review

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
2025-12-10 22:47:19 -08:00
Geo Min
8270900d60 [ci] Bumping TheRock commit hash (#3385)
* Bumping TheRock commit hash

* new docker hash

* Using new runner name
2025-12-10 17:34:41 -08:00
John Shumway
15ed65db35 Improve sequence sorting and add unit tests (#3376)
Old sequence sort code was showing up on build profiles. Convert it to constexpr functions for much more efficient build-time execution. The sorting is still O(N^2), but our sequences are small enough it executes quickly. This reduced compilation time of a small convolution by more than 10% and time overall time spent in the compiler on a narrow build by %6.
2025-12-10 12:25:23 -08:00
Po Yen Chen
b15df37255 fix: python 3.8 compatibility in fmha codegen (#3388) 2025-12-10 07:08:41 -08:00
Ville Pietilä
fc22320d78 [CK_TILE] Split-K autodeduction (#3351)
* First version of split-K autodeduction.

* Fix circular dependency and kernel construction.

* Fix tolerance calculation for bwd weight example.

* Simplify kernel construction.

* Fix kernel launching bug for split-K autodeduce.

* Add split-K autodeduction support for the two stage example.

* Fix a corner case.

* Fix clang-format.

* Fix clang-format for inc files.

* Add missing header.

* Prevent too large split-K values.

* Fix formatting.

* Add unit tests for IsSupportedArgument in grouped bwd conv.

* clang-format.

* Fix merge conflicts.

* Address feedback from code review.

* clang-format

* Fix new tests after merge.

---------

Co-authored-by: Ville Pietilä <>
2025-12-10 09:30:30 +02:00
Zzz9990
1aa93ef551 [CK_TILE MOE] add NT & preshuffle permute to cktile MOE (#3377)
* update coherence
---------

Co-authored-by: Zzz9990 <Zzz9990>
2025-12-10 10:03:28 +08:00
Illia Silin
934ba1208a use hipTensor from monorepo for daily builds (#3386) 2025-12-09 14:39:08 -08:00
Illia Silin
0d8259affd temporarily disable daily builds on gfx1010 and gfx908 (#3384) 2025-12-09 10:37:13 -08:00
Illia Silin
7582c9e73f Upgrade to ROCm7.1.1 as default compiler. (#3370)
* upgrade to rocm7.1.1 as new default compiler

* fix jenkinsfile
2025-12-09 07:35:32 -08:00
dependabot[bot]
50ca3f83eb Bump rocm-docs-core[api_reference] from 1.20.1 to 1.31.0 in /docs/sphinx (#3374)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.20.1 to 1.31.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.20.1...v1.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-09 07:10:34 -08:00
Aleksander Dudek
07c078d5ef [CK_TILE] Add pooling to ckTileEngine part3 2025-12-09 11:59:37 +00:00
lalala-sh
6f0966e1e9 fix a16w4 moe bugs (#3373)
* fix valid mask bug

* update format
2025-12-09 17:54:55 +08:00
Yi DING
c1c2e41a03 [CK_TILE] Generate random tensor values with multiple threads (#3324) 2025-12-09 11:02:33 +08:00
Sami Remes
c363a98d41 [CK_TILE] Support more layouts for BQuant GEMM (#3349)
* WIP: preparing to add transpose bq support

* WIP: handle both row/col layout for BQ windows/tile dstr

* Fix build

* WIP: adding some test, debugging numerical errors

* Fix all but pkint4 tests

* Remove test_gemm_quant_typed.cpp again

* update disabled tests

* add conversion from pkint4 for b matrix

* fix formatting

* fix formatting

* Fix tr_load and use override b datatype for clarity

* fix formatting

* make bquant preshuffle tests bqlayout column-major
2025-12-08 13:05:56 -08:00