Commit Graph

3879 Commits

Author SHA1 Message Date
assistant-librarian[bot]
c557f19704 Merge commit '3ede8e2a6e9a1c921f27e2d66442829a092cc646' into develop 2025-11-18 09:14:09 +00:00
Sami Remes
acb3b43bc0 [CK_TILE] Non-K Major from old CK to CK-Tile - fix reverted PR (#3199)
* Reapply "[CK_TILE] Non-K Major from old CK to CK-Tile (#2442)" (#3017)

This reverts commit 1cda0c4c95e5f15f3fcbb9a5edf118ea85bcccd2.

* WIP

* take Y2 as the AK1/BK1 value, that is the 'vector size' after shuffle

* use get_n_lds_banks()

* clang-format

---------

Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

[ROCm/composable_kernel commit: 3ede8e2a6e]
2025-11-18 10:17:02 +02:00
assistant-librarian[bot]
b9e9aa04c2 Merge commit 'b6720531de9cbbe5f6022f173ead11c61860f57f' into develop 2025-11-18 06:16:01 +00:00
Yi DING
7336398fb6 [CK_TILE] MX Flatmm Split kernel instances (#3207)
* [CK_TILE] MX Flatmm Split kernel instances

* Fix flatmm example compile

[ROCm/composable_kernel commit: b6720531de]
2025-11-18 13:46:30 +08:00
assistant-librarian[bot]
ca68d8728c Merge commit '92498464f6ede6c4b1f990a57193c47b52530030' into develop 2025-11-18 00:35:57 +00:00
kabrahamAMD
cad9d98976 [CK_Builder] removed direction and elementwise_operation from required parameters … (#3192)
Removed direction and elementwise operation from default values required for convolution signature concept. Added constexpr helpers to set default values. Add compile-time tests.

[ROCm/composable_kernel commit: 92498464f6]
2025-11-17 15:23:48 -08:00
assistant-librarian[bot]
c6712a96ff Merge commit '22a934a2294b778521a85e179c14155b6f72a2e4' into develop 2025-11-17 17:13:23 +00:00
Aviral Goel
41ef9a10f5 chore(copyright): update copyright header for include directory (#3219)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

* chore(copyright): update copyright header for library directory

[ROCm/composable_kernel commit: 22a934a229]
2025-11-17 08:57:45 -08:00
assistant-librarian[bot]
54282fc7b2 Merge commit 'b38bb492a1a55b5abb0c345962143c0f9c482cfb' into develop 2025-11-15 01:40:21 +00:00
Illia Silin
f8ec330b69 Disable DL kernels on all architectures except gfx103x. (#3218)
* disable dl kernels on all archs except gfx103

* add gfx10-3-generic target to cmake

[ROCm/composable_kernel commit: b38bb492a1]
2025-11-14 17:39:50 -08:00
assistant-librarian[bot]
b4e313286b Merge commit '0aadb4b2c4114a26147c30abc894f2693795b888' into develop 2025-11-14 20:13:54 +00:00
Aviral Goel
0577b5dd78 chore(copyright): update copyright header for profiler directory (#3205)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

* chore(copyright): update copyright header for profiler directory

[ROCm/composable_kernel commit: 0aadb4b2c4]
2025-11-14 11:19:25 -08:00
assistant-librarian[bot]
ac02ddd324 Merge commit '3aa883b9ffd3dc4c18414b818774d3da94b8b9e1' into develop 2025-11-14 17:12:11 +00:00
Aviral Goel
90503f7e3d chore(copyright): update copyright header for python directory (#3200)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

* chore(copyright): update copyright header for python directory

[ROCm/composable_kernel commit: 3aa883b9ff]
2025-11-14 08:21:36 -08:00
jefyang1
72dbbc7d77 Add new gemm multiply multiply instances on gfx950 (#3213)
[ROCm/composable_kernel commit: d30babbd00]
2025-11-14 08:20:41 -08:00
assistant-librarian[bot]
32b8a73252 Merge commit 'caadb896f1e01032a9d9a7db8484f9d1f3861f1e' into develop 2025-11-14 05:13:13 +00:00
John Afaganis
b49e30206f 7.2 version bump (#3210)
* 7.2 version bump

* Update CHANGELOG.md

* Update Jenkinsfile

* Update CHANGELOG.md

* Update CMakeLists.txt

* Update Jenkinsfile

[ROCm/composable_kernel commit: caadb896f1]
2025-11-13 21:04:03 -08:00
assistant-librarian[bot]
897c2bd422 Merge commit '4d629cd2b0bb0b4b210881be0db398bcd382f444' into develop 2025-11-14 02:43:22 +00:00
BingYuan.Zhou
807c297a17 fix build error (#3195)
Co-authored-by: root <root@hjbog-srdc-39.amd.com>

[ROCm/composable_kernel commit: 4d629cd2b0]
2025-11-14 09:46:13 +08:00
Yi DING
fda95832b0 [CK_TILE] Improve device printing (#3198)
* [CK_TILE] Improve device printing

* fix host gtest build

* clean

[ROCm/composable_kernel commit: 4a8b17d1a4]
2025-11-14 09:46:06 +08:00
assistant-librarian[bot]
a96aded2b1 Merge commit '2a73eb3bc0828db654c73058f20a2b794c16cb01' into develop 2025-11-14 00:36:42 +00:00
yinglu
bdbe3e4eb9 Simulate TF32 with BF16x3 (#3142)
* tf32:bf16x3:use bf16x3 emulate tf32 gemm

* change blockwiseGemm to demo bf16x3

* temp push

* self review

* self review

* fix multi-device compile error

* bug fix

* code refactor

* limit to gfx950

* enhance gemm gfx942 threshold

* lower change from blockwise to warpwise

* refact codes

* refact codes

* error fix

* change threshold

* bug fix

* fix threshold error

* change host reference implement to same as device

* bug fix

* bug fix

* code refact

* fix clang-format fail

* code refine

[ROCm/composable_kernel commit: 2a73eb3bc0]
2025-11-13 16:21:09 -08:00
assistant-librarian[bot]
acd5abe4f1 Merge commit 'f2cfc6b94ee3154697030c4dfa214040bb4af4c9' into develop 2025-11-13 19:11:21 +00:00
SamiAario-AMD
d49eb1d431 Remove "basic" and universal GEMM tests, and incorporate their test cases into the GEMM pipeline tests (#3094)
* Add missing copyright statements

* Use ck_tile::host_tensor_descriptor instead of a custom lambda

* Refactor use of check_data_type in test classes

* Use TEST_SUITE_NAME with TYPED_TEST_SUITE

* Remove an unused namespace

* Make dim3 const

* Add BF8 x BF8 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add F8 x BF8 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add BF16 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add BF16 x BF16 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add BF8 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add F8 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Add F16 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp

* Skip failing tests of F16 x I4 for CompV3 with K == 2 * K_Tile

* Add missing precision type combinations to CompV4 from CompV3

* Move the INT8 tests around for consistency with KernelTypesCompV3Wmma

* Add missing precision type combinations to CompV3Wmma from CompV3

* Remove the basic and universal tests and their dependencies

* On __gfx950__, avoid using transposed loading of A with datatype pk_int4_t of B

* Use ADataType and BDataType instead of ComputeDataType for WarpGemm

* Explicitly set some return types to void

* Use more general typenames in InterleavedPKTypeLoader

* Add load_interleaved_pk_type.hpp to common.hpp

* Use std::is_same_v in load_int4_tile

* Add handling of LoadTranspose to load_int4_tile

* Factor out common code in several places using load_int4_tile

* Add support for pk_int4_t using load_int4_tile

* Fix formatting

[ROCm/composable_kernel commit: f2cfc6b94e]
2025-11-13 11:01:27 -08:00
assistant-librarian[bot]
0997e2eb6d Merge commit '7d57bc169f8206f06bc516a7f930f388def32347' into develop 2025-11-13 17:13:19 +00:00
Ville Pietilä
547165ce4c [CK_BUILDER] Forward convolution builder improvements (#3179)
Proposed changes
Improve the forward convolution builder implementation and addressed leftover feedback left from PR #3138. Main changes

Refactored tests such that they reflect better the builder pattern. The templates and types for the convolution algorithm concepts are created via factory that facilitates programmatic creation of the device op instances.
Moved tests into anonymous namespace.
The convolution factory had lot of if-else constructs when CK Builder types were converted into CK library types. I had initially trouble in using static_assert in the default branch of switch as the static_assert was evaluated at compile time even for valid types. However, if we change the static_assert to throw "<error message>", it will result in a compile-time error only if the default branch is actually hit. This assumes that the function is consteval. Hence, changed all conversions in the convolution factory to use switch, which is more intuitive.
Removed the explicit device op definition from convolution signature and the corresponding predicate file. The device ops are defined by the corresponding concepts. This allowed to remove lot of boilerplate code from the convolution factory.
Adde inheritance and convolution algorithm specialization to handle device ops that are specialization of a more generic ones. The large tensor support is more naturally expressed by this pattern.
Added support for the FP8 data type.

* WIP: Builder for expected test results.

* Improve ckb fwd conv instance tests.

* clang-format

* Change if-else statements into switch in conv factory.

* Fix clang-formatting.

* Removed unnecessary includes.

* Added missing copyright.

* Remove explicit device op flag from from convolution signature.

* Add missing concept.

* Fix build.

* clang-format

* Add test for building conv fwd FP8 instances.

* Add missing header to instance traits.

* Clean-up recently added instances.

* Introduce inheritance and specialization.

* Use builder to build conv algorithm templates and types.

* clang-format

* Fix conv description tests.

---------

Co-authored-by: John Shumway <john.shumwayjr@gmail.com>

[ROCm/composable_kernel commit: 7d57bc169f]
2025-11-13 08:47:25 -08:00
assistant-librarian[bot]
46929142bf Merge commit 'ca2ee0eb8ae4069175df9e4731c7b0aed56d6c8d' into develop 2025-11-13 16:14:02 +00:00
jefyang1
2a6a163e7d Fix test_gemm_multiply_multiply_wp_xdl_fp8 on gfx950 (#3191)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: ca2ee0eb8a]
2025-11-13 09:32:54 -06:00
assistant-librarian[bot]
53556bf6cb Merge commit '8d50001b939691134a0b078ed15a41e22ee08bd0' into develop 2025-11-13 13:22:01 +00:00
Yi DING
f5eb722fbe [CK_TILE] Improve F8F6F4 Scaled WarpGemm (#3197)
* [CK_TILE] Improve F8F6F4 Scaled WarpGemm

* Thanks, Copilot

[ROCm/composable_kernel commit: 8d50001b93]
2025-11-13 20:22:05 +08:00
assistant-librarian[bot]
76e50bb65b Merge commit 'fb41a7b73be5b686611e3bc75668cb8025252d8d' into develop 2025-11-13 08:15:17 +00:00
Khushbu Agarwal
1ec766b17d fixing ambiguous shuffle definitions (#3175)
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: fb41a7b73b]
2025-11-12 23:44:12 -08:00
Cong Ma
fec8b3228b [CK TILE GEMM] Refactor block_scale_gemm examples (#3181)
* [CK TILE GEMM] Refactor block_scale_gemm examples

- Split cpp file to reduce building time
- Support multiple GemmConfig

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update Readme

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Add support for rowcol and tensor GEMM operations

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Update README

* [CK TILE GEMM] Refactor block_scale_gemm examples

- Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.

[ROCm/composable_kernel commit: 6fd8ddabe7]
2025-11-12 23:43:40 -08:00
assistant-librarian[bot]
c36a71b050 Merge commit '9af30f04b65b8e50877d01ce8377a8cd581d462c' into develop 2025-11-13 07:13:36 +00:00
Thrupti Raj Lakshmana Gowda
5c19f34cb4 Ck tile engine commons (#3166)
* Moving Preshuffle to commons

* Fixing Common Validations

* Addressing Review Comments

* Partial Rebasing

* Partial Rebasing

* Partial Rebasing

* Rebasing Complete

[ROCm/composable_kernel commit: 9af30f04b6]
2025-11-13 00:56:18 -06:00
assistant-librarian[bot]
f03d7dcf6e Merge commit '797ddfa41e5e2c45f9eea9e6c969ba528e5a9c39' into develop 2025-11-13 00:36:06 +00:00
Aviral Goel
4c43e89a84 chore(copyright): update copyright header for test_data directory (#3194)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

* chore(copyright): update copyright header for test_data directory

[ROCm/composable_kernel commit: 797ddfa41e]
2025-11-12 16:07:28 -08:00
John Afaganis
97dba6f3c5 Add C++17 deprecation warning to CHANGELOG.md (#3203)
* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

[ROCm/composable_kernel commit: 9342365713]
2025-11-12 16:05:53 -08:00
assistant-librarian[bot]
77527d2fa6 Merge commit '3784c0e7c395af214fdddd5f702691b354bfe8d4' into develop 2025-11-12 20:14:45 +00:00
Illia Silin
fbab772ad4 add permissions for /tmp folder (#3201)
[ROCm/composable_kernel commit: 3784c0e7c3]
2025-11-12 11:47:07 -08:00
Enrico Degregori
e00db44d0c Wmma support for gemm_reduce (#3145)
* Initial implementation GEMM+Reduce:

 - device struct
 - epilogue struct

* Fix tests, improve profiler and add initial instances

* Add instances

* Fix compilation error

* Address review comments

* Fix logging

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 7414a0f4d4]
2025-11-12 11:23:54 -08:00
assistant-librarian[bot]
90e4b6bfe9 Merge commit '299c9bca1bee2ef77bb78878bcdd9d11a13564e5' into develop 2025-11-12 16:14:54 +00:00
Yashvardhan Agarwal
c8c5a7e1c6 [CK_Tile] Pooling example readme update (#3174)
* pooling example readme update

- The updated readme explains the transformations of the pooling kernel
using a mermaid diagram

* Update example/ck_tile/36_pooling/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* resolve comments

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: 299c9bca1b]
2025-11-12 07:30:20 -08:00
assistant-librarian[bot]
98033a68ce Merge commit '40d2ed0f2a442026c57dc17e6e7bd281b6c2535c' into develop 2025-11-12 02:42:51 +00:00
Po Yen Chen
97cb3abf33 [CK_TILE] Share partition index across threads and specify offset in load_tile()/async_load_tile()/load_tile_transpose() (#2905)
* Allow sharing partition index across threads

* Fix typo PartitoinIndex -> PartitionIndex

* Remove C++20 'requires' usages

* Add missing template arguments

* Fix load_tile() overload ambiguity issue

* Use SFINAE to exclude invalid arguments

* Add additional offset parameter to the async_load_tile()

* Remove async_load_tile() default argument to avoid ambiguity

* Extract tile_window coordinate compute logic as method

* Use warp-shared LDS base address in tile_window::async_load()

* Add constraint to tile_window::load() templates

* Fix wrong type traits is_class_v<> usages

* Add missing constraint to async_load_tile()

* Add missing tile_window::load() overload

* Add more constraint to avoid load_tile() call ambiguity

* Rename ParitionIndex as ReplacementPartitionIndex

* Update pre_computed_warp_coords_ in move_extended()

* Fix inconsistency between template parameters and documentation

* Allow specifying pre-computed parition index

* Add type straits is_sequence<> & is_tile_distribution<>

* Add type straits is_tensor_view<>

* Add type constraints to make_tile_window() templates

* Allow passing partition_index to set_tile_if()

* Allow specifying partition_index to store_tile()

* Add missing template parameter of replace_bottom_tensor_view()

* Allow passing partition_index to Default2DEpilogue

* Make get_partition_index() public

* Add _with_offset() postfix to avoid resolution error

* Remove ReplacementPartitionIndex template param

* Add missing comments

* Add load_tile_transpose_with_offset() overload

[ROCm/composable_kernel commit: 40d2ed0f2a]
2025-11-12 10:26:14 +08:00
assistant-librarian[bot]
c014babf51 Merge commit '92c1f4981ab1d081978c8f6132ca93949d4749e6' into develop 2025-11-11 22:12:49 +00:00
Bartłomiej Kocot
a2a69e7649 [CK_BUILDER] Add grouped conv fwd ck tile traits (#3183)
* [CK BUILDER] Add grouped conv fwd ck tile traits

* Update instance_traits_tile_grouped_convolution_forward.hpp

* Update grouped_convolution_forward_kernel.hpp


[ROCm/composable_kernel commit: 92c1f4981a]
2025-11-11 13:55:33 -08:00
Aviral Goel
f01853cf46 Add CK Tile Tutorials Folder with GEMM and COPY Kernel (#3038)
* feat: add tutorial folder with gemm tutorial

* chore: move copy kernel from examples folder to tutorial

* Update tutorial/ck_tile/01_naive_gemm/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tutorial/ck_tile/01_naive_gemm/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: remove handdrawn images

* docs: add write ups to explain the gemm kernel

* docs: add about block level pipeline and static distributed tensors

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: b145a5fe80]
2025-11-11 14:15:49 -06:00
assistant-librarian[bot]
ba43b54f9f Merge commit 'c54ecd905b07849076069d56c284472230564568' into develop 2025-11-11 20:14:02 +00:00
Aviral Goel
a8d2ecc971 docs: update ckProfiler readme with selective building option (#3140)
* docs: update ckProfiler readme with selective building option

* docs: add list of operations for ckProfiler

[ROCm/composable_kernel commit: c54ecd905b]
2025-11-11 14:27:33 -05:00