Commit Graph

3949 Commits

Author SHA1 Message Date
Enrico Degregori
2f3dc0a119 Wmma support for gemm_reduce (#3145)
* Initial implementation GEMM+Reduce:

 - device struct
 - epilogue struct

* Fix tests, improve profiler and add initial instances

* Add instances

* Fix compilation error

* Address review comments

* Fix logging

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: 7414a0f4d4]
2025-11-12 11:23:54 -08:00
assistant-librarian[bot]
90e4b6bfe9 Merge commit '299c9bca1bee2ef77bb78878bcdd9d11a13564e5' into develop 2025-11-12 16:14:54 +00:00
Yashvardhan Agarwal
0ca982f8d5 [CK_Tile] Pooling example readme update (#3174)
* pooling example readme update

- The updated readme explains the transformations of the pooling kernel
using a mermaid diagram

* Update example/ck_tile/36_pooling/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* resolve comments

---------

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: 299c9bca1b]
2025-11-12 07:30:20 -08:00
assistant-librarian[bot]
98033a68ce Merge commit '40d2ed0f2a442026c57dc17e6e7bd281b6c2535c' into develop 2025-11-12 02:42:51 +00:00
Po Yen Chen
7713c5071b [CK_TILE] Share partition index across threads and specify offset in load_tile()/async_load_tile()/load_tile_transpose() (#2905)
* Allow sharing partition index across threads

* Fix typo PartitoinIndex -> PartitionIndex

* Remove C++20 'requires' usages

* Add missing template arguments

* Fix load_tile() overload ambiguity issue

* Use SFINAE to exclude invalid arguments

* Add additional offset parameter to the async_load_tile()

* Remove async_load_tile() default argument to avoid ambiguity

* Extract tile_window coordinate compute logic as method

* Use warp-shared LDS base address in tile_window::async_load()

* Add constraint to tile_window::load() templates

* Fix wrong type traits is_class_v<> usages

* Add missing constraint to async_load_tile()

* Add missing tile_window::load() overload

* Add more constraint to avoid load_tile() call ambiguity

* Rename ParitionIndex as ReplacementPartitionIndex

* Update pre_computed_warp_coords_ in move_extended()

* Fix inconsistency between template parameters and documentation

* Allow specifying pre-computed parition index

* Add type straits is_sequence<> & is_tile_distribution<>

* Add type straits is_tensor_view<>

* Add type constraints to make_tile_window() templates

* Allow passing partition_index to set_tile_if()

* Allow specifying partition_index to store_tile()

* Add missing template parameter of replace_bottom_tensor_view()

* Allow passing partition_index to Default2DEpilogue

* Make get_partition_index() public

* Add _with_offset() postfix to avoid resolution error

* Remove ReplacementPartitionIndex template param

* Add missing comments

* Add load_tile_transpose_with_offset() overload

[ROCm/composable_kernel commit: 40d2ed0f2a]
2025-11-12 10:26:14 +08:00
assistant-librarian[bot]
c014babf51 Merge commit '92c1f4981ab1d081978c8f6132ca93949d4749e6' into develop 2025-11-11 22:12:49 +00:00
Bartłomiej Kocot
b122e12c91 [CK_BUILDER] Add grouped conv fwd ck tile traits (#3183)
* [CK BUILDER] Add grouped conv fwd ck tile traits

* Update instance_traits_tile_grouped_convolution_forward.hpp

* Update grouped_convolution_forward_kernel.hpp


[ROCm/composable_kernel commit: 92c1f4981a]
2025-11-11 13:55:33 -08:00
Aviral Goel
efcd6297d4 Add CK Tile Tutorials Folder with GEMM and COPY Kernel (#3038)
* feat: add tutorial folder with gemm tutorial

* chore: move copy kernel from examples folder to tutorial

* Update tutorial/ck_tile/01_naive_gemm/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tutorial/ck_tile/01_naive_gemm/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: remove handdrawn images

* docs: add write ups to explain the gemm kernel

* docs: add about block level pipeline and static distributed tensors

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[ROCm/composable_kernel commit: b145a5fe80]
2025-11-11 14:15:49 -06:00
assistant-librarian[bot]
ba43b54f9f Merge commit 'c54ecd905b07849076069d56c284472230564568' into develop 2025-11-11 20:14:02 +00:00
Aviral Goel
4d69189324 docs: update ckProfiler readme with selective building option (#3140)
* docs: update ckProfiler readme with selective building option

* docs: add list of operations for ckProfiler

[ROCm/composable_kernel commit: c54ecd905b]
2025-11-11 14:27:33 -05:00
Aviral Goel
9d49cab98b chore(copyright): update copyright header for script directory (#3184)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

---------

Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>

[ROCm/composable_kernel commit: ab68c9d384]
2025-11-11 11:26:01 -08:00
assistant-librarian[bot]
db12c41b56 Merge commit '1b1c46e508c1fd40a03f54114b6b78629032fb4f' into develop 2025-11-11 17:12:49 +00:00
linqunAMD
31400ca622 [CK_TILE] Fix gemm_quant (#3186)
[ROCm/composable_kernel commit: 1b1c46e508]
2025-11-11 08:23:57 -08:00
Aviral Goel
d09313cf15 chore(copyright): update copyright header for tile_engine directory (#3180)
[ROCm/composable_kernel commit: 88e3212fcc]
2025-11-11 08:17:24 -08:00
Scott Todd
4c757e5b4f Bump commit ref for TheRock in workflows (#3189)
* Bump commit ref for TheRock in workflows

* Update to more recent commit (could also `rm` the patch)

* Revert "Update to more recent commit (could also `rm` the patch)"

This reverts commit 4b9f4952ea.

* Rm patch that no longer applies

* Fix post_build_upload flag name

* Fix artifact_group plumbing for setup test env

[ROCm/composable_kernel commit: aa1fb29aa1]
2025-11-11 07:44:38 -08:00
Khushbu Agarwal
a297885de5 formatting (#3182)
[ROCm/composable_kernel commit: 06c651b100]
2025-11-11 07:42:26 -08:00
Enrico Degregori
f80e8dfaa8 Extend support for ak1 / bk1 WMMA (#3073)
* Extend AK1 / BK1 support:

 - Add support for AK1 != BK1
 - Add support for AK1, BK1 > 8
 - Introduce KInner template parameter for pipelines when loading multiple tiles with one instruction

* fix clang format

[ROCm/composable_kernel commit: 1c544abf57]
2025-11-11 07:38:15 -08:00
assistant-librarian[bot]
0b000816a4 Merge commit '9f33b7cfd3df3fcfd540f7633b0abd7019935761' into develop 2025-11-10 19:12:32 +00:00
Thomas Ning
fdccd7a3b4 fix input range (#3188)
[ROCm/composable_kernel commit: 9f33b7cfd3]
2025-11-10 11:08:41 -08:00
linqunAMD
89b798620c [ck] Enable missing op for gfx11 and gfx12 (#3187)
[ROCm/composable_kernel commit: 7b6ba8d5c2]
2025-11-10 10:58:20 -08:00
linqunAMD
27df389d70 [ck] correct memory size in grouped_gemm_multi_abd_xdl_fixed_nk_bias_bf16_i8 (#3168)
b1 and b0 use same layout,  so, the size of b1_tensors_device should be same with b0_tensors_device's

[ROCm/composable_kernel commit: e593a14ae1]
2025-11-10 10:58:08 -08:00
Manish Kumar
045a8ca2ff [CK-Tile] Add gtests for compiler CI for faster testing (#3123)
* Add gtests for compiler CI for faster testing

* Add changes to have a custom target

* Add a gtest suite for gemm kernel for running CI tests with compiler mode

* Fix Clang error (EOL)

* Removed compiler subfolder from CMake

* Add gtest suite for gemm kernel

* Disable failed tests

* Fix build errors

* Resolved PR comments

* Update shape for persistent gemm kernel test

* Seperated types by H/W archs

* Made changes to persistent types

* Fix persistent build failure issue

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: d5746dd120]
2025-11-10 10:42:23 -08:00
assistant-librarian[bot]
650109a348 Merge commit 'e31a7a4f29b371c32ea9daf9211b6ae1fed2fa40' into develop 2025-11-07 04:14:29 +00:00
Gino Lu
89a665e60e fix MX bpreshuffle gemm B grid descriptor dimension error. (#3170)
[ROCm/composable_kernel commit: e31a7a4f29]
2025-11-06 19:42:39 -08:00
assistant-librarian[bot]
4c67bf8aaf Merge commit 'd04eba4ae37c8c2d40855f02aa861e1ac1ec7b3f' into develop 2025-11-07 01:40:22 +00:00
Xudong Yuan
a8dbac6470 Ck moe mxfp4 blockm32 (#3098)
* block_m = 32

* ck block_m = 32

* aiter/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_b_preshuffle_mx_moe_v3.hpp format

* mxfp4_moe v1 pipe

* update format

---------

Co-authored-by: zhimding <zhimding@amd.com>
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: felix <felix.li@amd.com>

[ROCm/composable_kernel commit: d04eba4ae3]
2025-11-07 08:45:41 +08:00
assistant-librarian[bot]
d1d568c17b Merge commit '5f3cae3e28a042e411afcd2e54b16cc6909c5bbb' into develop 2025-11-07 00:36:11 +00:00
JH-Leon-KIM-AMD
4fbe5ee525 [CK_BUILDER]ckb add remining fwd conv device ops (#3155)
* Add device operation to conv signature. Use unions to hold conv layouts and device operations.

* Add predicates for all device op instances.

* Use the device op signature for validation.

* Fix ckb CMakeLists.txt file for tests.

* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.

* Fix clang-formatting.

* add device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk

* Add full DL configurability with Option A implementation

- Added 5 DL descriptor structs (39 configurable parameters)
- Added 10 C++20 concepts for type-safe validation
- Updated factory to read all parameters from descriptors
- Updated test helper to populate all descriptors
- All tests passing (13/13 including 3 new DL tests)

* Add factory and test support for DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor

- Add factory specialization for Large_Tensor device operation (conv_factory.hpp lines 1145-1265)
- Add macro collision workaround using pragma push/pop (conv_factory.hpp lines 43-51)
- Add test helper function run_test_DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Add builder test file test_ckb_conv_fwd_2d_large_tensor_fp16.cpp with 2 test cases
- Update CMakeLists.txt to include new test file
- Reuse existing ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle descriptor
- Map all 42 template parameters identical to regular XDL CShuffle
- All 15 builder tests passing including 2 new Large_Tensor tests

Completes Task 350: All 4 forward convolution device operations now supported in CK Builder.

* Update copyright headers to new format

- Change copyright format to: Copyright (C) Advanced Micro Devices, Inc., or its affiliates.
- Reorder headers: Copyright first, then SPDX-License-Identifier
- Updated files:
  * experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp
  * experimental/builder/test/conv/test_ckb_conv_fwd_2d_large_tensor_fp16.cpp
  * experimental/builder/include/ck_tile/builder/device_op_types.hpp

* fix c++ 18 format

* Fix clang-format-18 error in device_op_types.hpp

---------

Co-authored-by: Ville Pietilä <ville.pietila@amd.com>
Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com>

[ROCm/composable_kernel commit: 5f3cae3e28]
2025-11-06 16:29:48 -08:00
assistant-librarian[bot]
63d8864858 Merge commit '76c4c12f5959adcd56d1627a1d1ce885deb9d096' into develop 2025-11-06 23:12:25 +00:00
Johannes Graner
cd334376dc Add .clangd and CMakeUserPresets.json to .gitignore (#3171)
[ROCm/composable_kernel commit: 76c4c12f59]
2025-11-06 15:07:39 -08:00
assistant-librarian[bot]
cb20485d00 Merge commit '18e083003fa25a661015542c39b1979200f361cf' into develop 2025-11-06 15:13:08 +00:00
Adam Osewski
3e184d3b67 [CK_BUILDER] Convolution description (#3163)
* Add DirectLoad tparam & clean up headers.

* Add convolution traits.

* Update inline documentation.

* Add more convolution specialization and gemm padding types.

* Add additional helper functions & more tests to conv traits.

* Fix tests cmake file.

* Add case insensitive string comparison

* Fix function name overlapping with variable name.

* Unify pipeline version and scheduler enums.

* Fix includes.

* Update test conv traits with unified enums.

* Update concepts etc with update unified enum

* Fix ckb conv fwd test - unified enum usage.

* Dump changes.

* Add ostream overloads for all enum classes.

* Update detailed() function in ConvDescription

* Fix handling union based conv direction.

* Add test & update conv description.

* Refine tree view.

* Update copyrights

* Fix merge artifacts

* Update detailed tree conv description

* Fix clang-format

[ROCm/composable_kernel commit: 18e083003f]
2025-11-06 15:46:26 +01:00
assistant-librarian[bot]
78783a456c Merge commit '2234ff830b2f4ce8026c50b2d81f95f38f7117e5' into develop 2025-11-06 11:12:13 +00:00
Bartłomiej Kocot
5c219f1697 [CK TILE] Convolution remove magic values (#3160)
* [CK TILE] Refactor Conv configs and Conv Elementwise

* fix

* [CK TILE] Convolution remove magix values

* fix partitioner

[ROCm/composable_kernel commit: 2234ff830b]
2025-11-06 11:26:30 +01:00
assistant-librarian[bot]
cd3b8ae564 Merge commit '12922120d2567c3512048d7e8ed37e387a07bab6' into develop 2025-11-06 07:13:12 +00:00
joyeamd
ee21c7b651 add gfx11's barrier following SPG's reference (#3159)
* add gfx11's barrier following SPG's reference

* re-format the code

* minor fix

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 12922120d2]
2025-11-05 22:29:03 -08:00
assistant-librarian[bot]
b3950e9d11 Merge commit '4533aa6dbab648adc1a496b6064cb79777c41cf5' into develop 2025-11-06 00:35:42 +00:00
Illia Silin
d258a23f20 Fix compilation errors with clang22. (#3164)
* resolve compilation issue with clang22

* add __extension__ for __COUNTER__ usage in ck_tile

[ROCm/composable_kernel commit: 4533aa6dba]
2025-11-05 15:42:22 -08:00
assistant-librarian[bot]
4bbbfeb186 Merge commit 'b8527a92360496666ed6606e53ddc97e35dcf76e' into develop 2025-11-05 17:12:47 +00:00
Adam Osewski
f7bfb69702 [CK_BUILDER] Convolution traits. (#3152)
Added:

1. Convolution traits & unit tests
2. Update builder enumerators to have representation of Convolution Kernels properties.
3. Unified builder pipeline version & scheduler enumerators

[ROCm/composable_kernel commit: b8527a9236]
2025-11-05 08:53:06 -08:00
assistant-librarian[bot]
ea517e1c34 Merge commit '3b076b0b74fec1c5a27a808cea45b21c6f526ced' into develop 2025-11-05 03:31:59 +00:00
andrew clark
2cdce54765 Collecting redis stats (#3149)
[ROCm/composable_kernel commit: 3b076b0b74]
2025-11-04 18:55:11 -08:00
Illia Silin
8d454aa01d Initialize new variable to prevent c++17 compiler error (#3156)
* initialize new variable to prevent c++17 compiler error

* build for gfx90a using -std=c++17 flag

[ROCm/composable_kernel commit: 930423ab3b]
2025-11-04 18:54:14 -08:00
assistant-librarian[bot]
7148cc6371 Merge commit '31c019f5891f75a2c9a26cb3d3e61c63596e4c30' into develop 2025-11-04 19:11:52 +00:00
Vidyasagar Ananthan
42d1855685 Chunk Ctests so we dont run into large number of tests error (#3050)
* Chunk Ctests so we dont run into large number of tests error

* Addressing feedback from copilot

[ROCm/composable_kernel commit: 31c019f589]
2025-11-04 10:31:32 -08:00
assistant-librarian[bot]
8c8fec6769 Merge commit '5abe4109e0c30993b9e1afe00f95154939043859' into develop 2025-11-04 18:15:42 +00:00
Cong Ma
53e42f5cce Introduces the new partitioner to implement the reduction StreamK kernel. (#3107)
* Introduces the new partitioner to implement the reduction StreamK kernel

* Add more doc text to functions

* Add persistent-dp option to streamk example

* Update example/ck_tile/40_streamk_gemm/README.md

[ROCm/composable_kernel commit: 5abe4109e0]
2025-11-04 10:32:17 -07:00
assistant-librarian[bot]
4d94ea61e1 Merge commit '13ba06f1e75a28037c78c9d75f660f4ab7877d27' into develop 2025-11-04 17:11:25 +00:00
Thomas Ning
dceaa603d0 fix the blockscale 2d case (#3148)
Co-authored-by: Aviral Goel <aviral.goel@amd.com>

[ROCm/composable_kernel commit: 13ba06f1e7]
2025-11-04 11:55:23 -05:00
assistant-librarian[bot]
32a26d371b Merge commit '0be0288f58879123c228373525c4b438d354694f' into develop 2025-11-04 15:13:12 +00:00