Enrico Degregori
2f3dc0a119
Wmma support for gemm_reduce ( #3145 )
...
* Initial implementation GEMM+Reduce:
- device struct
- epilogue struct
* Fix tests, improve profiler and add initial instances
* Add instances
* Fix compilation error
* Address review comments
* Fix logging
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
[ROCm/composable_kernel commit: 7414a0f4d4 ]
2025-11-12 11:23:54 -08:00
assistant-librarian[bot]
90e4b6bfe9
Merge commit '299c9bca1bee2ef77bb78878bcdd9d11a13564e5' into develop
2025-11-12 16:14:54 +00:00
Yashvardhan Agarwal
0ca982f8d5
[CK_Tile] Pooling example readme update ( #3174 )
...
* pooling example readme update
- The updated readme explains the transformations of the pooling kernel
using a mermaid diagram
* Update example/ck_tile/36_pooling/README.md
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* resolve comments
---------
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
[ROCm/composable_kernel commit: 299c9bca1b ]
2025-11-12 07:30:20 -08:00
assistant-librarian[bot]
98033a68ce
Merge commit '40d2ed0f2a442026c57dc17e6e7bd281b6c2535c' into develop
2025-11-12 02:42:51 +00:00
Po Yen Chen
7713c5071b
[CK_TILE] Share partition index across threads and specify offset in load_tile()/async_load_tile()/load_tile_transpose() ( #2905 )
...
* Allow sharing partition index across threads
* Fix typo PartitoinIndex -> PartitionIndex
* Remove C++20 'requires' usages
* Add missing template arguments
* Fix load_tile() overload ambiguity issue
* Use SFINAE to exclude invalid arguments
* Add additional offset parameter to the async_load_tile()
* Remove async_load_tile() default argument to avoid ambiguity
* Extract tile_window coordinate compute logic as method
* Use warp-shared LDS base address in tile_window::async_load()
* Add constraint to tile_window::load() templates
* Fix wrong type traits is_class_v<> usages
* Add missing constraint to async_load_tile()
* Add missing tile_window::load() overload
* Add more constraint to avoid load_tile() call ambiguity
* Rename ParitionIndex as ReplacementPartitionIndex
* Update pre_computed_warp_coords_ in move_extended()
* Fix inconsistency between template parameters and documentation
* Allow specifying pre-computed parition index
* Add type straits is_sequence<> & is_tile_distribution<>
* Add type straits is_tensor_view<>
* Add type constraints to make_tile_window() templates
* Allow passing partition_index to set_tile_if()
* Allow specifying partition_index to store_tile()
* Add missing template parameter of replace_bottom_tensor_view()
* Allow passing partition_index to Default2DEpilogue
* Make get_partition_index() public
* Add _with_offset() postfix to avoid resolution error
* Remove ReplacementPartitionIndex template param
* Add missing comments
* Add load_tile_transpose_with_offset() overload
[ROCm/composable_kernel commit: 40d2ed0f2a ]
2025-11-12 10:26:14 +08:00
assistant-librarian[bot]
c014babf51
Merge commit '92c1f4981ab1d081978c8f6132ca93949d4749e6' into develop
2025-11-11 22:12:49 +00:00
Bartłomiej Kocot
b122e12c91
[CK_BUILDER] Add grouped conv fwd ck tile traits ( #3183 )
...
* [CK BUILDER] Add grouped conv fwd ck tile traits
* Update instance_traits_tile_grouped_convolution_forward.hpp
* Update grouped_convolution_forward_kernel.hpp
[ROCm/composable_kernel commit: 92c1f4981a ]
2025-11-11 13:55:33 -08:00
Aviral Goel
efcd6297d4
Add CK Tile Tutorials Folder with GEMM and COPY Kernel ( #3038 )
...
* feat: add tutorial folder with gemm tutorial
* chore: move copy kernel from examples folder to tutorial
* Update tutorial/ck_tile/01_naive_gemm/README.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tutorial/ck_tile/01_naive_gemm/README.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* chore: remove handdrawn images
* docs: add write ups to explain the gemm kernel
* docs: add about block level pipeline and static distributed tensors
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
[ROCm/composable_kernel commit: b145a5fe80 ]
2025-11-11 14:15:49 -06:00
assistant-librarian[bot]
ba43b54f9f
Merge commit 'c54ecd905b07849076069d56c284472230564568' into develop
2025-11-11 20:14:02 +00:00
Aviral Goel
4d69189324
docs: update ckProfiler readme with selective building option ( #3140 )
...
* docs: update ckProfiler readme with selective building option
* docs: add list of operations for ckProfiler
[ROCm/composable_kernel commit: c54ecd905b ]
2025-11-11 14:27:33 -05:00
Aviral Goel
9d49cab98b
chore(copyright): update copyright header for script directory ( #3184 )
...
* chore(copyright): update copyright header for tile_engine directory
* chore(copyright): update copyright header for script directory
---------
Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com >
[ROCm/composable_kernel commit: ab68c9d384 ]
2025-11-11 11:26:01 -08:00
assistant-librarian[bot]
db12c41b56
Merge commit '1b1c46e508c1fd40a03f54114b6b78629032fb4f' into develop
2025-11-11 17:12:49 +00:00
linqunAMD
31400ca622
[CK_TILE] Fix gemm_quant ( #3186 )
...
[ROCm/composable_kernel commit: 1b1c46e508 ]
2025-11-11 08:23:57 -08:00
Aviral Goel
d09313cf15
chore(copyright): update copyright header for tile_engine directory ( #3180 )
...
[ROCm/composable_kernel commit: 88e3212fcc ]
2025-11-11 08:17:24 -08:00
Scott Todd
4c757e5b4f
Bump commit ref for TheRock in workflows ( #3189 )
...
* Bump commit ref for TheRock in workflows
* Update to more recent commit (could also `rm` the patch)
* Revert "Update to more recent commit (could also `rm` the patch)"
This reverts commit 4b9f4952ea .
* Rm patch that no longer applies
* Fix post_build_upload flag name
* Fix artifact_group plumbing for setup test env
[ROCm/composable_kernel commit: aa1fb29aa1 ]
2025-11-11 07:44:38 -08:00
Khushbu Agarwal
a297885de5
formatting ( #3182 )
...
[ROCm/composable_kernel commit: 06c651b100 ]
2025-11-11 07:42:26 -08:00
Enrico Degregori
f80e8dfaa8
Extend support for ak1 / bk1 WMMA ( #3073 )
...
* Extend AK1 / BK1 support:
- Add support for AK1 != BK1
- Add support for AK1, BK1 > 8
- Introduce KInner template parameter for pipelines when loading multiple tiles with one instruction
* fix clang format
[ROCm/composable_kernel commit: 1c544abf57 ]
2025-11-11 07:38:15 -08:00
assistant-librarian[bot]
0b000816a4
Merge commit '9f33b7cfd3df3fcfd540f7633b0abd7019935761' into develop
2025-11-10 19:12:32 +00:00
Thomas Ning
fdccd7a3b4
fix input range ( #3188 )
...
[ROCm/composable_kernel commit: 9f33b7cfd3 ]
2025-11-10 11:08:41 -08:00
linqunAMD
89b798620c
[ck] Enable missing op for gfx11 and gfx12 ( #3187 )
...
[ROCm/composable_kernel commit: 7b6ba8d5c2 ]
2025-11-10 10:58:20 -08:00
linqunAMD
27df389d70
[ck] correct memory size in grouped_gemm_multi_abd_xdl_fixed_nk_bias_bf16_i8 ( #3168 )
...
b1 and b0 use same layout, so, the size of b1_tensors_device should be same with b0_tensors_device's
[ROCm/composable_kernel commit: e593a14ae1 ]
2025-11-10 10:58:08 -08:00
Manish Kumar
045a8ca2ff
[CK-Tile] Add gtests for compiler CI for faster testing ( #3123 )
...
* Add gtests for compiler CI for faster testing
* Add changes to have a custom target
* Add a gtest suite for gemm kernel for running CI tests with compiler mode
* Fix Clang error (EOL)
* Removed compiler subfolder from CMake
* Add gtest suite for gemm kernel
* Disable failed tests
* Fix build errors
* Resolved PR comments
* Update shape for persistent gemm kernel test
* Seperated types by H/W archs
* Made changes to persistent types
* Fix persistent build failure issue
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: d5746dd120 ]
2025-11-10 10:42:23 -08:00
assistant-librarian[bot]
650109a348
Merge commit 'e31a7a4f29b371c32ea9daf9211b6ae1fed2fa40' into develop
2025-11-07 04:14:29 +00:00
Gino Lu
89a665e60e
fix MX bpreshuffle gemm B grid descriptor dimension error. ( #3170 )
...
[ROCm/composable_kernel commit: e31a7a4f29 ]
2025-11-06 19:42:39 -08:00
assistant-librarian[bot]
4c67bf8aaf
Merge commit 'd04eba4ae37c8c2d40855f02aa861e1ac1ec7b3f' into develop
2025-11-07 01:40:22 +00:00
Xudong Yuan
a8dbac6470
Ck moe mxfp4 blockm32 ( #3098 )
...
* block_m = 32
* ck block_m = 32
* aiter/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_b_preshuffle_mx_moe_v3.hpp format
* mxfp4_moe v1 pipe
* update format
---------
Co-authored-by: zhimding <zhimding@amd.com >
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: felix <felix.li@amd.com >
[ROCm/composable_kernel commit: d04eba4ae3 ]
2025-11-07 08:45:41 +08:00
assistant-librarian[bot]
d1d568c17b
Merge commit '5f3cae3e28a042e411afcd2e54b16cc6909c5bbb' into develop
2025-11-07 00:36:11 +00:00
JH-Leon-KIM-AMD
4fbe5ee525
[CK_BUILDER]ckb add remining fwd conv device ops ( #3155 )
...
* Add device operation to conv signature. Use unions to hold conv layouts and device operations.
* Add predicates for all device op instances.
* Use the device op signature for validation.
* Fix ckb CMakeLists.txt file for tests.
* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.
* Fix clang-formatting.
* add device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk
* Add full DL configurability with Option A implementation
- Added 5 DL descriptor structs (39 configurable parameters)
- Added 10 C++20 concepts for type-safe validation
- Updated factory to read all parameters from descriptors
- Updated test helper to populate all descriptors
- All tests passing (13/13 including 3 new DL tests)
* Add factory and test support for DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Add factory specialization for Large_Tensor device operation (conv_factory.hpp lines 1145-1265)
- Add macro collision workaround using pragma push/pop (conv_factory.hpp lines 43-51)
- Add test helper function run_test_DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Add builder test file test_ckb_conv_fwd_2d_large_tensor_fp16.cpp with 2 test cases
- Update CMakeLists.txt to include new test file
- Reuse existing ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle descriptor
- Map all 42 template parameters identical to regular XDL CShuffle
- All 15 builder tests passing including 2 new Large_Tensor tests
Completes Task 350: All 4 forward convolution device operations now supported in CK Builder.
* Update copyright headers to new format
- Change copyright format to: Copyright (C) Advanced Micro Devices, Inc., or its affiliates.
- Reorder headers: Copyright first, then SPDX-License-Identifier
- Updated files:
* experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp
* experimental/builder/test/conv/test_ckb_conv_fwd_2d_large_tensor_fp16.cpp
* experimental/builder/include/ck_tile/builder/device_op_types.hpp
* fix c++ 18 format
* Fix clang-format-18 error in device_op_types.hpp
---------
Co-authored-by: Ville Pietilä <ville.pietila@amd.com >
Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com >
[ROCm/composable_kernel commit: 5f3cae3e28 ]
2025-11-06 16:29:48 -08:00
assistant-librarian[bot]
63d8864858
Merge commit '76c4c12f5959adcd56d1627a1d1ce885deb9d096' into develop
2025-11-06 23:12:25 +00:00
Johannes Graner
cd334376dc
Add .clangd and CMakeUserPresets.json to .gitignore ( #3171 )
...
[ROCm/composable_kernel commit: 76c4c12f59 ]
2025-11-06 15:07:39 -08:00
assistant-librarian[bot]
cb20485d00
Merge commit '18e083003fa25a661015542c39b1979200f361cf' into develop
2025-11-06 15:13:08 +00:00
Adam Osewski
3e184d3b67
[CK_BUILDER] Convolution description ( #3163 )
...
* Add DirectLoad tparam & clean up headers.
* Add convolution traits.
* Update inline documentation.
* Add more convolution specialization and gemm padding types.
* Add additional helper functions & more tests to conv traits.
* Fix tests cmake file.
* Add case insensitive string comparison
* Fix function name overlapping with variable name.
* Unify pipeline version and scheduler enums.
* Fix includes.
* Update test conv traits with unified enums.
* Update concepts etc with update unified enum
* Fix ckb conv fwd test - unified enum usage.
* Dump changes.
* Add ostream overloads for all enum classes.
* Update detailed() function in ConvDescription
* Fix handling union based conv direction.
* Add test & update conv description.
* Refine tree view.
* Update copyrights
* Fix merge artifacts
* Update detailed tree conv description
* Fix clang-format
[ROCm/composable_kernel commit: 18e083003f ]
2025-11-06 15:46:26 +01:00
assistant-librarian[bot]
78783a456c
Merge commit '2234ff830b2f4ce8026c50b2d81f95f38f7117e5' into develop
2025-11-06 11:12:13 +00:00
Bartłomiej Kocot
5c219f1697
[CK TILE] Convolution remove magic values ( #3160 )
...
* [CK TILE] Refactor Conv configs and Conv Elementwise
* fix
* [CK TILE] Convolution remove magix values
* fix partitioner
[ROCm/composable_kernel commit: 2234ff830b ]
2025-11-06 11:26:30 +01:00
assistant-librarian[bot]
cd3b8ae564
Merge commit '12922120d2567c3512048d7e8ed37e387a07bab6' into develop
2025-11-06 07:13:12 +00:00
joyeamd
ee21c7b651
add gfx11's barrier following SPG's reference ( #3159 )
...
* add gfx11's barrier following SPG's reference
* re-format the code
* minor fix
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
[ROCm/composable_kernel commit: 12922120d2 ]
2025-11-05 22:29:03 -08:00
assistant-librarian[bot]
b3950e9d11
Merge commit '4533aa6dbab648adc1a496b6064cb79777c41cf5' into develop
2025-11-06 00:35:42 +00:00
Illia Silin
d258a23f20
Fix compilation errors with clang22. ( #3164 )
...
* resolve compilation issue with clang22
* add __extension__ for __COUNTER__ usage in ck_tile
[ROCm/composable_kernel commit: 4533aa6dba ]
2025-11-05 15:42:22 -08:00
assistant-librarian[bot]
4bbbfeb186
Merge commit 'b8527a92360496666ed6606e53ddc97e35dcf76e' into develop
2025-11-05 17:12:47 +00:00
Adam Osewski
f7bfb69702
[CK_BUILDER] Convolution traits. ( #3152 )
...
Added:
1. Convolution traits & unit tests
2. Update builder enumerators to have representation of Convolution Kernels properties.
3. Unified builder pipeline version & scheduler enumerators
[ROCm/composable_kernel commit: b8527a9236 ]
2025-11-05 08:53:06 -08:00
assistant-librarian[bot]
ea517e1c34
Merge commit '3b076b0b74fec1c5a27a808cea45b21c6f526ced' into develop
2025-11-05 03:31:59 +00:00
andrew clark
2cdce54765
Collecting redis stats ( #3149 )
...
[ROCm/composable_kernel commit: 3b076b0b74 ]
2025-11-04 18:55:11 -08:00
Illia Silin
8d454aa01d
Initialize new variable to prevent c++17 compiler error ( #3156 )
...
* initialize new variable to prevent c++17 compiler error
* build for gfx90a using -std=c++17 flag
[ROCm/composable_kernel commit: 930423ab3b ]
2025-11-04 18:54:14 -08:00
assistant-librarian[bot]
7148cc6371
Merge commit '31c019f5891f75a2c9a26cb3d3e61c63596e4c30' into develop
2025-11-04 19:11:52 +00:00
Vidyasagar Ananthan
42d1855685
Chunk Ctests so we dont run into large number of tests error ( #3050 )
...
* Chunk Ctests so we dont run into large number of tests error
* Addressing feedback from copilot
[ROCm/composable_kernel commit: 31c019f589 ]
2025-11-04 10:31:32 -08:00
assistant-librarian[bot]
8c8fec6769
Merge commit '5abe4109e0c30993b9e1afe00f95154939043859' into develop
2025-11-04 18:15:42 +00:00
Cong Ma
53e42f5cce
Introduces the new partitioner to implement the reduction StreamK kernel. ( #3107 )
...
* Introduces the new partitioner to implement the reduction StreamK kernel
* Add more doc text to functions
* Add persistent-dp option to streamk example
* Update example/ck_tile/40_streamk_gemm/README.md
[ROCm/composable_kernel commit: 5abe4109e0 ]
2025-11-04 10:32:17 -07:00
assistant-librarian[bot]
4d94ea61e1
Merge commit '13ba06f1e75a28037c78c9d75f660f4ab7877d27' into develop
2025-11-04 17:11:25 +00:00
Thomas Ning
dceaa603d0
fix the blockscale 2d case ( #3148 )
...
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
[ROCm/composable_kernel commit: 13ba06f1e7 ]
2025-11-04 11:55:23 -05:00
assistant-librarian[bot]
32a26d371b
Merge commit '0be0288f58879123c228373525c4b438d354694f' into develop
2025-11-04 15:13:12 +00:00