Commit Graph

3879 Commits

Author SHA1 Message Date
Aviral Goel
9ec4b67288 chore(copyright): update copyright header for script directory (#3184)
* chore(copyright): update copyright header for tile_engine directory

* chore(copyright): update copyright header for script directory

---------

Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>

[ROCm/composable_kernel commit: ab68c9d384]
2025-11-11 11:26:01 -08:00
assistant-librarian[bot]
db12c41b56 Merge commit '1b1c46e508c1fd40a03f54114b6b78629032fb4f' into develop 2025-11-11 17:12:49 +00:00
linqunAMD
13cf0bd17f [CK_TILE] Fix gemm_quant (#3186)
[ROCm/composable_kernel commit: 1b1c46e508]
2025-11-11 08:23:57 -08:00
Aviral Goel
c1b5372db3 chore(copyright): update copyright header for tile_engine directory (#3180)
[ROCm/composable_kernel commit: 88e3212fcc]
2025-11-11 08:17:24 -08:00
Scott Todd
2c7d1aba58 Bump commit ref for TheRock in workflows (#3189)
* Bump commit ref for TheRock in workflows

* Update to more recent commit (could also `rm` the patch)

* Revert "Update to more recent commit (could also `rm` the patch)"

This reverts commit 4b9f4952ea.

* Rm patch that no longer applies

* Fix post_build_upload flag name

* Fix artifact_group plumbing for setup test env

[ROCm/composable_kernel commit: aa1fb29aa1]
2025-11-11 07:44:38 -08:00
Khushbu Agarwal
ae4444dfba formatting (#3182)
[ROCm/composable_kernel commit: 06c651b100]
2025-11-11 07:42:26 -08:00
Enrico Degregori
8e23284922 Extend support for ak1 / bk1 WMMA (#3073)
* Extend AK1 / BK1 support:

 - Add support for AK1 != BK1
 - Add support for AK1, BK1 > 8
 - Introduce KInner template parameter for pipelines when loading multiple tiles with one instruction

* fix clang format

[ROCm/composable_kernel commit: 1c544abf57]
2025-11-11 07:38:15 -08:00
assistant-librarian[bot]
0b000816a4 Merge commit '9f33b7cfd3df3fcfd540f7633b0abd7019935761' into develop 2025-11-10 19:12:32 +00:00
Thomas Ning
b40859d461 fix input range (#3188)
[ROCm/composable_kernel commit: 9f33b7cfd3]
2025-11-10 11:08:41 -08:00
linqunAMD
ddb0078fec [ck] Enable missing op for gfx11 and gfx12 (#3187)
[ROCm/composable_kernel commit: 7b6ba8d5c2]
2025-11-10 10:58:20 -08:00
linqunAMD
93b4c77e06 [ck] correct memory size in grouped_gemm_multi_abd_xdl_fixed_nk_bias_bf16_i8 (#3168)
b1 and b0 use same layout,  so, the size of b1_tensors_device should be same with b0_tensors_device's

[ROCm/composable_kernel commit: e593a14ae1]
2025-11-10 10:58:08 -08:00
Manish Kumar
5f9d5566e5 [CK-Tile] Add gtests for compiler CI for faster testing (#3123)
* Add gtests for compiler CI for faster testing

* Add changes to have a custom target

* Add a gtest suite for gemm kernel for running CI tests with compiler mode

* Fix Clang error (EOL)

* Removed compiler subfolder from CMake

* Add gtest suite for gemm kernel

* Disable failed tests

* Fix build errors

* Resolved PR comments

* Update shape for persistent gemm kernel test

* Seperated types by H/W archs

* Made changes to persistent types

* Fix persistent build failure issue

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: d5746dd120]
2025-11-10 10:42:23 -08:00
assistant-librarian[bot]
650109a348 Merge commit 'e31a7a4f29b371c32ea9daf9211b6ae1fed2fa40' into develop 2025-11-07 04:14:29 +00:00
Gino Lu
0344170dac fix MX bpreshuffle gemm B grid descriptor dimension error. (#3170)
[ROCm/composable_kernel commit: e31a7a4f29]
2025-11-06 19:42:39 -08:00
assistant-librarian[bot]
4c67bf8aaf Merge commit 'd04eba4ae37c8c2d40855f02aa861e1ac1ec7b3f' into develop 2025-11-07 01:40:22 +00:00
Xudong Yuan
6e40562dff Ck moe mxfp4 blockm32 (#3098)
* block_m = 32

* ck block_m = 32

* aiter/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_b_preshuffle_mx_moe_v3.hpp format

* mxfp4_moe v1 pipe

* update format

---------

Co-authored-by: zhimding <zhimding@amd.com>
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: felix <felix.li@amd.com>

[ROCm/composable_kernel commit: d04eba4ae3]
2025-11-07 08:45:41 +08:00
assistant-librarian[bot]
d1d568c17b Merge commit '5f3cae3e28a042e411afcd2e54b16cc6909c5bbb' into develop 2025-11-07 00:36:11 +00:00
JH-Leon-KIM-AMD
e8afef1e8b [CK_BUILDER]ckb add remining fwd conv device ops (#3155)
* Add device operation to conv signature. Use unions to hold conv layouts and device operations.

* Add predicates for all device op instances.

* Use the device op signature for validation.

* Fix ckb CMakeLists.txt file for tests.

* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.

* Fix clang-formatting.

* add device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk

* Add full DL configurability with Option A implementation

- Added 5 DL descriptor structs (39 configurable parameters)
- Added 10 C++20 concepts for type-safe validation
- Updated factory to read all parameters from descriptors
- Updated test helper to populate all descriptors
- All tests passing (13/13 including 3 new DL tests)

* Add factory and test support for DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor

- Add factory specialization for Large_Tensor device operation (conv_factory.hpp lines 1145-1265)
- Add macro collision workaround using pragma push/pop (conv_factory.hpp lines 43-51)
- Add test helper function run_test_DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Add builder test file test_ckb_conv_fwd_2d_large_tensor_fp16.cpp with 2 test cases
- Update CMakeLists.txt to include new test file
- Reuse existing ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle descriptor
- Map all 42 template parameters identical to regular XDL CShuffle
- All 15 builder tests passing including 2 new Large_Tensor tests

Completes Task 350: All 4 forward convolution device operations now supported in CK Builder.

* Update copyright headers to new format

- Change copyright format to: Copyright (C) Advanced Micro Devices, Inc., or its affiliates.
- Reorder headers: Copyright first, then SPDX-License-Identifier
- Updated files:
  * experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp
  * experimental/builder/test/conv/test_ckb_conv_fwd_2d_large_tensor_fp16.cpp
  * experimental/builder/include/ck_tile/builder/device_op_types.hpp

* fix c++ 18 format

* Fix clang-format-18 error in device_op_types.hpp

---------

Co-authored-by: Ville Pietilä <ville.pietila@amd.com>
Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com>

[ROCm/composable_kernel commit: 5f3cae3e28]
2025-11-06 16:29:48 -08:00
assistant-librarian[bot]
63d8864858 Merge commit '76c4c12f5959adcd56d1627a1d1ce885deb9d096' into develop 2025-11-06 23:12:25 +00:00
Johannes Graner
085690955f Add .clangd and CMakeUserPresets.json to .gitignore (#3171)
[ROCm/composable_kernel commit: 76c4c12f59]
2025-11-06 15:07:39 -08:00
assistant-librarian[bot]
cb20485d00 Merge commit '18e083003fa25a661015542c39b1979200f361cf' into develop 2025-11-06 15:13:08 +00:00
Adam Osewski
9fde8e559a [CK_BUILDER] Convolution description (#3163)
* Add DirectLoad tparam & clean up headers.

* Add convolution traits.

* Update inline documentation.

* Add more convolution specialization and gemm padding types.

* Add additional helper functions & more tests to conv traits.

* Fix tests cmake file.

* Add case insensitive string comparison

* Fix function name overlapping with variable name.

* Unify pipeline version and scheduler enums.

* Fix includes.

* Update test conv traits with unified enums.

* Update concepts etc with update unified enum

* Fix ckb conv fwd test - unified enum usage.

* Dump changes.

* Add ostream overloads for all enum classes.

* Update detailed() function in ConvDescription

* Fix handling union based conv direction.

* Add test & update conv description.

* Refine tree view.

* Update copyrights

* Fix merge artifacts

* Update detailed tree conv description

* Fix clang-format

[ROCm/composable_kernel commit: 18e083003f]
2025-11-06 15:46:26 +01:00
assistant-librarian[bot]
78783a456c Merge commit '2234ff830b2f4ce8026c50b2d81f95f38f7117e5' into develop 2025-11-06 11:12:13 +00:00
Bartłomiej Kocot
e89cb52306 [CK TILE] Convolution remove magic values (#3160)
* [CK TILE] Refactor Conv configs and Conv Elementwise

* fix

* [CK TILE] Convolution remove magix values

* fix partitioner

[ROCm/composable_kernel commit: 2234ff830b]
2025-11-06 11:26:30 +01:00
assistant-librarian[bot]
cd3b8ae564 Merge commit '12922120d2567c3512048d7e8ed37e387a07bab6' into develop 2025-11-06 07:13:12 +00:00
joyeamd
846b43f43b add gfx11's barrier following SPG's reference (#3159)
* add gfx11's barrier following SPG's reference

* re-format the code

* minor fix

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: 12922120d2]
2025-11-05 22:29:03 -08:00
assistant-librarian[bot]
b3950e9d11 Merge commit '4533aa6dbab648adc1a496b6064cb79777c41cf5' into develop 2025-11-06 00:35:42 +00:00
Illia Silin
b7d6555a88 Fix compilation errors with clang22. (#3164)
* resolve compilation issue with clang22

* add __extension__ for __COUNTER__ usage in ck_tile

[ROCm/composable_kernel commit: 4533aa6dba]
2025-11-05 15:42:22 -08:00
assistant-librarian[bot]
4bbbfeb186 Merge commit 'b8527a92360496666ed6606e53ddc97e35dcf76e' into develop 2025-11-05 17:12:47 +00:00
Adam Osewski
54409e7fb5 [CK_BUILDER] Convolution traits. (#3152)
Added:

1. Convolution traits & unit tests
2. Update builder enumerators to have representation of Convolution Kernels properties.
3. Unified builder pipeline version & scheduler enumerators

[ROCm/composable_kernel commit: b8527a9236]
2025-11-05 08:53:06 -08:00
assistant-librarian[bot]
ea517e1c34 Merge commit '3b076b0b74fec1c5a27a808cea45b21c6f526ced' into develop 2025-11-05 03:31:59 +00:00
andrew clark
a70d21d523 Collecting redis stats (#3149)
[ROCm/composable_kernel commit: 3b076b0b74]
2025-11-04 18:55:11 -08:00
Illia Silin
bb4b6e5961 Initialize new variable to prevent c++17 compiler error (#3156)
* initialize new variable to prevent c++17 compiler error

* build for gfx90a using -std=c++17 flag

[ROCm/composable_kernel commit: 930423ab3b]
2025-11-04 18:54:14 -08:00
assistant-librarian[bot]
7148cc6371 Merge commit '31c019f5891f75a2c9a26cb3d3e61c63596e4c30' into develop 2025-11-04 19:11:52 +00:00
Vidyasagar Ananthan
4d72320b51 Chunk Ctests so we dont run into large number of tests error (#3050)
* Chunk Ctests so we dont run into large number of tests error

* Addressing feedback from copilot

[ROCm/composable_kernel commit: 31c019f589]
2025-11-04 10:31:32 -08:00
assistant-librarian[bot]
8c8fec6769 Merge commit '5abe4109e0c30993b9e1afe00f95154939043859' into develop 2025-11-04 18:15:42 +00:00
Cong Ma
0343c4e1fe Introduces the new partitioner to implement the reduction StreamK kernel. (#3107)
* Introduces the new partitioner to implement the reduction StreamK kernel

* Add more doc text to functions

* Add persistent-dp option to streamk example

* Update example/ck_tile/40_streamk_gemm/README.md

[ROCm/composable_kernel commit: 5abe4109e0]
2025-11-04 10:32:17 -07:00
assistant-librarian[bot]
4d94ea61e1 Merge commit '13ba06f1e75a28037c78c9d75f660f4ab7877d27' into develop 2025-11-04 17:11:25 +00:00
Thomas Ning
1a8f824938 fix the blockscale 2d case (#3148)
Co-authored-by: Aviral Goel <aviral.goel@amd.com>

[ROCm/composable_kernel commit: 13ba06f1e7]
2025-11-04 11:55:23 -05:00
assistant-librarian[bot]
32a26d371b Merge commit '0be0288f58879123c228373525c4b438d354694f' into develop 2025-11-04 15:13:12 +00:00
John Shumway
a9d0980ad9 [CK_BUILDER] Update copyright messages. (#3150)
* Update copyright messages.

Copyright messages should no longer include a year. This PR updates all 38 source files to the new format.

* Switch to (C) from unicode copyright symbol.

The unicodein comments  was causing compilation errors.

[ROCm/composable_kernel commit: 0be0288f58]
2025-11-04 15:35:16 +01:00
John Shumway
52204ff4e5 [CK_BUILDER] Add backward weight instance traits for xdl cshuffle. (#3143)
* Add backward weight instance traits for xdl cshuffle.

To keep instance test file sizes reasonable, we start a new test_bwd_weight_instances_traits.cpp test file.

* Fix copyright notices.

* Remove (c) symbol, replace with (C).

Having UTF-8 in source caused an error with code generation.

[ROCm/composable_kernel commit: 6dbee64886]
2025-11-04 15:34:00 +01:00
assistant-librarian[bot]
5b7defb9da Merge commit '8681ced9629f6e952afa5b77c5f3549d60920efa' into develop 2025-11-04 14:12:38 +00:00
Bartłomiej Kocot
052c043d99 [CK TILE] Refactor Conv configs and Conv Elementwise (#3151)
* [CK TILE] Refactor Conv configs and Conv Elementwise

* fix

[ROCm/composable_kernel commit: 8681ced962]
2025-11-04 15:04:53 +01:00
assistant-librarian[bot]
58d420c0a4 Merge commit '99f38e4d9bedcf1b09d58653c354f042f8c509ae' into develop 2025-11-04 00:35:23 +00:00
Bartłomiej Kocot
a3a55b00d7 [CK TILE] Refactor grouped conv fwd large tensor (#3144)
[ROCm/composable_kernel commit: 99f38e4d9b]
2025-11-04 00:34:48 +01:00
assistant-librarian[bot]
a0410f0a05 Merge commit 'c7ded76cc784f0b4d2c24d3985cb587ad22cbd7f' into develop 2025-11-03 21:11:57 +00:00
Vidyasagar Ananthan
c9e7b735c0 Adding note on CMake convenience script (#3139)
* Adding note on convenience script

* Addressing feedback

* Update README.md

reword

---------

Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>

[ROCm/composable_kernel commit: c7ded76cc7]
2025-11-03 12:21:57 -08:00
assistant-librarian[bot]
a8059a2e58 Merge commit '507d81c3af51b81f15b946a2a4bef7f594620292' into develop 2025-11-03 20:14:18 +00:00
Enrico Degregori
9575bcd099 Fix splitk preshuffle (#3137)
* Fix splitK multiply_multiply_wp

* Add tests for gemm_multiply_multiply_wp

* Add tests for gemm_universal_preshuffle (KBatch = 1)

* Add tests gemm_blockscale_wp

* Fix splitk gemm universal preshuffle

* Run new tests on arch supporting fp8

* Restore example

* Fix strides profiler

* Fix tests

* Fix clang format

* Finalize profiler preshuffle with tolerances

* Minor improvements to splitk related changes

* Address review comments: clang format and ckProfiler typo

* Remove b_k_split_offset from SplitKBatchOffset struct

[ROCm/composable_kernel commit: 507d81c3af]
2025-11-03 11:59:01 -08:00