ThruptiRajLakshmanaGowda
cbc8959073
Partial Progress : Working GEMM Preshuffle
2025-11-26 02:25:40 +00:00
ThruptiRajLakshmanaGowda
2f894b29bb
Partial Progress : Working GEMM Multi D
2025-11-26 01:31:52 +00:00
ThruptiRajLakshmanaGowda
1a07c85301
Partial Progress : Jenkins change to run op_new
2025-11-25 23:46:00 +00:00
ThruptiRajLakshmanaGowda
d78098498c
Partial Progress : Working code for Universal GEMM
2025-11-25 23:40:44 +00:00
ThruptiRajLakshmanaGowda
2b03f054f8
Partial Progress : Working GEMM Universal
2025-11-25 17:42:15 +00:00
ThruptiRajLakshmanaGowda
1885f68606
Partial Progress
2025-11-25 08:45:55 +00:00
ThruptiRajLakshmanaGowda
2c4d0dd289
Partial Progress : Generate Single Kernel until trait config
2025-11-20 22:36:59 +00:00
ThruptiRajLakshmanaGowda
d6db805e82
Partial Progress : Working till Listing kernels
2025-11-20 18:23:37 +00:00
ThruptiRajLakshmanaGowda
eeeff3fbfe
Partial Progress : Working till Listing kernels
2025-11-20 18:18:46 +00:00
ThruptiRajLakshmanaGowda
e2bfdba309
Partial Progress : Final Structuring
2025-11-19 00:26:21 +00:00
ThruptiRajLakshmanaGowda
c5fcc2a9ec
Partial Progress : Restructure structure
2025-11-18 00:46:49 +00:00
ThruptiRajLakshmanaGowda
1002b7ebee
Partial Progress : Boiler plate code
2025-11-17 22:56:43 +00:00
ThruptiRajLakshmanaGowda
cd7f41fddf
Restructuring boiler plate code
2025-11-17 22:02:01 +00:00
jefyang1
ca2ee0eb8a
Fix test_gemm_multiply_multiply_wp_xdl_fp8 on gfx950 ( #3191 )
...
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-11-13 09:32:54 -06:00
Yi DING
8d50001b93
[CK_TILE] Improve F8F6F4 Scaled WarpGemm ( #3197 )
...
* [CK_TILE] Improve F8F6F4 Scaled WarpGemm
* Thanks, Copilot
2025-11-13 20:22:05 +08:00
Khushbu Agarwal
fb41a7b73b
fixing ambiguous shuffle definitions ( #3175 )
...
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-11-12 23:44:12 -08:00
Cong Ma
6fd8ddabe7
[CK TILE GEMM] Refactor block_scale_gemm examples ( #3181 )
...
* [CK TILE GEMM] Refactor block_scale_gemm examples
- Split cpp file to reduce building time
- Support multiple GemmConfig
* [CK TILE GEMM] Refactor block_scale_gemm examples
- Update Readme
* [CK TILE GEMM] Refactor block_scale_gemm examples
- Add support for rowcol and tensor GEMM operations
* [CK TILE GEMM] Refactor block_scale_gemm examples
- Update README
* [CK TILE GEMM] Refactor block_scale_gemm examples
- Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.
2025-11-12 23:43:40 -08:00
Thrupti Raj Lakshmana Gowda
9af30f04b6
Ck tile engine commons ( #3166 )
...
* Moving Preshuffle to commons
* Fixing Common Validations
* Addressing Review Comments
* Partial Rebasing
* Partial Rebasing
* Partial Rebasing
* Rebasing Complete
2025-11-13 00:56:18 -06:00
Aviral Goel
797ddfa41e
chore(copyright): update copyright header for test_data directory ( #3194 )
...
* chore(copyright): update copyright header for tile_engine directory
* chore(copyright): update copyright header for script directory
* chore(copyright): update copyright header for test_data directory
2025-11-12 16:07:28 -08:00
John Afaganis
9342365713
Add C++17 deprecation warning to CHANGELOG.md ( #3203 )
...
* Update CHANGELOG.md
* Update CHANGELOG.md
* Update CHANGELOG.md
2025-11-12 16:05:53 -08:00
Illia Silin
3784c0e7c3
add permissions for /tmp folder ( #3201 )
2025-11-12 11:47:07 -08:00
Enrico Degregori
7414a0f4d4
Wmma support for gemm_reduce ( #3145 )
...
* Initial implementation GEMM+Reduce:
- device struct
- epilogue struct
* Fix tests, improve profiler and add initial instances
* Add instances
* Fix compilation error
* Address review comments
* Fix logging
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-11-12 11:23:54 -08:00
Yashvardhan Agarwal
299c9bca1b
[CK_Tile] Pooling example readme update ( #3174 )
...
* pooling example readme update
- The updated readme explains the transformations of the pooling kernel
using a mermaid diagram
* Update example/ck_tile/36_pooling/README.md
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* resolve comments
---------
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
2025-11-12 07:30:20 -08:00
Po Yen Chen
40d2ed0f2a
[CK_TILE] Share partition index across threads and specify offset in load_tile()/async_load_tile()/load_tile_transpose() ( #2905 )
...
* Allow sharing partition index across threads
* Fix typo PartitoinIndex -> PartitionIndex
* Remove C++20 'requires' usages
* Add missing template arguments
* Fix load_tile() overload ambiguity issue
* Use SFINAE to exclude invalid arguments
* Add additional offset parameter to the async_load_tile()
* Remove async_load_tile() default argument to avoid ambiguity
* Extract tile_window coordinate compute logic as method
* Use warp-shared LDS base address in tile_window::async_load()
* Add constraint to tile_window::load() templates
* Fix wrong type traits is_class_v<> usages
* Add missing constraint to async_load_tile()
* Add missing tile_window::load() overload
* Add more constraint to avoid load_tile() call ambiguity
* Rename ParitionIndex as ReplacementPartitionIndex
* Update pre_computed_warp_coords_ in move_extended()
* Fix inconsistency between template parameters and documentation
* Allow specifying pre-computed parition index
* Add type straits is_sequence<> & is_tile_distribution<>
* Add type straits is_tensor_view<>
* Add type constraints to make_tile_window() templates
* Allow passing partition_index to set_tile_if()
* Allow specifying partition_index to store_tile()
* Add missing template parameter of replace_bottom_tensor_view()
* Allow passing partition_index to Default2DEpilogue
* Make get_partition_index() public
* Add _with_offset() postfix to avoid resolution error
* Remove ReplacementPartitionIndex template param
* Add missing comments
* Add load_tile_transpose_with_offset() overload
2025-11-12 10:26:14 +08:00
Bartłomiej Kocot
92c1f4981a
[CK_BUILDER] Add grouped conv fwd ck tile traits ( #3183 )
...
* [CK BUILDER] Add grouped conv fwd ck tile traits
* Update instance_traits_tile_grouped_convolution_forward.hpp
* Update grouped_convolution_forward_kernel.hpp
2025-11-11 13:55:33 -08:00
Aviral Goel
b145a5fe80
Add CK Tile Tutorials Folder with GEMM and COPY Kernel ( #3038 )
...
* feat: add tutorial folder with gemm tutorial
* chore: move copy kernel from examples folder to tutorial
* Update tutorial/ck_tile/01_naive_gemm/README.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tutorial/ck_tile/01_naive_gemm/README.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* chore: remove handdrawn images
* docs: add write ups to explain the gemm kernel
* docs: add about block level pipeline and static distributed tensors
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-11 14:15:49 -06:00
Aviral Goel
c54ecd905b
docs: update ckProfiler readme with selective building option ( #3140 )
...
* docs: update ckProfiler readme with selective building option
* docs: add list of operations for ckProfiler
2025-11-11 14:27:33 -05:00
Aviral Goel
ab68c9d384
chore(copyright): update copyright header for script directory ( #3184 )
...
* chore(copyright): update copyright header for tile_engine directory
* chore(copyright): update copyright header for script directory
---------
Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com >
2025-11-11 11:26:01 -08:00
linqunAMD
1b1c46e508
[CK_TILE] Fix gemm_quant ( #3186 )
2025-11-11 08:23:57 -08:00
Aviral Goel
88e3212fcc
chore(copyright): update copyright header for tile_engine directory ( #3180 )
2025-11-11 08:17:24 -08:00
Scott Todd
aa1fb29aa1
Bump commit ref for TheRock in workflows ( #3189 )
...
* Bump commit ref for TheRock in workflows
* Update to more recent commit (could also `rm` the patch)
* Revert "Update to more recent commit (could also `rm` the patch)"
This reverts commit 4b9f4952ea .
* Rm patch that no longer applies
* Fix post_build_upload flag name
* Fix artifact_group plumbing for setup test env
2025-11-11 07:44:38 -08:00
Khushbu Agarwal
06c651b100
formatting ( #3182 )
2025-11-11 07:42:26 -08:00
Enrico Degregori
1c544abf57
Extend support for ak1 / bk1 WMMA ( #3073 )
...
* Extend AK1 / BK1 support:
- Add support for AK1 != BK1
- Add support for AK1, BK1 > 8
- Introduce KInner template parameter for pipelines when loading multiple tiles with one instruction
* fix clang format
2025-11-11 07:38:15 -08:00
Thomas Ning
9f33b7cfd3
fix input range ( #3188 )
2025-11-10 11:08:41 -08:00
linqunAMD
7b6ba8d5c2
[ck] Enable missing op for gfx11 and gfx12 ( #3187 )
2025-11-10 10:58:20 -08:00
linqunAMD
e593a14ae1
[ck] correct memory size in grouped_gemm_multi_abd_xdl_fixed_nk_bias_bf16_i8 ( #3168 )
...
b1 and b0 use same layout, so, the size of b1_tensors_device should be same with b0_tensors_device's
2025-11-10 10:58:08 -08:00
Manish Kumar
d5746dd120
[CK-Tile] Add gtests for compiler CI for faster testing ( #3123 )
...
* Add gtests for compiler CI for faster testing
* Add changes to have a custom target
* Add a gtest suite for gemm kernel for running CI tests with compiler mode
* Fix Clang error (EOL)
* Removed compiler subfolder from CMake
* Add gtest suite for gemm kernel
* Disable failed tests
* Fix build errors
* Resolved PR comments
* Update shape for persistent gemm kernel test
* Seperated types by H/W archs
* Made changes to persistent types
* Fix persistent build failure issue
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
2025-11-10 10:42:23 -08:00
Gino Lu
e31a7a4f29
fix MX bpreshuffle gemm B grid descriptor dimension error. ( #3170 )
2025-11-06 19:42:39 -08:00
Xudong Yuan
d04eba4ae3
Ck moe mxfp4 blockm32 ( #3098 )
...
* block_m = 32
* ck block_m = 32
* aiter/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_b_preshuffle_mx_moe_v3.hpp format
* mxfp4_moe v1 pipe
* update format
---------
Co-authored-by: zhimding <zhimding@amd.com >
Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com >
Co-authored-by: felix <felix.li@amd.com >
2025-11-07 08:45:41 +08:00
JH-Leon-KIM-AMD
5f3cae3e28
[CK_BUILDER]ckb add remining fwd conv device ops ( #3155 )
...
* Add device operation to conv signature. Use unions to hold conv layouts and device operations.
* Add predicates for all device op instances.
* Use the device op signature for validation.
* Fix ckb CMakeLists.txt file for tests.
* Fix building CK Builder instance traits after the introduction of direct load template parameter in CK.
* Fix clang-formatting.
* add device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk
* Add full DL configurability with Option A implementation
- Added 5 DL descriptor structs (39 configurable parameters)
- Added 10 C++20 concepts for type-safe validation
- Updated factory to read all parameters from descriptors
- Updated test helper to populate all descriptors
- All tests passing (13/13 including 3 new DL tests)
* Add factory and test support for DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Add factory specialization for Large_Tensor device operation (conv_factory.hpp lines 1145-1265)
- Add macro collision workaround using pragma push/pop (conv_factory.hpp lines 43-51)
- Add test helper function run_test_DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor
- Add builder test file test_ckb_conv_fwd_2d_large_tensor_fp16.cpp with 2 test cases
- Update CMakeLists.txt to include new test file
- Reuse existing ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle descriptor
- Map all 42 template parameters identical to regular XDL CShuffle
- All 15 builder tests passing including 2 new Large_Tensor tests
Completes Task 350: All 4 forward convolution device operations now supported in CK Builder.
* Update copyright headers to new format
- Change copyright format to: Copyright (C) Advanced Micro Devices, Inc., or its affiliates.
- Reorder headers: Copyright first, then SPDX-License-Identifier
- Updated files:
* experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp
* experimental/builder/test/conv/test_ckb_conv_fwd_2d_large_tensor_fp16.cpp
* experimental/builder/include/ck_tile/builder/device_op_types.hpp
* fix c++ 18 format
* Fix clang-format-18 error in device_op_types.hpp
---------
Co-authored-by: Ville Pietilä <ville.pietila@amd.com >
Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com >
2025-11-06 16:29:48 -08:00
Johannes Graner
76c4c12f59
Add .clangd and CMakeUserPresets.json to .gitignore ( #3171 )
2025-11-06 15:07:39 -08:00
Adam Osewski
18e083003f
[CK_BUILDER] Convolution description ( #3163 )
...
* Add DirectLoad tparam & clean up headers.
* Add convolution traits.
* Update inline documentation.
* Add more convolution specialization and gemm padding types.
* Add additional helper functions & more tests to conv traits.
* Fix tests cmake file.
* Add case insensitive string comparison
* Fix function name overlapping with variable name.
* Unify pipeline version and scheduler enums.
* Fix includes.
* Update test conv traits with unified enums.
* Update concepts etc with update unified enum
* Fix ckb conv fwd test - unified enum usage.
* Dump changes.
* Add ostream overloads for all enum classes.
* Update detailed() function in ConvDescription
* Fix handling union based conv direction.
* Add test & update conv description.
* Refine tree view.
* Update copyrights
* Fix merge artifacts
* Update detailed tree conv description
* Fix clang-format
2025-11-06 15:46:26 +01:00
Bartłomiej Kocot
2234ff830b
[CK TILE] Convolution remove magic values ( #3160 )
...
* [CK TILE] Refactor Conv configs and Conv Elementwise
* fix
* [CK TILE] Convolution remove magix values
* fix partitioner
2025-11-06 11:26:30 +01:00
joyeamd
12922120d2
add gfx11's barrier following SPG's reference ( #3159 )
...
* add gfx11's barrier following SPG's reference
* re-format the code
* minor fix
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
2025-11-05 22:29:03 -08:00
Illia Silin
4533aa6dba
Fix compilation errors with clang22. ( #3164 )
...
* resolve compilation issue with clang22
* add __extension__ for __COUNTER__ usage in ck_tile
2025-11-05 15:42:22 -08:00
Adam Osewski
b8527a9236
[CK_BUILDER] Convolution traits. ( #3152 )
...
Added:
1. Convolution traits & unit tests
2. Update builder enumerators to have representation of Convolution Kernels properties.
3. Unified builder pipeline version & scheduler enumerators
2025-11-05 08:53:06 -08:00
andrew clark
3b076b0b74
Collecting redis stats ( #3149 )
2025-11-04 18:55:11 -08:00
Illia Silin
930423ab3b
Initialize new variable to prevent c++17 compiler error ( #3156 )
...
* initialize new variable to prevent c++17 compiler error
* build for gfx90a using -std=c++17 flag
2025-11-04 18:54:14 -08:00
Vidyasagar Ananthan
31c019f589
Chunk Ctests so we dont run into large number of tests error ( #3050 )
...
* Chunk Ctests so we dont run into large number of tests error
* Addressing feedback from copilot
2025-11-04 10:31:32 -08:00
Cong Ma
5abe4109e0
Introduces the new partitioner to implement the reduction StreamK kernel. ( #3107 )
...
* Introduces the new partitioner to implement the reduction StreamK kernel
* Add more doc text to functions
* Add persistent-dp option to streamk example
* Update example/ck_tile/40_streamk_gemm/README.md
2025-11-04 10:32:17 -07:00