composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-13 09:45:56 +00:00

Author	SHA1	Message	Date
yinglu	2a73eb3bc0	Simulate TF32 with BF16x3 (#3142 ) * tf32:bf16x3:use bf16x3 emulate tf32 gemm * change blockwiseGemm to demo bf16x3 * temp push * self review * self review * fix multi-device compile error * bug fix * code refactor * limit to gfx950 * enhance gemm gfx942 threshold * lower change from blockwise to warpwise * refact codes * refact codes * error fix * change threshold * bug fix * fix threshold error * change host reference implement to same as device * bug fix * bug fix * code refact * fix clang-format fail * code refine	2025-11-13 16:21:09 -08:00
SamiAario-AMD	f2cfc6b94e	Remove "basic" and universal GEMM tests, and incorporate their test cases into the GEMM pipeline tests (#3094 ) * Add missing copyright statements * Use ck_tile::host_tensor_descriptor instead of a custom lambda * Refactor use of check_data_type in test classes * Use TEST_SUITE_NAME with TYPED_TEST_SUITE * Remove an unused namespace * Make dim3 const * Add BF8 x BF8 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add F8 x BF8 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add BF16 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add BF16 x BF16 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add BF8 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add F8 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Add F16 x I4 tests for CompV3 in test_gemm_pipeline_kernel_types.hpp * Skip failing tests of F16 x I4 for CompV3 with K == 2 * K_Tile * Add missing precision type combinations to CompV4 from CompV3 * Move the INT8 tests around for consistency with KernelTypesCompV3Wmma * Add missing precision type combinations to CompV3Wmma from CompV3 * Remove the basic and universal tests and their dependencies * On __gfx950__, avoid using transposed loading of A with datatype pk_int4_t of B * Use ADataType and BDataType instead of ComputeDataType for WarpGemm * Explicitly set some return types to void * Use more general typenames in InterleavedPKTypeLoader * Add load_interleaved_pk_type.hpp to common.hpp * Use std::is_same_v in load_int4_tile * Add handling of LoadTranspose to load_int4_tile * Factor out common code in several places using load_int4_tile * Add support for pk_int4_t using load_int4_tile * Fix formatting	2025-11-13 11:01:27 -08:00
Ville Pietilä	7d57bc169f	[CK_BUILDER] Forward convolution builder improvements (#3179 ) Proposed changes Improve the forward convolution builder implementation and addressed leftover feedback left from PR #3138. Main changes Refactored tests such that they reflect better the builder pattern. The templates and types for the convolution algorithm concepts are created via factory that facilitates programmatic creation of the device op instances. Moved tests into anonymous namespace. The convolution factory had lot of if-else constructs when CK Builder types were converted into CK library types. I had initially trouble in using static_assert in the default branch of switch as the static_assert was evaluated at compile time even for valid types. However, if we change the static_assert to throw "<error message>", it will result in a compile-time error only if the default branch is actually hit. This assumes that the function is consteval. Hence, changed all conversions in the convolution factory to use switch, which is more intuitive. Removed the explicit device op definition from convolution signature and the corresponding predicate file. The device ops are defined by the corresponding concepts. This allowed to remove lot of boilerplate code from the convolution factory. Adde inheritance and convolution algorithm specialization to handle device ops that are specialization of a more generic ones. The large tensor support is more naturally expressed by this pattern. Added support for the FP8 data type. * WIP: Builder for expected test results. * Improve ckb fwd conv instance tests. * clang-format * Change if-else statements into switch in conv factory. * Fix clang-formatting. * Removed unnecessary includes. * Added missing copyright. * Remove explicit device op flag from from convolution signature. * Add missing concept. * Fix build. * clang-format * Add test for building conv fwd FP8 instances. * Add missing header to instance traits. * Clean-up recently added instances. * Introduce inheritance and specialization. * Use builder to build conv algorithm templates and types. * clang-format * Fix conv description tests. --------- Co-authored-by: John Shumway <john.shumwayjr@gmail.com>	2025-11-13 08:47:25 -08:00
jefyang1	ca2ee0eb8a	Fix test_gemm_multiply_multiply_wp_xdl_fp8 on gfx950 (#3191 ) Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-11-13 09:32:54 -06:00
Yi DING	8d50001b93	[CK_TILE] Improve F8F6F4 Scaled WarpGemm (#3197 ) * [CK_TILE] Improve F8F6F4 Scaled WarpGemm * Thanks, Copilot	2025-11-13 20:22:05 +08:00
Khushbu Agarwal	fb41a7b73b	fixing ambiguous shuffle definitions (#3175 ) Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-11-12 23:44:12 -08:00
Cong Ma	6fd8ddabe7	[CK TILE GEMM] Refactor block_scale_gemm examples (#3181 ) * [CK TILE GEMM] Refactor block_scale_gemm examples - Split cpp file to reduce building time - Support multiple GemmConfig * [CK TILE GEMM] Refactor block_scale_gemm examples - Update Readme * [CK TILE GEMM] Refactor block_scale_gemm examples - Add support for rowcol and tensor GEMM operations * [CK TILE GEMM] Refactor block_scale_gemm examples - Update README * [CK TILE GEMM] Refactor block_scale_gemm examples - Set quant group size to (1, 1, 64) for targets excluding gfx950, where warp tile size (16, 16, 128) is incompatible.	2025-11-12 23:43:40 -08:00
Thrupti Raj Lakshmana Gowda	9af30f04b6	Ck tile engine commons (#3166 ) * Moving Preshuffle to commons * Fixing Common Validations * Addressing Review Comments * Partial Rebasing * Partial Rebasing * Partial Rebasing * Rebasing Complete	2025-11-13 00:56:18 -06:00
Aviral Goel	797ddfa41e	chore(copyright): update copyright header for test_data directory (#3194 ) * chore(copyright): update copyright header for tile_engine directory * chore(copyright): update copyright header for script directory * chore(copyright): update copyright header for test_data directory	2025-11-12 16:07:28 -08:00
John Afaganis	9342365713	Add C++17 deprecation warning to CHANGELOG.md (#3203 ) * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md	2025-11-12 16:05:53 -08:00
Illia Silin	3784c0e7c3	add permissions for /tmp folder (#3201 )	2025-11-12 11:47:07 -08:00
Enrico Degregori	7414a0f4d4	Wmma support for gemm_reduce (#3145 ) * Initial implementation GEMM+Reduce: - device struct - epilogue struct * Fix tests, improve profiler and add initial instances * Add instances * Fix compilation error * Address review comments * Fix logging --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-11-12 11:23:54 -08:00
Yashvardhan Agarwal	299c9bca1b	[CK_Tile] Pooling example readme update (#3174 ) * pooling example readme update - The updated readme explains the transformations of the pooling kernel using a mermaid diagram * Update example/ck_tile/36_pooling/README.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * resolve comments --------- Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>	2025-11-12 07:30:20 -08:00
Po Yen Chen	40d2ed0f2a	[CK_TILE] Share partition index across threads and specify offset in load_tile()/async_load_tile()/load_tile_transpose() (#2905 ) * Allow sharing partition index across threads * Fix typo PartitoinIndex -> PartitionIndex * Remove C++20 'requires' usages * Add missing template arguments * Fix load_tile() overload ambiguity issue * Use SFINAE to exclude invalid arguments * Add additional offset parameter to the async_load_tile() * Remove async_load_tile() default argument to avoid ambiguity * Extract tile_window coordinate compute logic as method * Use warp-shared LDS base address in tile_window::async_load() * Add constraint to tile_window::load() templates * Fix wrong type traits is_class_v<> usages * Add missing constraint to async_load_tile() * Add missing tile_window::load() overload * Add more constraint to avoid load_tile() call ambiguity * Rename ParitionIndex as ReplacementPartitionIndex * Update pre_computed_warp_coords_ in move_extended() * Fix inconsistency between template parameters and documentation * Allow specifying pre-computed parition index * Add type straits is_sequence<> & is_tile_distribution<> * Add type straits is_tensor_view<> * Add type constraints to make_tile_window() templates * Allow passing partition_index to set_tile_if() * Allow specifying partition_index to store_tile() * Add missing template parameter of replace_bottom_tensor_view() * Allow passing partition_index to Default2DEpilogue * Make get_partition_index() public * Add _with_offset() postfix to avoid resolution error * Remove ReplacementPartitionIndex template param * Add missing comments * Add load_tile_transpose_with_offset() overload	2025-11-12 10:26:14 +08:00
Bartłomiej Kocot	92c1f4981a	[CK_BUILDER] Add grouped conv fwd ck tile traits (#3183 ) * [CK BUILDER] Add grouped conv fwd ck tile traits * Update instance_traits_tile_grouped_convolution_forward.hpp * Update grouped_convolution_forward_kernel.hpp	2025-11-11 13:55:33 -08:00
Aviral Goel	b145a5fe80	Add CK Tile Tutorials Folder with GEMM and COPY Kernel (#3038 ) * feat: add tutorial folder with gemm tutorial * chore: move copy kernel from examples folder to tutorial * Update tutorial/ck_tile/01_naive_gemm/README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tutorial/ck_tile/01_naive_gemm/README.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * chore: remove handdrawn images * docs: add write ups to explain the gemm kernel * docs: add about block level pipeline and static distributed tensors --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-11 14:15:49 -06:00
Aviral Goel	c54ecd905b	docs: update ckProfiler readme with selective building option (#3140 ) * docs: update ckProfiler readme with selective building option * docs: add list of operations for ckProfiler	2025-11-11 14:27:33 -05:00
Aviral Goel	ab68c9d384	chore(copyright): update copyright header for script directory (#3184 ) * chore(copyright): update copyright header for tile_engine directory * chore(copyright): update copyright header for script directory --------- Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>	2025-11-11 11:26:01 -08:00
linqunAMD	1b1c46e508	[CK_TILE] Fix gemm_quant (#3186 )	2025-11-11 08:23:57 -08:00
Aviral Goel	88e3212fcc	chore(copyright): update copyright header for tile_engine directory (#3180 )	2025-11-11 08:17:24 -08:00
Scott Todd	aa1fb29aa1	Bump commit ref for TheRock in workflows (#3189 ) * Bump commit ref for TheRock in workflows * Update to more recent commit (could also `rm` the patch) * Revert "Update to more recent commit (could also `rm` the patch)" This reverts commit `4b9f4952ea`. * Rm patch that no longer applies * Fix post_build_upload flag name * Fix artifact_group plumbing for setup test env	2025-11-11 07:44:38 -08:00
Khushbu Agarwal	06c651b100	formatting (#3182 )	2025-11-11 07:42:26 -08:00
Enrico Degregori	1c544abf57	Extend support for ak1 / bk1 WMMA (#3073 ) * Extend AK1 / BK1 support: - Add support for AK1 != BK1 - Add support for AK1, BK1 > 8 - Introduce KInner template parameter for pipelines when loading multiple tiles with one instruction * fix clang format	2025-11-11 07:38:15 -08:00
Thomas Ning	9f33b7cfd3	fix input range (#3188 )	2025-11-10 11:08:41 -08:00
linqunAMD	7b6ba8d5c2	[ck] Enable missing op for gfx11 and gfx12 (#3187 )	2025-11-10 10:58:20 -08:00
linqunAMD	e593a14ae1	[ck] correct memory size in grouped_gemm_multi_abd_xdl_fixed_nk_bias_bf16_i8 (#3168 ) b1 and b0 use same layout, so, the size of b1_tensors_device should be same with b0_tensors_device's	2025-11-10 10:58:08 -08:00
Manish Kumar	d5746dd120	[CK-Tile] Add gtests for compiler CI for faster testing (#3123 ) * Add gtests for compiler CI for faster testing * Add changes to have a custom target * Add a gtest suite for gemm kernel for running CI tests with compiler mode * Fix Clang error (EOL) * Removed compiler subfolder from CMake * Add gtest suite for gemm kernel * Disable failed tests * Fix build errors * Resolved PR comments * Update shape for persistent gemm kernel test * Seperated types by H/W archs * Made changes to persistent types * Fix persistent build failure issue --------- Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-11-10 10:42:23 -08:00
Gino Lu	e31a7a4f29	fix MX bpreshuffle gemm B grid descriptor dimension error. (#3170 )	2025-11-06 19:42:39 -08:00
Xudong Yuan	d04eba4ae3	Ck moe mxfp4 blockm32 (#3098 ) * block_m = 32 * ck block_m = 32 * aiter/3rdparty/composable_kernel/include/ck/tensor_operation/gpu/block/blockwise_gemm_pipeline_xdlops_b_preshuffle_mx_moe_v3.hpp format * mxfp4_moe v1 pipe * update format --------- Co-authored-by: zhimding <zhimding@amd.com> Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com> Co-authored-by: felix <felix.li@amd.com>	2025-11-07 08:45:41 +08:00
JH-Leon-KIM-AMD	5f3cae3e28	[CK_BUILDER]ckb add remining fwd conv device ops (#3155 ) * Add device operation to conv signature. Use unions to hold conv layouts and device operations. * Add predicates for all device op instances. * Use the device op signature for validation. * Fix ckb CMakeLists.txt file for tests. * Fix building CK Builder instance traits after the introduction of direct load template parameter in CK. * Fix clang-formatting. * add device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk * Add full DL configurability with Option A implementation - Added 5 DL descriptor structs (39 configurable parameters) - Added 10 C++20 concepts for type-safe validation - Updated factory to read all parameters from descriptors - Updated test helper to populate all descriptors - All tests passing (13/13 including 3 new DL tests) * Add factory and test support for DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor - Add factory specialization for Large_Tensor device operation (conv_factory.hpp lines 1145-1265) - Add macro collision workaround using pragma push/pop (conv_factory.hpp lines 43-51) - Add test helper function run_test_DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor - Add builder test file test_ckb_conv_fwd_2d_large_tensor_fp16.cpp with 2 test cases - Update CMakeLists.txt to include new test file - Reuse existing ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle descriptor - Map all 42 template parameters identical to regular XDL CShuffle - All 15 builder tests passing including 2 new Large_Tensor tests Completes Task 350: All 4 forward convolution device operations now supported in CK Builder. * Update copyright headers to new format - Change copyright format to: Copyright (C) Advanced Micro Devices, Inc., or its affiliates. - Reorder headers: Copyright first, then SPDX-License-Identifier - Updated files: * experimental/builder/test/conv/test_ckb_conv_fwd_2d_dl_fp16.cpp * experimental/builder/test/conv/test_ckb_conv_fwd_2d_large_tensor_fp16.cpp * experimental/builder/include/ck_tile/builder/device_op_types.hpp * fix c++ 18 format * Fix clang-format-18 error in device_op_types.hpp --------- Co-authored-by: Ville Pietilä <ville.pietila@amd.com> Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com>	2025-11-06 16:29:48 -08:00
Johannes Graner	76c4c12f59	Add .clangd and CMakeUserPresets.json to .gitignore (#3171 )	2025-11-06 15:07:39 -08:00
Adam Osewski	18e083003f	[CK_BUILDER] Convolution description (#3163 ) * Add DirectLoad tparam & clean up headers. * Add convolution traits. * Update inline documentation. * Add more convolution specialization and gemm padding types. * Add additional helper functions & more tests to conv traits. * Fix tests cmake file. * Add case insensitive string comparison * Fix function name overlapping with variable name. * Unify pipeline version and scheduler enums. * Fix includes. * Update test conv traits with unified enums. * Update concepts etc with update unified enum * Fix ckb conv fwd test - unified enum usage. * Dump changes. * Add ostream overloads for all enum classes. * Update detailed() function in ConvDescription * Fix handling union based conv direction. * Add test & update conv description. * Refine tree view. * Update copyrights * Fix merge artifacts * Update detailed tree conv description * Fix clang-format	2025-11-06 15:46:26 +01:00
Bartłomiej Kocot	2234ff830b	[CK TILE] Convolution remove magic values (#3160 ) * [CK TILE] Refactor Conv configs and Conv Elementwise * fix * [CK TILE] Convolution remove magix values * fix partitioner	2025-11-06 11:26:30 +01:00
joyeamd	12922120d2	add gfx11's barrier following SPG's reference (#3159 ) * add gfx11's barrier following SPG's reference * re-format the code * minor fix --------- Co-authored-by: ThomasNing <thomas.ning@amd.com>	2025-11-05 22:29:03 -08:00
Illia Silin	4533aa6dba	Fix compilation errors with clang22. (#3164 ) * resolve compilation issue with clang22 * add __extension__ for __COUNTER__ usage in ck_tile	2025-11-05 15:42:22 -08:00
Adam Osewski	b8527a9236	[CK_BUILDER] Convolution traits. (#3152 ) Added: 1. Convolution traits & unit tests 2. Update builder enumerators to have representation of Convolution Kernels properties. 3. Unified builder pipeline version & scheduler enumerators	2025-11-05 08:53:06 -08:00
andrew clark	3b076b0b74	Collecting redis stats (#3149 )	2025-11-04 18:55:11 -08:00
Illia Silin	930423ab3b	Initialize new variable to prevent c++17 compiler error (#3156 ) * initialize new variable to prevent c++17 compiler error * build for gfx90a using -std=c++17 flag	2025-11-04 18:54:14 -08:00
Vidyasagar Ananthan	31c019f589	Chunk Ctests so we dont run into large number of tests error (#3050 ) * Chunk Ctests so we dont run into large number of tests error * Addressing feedback from copilot	2025-11-04 10:31:32 -08:00
Cong Ma	5abe4109e0	Introduces the new partitioner to implement the reduction StreamK kernel. (#3107 ) * Introduces the new partitioner to implement the reduction StreamK kernel * Add more doc text to functions * Add persistent-dp option to streamk example * Update example/ck_tile/40_streamk_gemm/README.md	2025-11-04 10:32:17 -07:00
Thomas Ning	13ba06f1e7	fix the blockscale 2d case (#3148 ) Co-authored-by: Aviral Goel <aviral.goel@amd.com>	2025-11-04 11:55:23 -05:00
John Shumway	0be0288f58	[CK_BUILDER] Update copyright messages. (#3150 ) * Update copyright messages. Copyright messages should no longer include a year. This PR updates all 38 source files to the new format. * Switch to (C) from unicode copyright symbol. The unicodein comments was causing compilation errors. therock-7.10	2025-11-04 15:35:16 +01:00
John Shumway	6dbee64886	[CK_BUILDER] Add backward weight instance traits for xdl cshuffle. (#3143 ) * Add backward weight instance traits for xdl cshuffle. To keep instance test file sizes reasonable, we start a new test_bwd_weight_instances_traits.cpp test file. * Fix copyright notices. * Remove (c) symbol, replace with (C). Having UTF-8 in source caused an error with code generation.	2025-11-04 15:34:00 +01:00
Bartłomiej Kocot	8681ced962	[CK TILE] Refactor Conv configs and Conv Elementwise (#3151 ) * [CK TILE] Refactor Conv configs and Conv Elementwise * fix	2025-11-04 15:04:53 +01:00
Bartłomiej Kocot	99f38e4d9b	[CK TILE] Refactor grouped conv fwd large tensor (#3144 )	2025-11-04 00:34:48 +01:00
Vidyasagar Ananthan	c7ded76cc7	Adding note on CMake convenience script (#3139 ) * Adding note on convenience script * Addressing feedback * Update README.md reword --------- Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>	2025-11-03 12:21:57 -08:00
Enrico Degregori	507d81c3af	Fix splitk preshuffle (#3137 ) * Fix splitK multiply_multiply_wp * Add tests for gemm_multiply_multiply_wp * Add tests for gemm_universal_preshuffle (KBatch = 1) * Add tests gemm_blockscale_wp * Fix splitk gemm universal preshuffle * Run new tests on arch supporting fp8 * Restore example * Fix strides profiler * Fix tests * Fix clang format * Finalize profiler preshuffle with tolerances * Minor improvements to splitk related changes * Address review comments: clang format and ckProfiler typo * Remove b_k_split_offset from SplitKBatchOffset struct	2025-11-03 11:59:01 -08:00
Thomas Ning	057b7d43b4	fix the compv4 and async pipeline when tile handler is 1 (#3141 )	2025-11-03 09:37:35 -08:00
Emily Martins	2ec57a8e70	Replace CK_TILE_PIPELINE macros with a common enum This change replaces pipeline macros like CK_TILE_PIPELINE_COMPUTE_V3, CK_TILE_PIPELINE_MEMORY, etc in the CK Tile examples with a common enum called GemmPipeline to reduce code duplication.	2025-11-03 09:35:05 -07:00
Michael Mcminn	afe1ff618d	Ud fix moe sorting gfx908 (#2720 ) * Adding a ds permute fallback for the gfx908 and older for row_newbcast:7 instruction * Better macro for selecting ROW_NEWBCAST * clang-format the update --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2025-11-03 07:31:31 -08:00

1 2 3 4 5 ...

2622 Commits