[CK_Tile] Merge multiple convolution groups into a single GEMM batch (#2986)

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-21 05:19:20 +00:00

* Fix compilation of the grouped conv examples.

* Fix grouped conv bwd weight example output in CK Tile.

* Add number of groups to merge to ck tile grouped gemm example.

* Initial set of tests for TransformConvBwdWeightToGemm.

* Added unit tests for TransformConvBwdWeightToGemm conv groups are merged.

* WIP: Tensor transformations.

* Add unit tests for coordinate transforms.

* Fully working conv group merging for TransformConvBwdWeightToGemm.

* WIP: Merged conv groups offset calculation.

* Adde unit tests for tensor view.

* WIP: Merged conv groups epilogue.

* Enable running multiple conv groups per batch.

* Add tests for tile_distribution_encoding.

* Change example to match optimally depthwise convolution with merged groups.

* Add more tests for tensor view.

* Integration test for reading diagonal blocks from grouped distributed tensor.

* Improved integration test.

* Improve test for accessing diagonal blocks.

* Added integration test for cshuffle epilogue LDS tile distribution.

* Add more logging.

* Increase the max number of reported errors.

* WIP: merged conv groups GEMM epilogue changes.

* LDS to global memory copy.

* Fix tile window size for c block.

* Integration test for CShuffle epilogue.

* Improved CShuffle test.

* WIP: Separate epilogue for merged conv groups.

* Tile example parameters changes to match depthwise conv.

* Offset fixes.

* Epilogue fixes.

* Working baseline for depthwise covolution with merged conv groups.

* Fix build.

* Initial unit tests for tensor descriptor.

* Add one more unit test for tensor view.

* WIP: LDS to global mem transfer using CK tile tensor descriptor and tile distribution encoding.

* Fully functional LDS to global mem transfer using tensor descriptor and tile distribution encoding.

* Add more comments, disable debug code.

* Remove debug and other dead code.

* Code clean-up for bwd tensor transformations.

* Enable running multiple GEMM batches of merged conv groups.

* Add compile check for assumed row-mjor layout.

* Fix strides in 1D conv to gemm transformation.

* WIP: Simplify conv to gemm transformations and handle K > 1 and C > 1 cases.

* Fix case k > 1 and c=1.

* Remove debug code.

* Make MPerGroup and NPerGroup template parameters.

* Add additional check for non-supported c > 1 case.

* WIP: Put back the generic tensor descriptors for convolutions.

* Fix tensor descriptors.

* Remove the obsolete template parameters.

* Add more instances.

* Fix bugs in merged conv groups tensor descriptors.

* Fix tensor descriptors for merged conv groups when K > 1.

* Remove debug output.

* Remove dead code.

* Fix merge conflicts.

* Code clean-up.

* Remove unused code.

* Run clang-formatting.

* Remove debug prints and obsolete tests.

* Check that number of convolution groups is multiple of merged groups.

* Fix build after removing obsolete functionality.

* Remove obsolete enumeration.

* Fix new unit projects.

* Remove unnecessary includes.

* Fix passing the number of merged groups.

* Remove unrelated tests.

* Fix IsSupportedArgument for bwd weight conv kernel.

* Fix clang formatting.

* Fix the bwd weight conv to gemm mapping for num merged groups > 1.

* GEMM config for conv group merging.

* Fix clang-formatting.

* Remove obsolete comment.

* Fix typos in comment strings.

* Increase the max number of reported errors when testing against reference implementation.

* Rename gemm_config to conv_config.

* Rename GemmConfig to ConvConfig and move NumGroupsToMerge into ConvConfig.

* Change num_groups_to_merge to a boolean flag in the ck tile grouped conv example.

* Run clang-format.

* Add number of merged groups into kernel name string.

* Remove group merging flag from CK Tile grouped conv example.

[ROCm/composable_kernel commit: 121bf0e1f3]

This commit is contained in:

Ville Pietilä

2025-10-29 16:49:28 +02:00

committed by

GitHub

parent 332a0e1696

commit abccb649d1

17 changed files with 755 additions and 269 deletions

									
										5

include/ck_tile/core/algorithm/static_encoding_pattern.hpp
									
												View File
												
				@@ -25,7 +25,7 @@

				 * (3) number of iterations to cover the entire Y axis.

				 * The raked here represents how data is partitioned across different processing granularity.

				 * It represents howe we are going to access the data in thread, warp, or blocked in contiguous

				 * It represents how we are going to access the data in thread, warp, or blocked in contiguous

				 region.

				 * From below, the qualifier for 'raked' is the part of warp/thread hierarchy

				 * in the split of Y tile dimension where the iteration happens,

				@@ -101,7 +101,7 @@ enum struct tile_distribution_pattern

				     * @brief Block raked pattern - aka linear.

				     *

				     */

				    block_raked,

				    block_raked

				};

				struct tile_distribution_encoding_pattern

				@@ -144,7 +144,6 @@ struct tile_distribution_encoding_pattern_2d<BlockSize,

				                                             NumWaveGroups>

				    : public tile_distribution_encoding_pattern

				{

				    // TODO: make pattern where below condition does not need to hold - GGemmMultiDSplitk!

				    static_assert(XPerTile % VecSize == 0, "XPerTile must be a multiple of VecSize!");

				    static constexpr index_t warp_size  = get_warp_size();

[CK_Tile] Merge multiple convolution groups into a single GEMM batch (#2986)

5 include/ck_tile/core/algorithm/static_encoding_pattern.hpp Unescape Escape View File

5

include/ck_tile/core/algorithm/static_encoding_pattern.hpp

View File