composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-01 20:21:23 +00:00

Author	SHA1	Message	Date
Bartłomiej Kocot	c2d7931446	[CK TILE] Clear output buffers for grouped conv bwd (#3127 )	2025-10-31 14:11:54 +01:00
Ville Pietilä	22d9f99942	Fixed building CK Tile grouped conv fwd bias clamp example. (#3124 )	2025-10-30 18:17:48 +02:00
Ville Pietilä	121bf0e1f3	[CK_Tile] Merge multiple convolution groups into a single GEMM batch (#2986 ) * Fix compilation of the grouped conv examples. * Fix grouped conv bwd weight example output in CK Tile. * Add number of groups to merge to ck tile grouped gemm example. * Initial set of tests for TransformConvBwdWeightToGemm. * Added unit tests for TransformConvBwdWeightToGemm conv groups are merged. * WIP: Tensor transformations. * Add unit tests for coordinate transforms. * Fully working conv group merging for TransformConvBwdWeightToGemm. * WIP: Merged conv groups offset calculation. * Adde unit tests for tensor view. * WIP: Merged conv groups epilogue. * Enable running multiple conv groups per batch. * Add tests for tile_distribution_encoding. * Change example to match optimally depthwise convolution with merged groups. * Add more tests for tensor view. * Integration test for reading diagonal blocks from grouped distributed tensor. * Improved integration test. * Improve test for accessing diagonal blocks. * Added integration test for cshuffle epilogue LDS tile distribution. * Add more logging. * Increase the max number of reported errors. * WIP: merged conv groups GEMM epilogue changes. * LDS to global memory copy. * Fix tile window size for c block. * Integration test for CShuffle epilogue. * Improved CShuffle test. * WIP: Separate epilogue for merged conv groups. * Tile example parameters changes to match depthwise conv. * Offset fixes. * Epilogue fixes. * Working baseline for depthwise covolution with merged conv groups. * Fix build. * Initial unit tests for tensor descriptor. * Add one more unit test for tensor view. * WIP: LDS to global mem transfer using CK tile tensor descriptor and tile distribution encoding. * Fully functional LDS to global mem transfer using tensor descriptor and tile distribution encoding. * Add more comments, disable debug code. * Remove debug and other dead code. * Code clean-up for bwd tensor transformations. * Enable running multiple GEMM batches of merged conv groups. * Add compile check for assumed row-mjor layout. * Fix strides in 1D conv to gemm transformation. * WIP: Simplify conv to gemm transformations and handle K > 1 and C > 1 cases. * Fix case k > 1 and c=1. * Remove debug code. * Make MPerGroup and NPerGroup template parameters. * Add additional check for non-supported c > 1 case. * WIP: Put back the generic tensor descriptors for convolutions. * Fix tensor descriptors. * Remove the obsolete template parameters. * Add more instances. * Fix bugs in merged conv groups tensor descriptors. * Fix tensor descriptors for merged conv groups when K > 1. * Remove debug output. * Remove dead code. * Fix merge conflicts. * Code clean-up. * Remove unused code. * Run clang-formatting. * Remove debug prints and obsolete tests. * Check that number of convolution groups is multiple of merged groups. * Fix build after removing obsolete functionality. * Remove obsolete enumeration. * Fix new unit projects. * Remove unnecessary includes. * Fix passing the number of merged groups. * Remove unrelated tests. * Fix IsSupportedArgument for bwd weight conv kernel. * Fix clang formatting. * Fix the bwd weight conv to gemm mapping for num merged groups > 1. * GEMM config for conv group merging. * Fix clang-formatting. * Remove obsolete comment. * Fix typos in comment strings. * Increase the max number of reported errors when testing against reference implementation. * Rename gemm_config to conv_config. * Rename GemmConfig to ConvConfig and move NumGroupsToMerge into ConvConfig. * Change num_groups_to_merge to a boolean flag in the ck tile grouped conv example. * Run clang-format. * Add number of merged groups into kernel name string. * Remove group merging flag from CK Tile grouped conv example.	2025-10-29 16:49:28 +02:00
Johannes Graner	5c1974065e	[CK_TILE] Add conv fwd + bias + clamp example (#3012 ) * Implement argument passing to element-wise functions for fwd convolution * Add files for fwd + bias + clamp example * Implement Bias * Implement Clamp * Elementwise function composition * Composition unit test * Implement fwd + bias + clamp example * Simplify argument passing and composition * elfunc -> bias_and_clamp * Rename function to specify example * Move element-wise function instantiation to kernel * Make bias a runtime tensor * No ugly namespace aliasing * Initialize element-wise function on host * Remove function initialization helper, simplify Compose initialization * Remove unintended LSP compatibility patch * Clean up includes and unused code * Switch names in cshuffle epilogue * Move CDElementwise to conv traits * Re-add required include * Initialize bias in same way as other tensors * Better type specification for ds pointer * Disable 1D convolution * Add warning for non-group-constant bias	2025-10-27 18:43:09 +01:00
Illia Silin	3348f01e6f	re-enable clang-format by default (#3030 ) * re-enable clang-format by default * fix clang format	2025-10-15 07:43:11 -07:00
jakpiase	6deaaa92cc	[CK_TILE] Switch into universal gemms for conv bwds (#2981 ) * switch into universal gemms for conv bwds * some fixes and support universal gemm in conv fwd * add reviewer comments	2025-10-14 16:09:16 +02:00
Johannes Graner	15fff74503	[CK Tile] Implement Invoker pattern for remaining grouped convolution examples (#2894 ) * Invoker for grouped_conv_fwd * Invoker for grouped_conv_bwd_data * Fix incorrect out layout identifier	2025-09-24 10:22:38 +02:00
jakpiase	624c46866e	[CK_TILE] Add conv bwd weight two stage support (#2855 ) * resolved conflicts * add conv bwd weight twostage * fix one file * fixes after review * fixes * fixes * Fix --------- Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>	2025-09-22 15:31:25 +02:00
linqunAMD	60d3e8f504	[CK_TILE] Fix example batched_gemm, grouped_gemm, gemm_multi_d, convolution on gfx11 & gfx12 (#2808 ) * [CK_TILE] Fix example batched_gemm, grouped_gemm, gemm_multi_d, convolution on gfx11 & gfx12 * fix gemm_splitk_two_stage * revert .pre-commit-config.yaml	2025-09-11 07:27:33 -07:00
Ville Pietilä	83f607e2a6	[CK Tile] Fix building grouped conv examples in CK Tile (#2777 ) * Fix compilation of the grouped conv examples. * Fix grouped conv bwd weight example output in CK Tile.	2025-09-05 09:14:21 +03:00
rahjain-amd	4d041837ad	Add json dump support to output details from CK/CKTile Examples. (#2551 ) * Adding RapidJson Library * Adding Json Dumps in all CK_Tile Examples Not verified yet * Adding json to cktile Batched Transpose * adding json dumps to layernorm2d_fwd * Adding json dump to flatmm_basic * Adding RapidJson Library * Adding Json Dumps in all CK_Tile Examples Not verified yet * Adding json to cktile Batched Transpose * adding json dumps to layernorm2d_fwd * Adding json dump to flatmm_basic * Adding json in 03_gemm * Add json dump to 16_batched_gemm * Add json dump to gemm_multi_d_fp16 * Add json dump to grouped_gemm * fix fmha_bwd/fwd * Fix clang-format errors exclude include/rapidjson in jenkins as its a third-party library * Saparating function and defination. * Update Documentation of 03_gemm * Refactoring as per code review * Disable fp8 instances on unsupported targets (#2592) * Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt * Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt * Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt * Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt --------- Co-authored-by: AviralGoelAMD <aviralgoel@amd.com> * fix clang format * remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt * Fixing Readme and unifying jsondumps * adding moe_smoothquant * adding fused_moe * Fixing Readme for batched_gemm * Fixing Readme for grouped_gemm * adding flatmm * adding gemm_multi_d_fp16 * adding elementwise * adding File name when json is dumped * Fixing Reduce after merge * adding batched_transpose * Adding Warptile in Gemm * Fixing Clang Format --------- Co-authored-by: Aviral Goel <aviral.goel@amd.com> Co-authored-by: AviralGoelAMD <aviralgoel@amd.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2025-09-02 23:31:29 -07:00
linqunAMD	4a49dac7c6	[Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel (#2728 ) * fix copy basic build error * fix other ck tile test build error	2025-08-28 20:30:30 +08:00
Cong Ma	cd53e2e57e	[CK TILE GEMM] Fix a merge conflict (#2753 ) * Fixed a merge conflict in `245467f3` * Foramt the code	2025-08-27 11:08:09 -07:00
Bartłomiej Kocot	4212bbc170	[CK Tile] Grouped convolution backward data (#2652 ) * base working version for single groupped conv bwd data * Fix 2d descriptor * fix groups * Add 3d support * fixes * fixes * fixes --------- Co-authored-by: Jakub Piasecki <jakpia21@gmail.com>	2025-08-20 05:29:57 -07:00
linqunAMD	9fcc1ee9fd	Support Wave32 in CK_TILE - Part 1 (#2594 ) * Support wave32/wave64 in CK_TILE - Part 1 * remove blocksize in kernel launch * fix build error * fix clang format * fix clang format 2 * fix clang format 3 * fix fmha build error * fix fmha build 2 * fix fmha build 3 * fix build error 4 * address review comment * update change log * replace KernelBlockSize with kBlockSize * fix CI fail * fix clang format * address review comment and rebase code. * fix universal test fail --------- Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2025-08-18 10:08:31 -07:00
Khushbu Agarwal	88d72178d6	[CK_Tile] Updating gpu timer when doing flush cache (#2593 ) * Missed updating function names in example * updating timer * code cleanup * addressing review comments * updating tile_engine code * addressing review comments	2025-07-31 16:43:33 -07:00
Illia Silin	504b101da3	upgrade from clang-format-12 to clang-format-18 (#2568 ) * upgrade to clang-format-18 * update to clang-format-18 in pre-commit-config	2025-07-28 11:34:07 -07:00
jakpiase	6681593864	[CK_TILE] Grouped Convolution Backward Weight Kernel (#2357 ) * [CK TILE] Grouped Convolution Forward Kernel * custom vector size * fixes * refactor * resolved conflicts * rebase fixes * fixes * tmp * add working support for splitk * minor fix * fixes * fixes * minor fix * small fix * Split K and preprocessing fixes --------- Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>	2025-07-24 10:41:35 +02:00
Bartłomiej Kocot	cebdee4d9e	[CK TILE] Grouped Convolution Forward Kernel (#2188 ) * [CK TILE] Grouped Convolution Forward Kernel * custom vector size * fixes * refactor * rebase fixes * fixes * fixes	2025-06-20 15:44:36 -07:00

19 Commits