jakpiase
72a1a1ca59
[CK_TILE] Switch into universal gemms for conv bwds ( #2981 )
...
* switch into universal gemms for conv bwds
* some fixes and support universal gemm in conv fwd
* add reviewer comments
[ROCm/composable_kernel commit: 6deaaa92cc ]
2025-10-14 16:09:16 +02:00
Johannes Graner
beb87960ad
[CK Tile] Implement Invoker pattern for remaining grouped convolution examples ( #2894 )
...
* Invoker for grouped_conv_fwd
* Invoker for grouped_conv_bwd_data
* Fix incorrect out layout identifier
[ROCm/composable_kernel commit: 15fff74503 ]
2025-09-24 10:22:38 +02:00
jakpiase
ccd54f7c92
[CK_TILE] Add conv bwd weight two stage support ( #2855 )
...
* resolved conflicts
* add conv bwd weight twostage
* fix one file
* fixes after review
* fixes
* fixes
* Fix
---------
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: 624c46866e ]
2025-09-22 15:31:25 +02:00
linqunAMD
a2bbb7bff0
[CK_TILE] Fix example batched_gemm, grouped_gemm, gemm_multi_d, convolution on gfx11 & gfx12 ( #2808 )
...
* [CK_TILE] Fix example batched_gemm, grouped_gemm, gemm_multi_d, convolution on gfx11 & gfx12
* fix gemm_splitk_two_stage
* revert .pre-commit-config.yaml
[ROCm/composable_kernel commit: 60d3e8f504 ]
2025-09-11 07:27:33 -07:00
Ville Pietilä
40361182ca
[CK Tile] Fix building grouped conv examples in CK Tile ( #2777 )
...
* Fix compilation of the grouped conv examples.
* Fix grouped conv bwd weight example output in CK Tile.
[ROCm/composable_kernel commit: 83f607e2a6 ]
2025-09-05 09:14:21 +03:00
rahjain-amd
7674eb6416
Add json dump support to output details from CK/CKTile Examples. ( #2551 )
...
* Adding RapidJson Library
* Adding Json Dumps in all CK_Tile Examples
Not verified yet
* Adding json to cktile Batched Transpose
* adding json dumps to layernorm2d_fwd
* Adding json dump to flatmm_basic
* Adding RapidJson Library
* Adding Json Dumps in all CK_Tile Examples
Not verified yet
* Adding json to cktile Batched Transpose
* adding json dumps to layernorm2d_fwd
* Adding json dump to flatmm_basic
* Adding json in 03_gemm
* Add json dump to 16_batched_gemm
* Add json dump to gemm_multi_d_fp16
* Add json dump to grouped_gemm
* fix fmha_bwd/fwd
* Fix clang-format errors
exclude include/rapidjson in jenkins as its a third-party library
* Saparating function and defination.
* Update Documentation of 03_gemm
* Refactoring as per code review
* Disable fp8 instances on unsupported targets (#2592 )
* Restrict building of gemm_universal_preshuffle_f8 instances to specific targets in CMakeLists.txt
* Add condition to skip gemm_xdl_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt
* Add conditions to skip unsupported targets for gemm_universal_preshuffle_f8 and gemm_xdl_universal_preshuffle_f8 instances in CMakeLists.txt
* Refine conditions to exclude gemm_universal_preshuffle_f8 instances for unsupported targets in CMakeLists.txt
---------
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com >
* fix clang format
* remove duplicate lines of code from library/src/tensor_operation_instance/gpu/CMakeLists.txt
* Fixing Readme and unifying jsondumps
* adding moe_smoothquant
* adding fused_moe
* Fixing Readme for batched_gemm
* Fixing Readme for grouped_gemm
* adding flatmm
* adding gemm_multi_d_fp16
* adding elementwise
* adding File name when json is dumped
* Fixing Reduce after merge
* adding batched_transpose
* Adding Warptile in Gemm
* Fixing Clang Format
---------
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
Co-authored-by: AviralGoelAMD <aviralgoel@amd.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
[ROCm/composable_kernel commit: 4d041837ad ]
2025-09-02 23:31:29 -07:00
linqunAMD
cd6d731322
[Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel ( #2728 )
...
* fix copy basic build error
* fix other ck tile test build error
[ROCm/composable_kernel commit: 4a49dac7c6 ]
2025-08-28 20:30:30 +08:00
Cong Ma
73657622b1
[CK TILE GEMM] Fix a merge conflict ( #2753 )
...
* Fixed a merge conflict in 43e7d549
* Foramt the code
[ROCm/composable_kernel commit: cd53e2e57e ]
2025-08-27 11:08:09 -07:00
Bartłomiej Kocot
3e8a6dfb9c
[CK Tile] Grouped convolution backward data ( #2652 )
...
* base working version for single groupped conv bwd data
* Fix 2d descriptor
* fix groups
* Add 3d support
* fixes
* fixes
* fixes
---------
Co-authored-by: Jakub Piasecki <jakpia21@gmail.com >
[ROCm/composable_kernel commit: 4212bbc170 ]
2025-08-20 05:29:57 -07:00
linqunAMD
807f7510b5
Support Wave32 in CK_TILE - Part 1 ( #2594 )
...
* Support wave32/wave64 in CK_TILE - Part 1
* remove blocksize in kernel launch
* fix build error
* fix clang format
* fix clang format 2
* fix clang format 3
* fix fmha build error
* fix fmha build 2
* fix fmha build 3
* fix build error 4
* address review comment
* update change log
* replace KernelBlockSize with kBlockSize
* fix CI fail
* fix clang format
* address review comment and rebase code.
* fix universal test fail
---------
Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 9fcc1ee9fd ]
2025-08-18 10:08:31 -07:00
Khushbu Agarwal
464c6f459e
[CK_Tile] Updating gpu timer when doing flush cache ( #2593 )
...
* Missed updating function names in example
* updating timer
* code cleanup
* addressing review comments
* updating tile_engine code
* addressing review comments
[ROCm/composable_kernel commit: 88d72178d6 ]
2025-07-31 16:43:33 -07:00
Illia Silin
3345f5f417
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
[ROCm/composable_kernel commit: 504b101da3 ]
2025-07-28 11:34:07 -07:00
jakpiase
bdb86fee78
[CK_TILE] Grouped Convolution Backward Weight Kernel ( #2357 )
...
* [CK TILE] Grouped Convolution Forward Kernel
* custom vector size
* fixes
* refactor
* resolved conflicts
* rebase fixes
* fixes
* tmp
* add working support for splitk
* minor fix
* fixes
* fixes
* minor fix
* small fix
* Split K and preprocessing fixes
---------
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: 6681593864 ]
2025-07-24 10:41:35 +02:00
Bartłomiej Kocot
2567f5e538
[CK TILE] Grouped Convolution Forward Kernel ( #2188 )
...
* [CK TILE] Grouped Convolution Forward Kernel
* custom vector size
* fixes
* refactor
* rebase fixes
* fixes
* fixes
[ROCm/composable_kernel commit: cebdee4d9e ]
2025-06-20 15:44:36 -07:00