JH-Leon-KIM-AMD
fb3b4011d7
[CK Tile] Grouped conv fwd splitn support ( #2776 )
...
## What's New
Add Split-N support for grouped convolution forward to handle tensors >2GB by splitting the batch dimension.
## Bug Fix
Fixed 32-bit integer overflow that caused crashes with 6+ splits:
- Use `long_index_t` for batch offset calculations
- Remove redundant GemmM initialization in constructors
## How It Works
- Automatically splits batch dimension when tensor exceeds 2GB
- Uses grid.z dimension for parallel processing of splits
- Each split processes a subset of batches independently
## Testing
Verified with tile_example_grouped_conv_fwd:
- n=3000 (6 splits) ✓
- n=3500 (7 splits) ✓
- n=10480 (40 splits) ✓
[ROCm/composable_kernel commit: 804065a36b ]
2025-09-16 16:56:11 +03:00
linqunAMD
a2bbb7bff0
[CK_TILE] Fix example batched_gemm, grouped_gemm, gemm_multi_d, convolution on gfx11 & gfx12 ( #2808 )
...
* [CK_TILE] Fix example batched_gemm, grouped_gemm, gemm_multi_d, convolution on gfx11 & gfx12
* fix gemm_splitk_two_stage
* revert .pre-commit-config.yaml
[ROCm/composable_kernel commit: 60d3e8f504 ]
2025-09-11 07:27:33 -07:00
Ville Pietilä
40361182ca
[CK Tile] Fix building grouped conv examples in CK Tile ( #2777 )
...
* Fix compilation of the grouped conv examples.
* Fix grouped conv bwd weight example output in CK Tile.
[ROCm/composable_kernel commit: 83f607e2a6 ]
2025-09-05 09:14:21 +03:00
linqunAMD
cd6d731322
[Regression] Fix CK_TILE build error in grouped_convolution, copy_basic and fused_moegemm_kernel ( #2728 )
...
* fix copy basic build error
* fix other ck tile test build error
[ROCm/composable_kernel commit: 4a49dac7c6 ]
2025-08-28 20:30:30 +08:00
Bartłomiej Kocot
3e8a6dfb9c
[CK Tile] Grouped convolution backward data ( #2652 )
...
* base working version for single groupped conv bwd data
* Fix 2d descriptor
* fix groups
* Add 3d support
* fixes
* fixes
* fixes
---------
Co-authored-by: Jakub Piasecki <jakpia21@gmail.com >
[ROCm/composable_kernel commit: 4212bbc170 ]
2025-08-20 05:29:57 -07:00
linqunAMD
807f7510b5
Support Wave32 in CK_TILE - Part 1 ( #2594 )
...
* Support wave32/wave64 in CK_TILE - Part 1
* remove blocksize in kernel launch
* fix build error
* fix clang format
* fix clang format 2
* fix clang format 3
* fix fmha build error
* fix fmha build 2
* fix fmha build 3
* fix build error 4
* address review comment
* update change log
* replace KernelBlockSize with kBlockSize
* fix CI fail
* fix clang format
* address review comment and rebase code.
* fix universal test fail
---------
Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 9fcc1ee9fd ]
2025-08-18 10:08:31 -07:00
Illia Silin
3345f5f417
upgrade from clang-format-12 to clang-format-18 ( #2568 )
...
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
[ROCm/composable_kernel commit: 504b101da3 ]
2025-07-28 11:34:07 -07:00
jakpiase
bdb86fee78
[CK_TILE] Grouped Convolution Backward Weight Kernel ( #2357 )
...
* [CK TILE] Grouped Convolution Forward Kernel
* custom vector size
* fixes
* refactor
* resolved conflicts
* rebase fixes
* fixes
* tmp
* add working support for splitk
* minor fix
* fixes
* fixes
* minor fix
* small fix
* Split K and preprocessing fixes
---------
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com >
[ROCm/composable_kernel commit: 6681593864 ]
2025-07-24 10:41:35 +02:00
Bartłomiej Kocot
2567f5e538
[CK TILE] Grouped Convolution Forward Kernel ( #2188 )
...
* [CK TILE] Grouped Convolution Forward Kernel
* custom vector size
* fixes
* refactor
* rebase fixes
* fixes
* fixes
[ROCm/composable_kernel commit: cebdee4d9e ]
2025-06-20 15:44:36 -07:00