linqunAMD
|
807f7510b5
|
Support Wave32 in CK_TILE - Part 1 (#2594)
* Support wave32/wave64 in CK_TILE - Part 1
* remove blocksize in kernel launch
* fix build error
* fix clang format
* fix clang format 2
* fix clang format 3
* fix fmha build error
* fix fmha build 2
* fix fmha build 3
* fix build error 4
* address review comment
* update change log
* replace KernelBlockSize with kBlockSize
* fix CI fail
* fix clang format
* address review comment and rebase code.
* fix universal test fail
---------
Co-authored-by: Lin, Qun <Quentin.Lin+amdeng@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
[ROCm/composable_kernel commit: 9fcc1ee9fd]
|
2025-08-18 10:08:31 -07:00 |
|
Khushbu Agarwal
|
464c6f459e
|
[CK_Tile] Updating gpu timer when doing flush cache (#2593)
* Missed updating function names in example
* updating timer
* code cleanup
* addressing review comments
* updating tile_engine code
* addressing review comments
[ROCm/composable_kernel commit: 88d72178d6]
|
2025-07-31 16:43:33 -07:00 |
|
Illia Silin
|
3345f5f417
|
upgrade from clang-format-12 to clang-format-18 (#2568)
* upgrade to clang-format-18
* update to clang-format-18 in pre-commit-config
[ROCm/composable_kernel commit: 504b101da3]
|
2025-07-28 11:34:07 -07:00 |
|
jakpiase
|
bdb86fee78
|
[CK_TILE] Grouped Convolution Backward Weight Kernel (#2357)
* [CK TILE] Grouped Convolution Forward Kernel
* custom vector size
* fixes
* refactor
* resolved conflicts
* rebase fixes
* fixes
* tmp
* add working support for splitk
* minor fix
* fixes
* fixes
* minor fix
* small fix
* Split K and preprocessing fixes
---------
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>
[ROCm/composable_kernel commit: 6681593864]
|
2025-07-24 10:41:35 +02:00 |
|
Bartłomiej Kocot
|
2567f5e538
|
[CK TILE] Grouped Convolution Forward Kernel (#2188)
* [CK TILE] Grouped Convolution Forward Kernel
* custom vector size
* fixes
* refactor
* rebase fixes
* fixes
* fixes
[ROCm/composable_kernel commit: cebdee4d9e]
|
2025-06-20 15:44:36 -07:00 |
|