Files
composable_kernel/include
jakpiase ad412c26f3 [rocm-libraries] ROCm/rocm-libraries#6624 (commit 47d0162)
[CK_TILE] Grouped Convolution Backward Data Direct Load
 (#6624)

## Proposed changes

Add Grouped Convolution Backward Data with Direct Load into
DeviceGroupedConvBwdDataMultipleD_Xdl_CShuffleV3 device implementation.
This enables direct global memory loading (bypassing LDS) for the
backward data convolution path on gfx950, following the same pattern
used in both backward weight and forward convolution.

Direct load convolution backward data improves performance by avoiding
LDS round-trips for certain configurations on gfx950, which supports a
wider range of instructions. Currently correctness is checked only at
usage point, but should be extended to a standalone UT in the future.
2026-04-23 09:17:50 +00:00
..