Files
composable_kernel/experimental
jakpiase 9cf49cd322 [rocm-libraries] ROCm/rocm-libraries#7465 (commit 81f1cf0)
[CK TILE] Increase default kPerXdl for grouped convolution instances  (#7465)

## Summary
Increases the default `kPerXdl` used in CK Tile grouped convolution
instance generation for forward, backward-data, and backward-weight
operations.
### Changes in `generate_instances.py`
- **Larger default `kPerXdl` for all fp16/bf16 tile sizes**:
`get_k_mfma()` now
returns `32` for `m/nPerXdl = 16` and `16` for `m/nPerXdl = 32`.
- **Cap `kPerXdl` to `kPerBlock`**: All three parsers
(`parse_fwd_instances`, `parse_bwd_weight_instances`,
`parse_bwd_data_instances`) now clamp the computed value with `min(...,
k_per_block)` to prevent generating invalid instances where `kPerXdl >
kPerBlock`.
### Expected impact
Higher `kPerXdl` increases the number of MFMA instructions issued per
warp per inner-loop iteration, improving arithmetic intensity and
reducing pipeline stall overhead for memory-bound shapes.
2026-05-21 12:05:09 +02:00
..