JH-Leon-KIM-AMD
9a5d1ea791
[rocm-libraries] ROCm/rocm-libraries#6208 (commit 33424f6)
[CK] Enable grouped conv bwd data to match non-grouped perf via NoShuffle + packed descriptors (#6208)
## Motivation
Improve performance of grouped convolution backward-data kernels to
match non-grouped kernel performance for G=1 cases.
## Technical Details
- Add NoShuffle epilogue path (direct VGPR→Global writes) by setting
`CDEBlockTransferScalarPerVector_NPerBlock = 1`
- Add nongrouped-match instances with optimized BBlockTransfer
parameters for better thread utilization
- Add packed (flat) descriptor path for G=1 2D convolutions, using
simpler tensor descriptors with fewer transform layers to reduce address
computation overhead in the GEMM main loop
- Cherry-pick PR #6090 for fair benchmarking (cache flush, include dX
zeroing cost)
## Test Plan
- Benchmark grouped vs non-grouped kernels on MI300X (589 shapes, BF16)
- Verify correctness with existing conv bwd data tests
## Test Result
| Metric | Before | After |
|--------|--------|-------|
| Mean ratio (grouped/nongrouped) | 1.159 | **1.028** |
| Median ratio | 1.142 | **1.026** |
| Cases within 2% | 26 (4.4%) | **186 (31.8%)** |
| Cases >20% slower | 188 (32%) | **2 (0.3%)** |
NoShuffle + nongrouped-match instances achieve **~2.8% average gap**
with non-grouped kernels (down from ~16%).
## Submission Checklist
- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
---------
Co-authored-by: root <root@ctr-cx64-mi300x-4.amd.com>
Co-authored-by: root <root@ctr-cx71-mi300x-01.amd.com>
Co-authored-by: root <root@ctr-cx63-mi300x-21.amd.com>
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>
Co-authored-by: root <root@gt-ccs-aus-h17-18.cs-aus.dcgpu>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-18 06:49:50 -07:00
..
2026-01-17 08:30:27 +01:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-01-07 16:30:57 +01:00
2026-05-15 06:46:51 -07:00
2026-03-30 07:19:32 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2025-11-28 13:49:54 -08:00
2026-04-10 11:17:11 -04:00
2025-11-28 13:49:54 -08:00
2026-05-18 13:02:38 +02:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-14 12:51:08 -07:00
2026-05-15 06:46:51 -07:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-01-28 17:41:02 +01:00
2025-11-28 13:49:54 -08:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-01-27 09:49:42 +01:00
2026-01-28 17:41:02 +01:00
2026-05-18 06:49:50 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 15:47:55 +02:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2026-05-14 12:51:08 -07:00
2025-11-28 13:49:54 -08:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2025-11-28 13:49:54 -08:00
2026-05-15 06:46:51 -07:00
2025-11-28 13:49:54 -08:00
2026-05-15 06:46:51 -07:00
2025-11-28 13:49:54 -08:00
2026-05-14 12:51:08 -07:00
2026-05-15 06:46:51 -07:00
2026-05-15 06:46:51 -07:00
2025-12-16 19:50:49 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2025-11-28 13:49:54 -08:00
2026-05-15 06:46:51 -07:00
2026-03-06 09:26:40 -07:00
2025-11-28 13:49:54 -08:00
2026-02-28 12:10:11 -08:00
2026-05-15 06:46:51 -07:00
2026-01-07 16:30:57 +01:00
2026-05-15 06:46:51 -07:00