[CK][CK Tile] Conv Bwd Data flush cache and profiling improvements (#6090)

## Motivation

Improve accuracy of conv bwd data perf measurements

## Technical Details
- enable flush cache
- for grouped conv we zero conv input(gemm output) inside device op, so
we also include this in time measurement
- for non-grouped conv we zero conv input(gemm output) outside device op
(in profile_conv_bwd_data_impl.hpp) so it is not included.
- In this pr I changed it to include zeroing if time_kernel/flush cache
is enabled so at now you should have more fair comparison. I changed it
only for time_kernel/flush_cache because MIOpen run own zeroing for
non-grouped solvers.

## Test Plan

test_grouped_conv_bwd_data_*

## Test Result

CI pending

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
This commit is contained in:
Bartłomiej Kocot
2026-04-04 02:22:22 +02:00
committed by GitHub
parent 9478c6c69f
commit 4112e08d0c
11 changed files with 362 additions and 144 deletions

View File

@@ -89,7 +89,8 @@ int call_profiler(const ckt::Args<SIGNATURE>& args,
0 /*log_level*/,
5 /*cold_iters*/,
50 /*nrepeat_*/,
true /*is_gpu_timer_*/});
true /*is_gpu_timer_*/,
time_kernel /*flush_cache*/});
if(time_kernel)
{
std::cout << "\nBest configuration parameters:" << "\n\tname: " << op_name << " (instance "