[rocm-libraries] ROCm/rocm-libraries#4271 (commit 6fce58e)

[Conv] Add NumGroupsToMerge to BwdWeight type string
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Proposed changes

Add parameter to bwd weight V3 type string showing the number of groups
to merge. This is required for MIOpen to be properly tuned since it uses
type strings for performance database entries.

In order to not break existing tuning databases, the parameter is added as a named suffix and only when group merging is enabled.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [ ] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered
This commit is contained in:
Johannes Graner
2026-02-11 09:08:38 +00:00
committed by assistant-librarian[bot]
parent d06f35027a
commit e88f139c6c

View File

@@ -1679,8 +1679,12 @@ struct DeviceGroupedConvBwdWeight_Xdl_CShuffleV3
if constexpr(DirectLoad) {
str << "_DirectLoad";
}
if constexpr(NumGroupsToMerge > 1) {
str << "_MergedGroups";
}
str << "<"
str << "<"
<< BlockSize << ", "
<< MPerBlock << ", "
<< NPerBlock << ", "
@@ -1695,8 +1699,10 @@ struct DeviceGroupedConvBwdWeight_Xdl_CShuffleV3
<< BBlockTransferDstScalarPerVector_K1 << ", "
<< CShuffleMXdlPerWavePerShuffle << ", "
<< CShuffleNXdlPerWavePerShuffle << ", "
<< CBlockTransferScalarPerVector_NWaveNPerXdl
<< ">";
<< CBlockTransferScalarPerVector_NWaveNPerXdl;
if constexpr(NumGroupsToMerge > 1)
str << ", " << NumGroupsToMerge;
str << ">";
// clang-format on
return str.str();