add memsetasync for ck moe splitk (#4282)

## Proposed changes

add memsetasync for ck moe splitk to fix 

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [ ] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered



---
🔁 Imported from
[ROCm/composable_kernel#3630](https://github.com/ROCm/composable_kernel/pull/3630)
🧑‍💻 Originally authored by @lalala-sh

---------

Co-authored-by: lalala-sh <Jiaxing.Wen@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
This commit is contained in:
assistant-librarian[bot]
2026-02-12 09:44:51 -08:00
committed by GitHub
parent b5d58b8bc5
commit c41544e621

View File

@@ -253,12 +253,12 @@ struct DeviceMoeGemmBlockScale
// rotating mem
rotating_mem.Next();
// clear c mem
// if(arg_.KBatch > 1)
// hipGetErrorString(hipMemsetAsync(arg_.p_c_grid,
// 0,
// arg_.M * arg_.N * sizeof(CDataType)
// * (IsInputGemm && IsSplitK ? 2 : 1),
// stream_config.stream_id_));
if(arg_.KBatch > 1)
hipGetErrorString(hipMemsetAsync(arg_.p_c_grid,
0,
arg_.M * arg_.N * sizeof(CDataType) *
(IsInputGemm && IsSplitK ? 2 : 1),
stream_config.stream_id_));
};
ave_time = ck::utility::launch_and_time_kernel_with_preprocess<false>(
@@ -272,12 +272,12 @@ struct DeviceMoeGemmBlockScale
}
else
{
// if(arg.KBatch > 1)
// hipGetErrorString(hipMemsetAsync(arg.p_c_grid,
// 0,
// arg.M * arg.N * sizeof(CDataType) *
// (IsInputGemm && IsSplitK ? 2 : 1),
// stream_config.stream_id_));
if(arg.KBatch > 1)
hipGetErrorString(hipMemsetAsync(arg.p_c_grid,
0,
arg.M * arg.N * sizeof(CDataType) *
(IsInputGemm && IsSplitK ? 2 : 1),
stream_config.stream_id_));
ave_time = launch_and_time_kernel(
stream_config, kernel, dim3(gdx, gdy, gdz), dim3(BlockSize), 0, arg);