[rocm-libraries] ROCm/rocm-libraries#4372 (commit 738ffd7)

[CK] Workaround blockscale wp test failure

## Motivation

Workaround to fix blockscale wp test failure for pipeline v3

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
This commit is contained in:
Enrico Degregori
2026-02-07 00:09:58 +00:00
committed by assistant-librarian[bot]
parent 1ddb38f098
commit 984a3d1828
3 changed files with 9 additions and 16 deletions

View File

@@ -704,10 +704,12 @@ struct BlockwiseGemmXdlops_pipeline_blockscale_bpreshuffle_v3<BlockGemmPipelineS
});
});
// We have to 1 stage early sync the lds for workaround the compiler
// limitation
if constexpr(m0.value == (MRepeat - LocalPrefetchStages - 1))
// Compiler issue. Previously the sync was done one stage earlier to fix it.
// Problem shows up again with latest compiler so we sync at the correct
// iteration and then we force the instructions before the sync
if constexpr(m0.value == (MRepeat - LocalPrefetchStages))
{
__builtin_amdgcn_sched_barrier(0); // force all instructions before this
block_sync_lds();
}
@@ -833,6 +835,7 @@ struct BlockwiseGemmXdlops_pipeline_blockscale_bpreshuffle_v3<BlockGemmPipelineS
if constexpr(m0.value == (MRepeat - LocalPrefetchStages))
{
__builtin_amdgcn_sched_barrier(0); // force all instructions before this
block_sync_lds();
}