Add a workaround for a compiler issue for bwd on gfx90a and ROCm 7.1.1 (#3369)

Sometimes there are not enough wait-states between v_mfma_f32... and v_accvgpr_read_b32 instructions if they are separated by s_cbranch.
The workaround is to read accvgprs to vgpr before branching.
This commit is contained in:
Anton Gorenko
2025-12-08 20:44:17 +05:00
committed by GitHub
parent 878b4e7f46
commit ca6143f0b2

View File

@@ -552,6 +552,15 @@ struct BlockFmhaBwdDQDKDVPipelineKRKTRVR
});
});
}
#if defined(__gfx9__)
else
{
// Workaround for a compiler issue: sometimes there are not enough wait-states
// between v_mfma_f32... and v_accvgpr_read_b32 instructions if they are separated
// by s_cbranch.
tile_elementwise_inout([](auto& x) { asm("; force move to %0" : "+v"(x)); }, s_acc);
}
#endif
{
bool need_perpixel_check = mask.IsEdgeTile(