[CK_TILE] Fix NaN for FMHA BWD When seq_q=0 (#5790)

## Motivation
This PR addresses NaNs in the FMHA backward (dQ/dK/dV) path when the
effective query sequence length for a tile is zero, by ensuring the
per-tile pipelines exit early with zeroed accumulators and by avoiding
an early kernel return that prevented writing out cleared gradients.

## Technical Details
- Add unconditional early-exit in the dK/dV pipelines when
`num_total_loop <= 0` (no work), returning zeroed accumulators.
- Adjust group-mode kernel early-return logic to only return when
**both** `seqlen_q` and `seqlen_k` are zero, allowing blocks to run and
store cleared dK/dV when `seqlen_q == 0`.

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
This commit is contained in:
Yi DING
2026-03-27 15:54:01 +08:00
committed by GitHub
parent f15c70365c
commit 8554618d6a
4 changed files with 13 additions and 25 deletions

View File

@@ -872,7 +872,7 @@ struct FmhaBwdDQDKDVKernel
}
// skip if logical lengths are zero
if(kargs.seqlen_q == 0 || kargs.seqlen_k == 0)
if(kargs.seqlen_q == 0 && kargs.seqlen_k == 0)
{
return;
}