[rocm-libraries] ROCm/rocm-libraries#4577 (commit a36922c)

[CK_TILE] FMHA BWD Launcher Interface

## Motivation
Reduce memory usage; Be prepared to implement optimizations of reducing
nsplits in deterministic cases.

## Technical Details
This PR introduces a new launcher interface for the FMHA backward
operation, replacing direct function calls with a more structured
approach. The launcher encapsulates kernel dispatch logic and provides
access to computed metadata like the number of dQ acc splits.

**Changes:**
- Added `fmha_bwd_launcher` class that wraps kernel execution and
exposes `dq_acc_splits`
- Moved `fmha_bwd_traits` construction earlier in the execution flow to
support launcher initialization
- Refactored code generation to produce both legacy API and new launcher
constructor

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
This commit is contained in:
Yi DING
2026-03-04 01:21:07 +00:00
committed by assistant-librarian[bot]
parent 08b6de62f8
commit b09112bbad
4 changed files with 150 additions and 69 deletions

View File

@@ -124,6 +124,13 @@ struct FmhaBwdDQDKDVKernel
#undef _TS_
// clang-format on
}
CK_TILE_HOST static index_t GetDqAccSplits(index_t seqlen_k)
{
if constexpr(kIsDeterministic)
return integer_divide_ceil(seqlen_k, FmhaPipeline::BlockFmhaShape::kN0);
else
return 1;
}
template <ck_tile::index_t I> // to avoid duplicated base class prblem, introduce an template
// arg