[CKTILE] FMHA fwd trload lse fix (#3046)

* enable storelse for fmha_fwd_trload kernel

* fix lse in trload

* fix the mask related bug

[ROCm/composable_kernel commit: 0d3860dfdb]
This commit is contained in:
Haocong WANG
2025-10-23 09:33:33 +08:00
committed by GitHub
parent 7c14d97d0e
commit 895983c816
2 changed files with 26 additions and 29 deletions

View File

@@ -724,7 +724,6 @@ class KernelComponentFactory:
and logits == "f"
and bias == "no"
and dropout == "f"
and lse == "f"
and skip == "f"
):
pipelines.append(FmhaFwdPipeline("qr_async_trload", "row", "f", "f", "f", "f", logits, bias, lse, dropout, squant, mask, skip, "t")) # fmt: skip