[CK_TILE][FMHA] Add new tile size for async (#3586)

* add new tile size for async

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

* Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix lse error

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

---------

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
Linjun-AMD
2026-01-20 07:22:33 +08:00
committed by GitHub
parent 98abfa4ade
commit f3aafb9555
2 changed files with 9 additions and 1 deletions

View File

@@ -321,6 +321,8 @@ struct BlockFmhaPipelineQRKSVSAsync
{
if(num_total_loop <= 0)
{
buffer_load_fence(0); // rocm-7.1.1, if whole tile is masked out, need to fence(0)
// otherwise will have compute error(maybe compiler bug?)
if constexpr(kStoreLSE)
{
auto lse =