[CK_TILE] FMHA BWD Enable Tile 16x192 (#2741)

* 16x192

* Use buffer_load_lds for lse/d

* Dispatch & cleanup

* Avoid zeroing dq & fix

* fix
This commit is contained in:
Yi DING
2025-08-28 18:54:18 +08:00
committed by GitHub
parent b951416cdb
commit ead4447b20
6 changed files with 173 additions and 114 deletions

View File

@@ -803,7 +803,6 @@ bool run(const ck_tile::ArgParser& arg_parser)
o_buf.ToDevice(o_host.data());
lse_buf.ToDevice(lse_host.data());
dq_buf.SetZero();
dbias_buf.SetZero();
dq_acc_buf.SetZero();