[CK_TILE] FMHA BWD Optimization For GFX950 (#2628)

* simplify fmha_bwd_kernel MakeKargs & dq_dram_window

* simply duplicate

* trload pipeline

* Try two-stage

* add prefetch

* optimize & iglp
This commit is contained in:
Yi DING
2025-08-12 11:11:55 +08:00
committed by GitHub
parent a7badc6ec5
commit 4fde1646e5
16 changed files with 2216 additions and 586 deletions

View File

@@ -51,6 +51,12 @@ inline std::string get_device_name()
default: return name;
}
}
inline bool is_load_tr_supported()
{
// Check if load transpose is supported.
return get_device_name() == "gfx950";
}
} // namespace ck_tile
#endif