mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
* Update for fmha_fwd qs_ks_vs pipeline
* Remove _builtin_amdgcn_sched_barrier(0)
* Move p_compute to p converting earlier for trying to increase vgprs re-using
* Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation
* Re-add __builtin_amdgcn_sched_barrier(0)
---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
[ROCm/composable_kernel commit: 3d50f57f43]