composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-12 18:17:58 +00:00

Files

root e8587b86c2 Fix CK-UA pipeline: s_waitcnt_vmcnt<0> in fmha_post_process

The final V tile's async load was not properly waited on before reading
from LDS: s_waitcnt_vmcnt<K_inst> allowed V_inst outstanding loads
(a no-op when K_inst == V_inst). The last loop iteration never prefetches
K, so only V is outstanding. Use s_waitcnt_vmcnt<0> unconditionally.

This partially fixes the BS32 race condition for production workloads
(maxk >= 256). A deeper pipeline race remains for very short KV
sequences (maxk < ~165, 2-5 pages) with block_size=32 at high batch.

Made-with: Cursor

2026-04-01 23:04:07 +00:00

[rocm-libraries] ROCm/rocm-libraries#6022 (commit 54b284a)

2026-03-31 15:19:43 +00:00

ck_tile

Fix CK-UA pipeline: s_waitcnt_vmcnt<0> in fmha_post_process

2026-04-01 23:04:07 +00:00

rapidjson

Update pre-commit to fixed versions, run remod for ck_tile (#2895 )

2025-10-16 15:29:17 -07:00