Files
composable_kernel/include
Anton Gorenko a9ddd16f4f [CK_TILE] FMHA Reduce register spilling in fwd with dropout (workaround for CI failures with clang-22) (#3221)
* Use vectorized stores for dropout randvals

With no kPadSeqLenK the kernel uses 2 buffer_store_dwordx2 instead of
16 buffer_store_byte. This requires less registers and reduces spilling.

* Calculate dropout randvals for storing and applying only once

Even though it may add a small overhead when storing is not required,
it uses significantly less registers and hence no spilling.

[ROCm/composable_kernel commit: d7b3197869]
2025-11-19 10:40:12 +05:00
..