Files
composable_kernel/include/ck_tile/ops
Illia Silin a7ed94f71c [CK_TILE] FMHA Reduce register spilling in fwd with dropout (workaround for CI failures with clang-22) (#3221) (#3372)
* Use vectorized stores for dropout randvals

With no kPadSeqLenK the kernel uses 2 buffer_store_dwordx2 instead of
16 buffer_store_byte. This requires less registers and reduces spilling.

* Calculate dropout randvals for storing and applying only once

Even though it may add a small overhead when storing is not required,
it uses significantly less registers and hence no spilling.

Co-authored-by: Anton Gorenko <anton@streamhpc.com>
2025-12-16 10:47:00 -08:00
..
2025-10-31 11:29:05 +08:00
2025-11-11 08:23:57 -08:00
2024-10-26 23:52:49 +08:00
2024-10-26 23:52:49 +08:00
2025-10-31 11:29:05 +08:00
2025-11-11 07:42:26 -08:00