Files
composable_kernel/include/ck_tile/ops
root 87d16738bf WIP: CK-UA KV-segment parallelism - kernel args and split range
Added split-KV fields to UnifiedAttentionVarlenKargs (num_splits,
i_split, lse_acc_ptr, o_acc_ptr + strides). Modified operator() to
compute per-split KV range using blocks_per_split.

INCOMPLETE: The pipeline returns normalized o_acc but the split-KV
combine kernel needs unnormalized o_acc + lse. Need to modify the
pipeline to optionally return m and l values alongside o_acc.

The kernel changes compile but the epilogue needs the split path
(write to float accumulators instead of final output).

Made-with: Cursor
2026-04-01 19:09:59 +00:00
..
2026-01-13 09:21:29 -08:00