Files
composable_kernel/include
juuso-oskari 374536f19a CK-UA: checkpoint FA4 pipeline + int64 Q/O base-offset fix
Working state before the pipeline cleanup/refactor:
  * FA4 matrix-softmax warp-group overlap pipeline (UA_FA4_PIPELINE=1).
  * Widen per-CTA query/output base offsets to long_index_t so large
    total_q (big-batch prefill) can't overflow int32 and fault on the
    output store (cache_ptr_int32_overflow_possible only covers K/V).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-03 08:47:43 +00:00
..