mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
When kCachePtrInt32OverflowPossible=true, we now: 1. Save original K/V buffer pointers at pipeline start 2. Always rebase by computing offset from original base pointer 3. Use k_row_stride/v_row_stride passed from kernel args This fixes the bug where successive rebases would compound, since each rebase modified buf.p_data_ without tracking the original base. Key insight: separate long_index_t variables for block_offset and elem_offset avoid compiler type promotion issues that caused assembly errors when computing the total offset in a single expression. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>