Files
composable_kernel/include
lalala-sh 35231e98e8 [CK] use int64 for ptr offset (#5094)
## Motivation

When the number of experts (E) is large (e.g., E=257 in DeepSeek-V3),
the `expert_id * expert_stride` calculation in MOE GEMM kernels
overflows `int32` (`index_t`), causing the weight matrix (B) pointer to
wrap to an invalid address and triggering a GPU memory access fault.

For example, with `N=1024, K=7168, IsInputGemm=true`:
- `expert_stride = N * K * 2 = 14,680,064`
- `INT32_MAX / expert_stride ≈ 146`
- Any `expert_id >= 147` causes overflow → negative offset → illegal
memory access → GPU crash

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
Co-authored-by: amd-shiraz <shiraz.ali@amd.com>
2026-03-05 18:00:01 -08:00
..
2026-03-05 18:00:01 -08:00