Files
composable_kernel/include
Johannes Graner b7d59e4b5f [rocm-libraries] ROCm/rocm-libraries#8099 (commit fc4894b)
[CK Tile] Fix Stream-K flag store: wave-uniform SGPR address
 for scalar s_store/s_load (#8099)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

Stream-K grouped-conv (and GEMM) kernels fail to assemble for some
instances: the inline scalar flag store/load gets a VGPR address
operand, which scalar-memory instructions reject (`invalid operand for
instruction`). This blocks Stream-K instances from building.

## Technical Details

- `StreamKReductionOps::{Signal,Wait}StorePartialDone` (shared by GEMM
and conv, added in #5393) take `kargs` by `const&` and feed
`kargs.workspace_ptr` / `cta_idx` into inline
`s_store_dword`/`s_load_dword` with `"s"` constraints. For some
instantiations the compiler can't keep the pointer wave-uniform and
emits a VGPR address.
- Fix: route the pointer and offset through `amd_wave_read_first_lane`
so the scalar-memory address is a wave-uniform SGPR before the asm. Same
instructions, no algorithm change.
- Not arch-specific: the affected instance fails on
gfx908/gfx90a/gfx942/gfx950 without the fix; whether the compiler spills
to a VGPR depends on the instantiation (tile/warp/pipeline), not the
target.

## Test Plan

- Compile the previously-failing dispatcher instance for
gfx908/gfx90a/gfx942/gfx950.
- `test_ck_tile_grouped_conv_bwd_weight_streamk` on gfx942, gfx90a,
gfx950 hardware.
- gfx950 perf A/B (example, bf16/tree, 10 runs each) with vs without the
change.

## Test Result

- Failing instance now assembles on all four archs; previously failed on
every one.
- 30/30 conv Stream-K tests pass on gfx942, gfx90a, gfx950.
- gfx950 perf delta -0.13% (within run-to-run noise) — no regression
from the added readfirstlane on the cold flag path.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 08:57:04 +00:00
..