Files
composable_kernel/dispatcher/codegen/fmha
damien-lejeune 2c0b7cbb0a [rocm-libraries] ROCm/rocm-libraries#8424 (commit debb669)
Add missing constraint in the FMHA qr async pipeline to
 enforce bk0=bk1  (#8424)

## Motivation

The purpose of this change is to add a guardrail to what values bk0 and
bk1 can take. This is to avoid ill defined sizes, silently failing and
generating NaN (or other error) at runtime.

An example of such failure can be obtained using the tile engine:

```
cd rocm-libraries/projects/composablekernel/tile_engine/ops/fmha
python fmha_benchmark.py configs/batch_prefill.json \
  --problems "1,4,1,8000,8000,256" \
  --filter "c.data_type=='bf16' and c.hdim_q==256 and c.pipeline=='qr_async' and c.mode=='group' and c.tile_n0==32 and c.tile_k0==64"
 ```

## Technical Details

The qr_async pipeline stages data in the K dimensions into LDS using a bk1-descriptor, while the (Q*K^T) gemm0 consumes bk0

## Test Plan

See command above

## Test Result

Before the change: (invalid) generate instances, error at runtime
After this change: no instance generated

## Submission Checklist

- [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Damien Lejeune <damien.lejeune@amd.com>
2026-06-16 07:41:58 +00:00
..