composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-13 10:37:42 +00:00

Files

Aviral Goel 72a365b769 [CK_TILE] Fix FP8 MXGEMM numerical error in async load path (#4704 )

## Summary

Fixes FP8 MXGEMM producing half the expected result (e.g., 128 instead
of 256 with all 1s input).

**Bug introduced in:** `b7de1e14cea70681a23cd1a136df42910c776e4a` -
"[CK_TILE] Add blockscale GEMM support for EightWarps on gfx950 (#4280)"

  ## Root Cause

In the `static_move_ys=true` code path in `tile_window.hpp`, the IMM
optimization computes `lds_ys_offset` using a default-constructed tensor
descriptor:

  ```cpp
  make_tensor_coordinate(decltype(tensor_descriptor){}, idx_ys_offset)
```

  This default-constructed descriptor has different strides than the actual DRAM tensor descriptor used for dram_ys_offset. When these offsets are mixed in the address   calculation:

```cpp

  imm_valid = lds_ys_offset % IMM_RANGE;     // From wrong descriptor
  wave_offset = dram_ys_offset - imm_valid;  // From correct descriptor
```

  The final address wave_offset + imm_valid ≠ dram_ys_offset, causing incorrect memory accesses.

  Fix
```cpp
Set imm_valid = 0 to bypass the IMM optimization and ensure the full
offset is passed through wave_offset:

constexpr auto imm_valid = 0; // Avoids inconsistency between
lds_ys_offset and dram_ys_offset
```

  This disables the 12-bit immediate field optimization in the buffer_load_lds instruction but guarantees correctness. A proper fix would require making the DRAM tensor   descriptor constexpr, which is not feasible since tensor strides depend on runtime parameters (LDA, LDB).

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

2026-02-23 12:29:47 -08:00

Revert "[ck] Support VGPR estimate in GridwiseGemm_wmma_cshuffle_v3" (#4762 )

2026-02-20 22:40:28 +00:00

ck_tile

[CK_TILE] Fix FP8 MXGEMM numerical error in async load path (#4704 )

2026-02-23 12:29:47 -08:00

rapidjson

Update pre-commit to fixed versions, run remod for ck_tile (#2895 )

2025-10-16 15:29:17 -07:00