Commit Graph

11 Commits

Author SHA1 Message Date
jefyang1
276863ca87 [rocm-libraries] ROCm/rocm-libraries#8259 (commit df03f10)
Add cluster launch in test ck_tile mx gemm tdm wmma

## Motivation

Add cluster launch test in test_ck_tile_mx_gemm_pipeline_tdm_wmma on
gfx1250, so that we can check the performance on gfx1250 hardware.

## Technical Details

Added Out-of-bounds guard in RunGemm of MxGemmKernel to skip blocks
padded by cluster alignment.

Add ClusterEnable/ClusterDisable aliases and extend the tuple in
test_mx_gemm_pipeline_kernel_types.hpp by adding two kernel types with
ClusterEnable for F8 CompTDMV1 and CompTDMV2 respectively. The existing
F4 non-ClusterLaunch kernel types have issue to be fixed, so this PR
does not include F4 cases.

Read ClusterLaunch from the tuple in test_mx_gemm_pipeline_util.hpp.

Update invoke_mx_gemm to branch on ClusterLaunch, including Add cluster
size constants, Switch GemmShape type, TilePartitioner type, and the
kernel launch call.

## Test Plan

Tested the changes on gfx1250 FFM.

## Test Result

The added kernel types (instances) passed the tests on gfx1250 FFM.

## Submission Checklist

- [x ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-06-11 17:33:11 +00:00
Enrico Degregori
bdd7a8333d [rocm-libraries] ROCm/rocm-libraries#6672 (commit bda3f97)
[CK Tile] PermuteN support MX GEMM

## Motivation

Add PermuteN support to preshuffle MX GEMM

## Technical Details

 - Modify `shuffle_b_permuteN` to support MX preshuffled layout
- Add `preShuffleScalePermuteN` with same functionality of
`preShuffleScale` but layout consistent with PermuteN
 - Include MX pre-processing functions in the library

## Test Plan

Add test configuration for permuteN with preshuffle (both FP4 and FP8)

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Cong Ma <congma13@amd.com>
2026-06-05 03:04:43 +00:00
Aviral Goel
15c904b460 [rocm-libraries] ROCm/rocm-libraries#7724 (commit 4cb149a)
ck_tile: add FillUniformScaleDistribution and fix MX GEMM
 scale init (#7724)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

### Problem
MX GEMM pipeline tests were passing vacuously: scale bytes were drawn
from a fixed range (40–60) which, for e8m0, maps to scales ≈ 10⁻²⁷ — far
below FP16 min denorm. Both GPU and CPU produced all-zero outputs, so
numerical checks passed without exercising the GEMM.

### Changes

**`include/ck_tile/host/fill.hpp`** — new
`FillUniformScaleDistribution<ScaleType>` functor
- Accepts human-readable float bounds and maps them to the raw byte
range of any ExMy scale type (e8m0, e4m3, e5m3) by re-centering the IEEE
754 exponent into the type's bias space
- Sampling is uniform over raw bytes → uniform over representable values
- Fixes left-shift UB: uses multiplication instead of `<< mant_bits` to
avoid shifting negative signed integers (C++17 UB)
- Adds `assert(min_r <= max_r)` to catch inverted-range UB when both
bounds exceed the type's representable range
- Provides default member values (0.125f, 2.0f) and `std::optional` seed
consistent with sibling fillers
- `/** */` Doxygen style with `@note` on snapping asymmetry

**`test/ck_tile/gemm_mx/test_mx_gemm_pipeline_util.hpp`** — fix scale
initialization
- Replace manual byte-range distribution with
`FillUniformScaleDistribution<>{0.125f, 2.0f}`
- Use distinct seeds for scale_a (11941) and scale_b (11943) to avoid
correlated scale tensors that were causing 60 test failures for
fp4+e5m3/e4m3 combinations

**`test/ck_tile/utility/test_fill.cpp`** — new unit tests for
`FillUniformScaleDistribution`
- 16 typed tests across e8m0, e4m3, e5m3: validity, range,
reproducibility, coverage, snapping, stress, nullopt seed, and range
overload
- Test helper `expected_raw_range` mirrors implementation clamping
exactly
2026-05-29 18:45:13 +00:00
JP-Fernando
74bc86240b [rocm-libraries] ROCm/rocm-libraries#5647 (commit 490437a)
[CK Tile] Add gemm universal preshuffle to MX GEMM  (#5647)

## Motivation

Add gemm universal preshuffle support to existing MX GEMM pipeline.

The straightforward way to do this is to port the `mx_flatmm` pipeline
to the existing `gemm_mx` framework.

## Technical Details

The `mx_flatmm` pipeline was not deleted, to allow for
back-compatibility.

## Test Plan

Add `preshuffle` option to example: `tile_example_mx_gemm`.

Add new configurations with enabled preshuffle to the existing
`test/ck_tile/gemm_mx` tests.

## Test Result

Example and tests were successful on `gf950` architecture in the `Alola`
cluster.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Fernando Jiménez <fernando.jimenez@streamhpc.com>
2026-05-22 16:07:53 +02:00
Illia Silin
e02c566795 [rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)
[CK] upgrade CI to rocm7.13 as default compiler (#7612)

## Motivation

Upgrade the default docker and compiler version in CI to rocm7.13.
In order to pass all the checks I had to also clean up a lot of
non-ascii characters in the source code comments and modify a couple of
tests that were affected by a new compiler logic.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Aviral Goel <aviral.goel@amd.com>
2026-05-22 02:43:50 +00:00
Enrico Degregori
9565ca21ec [rocm-libraries] ROCm/rocm-libraries#5552 (commit 369c7a2)
[CK Tile] Eight Waves pipeline for MX GEMM (#5552)

## Motivation

Integrate Eight Waves pipeline in MX GEMM

## Technical Details

 - EightWaves pipeline:
- Add pipeline, policy and block gemm (internally using existing
implementation used by GEMM and ABQuant)
   - Extend support of EightWaves policy for FP4 (packed types)
 - Async pipeline:
- Fix pipeline with packed scales (requires MRepeat and NRepeat to be
contiguous)
- block gemm specific for MX GEMM is defined because distribution
encodings have changed
 - CShuffle:
- Add new functionality to support MRepeat and NRepeat contiguous
(defined by `TilesPacked`)
 - Examples:
- Refactor examples to easily switch different configurations (similar
to GEMM universal)
- Scales values generated consistently with other microscale
implementations in CK Tile
   - Add configuration for EightWaves pipeline
 - Tests:
   - Unify existing FP8 and FP4 tests
   - Add tests for EightWaves pipeline
- Scales values generated consistently with other microscale
implementations in CK Tile

Note: FP6 support for MX GEMM was added later and the support for the
Eight Waves pipeline will be done in following PR

## Test Plan

Add new pipeline to tests: `test_ck_tile_mx_gemm_async` for both FP4 and
FP8

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-05-19 11:53:19 -07:00
Illia Silin
717f2efef7 [rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)
[CK] add composable kernel support on gfx1250 (#6978)

## Motivation

Add composable kernel support on gfx1250.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Qun Lin <qlin@amd.com>
Co-authored-by: jialuo12_amdeng <jia.luo@amd.com>
Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com>
Co-authored-by: hsivasun_amdeng <haresh.sivasuntharampillai@amd.com>
2026-05-15 06:46:51 -07:00
Linjun-AMD
7469320248 [rocm-libraries] ROCm/rocm-libraries#5849 (commit d9b89b2)
[CK_TILE ]Revert "[CK_TILE] Enable MXFP6 for MX GEMM op (#5095)" (#5849)

This reverts commit 7e55766ddf7e9e20791b0e4e2d7b4026cf16b637.

## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve.
-->

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-27 20:36:39 +00:00
Sami Remes
8f0ede3ea2 [rocm-libraries] ROCm/rocm-libraries#5095 (commit 7e55766)
[CK_TILE] Enable MXFP6 for MX GEMM op (#5095)

## Motivation

Add support for MXFP6 in the MX GEMM op in CK-Tile.

Depends on https://github.com/ROCm/rocm-libraries/pull/4594

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-19 18:07:47 -07:00
Thomas Ning
1ab29bf22f [rocm-libraries] ROCm/rocm-libraries#5323 (commit 5454e9e)
CK Tile MX GEMM Packing Improvement (#5323)

## Motivation

Reduce the scale loading size and also has better utilization of MFMA
scale selection.

## Technical Details

Add up the packing of mx scales.

## Test Plan

Use the existing test cases.

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Sami Remes <samremes@amd.com>
Co-authored-by: Enrico Degregori <enrico@streamhpc.com>
2026-03-17 11:57:32 -07:00
Sami Remes
c1525b3f30 [rocm-libraries] ROCm/rocm-libraries#4594 (commit 1fce4cb)
[CK_TILE] MX GEMM non-preshuffled RCR layout (#4594)

## Motivation

Implements a GEMM with MX scaling for fp4 and fp8 in non-preshuffled
layouts using async pipeline.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>
2026-03-10 20:12:05 +00:00