mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-04 13:41:24 +00:00

Files

Anton Gorenko 2312eef6c3 [rocm-libraries] ROCm/rocm-libraries#4368 (commit 17f7dfc)

[CK_TILE][FMHA] Support microscaling (mxfp8 and mxfp4) on
 gfx950 (#4368)

## Motivation

Microscaling types (mxfp8 and mxfp4) for fwd qr pipeline

## Technical Details

The microscaling is used when quant scale mode is
`BlockAttentionQuantScaleEnum::MX` and `Q/K/P/VDataType` are
fp8/bf8/fp4.

Supported features:
* only "qr" pipeline is implemented
* hdim 128 and 256 (smaller hdim are not possible due to restrictions of
"qr" pipeline, but they can be computed using instances with padding)
 * both 32x32x64 and 16x16x128 scale MFMAs are supported
 * Q and K scales are applied in hdim, V scales - in seqlen dimension
 * column-major V only
 * batch and group mode
 * bias, Alibi (tested but no instances by default, just like fp8)
 * masking etc.

Aiter PR with new API args: https://github.com/ROCm/aiter/pull/2008

## Test Plan

```
ninja test_ck_tile_fmha_fwd_mxfp8 && bin/test_ck_tile_fmha_fwd_mxfp8
ninja test_ck_tile_fmha_fwd_mxfp4 && bin/test_ck_tile_fmha_fwd_mxfp4
```

## Test Result

The tests must pass.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

2026-03-11 10:00:52 +00:00

algorithm

[rocm-libraries] ROCm/rocm-libraries#4797 (commit 1a30400)

2026-03-04 21:50:29 +00:00

arch

[rocm-libraries] ROCm/rocm-libraries#4368 (commit 17f7dfc)

2026-03-11 10:00:52 +00:00

container

[rocm-libraries] ROCm/rocm-libraries#4594 (commit 1fce4cb)

2026-03-10 20:12:43 +00:00

numeric

[Compiler] Addressing new compiler warnings (#3640 )

2026-02-02 09:39:48 -08:00

tensor

[rocm-libraries] ROCm/rocm-libraries#4594 (commit 1fce4cb)

2026-03-10 20:12:43 +00:00

utility

[rocm-libraries] ROCm/rocm-libraries#4797 (commit 1a30400)

2026-03-04 21:50:29 +00:00

config.hpp

[rocm-libraries] ROCm/rocm-libraries#5088 (commit 36ca523)

2026-03-10 16:47:43 +00:00

README.md

introducing ck_tile! (#1216 )

2024-04-15 19:27:12 -05:00

README.md

ck_tile/core

ck_tile/core contains every basic functions and structures to create a GPU kernel using ck_tile. User should only include ck_tile/core.hpp this single header to use all the functionality. Everything is under ck_tile namespace. The coding style under this folder should be similar to std (snake_case for structure/function, Camel for template types...)

algorithm/
    coordinate transform and some other reusable algorithm
arch/
    contains some basic device building block like mma, buffer addressing, etc...
container/
    contains basic container data structure, array/sequence/tuple/...
numeric/
    data type, and data type related math
tensor/
    tensor descriptors and tile level API
utility/
    other utility function for both host/device