composable_kernel/test at 2312eef6c36d2811b1f57c85c8ae4c58a595be9e - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-05 20:55:59 +00:00

Files

History

Anton Gorenko 2312eef6c3 [rocm-libraries] ROCm/rocm-libraries#4368 (commit 17f7dfc)

[CK_TILE][FMHA] Support microscaling (mxfp8 and mxfp4) on
 gfx950 (#4368)

## Motivation

Microscaling types (mxfp8 and mxfp4) for fwd qr pipeline

## Technical Details

The microscaling is used when quant scale mode is
`BlockAttentionQuantScaleEnum::MX` and `Q/K/P/VDataType` are
fp8/bf8/fp4.

Supported features:
* only "qr" pipeline is implemented
* hdim 128 and 256 (smaller hdim are not possible due to restrictions of
"qr" pipeline, but they can be computed using instances with padding)
 * both 32x32x64 and 16x16x128 scale MFMAs are supported
 * Q and K scales are applied in hdim, V scales - in seqlen dimension
 * column-major V only
 * batch and group mode
 * bias, Alibi (tested but no instances by default, just like fp8)
 * masking etc.

Aiter PR with new API args: https://github.com/ROCm/aiter/pull/2008

## Test Plan

```
ninja test_ck_tile_fmha_fwd_mxfp8 && bin/test_ck_tile_fmha_fwd_mxfp8
ninja test_ck_tile_fmha_fwd_mxfp4 && bin/test_ck_tile_fmha_fwd_mxfp4
```

## Test Result

The tests must pass.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

2026-03-11 10:00:52 +00:00

..

batched_contraction

Implement batched gemm bias permute for RDNA4 (#3534 )

2026-01-17 08:30:27 +01:00

[rocm-libraries] ROCm/rocm-libraries#4415 (commit b3b4af7)

2026-02-25 23:23:02 +00:00

batched_gemm_b_scale

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_gemm

[rocm-libraries] ROCm/rocm-libraries#4415 (commit b3b4af7)

2026-02-25 23:23:02 +00:00

batched_gemm_multi_d

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

batched_gemm_multiple_d_gemm_multiple_d

Implement batched gemm add relu gemm add for rdna4 (#3391 )

2026-01-20 13:06:59 -08:00

batched_gemm_reduce

WMMA support for batched_gemm_reduce (#3332 )

2026-01-20 10:50:46 +01:00

batched_gemm_softmax_gemm

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

batched_gemm_softmax_gemm_permute

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

block_swizzle_test

chore(copyright): update copyright header for test directory

2025-11-19 17:43:28 -07:00

block_to_ctile_map

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[rocm-libraries] ROCm/rocm-libraries#4368 (commit 17f7dfc)

2026-03-11 10:00:52 +00:00

Add grouped convnd dataset tests for bwd_data, bwd_weight and make them parallel (#3380 )

2025-12-15 13:38:25 +01:00

Add support to fp16 + compute fp16 and bf16 + compute bf16 contractions (#3598 )

2026-01-20 09:39:57 -08:00

conv_tensor_rearrange

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

convnd_bwd_data

[CK tests] Extend conv GPU reference (#3539 )

2026-01-27 09:49:42 +01:00

[CK tests] Extend conv GPU reference (#3539 )

2026-01-27 09:49:42 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CK_BUILDER] Integrate CKB validation with CK verification (#3649 )

2026-01-28 17:41:02 +01:00

elementwise_normalization

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

[rocm-libraries] ROCm/rocm-libraries#4415 (commit b3b4af7)

2026-02-25 23:23:02 +00:00

[rocm-libraries] ROCm/rocm-libraries#4415 (commit b3b4af7)

2026-02-25 23:23:02 +00:00

gemm_bias_add_reduce

Wmma support for gemm_bias_add_reduce (#3316 )

2026-01-07 10:27:16 -08:00

gemm_blockscale_wp

[rocm-libraries] ROCm/rocm-libraries#4372 (commit 738ffd7)

2026-02-07 00:09:58 +00:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[rocm-libraries] ROCm/rocm-libraries#4415 (commit b3b4af7)

2026-02-25 23:23:02 +00:00

gemm_multiply_multiply_wp

Wmma support for gemm_multiply_multiply_wp (#3278 )

2025-12-03 07:38:23 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[rocm-libraries] ROCm/rocm-libraries#4415 (commit b3b4af7)

2026-02-25 23:23:02 +00:00

gemm_universal_preshuffle

Implement device_gemm_universal_preshuffle_instance for RDNA4 (#3429 )

2026-01-15 07:19:31 -08:00

gemm_universal_reduce

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

gemm_universal_streamk

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CK tests] Extend conv GPU reference (#3539 )

2026-01-27 09:49:42 +01:00

gpu_verification

[CK_BUILDER] Integrate CKB validation with CK verification (#3649 )

2026-01-28 17:41:02 +01:00

grouped_convnd_bwd_data

[rocm-libraries] ROCm/rocm-libraries#4415 (commit b3b4af7)

2026-02-25 23:23:02 +00:00

grouped_convnd_bwd_weight

[rocm-libraries] ROCm/rocm-libraries#4872 (commit ca623f7)

2026-02-25 20:11:01 +00:00

grouped_convnd_fwd

[rocm-libraries] ROCm/rocm-libraries#4407 (commit adde219)

2026-02-11 13:43:01 +00:00

grouped_convnd_fwd_activation

Adding remaining conv, dynamic_op, and scaleadd_scaleadd_relu flavors for grouped conv fwd (#3529 )

2026-01-30 17:02:14 +01:00

[rocm-libraries] ROCm/rocm-libraries#4340 (commit 70a312f)

2026-02-26 00:28:58 +00:00

grouped_gemm_tile_loop

Implement grouped gemm tile loop for RDNA4 (#3304 )

2026-01-13 07:14:23 +01:00

magic_number_division

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

normalization_bwd_data

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

normalization_bwd_gamma_beta

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

normalization_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

position_embedding

[Compiler] Addressing new compiler warnings (#3640 )

2026-02-02 09:39:48 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

fix some minor error (#3409 )

2025-12-16 19:50:49 -08:00

reference_conv_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

space_filling_curve

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

threadwise_transfer_helper

[rocm-libraries] ROCm/rocm-libraries#4673 (commit ec385da)

2026-03-06 16:27:59 +00:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[rocm-libraries] ROCm/rocm-libraries#4828 (commit 7de19bb)

2026-02-28 20:11:11 +00:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

CMakeLists.txt

[rocm-libraries] ROCm/rocm-libraries#4368 (commit 17f7dfc)

2026-03-11 10:00:52 +00:00