mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-10 16:28:38 +00:00

Files

chris-tsiaousis-hpc db05d61136 [rocm-libraries] ROCm/rocm-libraries#6212 (commit ccee58d)

=?UTF-8?q?[CK=20TILE]=20Unification=20Work=20=E2=80=93=20?=
 =?UTF-8?q?More=20accurate=20tests=20for=20MmaPipelines=20(#6212)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

This PR solves several issues:

#### More accurate tests for MmaPipelines

The current tests for the MmaPipelines (test_amdgcn_sparse_mma,
test_amdgcn_wavewise_mma) use explicit input fragment vectors filled
with 1s, and only check the output of a single lane. We should have
tests that actually use the MmaPipelines with non-trivial input matrices
and verify the complete output.
Some other aspects of the current MmaPipelines tests that I noticed and
deserve some attention:

1. There is sometimes iteration over K outside of the pipeline, which is
then included in WaveTileK or FragK, which is not correct. We should
remove it, move K iteration inside of the pipeline, or be more clear
about this outer-K loop size and how it propagates downwards.
2. There is very tight coupling between the kernel, gtest code, and
test_pipeline helper, requiring a lot of information and functions to be
passed back and forth.
3. The test_pipeline helper is doing a bunch of register-related logic
on the host (related to point 1)
4. Without this register logic the only thing it does is check the
device, call the kernel, and check the output, but with a lot of
boilerplate.

#### Test helper for detecting target arch at HOST runtime

There is a really apparent issue we faced while writing tests:

Scenario:
1. Compile a test that supports both gfx950 and gfx1201 for gfx950
2. Run the test on a server that only has gfx1201 GPU

Actual:
Segmentation fault

Expected:
The test can correctly detect from HOST runtime that the DEVICE
target_id was different and skips the test.

Notes:

The only way of detecting the COMPILER_TARGET_ID in the existing "arch"
framework is launching a kernel and calling `get_compiler_target()` (so,
from a DEVICE code). This will create a segmentation fault if the
current arch differs from the target arch. To cope with this issue, we
propose to export the compiler target(s) (note they can be many) through
`projects/composablekernel/test/ck_tile/core/arch/CMakeLists.txt` and
define a test helper to deal with such cases.

#### Add composition support to Transforms

We have a small number of Transforms which act on MmaOp input and output
data, before and after the MmaOp call respectively. These are currently
implemented to work on an MmaTile level, but in theory they are also
supposed to work at a WaveTile level, i.e. after composition of multiple
MmaTiles to create larger effective MNK dimensions. Currently the
composed MmaTiles look like 2D C-style arrays of the individual MmaTile
level register vectors (see WaveWiseMmaPipeline). The transforms should
be able to take these and perform the proper transforms to the whole
WaveTile at once. This might allow for better performing
transformations.

Note: This PR handles the SparseTransform case and if we don't end up
doing scale as a transformation, there isn't really much left to do. If
we end up having only the sparse transform as a non-trivial transform,
then we could also consider removing the Transform framework.

2026-06-03 14:35:18 +00:00

add_rmsnorm2d_rdquant

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

atomic_add_op

Shuffle fix for gfx950 (#3491 )

2026-01-13 09:21:29 -08:00

batched_gemm

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

batched_transpose

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

container

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

core

[rocm-libraries] ROCm/rocm-libraries#6212 (commit ccee58d)

2026-06-03 14:35:18 +00:00

data_type

[rocm-libraries] ROCm/rocm-libraries#6088 (commit 6ac353c)

2026-05-20 12:36:13 +00:00

elementwise

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

epilogue

[rocm-libraries] ROCm/rocm-libraries#4302 (commit e62bd8a)

2026-03-19 10:17:20 +01:00

flatmm

[rocm-libraries] ROCm/rocm-libraries#7359 (commit dd62f9f)

2026-05-29 17:02:45 +00:00

fmha

[rocm-libraries] ROCm/rocm-libraries#7500 (commit f5cd4fd)

2026-06-03 06:16:10 +00:00

gemm

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

gemm_block_scale

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

gemm_multi_abd

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

gemm_multi_d

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

gemm_mx

[rocm-libraries] ROCm/rocm-libraries#7724 (commit 4cb149a)

2026-05-29 18:45:13 +00:00

gemm_persistent_async_input

[rocm-libraries] ROCm/rocm-libraries#5863 (commit 31d9247)

2026-04-14 20:22:18 +00:00

gemm_streamk

[rocm-libraries] ROCm/rocm-libraries#7584 (commit 060bad5)

2026-05-29 21:36:49 +00:00

gemm_tile_engine

[rocm-libraries] ROCm/rocm-libraries#4769 (commit 72ae66e)

2026-04-14 10:50:24 -07:00

gemm_weight_preshuffle

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

grouped_conv

[rocm-libraries] ROCm/rocm-libraries#7936 (commit 3dc91e6)

2026-06-03 08:40:03 +00:00

grouped_gemm

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

grouped_gemm_abquant

[CK_Tile] Adding support for preshuffleQuant in AB quant Block Scale Gemm (#3629 )

2026-01-28 19:45:09 -08:00

grouped_gemm_multi_d

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

grouped_gemm_mx

[rocm-libraries] ROCm/rocm-libraries#5552 (commit 369c7a2)

2026-05-19 11:53:19 -07:00

grouped_gemm_preshuffle

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

grouped_gemm_quant

[rocm-libraries] ROCm/rocm-libraries#7537 (commit 07123f4)

2026-05-21 08:36:23 -07:00

image_to_column

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

layernorm2d

[rocm-libraries] ROCm/rocm-libraries#5045 (commit 64a5502)

2026-03-03 13:54:08 -08:00

memory_copy

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

moe_smoothquant

[rocm-libraries] ROCm/rocm-libraries#6302 (commit 8d419e8)

2026-04-10 11:17:11 -04:00

moe_sorting

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

multicast_load

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

permute

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

pooling

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

pooling_tile_engine

[rocm-libraries] ROCm/rocm-libraries#4469 (commit 0844cb0)

2026-04-01 07:31:46 +00:00

reduce

[rocm-libraries] ROCm/rocm-libraries#4301 (commit 0821c9f)

2026-03-03 07:39:32 -08:00

rmsnorm2d

[rocm-libraries] ROCm/rocm-libraries#5045 (commit 64a5502)

2026-03-03 13:54:08 -08:00

slice_tile

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

smoothquant

[rocm-libraries] ROCm/rocm-libraries#6302 (commit 8d419e8)

2026-04-10 11:17:11 -04:00

tdm

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

topk_softmax

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

utility

[rocm-libraries] ROCm/rocm-libraries#7724 (commit 4cb149a)

2026-05-29 18:45:13 +00:00

warp_gemm

[rocm-libraries] ROCm/rocm-libraries#7830 (commit 590fe58)

2026-06-02 13:54:16 +00:00

CMakeLists.txt

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

README.md

[rocm-libraries] ROCm/rocm-libraries#4301 (commit 0821c9f)

2026-03-03 07:39:32 -08:00

README.md

CK Tile Testing Guide

This document describes the test organization and available test targets for CK Tile operations.

Overview

CK Tile tests are organized with multiple levels of granularity to support different development workflows:

Global test labels - Run tests across all operations
Operation-specific umbrella targets - Run all tests for a specific operation
Individual test executables - Run specific tests

Global Test Labels

These targets run tests across all CK operations (not just CK Tile):

`ninja smoke`

Run fast smoke tests (tests that complete within ~30 seconds on gfx90a).

ninja smoke

`ninja regression`

Run slower, more comprehensive regression tests.

ninja regression

`ninja check`

Run ALL available tests in the entire codebase.

ninja check

Operation-Specific Umbrella Targets

These targets allow you to run all tests for a specific CK Tile operation. This is useful when making changes to a particular operation and wanting to validate all related tests without running the entire test suite.

GEMM Operations

`ck_tile_gemm_tests`

Run all basic GEMM pipeline tests (memory, compute variants, persistent, etc.)

ninja ck_tile_gemm_tests

Test executables included:

test_ck_tile_gemm_pipeline_mem
test_ck_tile_gemm_pipeline_compv3
test_ck_tile_gemm_pipeline_compv4
test_ck_tile_gemm_pipeline_persistent
test_ck_tile_gemm_pipeline_compv6
test_ck_tile_gemm_pipeline_comp_async (gfx95 only)
test_ck_tile_gemm_pipeline_*_wmma variants (gfx11/gfx12 only)

`ck_tile_gemm_block_scale_tests`

Run all GEMM tests with block-scale quantization (AQuant, BQuant, ABQuant, etc.)

ninja ck_tile_gemm_block_scale_tests

Test executables included: 29 test executables covering:

AQuant tests (memory pipelines, base layouts, prefill, preshuffle, transpose)
ABQuant tests (base, padding, preshuffle)
BQuant tests (1D/2D variants, transpose)
BQuant with PreshuffleB (decode/prefill, 1D/2D)
BQuant with PreshuffleQuant (decode/prefill, 1D/2D)
RowColQuant and TensorQuant tests

`ck_tile_gemm_streamk_tests`

Run all GEMM StreamK tests (tile partitioner, reduction, smoke, extended)

ninja ck_tile_gemm_streamk_tests

Test executables included:

test_ck_tile_streamk_tile_partitioner
test_ck_tile_streamk_reduction
test_ck_tile_streamk_smoke
test_ck_tile_streamk_extended

`ck_tile_grouped_gemm_quant_tests`

Run all grouped GEMM quantization tests

ninja ck_tile_grouped_gemm_quant_tests

Test executables included:

test_ck_tile_grouped_gemm_quant_rowcol
test_ck_tile_grouped_gemm_quant_tensor
test_ck_tile_grouped_gemm_quant_aquant
test_ck_tile_grouped_gemm_quant_bquant
test_ck_tile_grouped_gemm_quant_bquant_preshuffleb

Other Operations

`ck_tile_fmha_tests`

Run all FMHA (Flash Multi-Head Attention) tests

ninja ck_tile_fmha_tests

Test executables included: Forward and backward tests for fp16, bf16, fp8bf16, fp32

`ck_tile_reduce_tests`

Run all reduce operation tests

ninja ck_tile_reduce_tests

Test executables included:

test_ck_tile_reduce2d
test_ck_tile_multi_reduce2d_threadwise
test_ck_tile_multi_reduce2d_multiblock

Individual Test Executables

You can also build and run individual test executables:

Build a specific test

ninja test_ck_tile_gemm_pipeline_mem

Run a specific test directly

./build/bin/test_ck_tile_gemm_pipeline_mem

Run a specific test through ctest

ctest -R test_ck_tile_gemm_pipeline_mem --output-on-failure