Files
composable_kernel/test/ck_tile/CMakeLists.txt
Aviral Goel 1a4aa7fd89 [rocm-libraries] ROCm/rocm-libraries#5082 (commit 9313659)
ck_tile: add gtest unit tests for MX flatmm (gfx950)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

- Add correctness unit tests for the MX-format flatmm kernel
(`example/ck_tile/18_flatmm/mxgemm`) under `test/ck_tile/flatmm/`
- Tests cover all five dtype combinations: FP4×FP4, FP8×FP8, FP6×FP6,
FP8×FP4, FP4×FP8
- Tests cover all four kernel dispatch paths (the `has_hot_loop` ×
`tail_num` product):
  - `has_hot_loop=false, tail=ODD` (K=256, num_loop=1)
  - `has_hot_loop=false, tail=EVEN` (K=512, num_loop=2)
  - `has_hot_loop=true, tail=ODD` (K=768, num_loop=3)
  - `has_hot_loop=true, tail=EVEN` (K=1024, num_loop=4)
- Remove unsupported `-split_k` CLI option from
`tile_example_mx_flatmm`; the pre-shuffled B layout is incompatible with
K-splitting and the option silently produced wrong results

## Changes

**New files (`test/ck_tile/flatmm/`):**
- `CMakeLists.txt` — builds 40 kernel instances as a shared OBJECT
library, links into 5 per-dtype test executables; forwards
`-DCK_TILE_USE_OCP_FP8` when `CK_USE_OCP_FP8` is ON
- `test_mx_flatmm_base.hpp` — base test fixture with
`run_test_with_validation(M, N, K, kbatch=1)`
- `test_mx_flatmm_fixtures.hpp` — concrete `TestMXFlatmm` typed test
class and type aliases
- `test_mx_flatmm_fp{4fp4,8fp8,6fp6,8fp4,4fp8}.cpp` — per-dtype
`TYPED_TEST_SUITE` files

**Modified files:**
- `example/ck_tile/18_flatmm/mxgemm/mx_flatmm_arch_traits.hpp` — moved
`preShuffleWeight` here (was in `mx_flatmm.cpp`) so it is includeable by
both the example and the tests
- `example/ck_tile/18_flatmm/mxgemm/mx_flatmm.cpp` / `run_mx_flatmm.inc`
— removed `-split_k` CLI arg, hardcoded `k_batch=1`, fixed `k_split`
formula, updated call sites after `preShuffleWeight` move
- `test/ck_tile/CMakeLists.txt` — added `add_subdirectory(flatmm)`
2026-03-11 22:47:59 +00:00

73 lines
2.9 KiB
CMake

# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
################################################################################
# CK Tile Test Organization
################################################################################
# CK Tile tests can be run using several methods:
#
# 1. Global test labels (run tests across all operations):
# - ninja smoke - Fast tests (~30s on gfx90a)
# - ninja regression - Slower comprehensive tests
# - ninja check - All available tests
#
# 2. Operation-specific umbrella targets (run all tests for a specific operation):
# - ninja ck_tile_gemm_tests - All basic GEMM tests
# - ninja ck_tile_gemm_block_scale_tests - All GEMM with block-scale quantization tests
# - ninja ck_tile_gemm_streamk_tests - All GEMM StreamK tests
# - ninja ck_tile_grouped_gemm_quant_tests - All grouped GEMM quantization tests
# - ninja ck_tile_reduce_tests - All reduce operation tests
# - ninja ck_tile_fmha_tests - All FMHA (Flash Attention) tests
#
# 3. Individual test executables:
# - ninja test_<test_name> - Build specific test executable
# - ./build/bin/test_<test_name> - Run specific test directly
#
# These umbrella targets are useful when working on specific operations to quickly
# validate all related tests without running the entire test suite.
################################################################################
add_subdirectory(image_to_column)
add_subdirectory(gemm)
add_subdirectory(gemm_persistent_async_input)
add_subdirectory(gemm_weight_preshuffle)
add_subdirectory(batched_gemm)
add_subdirectory(grouped_gemm)
add_subdirectory(grouped_gemm_preshuffle)
add_subdirectory(grouped_gemm_multi_d)
add_subdirectory(grouped_gemm_quant)
add_subdirectory(grouped_gemm_abquant)
add_subdirectory(gemm_multi_d)
add_subdirectory(gemm_multi_abd)
add_subdirectory(gemm_streamk)
add_subdirectory(data_type)
add_subdirectory(container)
add_subdirectory(elementwise)
# Not including these tests as there is a bug on gfx90a and gfx942
# resulting in "GPU core dump"
#add_subdirectory(moe_smoothquant)
add_subdirectory(permute)
add_subdirectory(moe_sorting)
add_subdirectory(slice_tile)
add_subdirectory(memory_copy)
add_subdirectory(batched_transpose)
add_subdirectory(smoothquant)
add_subdirectory(topk_softmax)
add_subdirectory(add_rmsnorm2d_rdquant)
# add_subdirectory(layernorm2d)
# add_subdirectory(rmsnorm2d)
add_subdirectory(gemm_block_scale)
add_subdirectory(flatmm)
add_subdirectory(gemm_mx)
add_subdirectory(utility)
add_subdirectory(warp_gemm)
add_subdirectory(reduce)
add_subdirectory(core)
add_subdirectory(epilogue)
add_subdirectory(atomic_add_op)
add_subdirectory(fmha)
add_subdirectory(gemm_tile_engine)
add_subdirectory(pooling)
add_subdirectory(grouped_conv)
add_subdirectory(gemm_streamk_tile_engine)