[GFX1250][CK_TILE] Add scale16 (ScaleBlockSize=16) support to MX GEMM TDM pipeline (#8202) Enables `ScaleBlockSize=16` end-to-end for the FP8/BF8 MX GEMM TDM pipeline, building on the scale16 warp-gemm layer already in develop. - **warp gemm:** add the 32x32x128 f8f6f4 scale16 traits and alias (2x2 grid of 16x16x128 scale16 intrinsic calls with per-subtile `SCALE_OPSEL`), and route 32x32 f8f6f4 through the dispatcher's `IsScale16` path. - **default policy:** select the warp gemm via the dispatcher with `IsScale16=(ScaleBlockSize==16)` so `WarpTile=16` and `WarpTile=32` each pick the matching scale16 path; guard WarpTile M/N to 16 or 32; scale-tile distribution for the scale16 layout. - **pipeline V1/V2:** thread `Problem::ScaleBlockSize` through the scale-window setup (replacing the hardcoded 32); expose `ScaleBlockSize` for the kernel. - **block gemm:** extract int64 (scale16) / int32 (scale32) scales by width. - **kernel:** scale16 descriptor order; reject unsupported `BlockScaleSize`. Test coverage for this path is in the stacked follow-up PR.
CK Tile Testing Guide
This document describes the test organization and available test targets for CK Tile operations.
Overview
CK Tile tests are organized with multiple levels of granularity to support different development workflows:
- Global test labels - Run tests across all operations
- Operation-specific umbrella targets - Run all tests for a specific operation
- Individual test executables - Run specific tests
Global Test Labels
These targets run tests across all CK operations (not just CK Tile):
ninja smoke
Run fast smoke tests (tests that complete within ~30 seconds on gfx90a).
ninja smoke
ninja regression
Run slower, more comprehensive regression tests.
ninja regression
ninja check
Run ALL available tests in the entire codebase.
ninja check
Operation-Specific Umbrella Targets
These targets allow you to run all tests for a specific CK Tile operation. This is useful when making changes to a particular operation and wanting to validate all related tests without running the entire test suite.
GEMM Operations
ck_tile_gemm_tests
Run all basic GEMM pipeline tests (memory, compute variants, persistent, etc.)
ninja ck_tile_gemm_tests
Test executables included:
test_ck_tile_gemm_pipeline_memtest_ck_tile_gemm_pipeline_compv3test_ck_tile_gemm_pipeline_compv4test_ck_tile_gemm_pipeline_persistenttest_ck_tile_gemm_pipeline_compv6test_ck_tile_gemm_pipeline_comp_async(gfx95 only)test_ck_tile_gemm_pipeline_*_wmmavariants (gfx11/gfx12 only)
ck_tile_gemm_block_scale_tests
Run all GEMM tests with block-scale quantization (AQuant, BQuant, ABQuant, etc.)
ninja ck_tile_gemm_block_scale_tests
Test executables included: 29 test executables covering:
- AQuant tests (memory pipelines, base layouts, prefill, preshuffle, transpose)
- ABQuant tests (base, padding, preshuffle)
- BQuant tests (1D/2D variants, transpose)
- BQuant with PreshuffleB (decode/prefill, 1D/2D)
- BQuant with PreshuffleQuant (decode/prefill, 1D/2D)
- RowColQuant and TensorQuant tests
ck_tile_gemm_streamk_tests
Run all GEMM StreamK tests (tile partitioner, reduction, smoke, extended)
ninja ck_tile_gemm_streamk_tests
Test executables included:
test_ck_tile_streamk_tile_partitionertest_ck_tile_streamk_reductiontest_ck_tile_streamk_smoketest_ck_tile_streamk_extended
ck_tile_grouped_gemm_quant_tests
Run all grouped GEMM quantization tests
ninja ck_tile_grouped_gemm_quant_tests
Test executables included:
test_ck_tile_grouped_gemm_quant_rowcoltest_ck_tile_grouped_gemm_quant_tensortest_ck_tile_grouped_gemm_quant_aquanttest_ck_tile_grouped_gemm_quant_bquanttest_ck_tile_grouped_gemm_quant_bquant_preshuffleb
Other Operations
ck_tile_fmha_tests
Run all FMHA (Flash Multi-Head Attention) tests
ninja ck_tile_fmha_tests
Test executables included: Forward and backward tests for fp16, bf16, fp8bf16, fp32
ck_tile_reduce_tests
Run all reduce operation tests
ninja ck_tile_reduce_tests
Test executables included:
test_ck_tile_reduce2dtest_ck_tile_multi_reduce2d_threadwisetest_ck_tile_multi_reduce2d_multiblock
Individual Test Executables
You can also build and run individual test executables:
Build a specific test
ninja test_ck_tile_gemm_pipeline_mem
Run a specific test directly
./build/bin/test_ck_tile_gemm_pipeline_mem
Run a specific test through ctest
ctest -R test_ck_tile_gemm_pipeline_mem --output-on-failure