Files
composable_kernel/test/ck_tile
Brock Hargreaves 37f3c99ea0 [CK] Streamk tile engine test not setting a reasonable CU_COUNT default when the query fails (#5165)
## Motivation

The following error was coming up when compiling on Windows when the
generate_configs.py file tries to query the GPU for the number of CU's:

```
[composable_kernel configure] -- Generating Stream-K test config files for fp16
[composable_kernel configure] Traceback (most recent call last):
[composable_kernel configure]   File "E:\TheRock\rocm-libraries\projects\composablekernel\test\ck_tile\gemm_streamk_tile_engine\generate_configs.py", line 277, in <module>
[composable_kernel configure]     main()
[composable_kernel configure]     ~~~~^^
[composable_kernel configure]   File "E:\TheRock\rocm-libraries\projects\composablekernel\test\ck_tile\gemm_streamk_tile_engine\generate_configs.py", line 271, in main
[composable_kernel configure]     cu_count, configs_dir_path, tile_sizes, datatype = get_args()
[composable_kernel configure]                                                        ~~~~~~~~^^
[composable_kernel configure]   File "E:\TheRock\rocm-libraries\projects\composablekernel\test\ck_tile\gemm_streamk_tile_engine\generate_configs.py", line 267, in get_args
[composable_kernel configure]     return (int(args.cu_count), args.configs_dir_path, args.tiles, args.datatype)
[composable_kernel configure]             ~~~^^^^^^^^^^^^^^^
[composable_kernel configure] ValueError: invalid literal for int() with base 10: 'Exit code 0xc0000135\n'
[composable_kernel configure] CMake Error at test/ck_tile/gemm_streamk_tile_engine/generate_configs.cmake:98 (message):
[composable_kernel configure]   Eror occured during execution of
[composable_kernel configure]   E:/TheRock/rocm-libraries/projects/composablekernel/test/ck_tile/gemm_streamk_tile_engine/generate_configs.py
[composable_kernel configure] Call Stack (most recent call first):
[composable_kernel configure]   test/ck_tile/gemm_streamk_tile_engine/CMakeLists.txt:301 (generate_test_configs)
[composable_kernel configure]
[composable_kernel configure]
[composable_kernel configure] -- Configuring incomplete, errors occurred!
[composable_kernel configure FAILED WITH CODE 1 in 41 seconds]
ninja: build stopped: subcommand failed.
```

## Technical Details

There was one major problem in the following code and two changes were
made:

```
        execute_process(
            COMMAND ${CPP_EXE_PATH}
            OUTPUT_STRIP_TRAILING_WHITESPACE
            ERROR_VARIABLE standard_error
            RESULT_VARIABLE queried_cu_count
        )

        if (standard_error)
            message(STATUS "Error information from attempting to query HIP device and properties:\n"
                            "${standard_error}")
        endif()
```



1. RESULT_VARIABLE does not capture the IO output of the executable, but
rather the exit code. You can see from the error output here that it was
trying to cast "Exit code 0xc0000135\n" to an integer. I fixed this by
changing RESULT_VARIABLE to OUTPUT_VARIABLE.

```
[composable_kernel configure] ValueError: invalid literal for int() with base 10: 'Exit code 0xc0000135\n'
```

Note that this also gives us the reason that the query failed: Exit code
0xc0000135, which needs to be addressed in a separate issue: "Exit code
0xc0000135, also seen as -1073741515, is a Windows error indicating that
an application failed to start because a required Dynamic Link Library
(DLL) file or a system component like the .NET Framework is missing or
corrupted"

It's likely the executable that is created from this code can't find the
hip dll, or something similar:

```
        set(CPP_FILE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cu_count.cpp)
        set(CPP_EXE_PATH ${CMAKE_CURRENT_BINARY_DIR}/cu_count)

        execute_process(
            COMMAND ${CMAKE_HIP_COMPILER} -x hip ${CPP_FILE_PATH} -o ${CPP_EXE_PATH}
            RESULT_VARIABLE compile_result
        )
```

2. For clarity and consistency purposes, I changed the check afterwards
to explicitly look for a non-zero exit code. This matches previous
checks in the cmake file. I also added improved error checking when the
query for the cu count fails.

## Test Plan

Ensure it compiles locally and existing CI isn't impacted.

## Test Result

Waiting on CI.

## Submission Checklist

- [ x ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Emily Martins <emily.martins@amd.com>
2026-03-10 10:52:18 -07:00
..
2026-02-02 16:04:40 +08:00

CK Tile Testing Guide

This document describes the test organization and available test targets for CK Tile operations.

Overview

CK Tile tests are organized with multiple levels of granularity to support different development workflows:

  1. Global test labels - Run tests across all operations
  2. Operation-specific umbrella targets - Run all tests for a specific operation
  3. Individual test executables - Run specific tests

Global Test Labels

These targets run tests across all CK operations (not just CK Tile):

ninja smoke

Run fast smoke tests (tests that complete within ~30 seconds on gfx90a).

ninja smoke

ninja regression

Run slower, more comprehensive regression tests.

ninja regression

ninja check

Run ALL available tests in the entire codebase.

ninja check

Operation-Specific Umbrella Targets

These targets allow you to run all tests for a specific CK Tile operation. This is useful when making changes to a particular operation and wanting to validate all related tests without running the entire test suite.

GEMM Operations

ck_tile_gemm_tests

Run all basic GEMM pipeline tests (memory, compute variants, persistent, etc.)

ninja ck_tile_gemm_tests

Test executables included:

  • test_ck_tile_gemm_pipeline_mem
  • test_ck_tile_gemm_pipeline_compv3
  • test_ck_tile_gemm_pipeline_compv4
  • test_ck_tile_gemm_pipeline_persistent
  • test_ck_tile_gemm_pipeline_compv6
  • test_ck_tile_gemm_pipeline_comp_async (gfx95 only)
  • test_ck_tile_gemm_pipeline_*_wmma variants (gfx11/gfx12 only)

ck_tile_gemm_block_scale_tests

Run all GEMM tests with block-scale quantization (AQuant, BQuant, ABQuant, etc.)

ninja ck_tile_gemm_block_scale_tests

Test executables included: 29 test executables covering:

  • AQuant tests (memory pipelines, base layouts, prefill, preshuffle, transpose)
  • ABQuant tests (base, padding, preshuffle)
  • BQuant tests (1D/2D variants, transpose)
  • BQuant with PreshuffleB (decode/prefill, 1D/2D)
  • BQuant with PreshuffleQuant (decode/prefill, 1D/2D)
  • RowColQuant and TensorQuant tests

ck_tile_gemm_streamk_tests

Run all GEMM StreamK tests (tile partitioner, reduction, smoke, extended)

ninja ck_tile_gemm_streamk_tests

Test executables included:

  • test_ck_tile_streamk_tile_partitioner
  • test_ck_tile_streamk_reduction
  • test_ck_tile_streamk_smoke
  • test_ck_tile_streamk_extended

ck_tile_grouped_gemm_quant_tests

Run all grouped GEMM quantization tests

ninja ck_tile_grouped_gemm_quant_tests

Test executables included:

  • test_ck_tile_grouped_gemm_quant_rowcol
  • test_ck_tile_grouped_gemm_quant_tensor
  • test_ck_tile_grouped_gemm_quant_aquant
  • test_ck_tile_grouped_gemm_quant_bquant
  • test_ck_tile_grouped_gemm_quant_bquant_preshuffleb

Other Operations

ck_tile_fmha_tests

Run all FMHA (Flash Multi-Head Attention) tests

ninja ck_tile_fmha_tests

Test executables included: Forward and backward tests for fp16, bf16, fp8bf16, fp32

ck_tile_reduce_tests

Run all reduce operation tests

ninja ck_tile_reduce_tests

Test executables included:

  • test_ck_tile_reduce2d
  • test_ck_tile_multi_reduce2d_threadwise
  • test_ck_tile_multi_reduce2d_multiblock

Individual Test Executables

You can also build and run individual test executables:

Build a specific test

ninja test_ck_tile_gemm_pipeline_mem

Run a specific test directly

./build/bin/test_ck_tile_gemm_pipeline_mem

Run a specific test through ctest

ctest -R test_ck_tile_gemm_pipeline_mem --output-on-failure