mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-18 20:09:25 +00:00
## Motivation This PR introduces a restructure for the benchmarking and profiling aspects of CK Tile's Tile Engine, expanding on the groundwork from this previous https://github.com/ROCm/composable_kernel/pull/3434 and outlined in this [design document](https://amdcloud-my.sharepoint.com/:w:/r/personal/astharai_amd_com/Documents/Restructuring%20Tile%20Engine.docx?d=w14ea28a30718416988ed5ebb759bd3b2&csf=1&web=1&e=l3VBuX). In PR 3434, to reduce repeated code we implemented: - Base class that centralizes common functionality and provides a default implementation (Universal GEMM) - Child classes for GEMM variants override virtual functions to handle variant-specific behavior This refactoring in this PR follows the same process and should greatly reduce the duplicated code present in Tile Engine and make it simpler to add in new operations, increasing scalability. ## Technical Details The files have been refactored around new base structs for benchmarks, profiling and problem descriptions. The new base structs are: - GemmProblem - GemmBenchmark - GemmProfiler Universal GEMM, Preshuffle GEMM, and Multi-D GEMM all have child classes that will inherit from these base structs overriding only what differs per variant. All common functions across the benchmarking and profiling files have been moved into newly added common utility files under the commons/ directory. The new utility files are: - utils.hpp: common functions for the benchmarking and profiling process - benchmark_utils.py: common utility functions for the benchmark generation ## Test Plan I tested using the existing tests for Tile Engine. ## Test Result All tests passed. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
75 lines
3.0 KiB
CMake
75 lines
3.0 KiB
CMake
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
|
# SPDX-License-Identifier: MIT
|
|
|
|
################################################################################
|
|
# CK Tile Test Organization
|
|
################################################################################
|
|
# CK Tile tests can be run using several methods:
|
|
#
|
|
# 1. Global test labels (run tests across all operations):
|
|
# - ninja smoke - Fast tests (~30s on gfx90a)
|
|
# - ninja regression - Slower comprehensive tests
|
|
# - ninja check - All available tests
|
|
#
|
|
# 2. Operation-specific umbrella targets (run all tests for a specific operation):
|
|
# - ninja ck_tile_gemm_tests - All basic GEMM tests
|
|
# - ninja ck_tile_gemm_block_scale_tests - All GEMM with block-scale quantization tests
|
|
# - ninja ck_tile_gemm_streamk_tests - All GEMM StreamK tests
|
|
# - ninja ck_tile_grouped_gemm_quant_tests - All grouped GEMM quantization tests
|
|
# - ninja ck_tile_reduce_tests - All reduce operation tests
|
|
# - ninja ck_tile_fmha_tests - All FMHA (Flash Attention) tests
|
|
#
|
|
# 3. Individual test executables:
|
|
# - ninja test_<test_name> - Build specific test executable
|
|
# - ./build/bin/test_<test_name> - Run specific test directly
|
|
#
|
|
# These umbrella targets are useful when working on specific operations to quickly
|
|
# validate all related tests without running the entire test suite.
|
|
################################################################################
|
|
|
|
add_subdirectory(image_to_column)
|
|
add_subdirectory(gemm)
|
|
add_subdirectory(gemm_persistent_async_input)
|
|
add_subdirectory(gemm_weight_preshuffle)
|
|
add_subdirectory(batched_gemm)
|
|
add_subdirectory(grouped_gemm)
|
|
add_subdirectory(grouped_gemm_preshuffle)
|
|
add_subdirectory(grouped_gemm_multi_d)
|
|
add_subdirectory(grouped_gemm_quant)
|
|
add_subdirectory(grouped_gemm_abquant)
|
|
add_subdirectory(gemm_multi_d)
|
|
add_subdirectory(gemm_multi_abd)
|
|
add_subdirectory(gemm_streamk)
|
|
add_subdirectory(data_type)
|
|
add_subdirectory(container)
|
|
add_subdirectory(elementwise)
|
|
# Not including these tests as there is a bug on gfx90a and gfx942
|
|
# resulting in "GPU core dump"
|
|
#add_subdirectory(moe_smoothquant)
|
|
add_subdirectory(permute)
|
|
add_subdirectory(moe_sorting)
|
|
add_subdirectory(slice_tile)
|
|
add_subdirectory(memory_copy)
|
|
add_subdirectory(batched_transpose)
|
|
add_subdirectory(smoothquant)
|
|
add_subdirectory(topk_softmax)
|
|
add_subdirectory(add_rmsnorm2d_rdquant)
|
|
# add_subdirectory(layernorm2d)
|
|
# add_subdirectory(rmsnorm2d)
|
|
add_subdirectory(gemm_block_scale)
|
|
add_subdirectory(flatmm)
|
|
add_subdirectory(gemm_mx)
|
|
add_subdirectory(utility)
|
|
add_subdirectory(warp_gemm)
|
|
add_subdirectory(reduce)
|
|
add_subdirectory(core)
|
|
add_subdirectory(epilogue)
|
|
add_subdirectory(atomic_add_op)
|
|
add_subdirectory(fmha)
|
|
# TODO: The Universal GEMM tile engine test will be either removed
|
|
# or moved to the appropriate location in future work.
|
|
# add_subdirectory(gemm_tile_engine)
|
|
add_subdirectory(pooling)
|
|
add_subdirectory(grouped_conv)
|
|
add_subdirectory(pooling_tile_engine)
|