## Motivation
Ensuring that tile engine benchmarking does not build by default and
slow other developers.
## Technical Details
- Added EXCLUDE_FROM_ALL to all add_subdirectory calls in
tile_engine/CMakeLists.txt and ops/gemm/CMakeLists.txt, so none of the
tile engine ops targets are part of the default all build.
- Added missing EXCLUDE_FROM_ALL to add_executable in
ops/pooling/CMakeLists.txt and ops/reduce/CMakeLists.txt (the GEMM
variants already had it).
- Downgraded message(STATUS ...) to message(VERBOSE ...) (or DEBUG for
per-target creation) in ops/pooling/, ops/gemm_streamk/, and ops/reduce/
CMakeLists. The other four GEMM variants (gemm_universal, gemm_multi_d,
gemm_preshuffle, grouped_gemm) already used VERBOSE.
- Targets can still be built on demand via their aggregate names (e.g.
make benchmark_pooling_all, make benchmark_gemm_streamk_all).
## Test Plan
Tile engine benchmark testing stage should be unaffected.
## Test Result
N/A
## Submission Checklist
- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
## Motivation
The grouped_gemm CK Tile kernel exists (e.g.,
`example/17_grouped_gemm/`) but has no Tile Engine wrapper. Grouped GEMM
handles multiple independent GEMM problems with varying M/N/K dimensions
in a single kernel launch. This PR adds the Tile Engine infrastructure
for automated kernel generation, benchmarking, and profiling of grouped
GEMM kernels.
Jira: AICK-809
## Technical Details
- Created Tile Engine wrapper under `tile_engine/ops/gemm/grouped_gemm/`
following the `gemm_universal` template
- Files added: `CMakeLists.txt`, `grouped_gemm_common.hpp`,
`grouped_gemm_benchmark.hpp`, `grouped_gemm_profiler.hpp`,
`grouped_gemm_benchmark.py`, `grouped_gemm_benchmark_single.cpp`,
`grouped_gemm_instance_builder.py`, `configs/`
- Supported datatypes: fp16, fp8, bf16, bf8
- Supported layouts: rcr, rrr, ccr, crr
- Target GPUs: gfx942, gfx950
- CK Tile kernel: `ck_tile::GroupedGemmKernel` from
`include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp`
- Instance builder extends `GemmKernelBuilder` base class
- Registered in `tile_engine/ops/gemm/CMakeLists.txt`
- Updated Jenkinsfile to build and benchmark grouped_gemm targets in CI
- Benchmark infrastructure includes JSON output, CSV export, and
verification support
## Test Plan
- CMake configure succeeds for grouped_gemm targets
- Kernel instance builder generates valid kernel headers for all
(datatype, layout) combinations
- At least one kernel binary compiles and runs per datatype/layout
combination
- Correctness passes with `--verify 1` on gfx942/gfx950
## Test Result
## Submission Checklist
- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch
* fix cpp17 compile error in the ck-tile examples
---------
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Ding, Yi <yi.ding@amd.com>
[ROCm/composable_kernel commit: 79aae7c7f7]
* Renaming old code
* Adding GEMM code with new Architecture
* Partial Progress : Errors
* Partial Progress : Working code
* Changes to element wise function
* Removing Debugging statements
* Working GEMM Multi D code
* Removing Stale Code
* Address Copilot review comments
* Address Copilot review comments
* Changes to validation file
* Changes to common code snippets
* Creating common folder
* Removing duplicate files
* Pointing to right common file
* Pointing to right common file
* Pointing to right common file
* Changing to VERBOSE
* Changing CMAKE messages to verbose
* Updating Cmake with right layout datatype configs
* Working code for GEMM Multi D
[ROCm/composable_kernel commit: a33d98f8e2]
* Partial Progress : CK Tile Engine GEMM
* Partial Progress : CK Tile Engine GEMM
* Partial Progress : Working GEMM Code
* Partial Progress : Working GEMM Code
* Changinf jenkins to remove preshuffle
* Partial Progress : CK TILE ENGINE GEMM Debugging
* Partial Progress : Removing changes that are not GEMM
* Partial Progress : Validation of full block size in GEMM
* Changes in Jenkins to run only fp16 and bf16
* Addressing Review Comments
* Partial Progress : Addressing CI issues
* Partial Progress - Runing GEMM for fp16,bf16 and rcr
* Clang
* Adding fp8 and bf8
* Adding fp8 and bf8
* Adding additional architrcture
* Limited datatypes and layouts
* Adding k_block_per_cu in test config
* Changes to faling CI errors
* Changes to faling CI errors
* Validation for GEMM
* Adding Layout support
* Adding Validations
* Adding layout in jenkins
* Update on Jenkins
* Distribution validation for GEMM
* Resolving merge conflicts
* Solving merge conflicts
[ROCm/composable_kernel commit: 7fc0a38e90]
* Reading gpuname from target for gemm in ck tile engine
* Reading gpuname from target for gemm preshuffle in ck tile engine
* Reading gpuname from target for gemm preshuffle in ck tile engine
* Get GPU changes for GEMM Muti D in TILE ENGINE
* Addressing errors for gpu name in cktileengine
[ROCm/composable_kernel commit: 9f77061094]
* initial work on adding support of gfx12 in tile_engine for GEMM benchmarking
* add stage("Run TILE_ENGINE_GEMM Tests on gfx1201") to Jenkins config
* make tile_[m/n/k] validation arch dependent
[ROCm/composable_kernel commit: 592d73ad73]
* Making edits to identify individual compilation issues.
* Minor fix for blob txt files not being created.
* Fixing compilation issues.
* Fixing ordering bug.
* Adding python profiling functionality.
* Setting individual build as default.
* Setting gpu target filtering for tile engine to gfx90a, gfx942 and gfx950.
* update the default running parameters and settings
* Fixing bug with benchmarking, shifting file generation to build instead of config.
* Updating fixes.
* Fixing json output and parsing.
* Disable ccache for tile engine gemm ops because we dont need it.
* Removing duplicate type definition.
* Improving json printing.
* Add the flexibility of different layout and more warp tile support
* Fix extra flag in name of individual kernels.
* Fixing bug with booleans.
* Solve the first patch of the post merge conflict
* Compilation fixes, and cosmetic improvements.
* Yet again compilation fixes after latest changes from develop.
* Fixing python benchmarking script.
---------
Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>
Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>
[ROCm/composable_kernel commit: 705804d9bf]
* Modify CMakeLists to allow for splitting.
* Modify CMakeLists for data and layout logic.
* Run tests and get build artifact.
* Test new Cmakelists for speedup.
* Further improvements for speedup.
* turn off the FMHA
* turn off the automatic tile engine gemm
* minor fix
* disable the transpose test first
* Address the comment
* Jenkinsfile
* change the make thread to 64
* change the compile thread to 32
* Try to use with less OS memory space
* Have the Unity build batch size to 2
* reduce the chunk size
---------
Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>
[ROCm/composable_kernel commit: e5b79b26fa]
* Updating runtime log message for CK TILE ENGINE
* CKTile layout from config
* CKTile custom config for CI
* Documentation for Layout Changes
* CKTile Layout changes to Jenkins
* Fixing Clang Format
* Changes to Jenkins file to fix error
* fix(cmake-ck-dev): no longer sets invalid values as gpu arch
* style(py files): ruff formatting
* fix(cmake-ck-release): no longer sets invalid values as gpu arch
* chore(cmake-tile_engine): add reminder to uncomment user config json
* Changes to jenkin file to address more cases
* Changes to Jenkins to fix Error
* Changes to Jenkins file for fixing an error
* Update Jenkinsfile (#2517)
* Update Jenkinsfile
---------
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
[ROCm/composable_kernel commit: 0f3083ab5c]
* updates to support int8 in 03_gemm example
* added comments, using aliases, helper functions
* test(gemm_universal): add test cases for int8 gemm pipeline
* fix(test_gemm): fix for failing test unit test for int8
* test(ck_tile): add int8 unit test for gemm universal
* refactor(gemm_universal): GPU reference verification for GEMM code improved
* style(gemm_universal): removed extra comments and did clang format
* merging recent changes to universal gemm to tile_engine
* ck tile engine integration work
* feat(tile_engine): add int8 support to tile engine ops/gemm
* feat(tile_engine): added 32 32 16 mfma instances to tile engine for int8
* style: Format code with clang-format-12
* refactor(tile_engine): address review comments
* style: removed unhelpful comments & unused variables.
* build: tile engine uses default config
* feat: add int8 support for CK_TILE GEMM
* style: added trailing commas to codegen_utils.py
* refactor: tile engine
* refactor: formatting and code review
* refactor: code formatting for python files
* fix: suppress build warning
* add support for gfx950
* refactor:KWarpTile size in gemms util
* Fix the branch and wrap up the k warp tile
* Add bf8 integration
* refactor: clang format and rebase
---------
Co-authored-by: zjli2013 <leezhengjiang@gmail.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: Khushbu Agarwal <khuagarw@amd.com>
[ROCm/composable_kernel commit: e03293ebce]
* - elevate important build messages to log level STATUS
- comment out the rest (temporarily)
* - marked all low importance build messages as log_level=DEBUG
[ROCm/composable_kernel commit: aed0f5880c]
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* code changes for single instance
* config file rename, added few more combinations in json file
* Fix cmake file
* Addressing review comments
* Reverting files changed by merge to develop
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com>
[ROCm/composable_kernel commit: fed0709121]