Commit Graph

23 Commits

Author SHA1 Message Date
Vidyasagar Ananthan
356bcbb8cb [CK][CK_Tile] Ensure CK Tile engine benchmarking targets are excluded from default build. (#6135)
## Motivation

Ensuring that tile engine benchmarking does not build by default and
slow other developers.

## Technical Details

- Added EXCLUDE_FROM_ALL to all add_subdirectory calls in
tile_engine/CMakeLists.txt and ops/gemm/CMakeLists.txt, so none of the
tile engine ops targets are part of the default all build.

- Added missing EXCLUDE_FROM_ALL to add_executable in
ops/pooling/CMakeLists.txt and ops/reduce/CMakeLists.txt (the GEMM
variants already had it).

- Downgraded message(STATUS ...) to message(VERBOSE ...) (or DEBUG for
per-target creation) in ops/pooling/, ops/gemm_streamk/, and ops/reduce/
CMakeLists. The other four GEMM variants (gemm_universal, gemm_multi_d,
gemm_preshuffle, grouped_gemm) already used VERBOSE.

- Targets can still be built on demand via their aggregate names (e.g.
make benchmark_pooling_all, make benchmark_gemm_streamk_all).


## Test Plan

Tile engine benchmark testing stage should be unaffected. 

## Test Result

N/A

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-03 22:07:58 +00:00
Thrupti Raj Lakshmana Gowda
a3298014a4 [CK TILE ENGINE] Add grouped_gemm operator to Tile Engine (gfx942/gfx950) (#4996)
## Motivation

The grouped_gemm CK Tile kernel exists (e.g.,
`example/17_grouped_gemm/`) but has no Tile Engine wrapper. Grouped GEMM
handles multiple independent GEMM problems with varying M/N/K dimensions
in a single kernel launch. This PR adds the Tile Engine infrastructure
for automated kernel generation, benchmarking, and profiling of grouped
GEMM kernels.

Jira: AICK-809

## Technical Details

- Created Tile Engine wrapper under `tile_engine/ops/gemm/grouped_gemm/`
following the `gemm_universal` template
- Files added: `CMakeLists.txt`, `grouped_gemm_common.hpp`,
`grouped_gemm_benchmark.hpp`, `grouped_gemm_profiler.hpp`,
`grouped_gemm_benchmark.py`, `grouped_gemm_benchmark_single.cpp`,
`grouped_gemm_instance_builder.py`, `configs/`
- Supported datatypes: fp16, fp8, bf16, bf8
- Supported layouts: rcr, rrr, ccr, crr
- Target GPUs: gfx942, gfx950
- CK Tile kernel: `ck_tile::GroupedGemmKernel` from
`include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp`
- Instance builder extends `GemmKernelBuilder` base class
- Registered in `tile_engine/ops/gemm/CMakeLists.txt`
- Updated Jenkinsfile to build and benchmark grouped_gemm targets in CI
- Benchmark infrastructure includes JSON output, CSV export, and
verification support

## Test Plan

- CMake configure succeeds for grouped_gemm targets
- Kernel instance builder generates valid kernel headers for all
(datatype, layout) combinations
- At least one kernel binary compiles and runs per datatype/layout
combination
- Correctness passes with `--verify 1` on gfx942/gfx950

## Test Result



## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-10 18:58:37 -05:00
Thrupti Raj Lakshmana Gowda
2dacac9561 [TILE ENGINE] Restructure to Base class of GEMM (#3434)
[ROCm/composable_kernel commit: e22622f0ec]
2025-12-19 23:53:56 +08:00
Aviral Goel
0861395425 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313)
* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>

[ROCm/composable_kernel commit: 004784ef98]
2025-11-28 13:49:54 -08:00
Max Podkorytov
0ce4a61da5 [CK Tile] enable building examples by default (#3259)
* remove EXCLUDE_FROM_ALL from ck-tile examples
-> +15 min build time w/ 64 threads for a single arch

* fix cpp17 compile error in the ck-tile examples

---------

Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Ding, Yi <yi.ding@amd.com>

[ROCm/composable_kernel commit: 79aae7c7f7]
2025-11-26 16:24:44 -08:00
Thrupti Raj Lakshmana Gowda
10e844d93c [CK TILE ENGINE] GEMM Multi D Restructure (#3121)
* Renaming old code

* Adding GEMM code with new Architecture

* Partial Progress : Errors

* Partial Progress : Working code

* Changes to element wise function

* Removing Debugging statements

* Working GEMM Multi D code

* Removing Stale Code

* Address Copilot review comments

* Address Copilot review comments

* Changes to validation file

* Changes to common code snippets

* Creating common folder

* Removing duplicate files

* Pointing to right common file

* Pointing to right common file

* Pointing to right common file

* Changing to VERBOSE

* Changing CMAKE messages to verbose

* Updating Cmake with right layout datatype configs

* Working code for GEMM Multi D

[ROCm/composable_kernel commit: a33d98f8e2]
2025-10-31 12:02:46 -07:00
Thrupti Raj Lakshmana Gowda
f32ef6ed17 Ck tile engine gemm (#2982)
* Partial Progress : CK Tile Engine GEMM

* Partial Progress : CK Tile Engine GEMM

* Partial Progress : Working GEMM Code

* Partial Progress : Working GEMM Code

* Changinf jenkins to remove preshuffle

* Partial Progress : CK TILE ENGINE GEMM Debugging

* Partial Progress : Removing changes that are not GEMM

* Partial Progress : Validation of full block size in GEMM

* Changes in Jenkins to run only fp16 and bf16

* Addressing Review Comments

* Partial Progress : Addressing CI issues

* Partial Progress - Runing GEMM for fp16,bf16 and rcr

* Clang

* Adding fp8 and bf8

* Adding fp8 and bf8

* Adding additional architrcture

* Limited datatypes and layouts

* Adding k_block_per_cu in test config

* Changes to faling CI errors

* Changes to faling CI errors

* Validation for GEMM

* Adding Layout support

* Adding Validations

* Adding layout in jenkins

* Update on Jenkins

* Distribution validation for GEMM

* Resolving merge conflicts

* Solving merge conflicts

[ROCm/composable_kernel commit: 7fc0a38e90]
2025-10-27 21:11:13 -05:00
Thrupti Raj Lakshmana Gowda
20ef4380d7 Ck tile engine preshuffle (#2919)
* Partial Progress : Preshuffle working code for datatype

* Partial Progress : Preshuffle Cleanup

* Working code for default config with min max step

* Partial Progress : PermuteN implemented in validation

* Partial Progress : PermuteN changes in Preshuffle

* CK Tile Engine Preshuffle Complete

* CK TILE ENGINE : Preshuffle Layout validation

* CK Tile Engine Preshuffle Validation

* Preshuffle Validation check

* CK Tile Engine Preshuffle : Fixing Validation Cases

* Addressing PR review Comments

* Changes in config

* Addressing Review Comments

* Adding additional architecture in Jenkins

* Partial Progress : Selective Datatype and layouts

* Limited datatypes and layouts

* Addressing CI errors

* Datatype updates

* Datatype updates

* Datatype changes to Preshuffle

* Addressing Review Comments

* Addressing Review Comments

* Datatype changes

* Changes to Cmake

* Update on Jenkins

* Formatting with precommit

* Ruff Formatting

[ROCm/composable_kernel commit: 8b185e872e]
2025-10-27 09:15:34 -05:00
Thrupti Raj Lakshmana Gowda
9a2f0f82b4 Excluding Tile engine from build (#3085)
[ROCm/composable_kernel commit: 0fd7d1a607]
2025-10-23 12:57:18 -07:00
Thrupti Raj Lakshmana Gowda
61dbfdb27b [CK TILE ENGINE] Code changes to finding GPU id from TARGET (#3055)
* Reading gpuname from target for gemm in ck tile engine

* Reading gpuname from target for gemm preshuffle in ck tile engine

* Reading gpuname from target for gemm preshuffle in ck tile engine

* Get GPU changes for GEMM Muti D in TILE ENGINE

* Addressing errors for gpu name in cktileengine

[ROCm/composable_kernel commit: 9f77061094]
2025-10-20 09:02:18 -07:00
pmaybank
377a3da125 [CK_TILE] Add support for gfx12 in tile_engine for GEMM benchmarking (#2802)
* initial work on adding support of gfx12 in tile_engine for GEMM benchmarking
* add stage("Run TILE_ENGINE_GEMM Tests on gfx1201") to Jenkins config
* make tile_[m/n/k] validation arch dependent

[ROCm/composable_kernel commit: 592d73ad73]
2025-09-17 17:59:01 +01:00
Thomas Ning
2142e9efec Restructure the Tile Engine to have faster build time and clear config report (#2747)
* Making edits to identify individual compilation issues.

* Minor fix for blob txt files not being created.

* Fixing compilation issues.

* Fixing ordering bug.

* Adding python profiling functionality.

* Setting individual build as default.

* Setting gpu target filtering for tile engine to gfx90a, gfx942 and gfx950.

* update the default running parameters and settings

* Fixing bug with benchmarking, shifting file generation to build instead of config.

* Updating fixes.

* Fixing json output and parsing.

* Disable ccache for tile engine gemm ops because we dont need it.

* Removing duplicate type definition.

* Improving json printing.

* Add the flexibility of different layout and more warp tile support

* Fix extra flag in name of individual kernels.

* Fixing bug with booleans.

* Solve the first patch of the post merge conflict

* Compilation fixes, and cosmetic improvements.

* Yet again compilation fixes after latest changes from develop.

* Fixing python benchmarking script.

---------

Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>
Co-authored-by: Vidyasagar Ananthan <vanantha@amd.com>

[ROCm/composable_kernel commit: 705804d9bf]
2025-08-30 06:54:18 -07:00
Vidyasagar Ananthan
0a5c8d6e45 Setting gpu target filtering for tile engine to gfx90a, gfx942 and gfx950. (#2709)
[ROCm/composable_kernel commit: bf3e719c16]
2025-08-19 18:12:06 -07:00
Illia Silin
980abcc237 Revert "Reduce build time tile engine (#2579)" (#2623)
This reverts commit ffbc6fc8e5.

[ROCm/composable_kernel commit: 833ae1d051]
2025-08-05 09:27:55 -07:00
Thomas Ning
ffbc6fc8e5 Reduce build time tile engine (#2579)
* Modify CMakeLists to allow for splitting.

* Modify CMakeLists for data and layout logic.

* Run tests and get build artifact.

* Test new Cmakelists for speedup.

* Further improvements for speedup.

* turn off the FMHA

* turn off the automatic tile engine gemm

* minor fix

* disable the transpose test first

* Address the comment

* Jenkinsfile

* change the make thread to 64

* change the compile thread to 32

* Try to use with less OS memory space

* Have the Unity build batch size to 2

* reduce the chunk size

---------

Co-authored-by: Vidyasagar Ananthan <vidyasagar.ananthan@amd.com>

[ROCm/composable_kernel commit: e5b79b26fa]
2025-08-01 14:42:33 -07:00
Thrupti Raj Lakshmana Gowda
e020f23fab [CKTILE] Layout Support for CK Tile engine (#2482)
* Updating runtime log message for CK TILE ENGINE

* CKTile layout from config

* CKTile custom config for CI

* Documentation for Layout Changes

* CKTile Layout changes  to Jenkins

* Fixing Clang Format

* Changes to Jenkins file to fix error

* fix(cmake-ck-dev): no longer sets invalid values as gpu arch

* style(py files): ruff formatting

* fix(cmake-ck-release): no longer sets invalid values as gpu arch

* chore(cmake-tile_engine): add reminder to uncomment user config json

* Changes to jenkin file to address more cases

* Changes to Jenkins to fix Error

* Changes to Jenkins file for fixing an error

* Update Jenkinsfile (#2517)

* Update Jenkinsfile

---------

Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 0f3083ab5c]
2025-07-17 12:19:41 -07:00
Khushbu Agarwal
207baa02bb Enabling diff datatypes for tile_engine and build with more granularity (#2392)
* merging recent changes to universal gemm to tile_engine

* Reducing Linking time by generating less intermediate files

* make small libs to build faster

* Reducing the instances

* reducing instances

* Restoring default config

* Restoring default config

* warp_n reverted in default config

* Adding diff json files for fp8 and fp16, cmake changes for fp8

* Restructure the CMake File

* Added more granularity for build and some debugging code

* removed some of debugging statements

* added fp8 instances

* tahe datatype from command line to enable both type of json files

* updated README file

* code cleanup

* code cleanup

* updated jenkinsfile

* enable tile_engine daily builds

* updating cmake file

* updated CMakeLists.txt

* Updating CMake code fixing gfx12 build

* Updating CMake code fixing gfx12 build

* Fix CMake file null checks

* fixed traces of rebase

* Update tile_engine/ops/gemm/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update tile_engine/ops/gemm/README.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* fixing rebase issue

---------

Co-authored-by: khushbu <khuagarw@gmail.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

[ROCm/composable_kernel commit: a14753b86f]
2025-06-25 15:18:24 -07:00
Thomas Ning
753232ea70 [CK Tile] Int8 Support on CK Tile GEMM (#2267)
* updates to support int8 in 03_gemm example

* added comments, using aliases, helper functions

* test(gemm_universal): add test cases for int8 gemm pipeline

* fix(test_gemm): fix for failing test unit test for int8

* test(ck_tile): add int8 unit test for gemm universal

* refactor(gemm_universal): GPU reference verification for GEMM code improved

* style(gemm_universal): removed extra comments and did clang format

* merging recent changes to universal gemm to tile_engine

* ck tile engine integration work

* feat(tile_engine): add int8 support to tile engine ops/gemm

* feat(tile_engine): added 32 32 16 mfma instances to tile engine for int8

* style: Format code with clang-format-12

* refactor(tile_engine): address review comments

* style: removed unhelpful comments & unused variables.

* build: tile engine uses default config

* feat: add int8 support for CK_TILE GEMM

* style: added trailing commas to codegen_utils.py

* refactor: tile engine

* refactor: formatting and code review

* refactor: code formatting for python files

* fix: suppress build warning

* add support for gfx950

* refactor:KWarpTile size in gemms util

* Fix the branch and wrap up the k warp tile

* Add bf8 integration

* refactor: clang format and rebase

---------

Co-authored-by: zjli2013 <leezhengjiang@gmail.com>
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com>
Co-authored-by: Khushbu Agarwal <khuagarw@amd.com>

[ROCm/composable_kernel commit: e03293ebce]
2025-06-25 08:20:35 -07:00
Aviral Goel
9727cf5f62 Label CMakeLists message() as DEBUG or STATUS for clean build output (#2301)
* - elevate important build messages to log level STATUS
- comment out the rest (temporarily)

* - marked all low importance build messages as log_level=DEBUG

[ROCm/composable_kernel commit: aed0f5880c]
2025-06-10 10:46:47 -07:00
Kiefer van Teutem
caad9d56fc Explicitly set the LINKER_LANGUAGE for the gemm_template_instances target to avoid Ninja build config failure. (#2265)
Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com>

[ROCm/composable_kernel commit: 2215a9edf0]
2025-05-30 13:32:28 -07:00
Casey-Shi
64b17847fa [Tile Engine] Add benchmark for tile engine gemm. (#2193)
* initial commit -m benchmark

* only support profile

* fix

* fix doc

* add default config

* add ci

* fix cmake

* tmp save for gen blobs

* fix bug

* merge

* range config

* test success

* fix

* fix

* move struct

* remove config property

* fix config

* remove comment

* add cmake option & modify

* add changelog

* fix

* format

* add pydantic module to the docker image

* fix

* add benchmark for cold and warmp up

* python format

* add asm cache control

* fix README

* remove pydantic module

* modify changelog

* fix config

* recover benchmark_gemm and fix

* format python

* refactor profiler

* fix csv bug

* fix codegen bug

* add kernel instance object

* add benchmark gemm executable

* fix jenkins & delete extra header

* disable warning output & enable default config

* Disable sparsity for invalid warp tile combinations

* fix gemm host template func

* refactor gemm profiler

* filter out some inmstances

* default config test & fix codegen bug

* add sparse flag to gen more instances

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 128f5a1eab]
2025-05-26 22:32:36 -07:00
Khushbu Agarwal
7795e976da Support for MFMA_16x16x128 for fp8/bf8 (#2125)
* Adding 16x16x128 support for gfx950

* Support for fp8 and bf8

* fix input arguments for MFMA scale instruction

* clang-formatted

* Fixes for lwpck-3145 (#2138)

* Fix lds tile & cmake dep & default epilogue

* Fallback BTypeToUse to ADataType in WOQ cases

* reverting instance json file

* reverting instance json file

---------

Co-authored-by: Yi DING <yi.ding@amd.com>

[ROCm/composable_kernel commit: d107f3c3a5]
2025-04-28 18:19:50 -07:00
Khushbu Agarwal
eee09ecdb3 [New] Build up the feature of CK Tile GEMM CodeGen (#1994)
* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* code changes for single instance

* config file rename, added few more combinations in json file

* Fix cmake file

* Addressing review comments

* Reverting files changed by merge to develop

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: fed0709121]
2025-04-03 11:54:12 -07:00