Commit Graph

23 Commits

Author SHA1 Message Date
Khushbu Agarwal
7f9758a186 [CK_Tile] Fix gemm kernel for 4,64,16 and 64,4,16 warp tile sizes (#2262)
* debugging issue

* debugging issue

* debugging

* debugging

* reverting debugging code

* clang formatted

* updating default_config.json

* fix ci failure

* clang formatted

[ROCm/composable_kernel commit: 59a85cb4bc]
2025-06-03 20:16:10 -07:00
Khushbu Agarwal
2b6621fba8 Rotating buffer PR CI fix (#2257)
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)"

This reverts commit 7baac527a1.

* fix regression

[ROCm/composable_kernel commit: 2e38eb4f1c]
2025-06-02 10:25:01 -07:00
Kiefer van Teutem
caad9d56fc Explicitly set the LINKER_LANGUAGE for the gemm_template_instances target to avoid Ninja build config failure. (#2265)
Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com>

[ROCm/composable_kernel commit: 2215a9edf0]
2025-05-30 13:32:28 -07:00
Casey-Shi
105a5aedad change from ninja to make (#2253)
[ROCm/composable_kernel commit: 29574f05f7]
2025-05-28 09:25:05 -07:00
Illia Silin
7baac527a1 Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)
This reverts commit 0f77aa335d.

[ROCm/composable_kernel commit: bbdaf79a52]
2025-05-28 09:46:52 -06:00
Casey-Shi
f2fe8424ce fix type hint (#2254)
[ROCm/composable_kernel commit: 4286eae09a]
2025-05-28 08:43:58 -07:00
Khushbu Agarwal
0f77aa335d [CK_tile] Add rotating buffer feature for universal gemm (#2200)
* Add rotating buffer feature for universal gemm

* adding changes in tile_engine

* Updated code to merge kernel_launch

* removing comments

* Enable rotating buffer changes to flatmm

* Created diff launch_kernel function for rotating buffer

* Simplfied calculation using macros

* merge code with new changes in tile_engine

* clang formatted

* Redefine macros

[ROCm/composable_kernel commit: 99857e10e6]
2025-05-27 23:00:58 -07:00
Casey-Shi
64b17847fa [Tile Engine] Add benchmark for tile engine gemm. (#2193)
* initial commit -m benchmark

* only support profile

* fix

* fix doc

* add default config

* add ci

* fix cmake

* tmp save for gen blobs

* fix bug

* merge

* range config

* test success

* fix

* fix

* move struct

* remove config property

* fix config

* remove comment

* add cmake option & modify

* add changelog

* fix

* format

* add pydantic module to the docker image

* fix

* add benchmark for cold and warmp up

* python format

* add asm cache control

* fix README

* remove pydantic module

* modify changelog

* fix config

* recover benchmark_gemm and fix

* format python

* refactor profiler

* fix csv bug

* fix codegen bug

* add kernel instance object

* add benchmark gemm executable

* fix jenkins & delete extra header

* disable warning output & enable default config

* Disable sparsity for invalid warp tile combinations

* fix gemm host template func

* refactor gemm profiler

* filter out some inmstances

* default config test & fix codegen bug

* add sparse flag to gen more instances

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: khuagarw <khuagarw@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 128f5a1eab]
2025-05-26 22:32:36 -07:00
Khushbu Agarwal
8eda5a636d Adding validation for tile sizes in Tile Engine (#2189)
* Adding validation for tile sizes

* Add architecture in config, and shuffle lines of code in warp_gemm.hpp

* Enable MFMA for gfx950, and invalid tile handling

[ROCm/composable_kernel commit: 3d8d6e75e4]
2025-05-15 10:28:31 -07:00
Khushbu Agarwal
2f6345845e Disable SMFMA gfx90a (#2184)
* sparsity fix for gfx90a

* reverting tile_engine changes

[ROCm/composable_kernel commit: f05e45ba59]
2025-05-12 09:56:23 -07:00
Khushbu Agarwal
7ba9766b8b Disable SMFMA for gfx90a (#2182)
[ROCm/composable_kernel commit: ef72a4b9bc]
2025-05-09 00:18:07 -07:00
Thomas Ning
dc54bb272f Revert "Disable the SMFMA instruction for gfx90a. (#2174)" (#2175)
This reverts commit ad29f1b98a.

[ROCm/composable_kernel commit: c757046d49]
2025-05-08 00:07:03 -07:00
Khushbu Agarwal
ad29f1b98a Disable the SMFMA instruction for gfx90a. (#2174)
* remove smfma for gfx90a

* clang formatted

[ROCm/composable_kernel commit: a32d907771]
2025-05-07 23:09:22 -07:00
Khushbu Agarwal
27c027c591 [CK_Tile] Simplified Mem pipeline (#2159)
* simplify code

* compiled the code

* Simplified example and codegen for mem pipeline

* Reveting config and universal gemm example

* clang formatted

* remove comments

* clang formatted

* Add memory operation changes for defualt pipeline

* fix config file

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: c7b8e86e34]
2025-05-07 18:37:31 -07:00
Khushbu Agarwal
d4d98d5c34 mfma_32x32x64_fp8/bf8 (#2148)
* support for mfma_32x32x64_fp8

* clang-formatted

* Fixing sparsity in codegen

[ROCm/composable_kernel commit: d58f2b8bd0]
2025-05-01 13:36:24 -07:00
Aviral Goel
9d5b89ce8a [Tile Engine] Improved README.md (#2134)
* improved tile_engine readme

* changed ck tile explanation and json

* further improved readme

* fixed typo

[ROCm/composable_kernel commit: 1aea51d34e]
2025-04-29 17:37:07 -07:00
Khushbu Agarwal
7795e976da Support for MFMA_16x16x128 for fp8/bf8 (#2125)
* Adding 16x16x128 support for gfx950

* Support for fp8 and bf8

* fix input arguments for MFMA scale instruction

* clang-formatted

* Fixes for lwpck-3145 (#2138)

* Fix lds tile & cmake dep & default epilogue

* Fallback BTypeToUse to ADataType in WOQ cases

* reverting instance json file

* reverting instance json file

---------

Co-authored-by: Yi DING <yi.ding@amd.com>

[ROCm/composable_kernel commit: d107f3c3a5]
2025-04-28 18:19:50 -07:00
Khushbu Agarwal
a75ab12f3a [TileEngine] Support for sparsity in codegen (#2128)
* Added sparsity flag in codegen

* remove comments

* clan formatted

* added sparsity as runtime argument

* updated README

* updated stream config variable

* fix typo for tail_num in hot loop

[ROCm/composable_kernel commit: 768c99eca9]
2025-04-28 18:19:23 -07:00
Khushbu Agarwal
03cdc5602a Adding include directory in tile_engine (#2116)
[ROCm/composable_kernel commit: 94662b02d0]
2025-04-22 15:55:19 -07:00
Khushbu Agarwal
790dfe9bcd multi instance generation for CkTileEngine (#2080)
* Add support for multi-instance verification, print detail for each instance, documentation fix

* clang formatted

* Added Readme file

* updated readme

* Addressing review comments

* clang formatted

* Updated ReadMe and GPU reference code

* simplified dispatch kernel code

* indentation

[ROCm/composable_kernel commit: 7cadf187e2]
2025-04-21 08:39:45 -07:00
Khushbu Agarwal
50c53c7252 file clang formatted (#2053)
[ROCm/composable_kernel commit: 3bda57c204]
2025-04-03 16:55:49 -07:00
Khushbu Agarwal
9b9f33d37e Documentation for newly added struct (#2051)
[ROCm/composable_kernel commit: b443056a26]
2025-04-03 16:24:34 -07:00
Khushbu Agarwal
eee09ecdb3 [New] Build up the feature of CK Tile GEMM CodeGen (#1994)
* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* code changes for single instance

* config file rename, added few more combinations in json file

* Fix cmake file

* Addressing review comments

* Reverting files changed by merge to develop

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: fed0709121]
2025-04-03 11:54:12 -07:00