Thrupti Raj Lakshmana Gowda
6a953648d1
Updating Runtime log for CK Tile Engine ( #2431 )
...
* Updating runtime log message for CK TILE ENGINE
* Fixing Clang Format
* Update tile_engine/ops/gemm/README.md
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
---------
Co-authored-by: ThruptiRajLakshmanaGowda <tlakshma@amd.com >
Co-authored-by: Aviral Goel <aviral.goel@amd.com >
[ROCm/composable_kernel commit: a03682cb80 ]
2025-07-01 10:59:49 -07:00
Khushbu Agarwal
d33891768a
Enabling diff datatypes for tile_engine and build with more granularity ( #2392 )
...
* merging recent changes to universal gemm to tile_engine
* Reducing Linking time by generating less intermediate files
* make small libs to build faster
* Reducing the instances
* reducing instances
* Restoring default config
* Restoring default config
* warp_n reverted in default config
* Adding diff json files for fp8 and fp16, cmake changes for fp8
* Restructure the CMake File
* Added more granularity for build and some debugging code
* removed some of debugging statements
* added fp8 instances
* tahe datatype from command line to enable both type of json files
* updated README file
* code cleanup
* code cleanup
* updated jenkinsfile
* enable tile_engine daily builds
* updating cmake file
* updated CMakeLists.txt
* Updating CMake code fixing gfx12 build
* Updating CMake code fixing gfx12 build
* Fix CMake file null checks
* fixed traces of rebase
* Update tile_engine/ops/gemm/README.md
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update tile_engine/ops/gemm/README.md
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* Update tile_engine/ops/gemm/README.md
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
* fixing rebase issue
---------
Co-authored-by: khushbu <khuagarw@gmail.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com >
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com >
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com >
[ROCm/composable_kernel commit: a14753b86f ]
2025-06-25 15:18:24 -07:00
Thomas Ning
90add28587
[CK Tile] Int8 Support on CK Tile GEMM ( #2267 )
...
* updates to support int8 in 03_gemm example
* added comments, using aliases, helper functions
* test(gemm_universal): add test cases for int8 gemm pipeline
* fix(test_gemm): fix for failing test unit test for int8
* test(ck_tile): add int8 unit test for gemm universal
* refactor(gemm_universal): GPU reference verification for GEMM code improved
* style(gemm_universal): removed extra comments and did clang format
* merging recent changes to universal gemm to tile_engine
* ck tile engine integration work
* feat(tile_engine): add int8 support to tile engine ops/gemm
* feat(tile_engine): added 32 32 16 mfma instances to tile engine for int8
* style: Format code with clang-format-12
* refactor(tile_engine): address review comments
* style: removed unhelpful comments & unused variables.
* build: tile engine uses default config
* feat: add int8 support for CK_TILE GEMM
* style: added trailing commas to codegen_utils.py
* refactor: tile engine
* refactor: formatting and code review
* refactor: code formatting for python files
* fix: suppress build warning
* add support for gfx950
* refactor:KWarpTile size in gemms util
* Fix the branch and wrap up the k warp tile
* Add bf8 integration
* refactor: clang format and rebase
---------
Co-authored-by: zjli2013 <leezhengjiang@gmail.com >
Co-authored-by: AviralGoelAMD <aviral.goel@amd.com >
Co-authored-by: Khushbu Agarwal <khuagarw@amd.com >
[ROCm/composable_kernel commit: e03293ebce ]
2025-06-25 08:20:35 -07:00
linqunAMD
d2ec53a74e
[CK_TILE] Refine fp8 support in flatmm ( #2239 )
...
* [CK_TILE] Refine fp8 in flatmm
1. Replace USING_MFMA_16x16x32 & USING_MFMA_16x16x32 with constexpr
2. Add an additional const check to avoid build error in HotLoopScheduler
3. Refine shuffleb to support both tile 32x32 and 16x16
4. Support command option -init
5. Move Gemm warp defintion to a separate struct
* fix clang format
* fix clang format
* keep default bhavior unchanged (warp tile = 16x16)
* fix tile engine build error
* fix a typo in codegen_utils.py
* address review comments
* address review comments
---------
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 37e1a27537 ]
2025-06-25 01:07:45 -07:00
Aviral Goel
54ded8c52f
Label CMakeLists message() as DEBUG or STATUS for clean build output ( #2301 )
...
* - elevate important build messages to log level STATUS
- comment out the rest (temporarily)
* - marked all low importance build messages as log_level=DEBUG
[ROCm/composable_kernel commit: aed0f5880c ]
2025-06-10 10:46:47 -07:00
Eisuke Kawashima
808cc61307
chore: unset executable permission ( #2303 )
...
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com >
[ROCm/composable_kernel commit: 4e586ca958 ]
2025-06-10 09:13:59 -07:00
Khushbu Agarwal
c395db8926
[CK_Tile] Fix gemm kernel for 4,64,16 and 64,4,16 warp tile sizes ( #2262 )
...
* debugging issue
* debugging issue
* debugging
* debugging
* reverting debugging code
* clang formatted
* updating default_config.json
* fix ci failure
* clang formatted
[ROCm/composable_kernel commit: 59a85cb4bc ]
2025-06-03 20:16:10 -07:00
Khushbu Agarwal
42ace38c07
Rotating buffer PR CI fix ( #2257 )
...
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200 )" (#2256 )"
This reverts commit 2c31e1e716b20a268cc6ffca4af7cc5ecbe44e3f.
* fix regression
[ROCm/composable_kernel commit: 2e38eb4f1c ]
2025-06-02 10:25:01 -07:00
Kiefer van Teutem
8f421515c0
Explicitly set the LINKER_LANGUAGE for the gemm_template_instances target to avoid Ninja build config failure. ( #2265 )
...
Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com >
[ROCm/composable_kernel commit: 2215a9edf0 ]
2025-05-30 13:32:28 -07:00
Casey-Shi
b7c31ca612
change from ninja to make ( #2253 )
...
[ROCm/composable_kernel commit: 29574f05f7 ]
2025-05-28 09:25:05 -07:00
Illia Silin
fa9625d940
Revert "[CK_tile] Add rotating buffer feature for universal gemm ( #2200 )" ( #2256 )
...
This reverts commit b021b5f1d3ae599305e0b455035a6e01ad81fe23.
[ROCm/composable_kernel commit: bbdaf79a52 ]
2025-05-28 09:46:52 -06:00
Casey-Shi
83c018bb92
fix type hint ( #2254 )
...
[ROCm/composable_kernel commit: 4286eae09a ]
2025-05-28 08:43:58 -07:00
Khushbu Agarwal
2ca6f22fab
[CK_tile] Add rotating buffer feature for universal gemm ( #2200 )
...
* Add rotating buffer feature for universal gemm
* adding changes in tile_engine
* Updated code to merge kernel_launch
* removing comments
* Enable rotating buffer changes to flatmm
* Created diff launch_kernel function for rotating buffer
* Simplfied calculation using macros
* merge code with new changes in tile_engine
* clang formatted
* Redefine macros
[ROCm/composable_kernel commit: 99857e10e6 ]
2025-05-27 23:00:58 -07:00
Casey-Shi
3bcbdd608e
[Tile Engine] Add benchmark for tile engine gemm. ( #2193 )
...
* initial commit -m benchmark
* only support profile
* fix
* fix doc
* add default config
* add ci
* fix cmake
* tmp save for gen blobs
* fix bug
* merge
* range config
* test success
* fix
* fix
* move struct
* remove config property
* fix config
* remove comment
* add cmake option & modify
* add changelog
* fix
* format
* add pydantic module to the docker image
* fix
* add benchmark for cold and warmp up
* python format
* add asm cache control
* fix README
* remove pydantic module
* modify changelog
* fix config
* recover benchmark_gemm and fix
* format python
* refactor profiler
* fix csv bug
* fix codegen bug
* add kernel instance object
* add benchmark gemm executable
* fix jenkins & delete extra header
* disable warning output & enable default config
* Disable sparsity for invalid warp tile combinations
* fix gemm host template func
* refactor gemm profiler
* filter out some inmstances
* default config test & fix codegen bug
* add sparse flag to gen more instances
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: khuagarw <khuagarw@amd.com >
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com >
[ROCm/composable_kernel commit: 128f5a1eab ]
2025-05-26 22:32:36 -07:00
Khushbu Agarwal
395262f196
Adding validation for tile sizes in Tile Engine ( #2189 )
...
* Adding validation for tile sizes
* Add architecture in config, and shuffle lines of code in warp_gemm.hpp
* Enable MFMA for gfx950, and invalid tile handling
[ROCm/composable_kernel commit: 3d8d6e75e4 ]
2025-05-15 10:28:31 -07:00
Khushbu Agarwal
699091846a
Disable SMFMA gfx90a ( #2184 )
...
* sparsity fix for gfx90a
* reverting tile_engine changes
[ROCm/composable_kernel commit: f05e45ba59 ]
2025-05-12 09:56:23 -07:00
Khushbu Agarwal
83180829b6
Disable SMFMA for gfx90a ( #2182 )
...
[ROCm/composable_kernel commit: ef72a4b9bc ]
2025-05-09 00:18:07 -07:00
Thomas Ning
9fa7eed532
Revert "Disable the SMFMA instruction for gfx90a. ( #2174 )" ( #2175 )
...
This reverts commit 95b372278a9a57ac2be99700105674f25e5a3554.
[ROCm/composable_kernel commit: c757046d49 ]
2025-05-08 00:07:03 -07:00
Khushbu Agarwal
f7db78cfcf
Disable the SMFMA instruction for gfx90a. ( #2174 )
...
* remove smfma for gfx90a
* clang formatted
[ROCm/composable_kernel commit: a32d907771 ]
2025-05-07 23:09:22 -07:00
Khushbu Agarwal
a86620111c
[CK_Tile] Simplified Mem pipeline ( #2159 )
...
* simplify code
* compiled the code
* Simplified example and codegen for mem pipeline
* Reveting config and universal gemm example
* clang formatted
* remove comments
* clang formatted
* Add memory operation changes for defualt pipeline
* fix config file
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
[ROCm/composable_kernel commit: c7b8e86e34 ]
2025-05-07 18:37:31 -07:00
Khushbu Agarwal
5d0b1b733c
mfma_32x32x64_fp8/bf8 ( #2148 )
...
* support for mfma_32x32x64_fp8
* clang-formatted
* Fixing sparsity in codegen
[ROCm/composable_kernel commit: d58f2b8bd0 ]
2025-05-01 13:36:24 -07:00
Aviral Goel
b537ae86b9
[Tile Engine] Improved README.md ( #2134 )
...
* improved tile_engine readme
* changed ck tile explanation and json
* further improved readme
* fixed typo
[ROCm/composable_kernel commit: 1aea51d34e ]
2025-04-29 17:37:07 -07:00
Khushbu Agarwal
aeb46e6a49
Support for MFMA_16x16x128 for fp8/bf8 ( #2125 )
...
* Adding 16x16x128 support for gfx950
* Support for fp8 and bf8
* fix input arguments for MFMA scale instruction
* clang-formatted
* Fixes for lwpck-3145 (#2138 )
* Fix lds tile & cmake dep & default epilogue
* Fallback BTypeToUse to ADataType in WOQ cases
* reverting instance json file
* reverting instance json file
---------
Co-authored-by: Yi DING <yi.ding@amd.com >
[ROCm/composable_kernel commit: d107f3c3a5 ]
2025-04-28 18:19:50 -07:00
Khushbu Agarwal
10188b5103
[TileEngine] Support for sparsity in codegen ( #2128 )
...
* Added sparsity flag in codegen
* remove comments
* clan formatted
* added sparsity as runtime argument
* updated README
* updated stream config variable
* fix typo for tail_num in hot loop
[ROCm/composable_kernel commit: 768c99eca9 ]
2025-04-28 18:19:23 -07:00
Khushbu Agarwal
c6b7f48326
Adding include directory in tile_engine ( #2116 )
...
[ROCm/composable_kernel commit: 94662b02d0 ]
2025-04-22 15:55:19 -07:00
Khushbu Agarwal
74210a9dfc
multi instance generation for CkTileEngine ( #2080 )
...
* Add support for multi-instance verification, print detail for each instance, documentation fix
* clang formatted
* Added Readme file
* updated readme
* Addressing review comments
* clang formatted
* Updated ReadMe and GPU reference code
* simplified dispatch kernel code
* indentation
[ROCm/composable_kernel commit: 7cadf187e2 ]
2025-04-21 08:39:45 -07:00
Khushbu Agarwal
09792fa112
file clang formatted ( #2053 )
...
[ROCm/composable_kernel commit: 3bda57c204 ]
2025-04-03 16:55:49 -07:00
Khushbu Agarwal
844730776f
Documentation for newly added struct ( #2051 )
...
[ROCm/composable_kernel commit: b443056a26 ]
2025-04-03 16:24:34 -07:00
Khushbu Agarwal
b85b103194
[New] Build up the feature of CK Tile GEMM CodeGen ( #1994 )
...
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* New branch for codegen changes
* Fix verify function for int4
* pk_int4 codegen
* Update to review comments
* Remove codegen directory and rename filenames
* Remove extra files; clean up CMake file
* code changes for single instance
* config file rename, added few more combinations in json file
* Fix cmake file
* Addressing review comments
* Reverting files changed by merge to develop
---------
Co-authored-by: ThomasNing <thomas.ning@amd.com >
[ROCm/composable_kernel commit: fed0709121 ]
2025-04-03 11:54:12 -07:00