Commit Graph

7 Commits

Author SHA1 Message Date
Khushbu Agarwal
7795e976da Support for MFMA_16x16x128 for fp8/bf8 (#2125)
* Adding 16x16x128 support for gfx950

* Support for fp8 and bf8

* fix input arguments for MFMA scale instruction

* clang-formatted

* Fixes for lwpck-3145 (#2138)

* Fix lds tile & cmake dep & default epilogue

* Fallback BTypeToUse to ADataType in WOQ cases

* reverting instance json file

* reverting instance json file

---------

Co-authored-by: Yi DING <yi.ding@amd.com>

[ROCm/composable_kernel commit: d107f3c3a5]
2025-04-28 18:19:50 -07:00
Khushbu Agarwal
a75ab12f3a [TileEngine] Support for sparsity in codegen (#2128)
* Added sparsity flag in codegen

* remove comments

* clan formatted

* added sparsity as runtime argument

* updated README

* updated stream config variable

* fix typo for tail_num in hot loop

[ROCm/composable_kernel commit: 768c99eca9]
2025-04-28 18:19:23 -07:00
Khushbu Agarwal
03cdc5602a Adding include directory in tile_engine (#2116)
[ROCm/composable_kernel commit: 94662b02d0]
2025-04-22 15:55:19 -07:00
Khushbu Agarwal
790dfe9bcd multi instance generation for CkTileEngine (#2080)
* Add support for multi-instance verification, print detail for each instance, documentation fix

* clang formatted

* Added Readme file

* updated readme

* Addressing review comments

* clang formatted

* Updated ReadMe and GPU reference code

* simplified dispatch kernel code

* indentation

[ROCm/composable_kernel commit: 7cadf187e2]
2025-04-21 08:39:45 -07:00
Khushbu Agarwal
50c53c7252 file clang formatted (#2053)
[ROCm/composable_kernel commit: 3bda57c204]
2025-04-03 16:55:49 -07:00
Khushbu Agarwal
9b9f33d37e Documentation for newly added struct (#2051)
[ROCm/composable_kernel commit: b443056a26]
2025-04-03 16:24:34 -07:00
Khushbu Agarwal
eee09ecdb3 [New] Build up the feature of CK Tile GEMM CodeGen (#1994)
* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* New branch for codegen changes

* Fix verify function for int4

* pk_int4 codegen

* Update to review comments

* Remove codegen directory and rename filenames

* Remove extra files; clean up CMake file

* code changes for single instance

* config file rename, added few more combinations in json file

* Fix cmake file

* Addressing review comments

* Reverting files changed by merge to develop

---------

Co-authored-by: ThomasNing <thomas.ning@amd.com>

[ROCm/composable_kernel commit: fed0709121]
2025-04-03 11:54:12 -07:00