Commit Graph

12 Commits

Author SHA1 Message Date
Khushbu Agarwal
f3120e7526 Merge flatmm Operator with universal gemm (#2434)
* Initial commit

* Adding new tile partitioner to flatmm

* intermediate changes

* debugging kernels

* Updating flatmm example to universal gemm example

* updated flatmm kernel to run via gemmKernel

* update universal gemm to incorporate flatmm

* debug

* Fix flatmm call

* Fixing other kernels and tests for API changes

* clang formatted

* fixing gemm tests

* added test for flatmm and simplify kernel arguments

* adding flatmm test

* fix test for flatmm

* simplify gemm kernel with flatmm

* remove flatmm related files

* addressing review comments and code clean up

* resolving empty file

* resolving empty file

* clang formatted

* addressing review comments

* enable persistent kernel for flatmm

* reverted the removed files for flatmm

* reverted the removed files for flatmm

* changed flatmm to weightPReshuffle; removed the _1 added in teh faltmm example

* some more renames

* clang formatted

[ROCm/composable_kernel commit: d239b91fd5]
2025-07-11 08:27:55 -07:00
linqunAMD
d2ec53a74e [CK_TILE] Refine fp8 support in flatmm (#2239)
* [CK_TILE] Refine fp8 in flatmm

1. Replace USING_MFMA_16x16x32 & USING_MFMA_16x16x32 with constexpr
2. Add an additional const check to avoid build error in HotLoopScheduler
3. Refine shuffleb to support both tile 32x32 and 16x16
4. Support command option -init
5. Move Gemm warp defintion to a separate struct

* fix clang format

* fix clang format

* keep default bhavior unchanged (warp tile = 16x16)

* fix tile engine build error

* fix a typo in codegen_utils.py

* address review comments

* address review comments

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>

[ROCm/composable_kernel commit: 37e1a27537]
2025-06-25 01:07:45 -07:00
Thomas Ning
cb631cd5b1 Fix the CK Tile related operators (#2356)
* fix the flatmm

* Fix the pipeline

* address the comment

[ROCm/composable_kernel commit: 3c4cdfac4f]
2025-06-16 17:38:52 -07:00
Illia Silin
fba2ac98ae Revert "fix the flatmm (#2349)" (#2352)
This reverts commit 37f25427f2ec6c907bf12030338f69945691f54c.

[ROCm/composable_kernel commit: 5523df4b2d]
2025-06-16 07:54:55 -07:00
Thomas Ning
6f158242b6 fix the flatmm (#2349)
[ROCm/composable_kernel commit: d996bc78be]
2025-06-16 02:17:53 -07:00
Khushbu Agarwal
bdb185a509 fix flatmm kernel for bigger size for fp16 datatype (#2302)
[ROCm/composable_kernel commit: bd270fe4bc]
2025-06-10 11:13:40 -07:00
Khushbu Agarwal
42ace38c07 Rotating buffer PR CI fix (#2257)
* Revert "Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)"

This reverts commit 2c31e1e716b20a268cc6ffca4af7cc5ecbe44e3f.

* fix regression

[ROCm/composable_kernel commit: 2e38eb4f1c]
2025-06-02 10:25:01 -07:00
Illia Silin
fa9625d940 Revert "[CK_tile] Add rotating buffer feature for universal gemm (#2200)" (#2256)
This reverts commit b021b5f1d3ae599305e0b455035a6e01ad81fe23.

[ROCm/composable_kernel commit: bbdaf79a52]
2025-05-28 09:46:52 -06:00
Khushbu Agarwal
2ca6f22fab [CK_tile] Add rotating buffer feature for universal gemm (#2200)
* Add rotating buffer feature for universal gemm

* adding changes in tile_engine

* Updated code to merge kernel_launch

* removing comments

* Enable rotating buffer changes to flatmm

* Created diff launch_kernel function for rotating buffer

* Simplfied calculation using macros

* merge code with new changes in tile_engine

* clang formatted

* Redefine macros

[ROCm/composable_kernel commit: 99857e10e6]
2025-05-27 23:00:58 -07:00
Aviral Goel
bfdffc4704 Add catch blocks in example GEMM apps to enable better error handling (Issue: 1928) (#2234)
* added catch statements to examples

* clang format

[ROCm/composable_kernel commit: c52649ad57]
2025-05-27 22:32:42 -07:00
BingYuan.Zhou
2428249bc7 Flatmm merge (#2168)
* sync with function interface of cshuffleepiloge,fix flatmm build fail

* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm

---------

Co-authored-by: solin <bingzhou@amd.com>

[ROCm/composable_kernel commit: 6a3960c1e1]
2025-05-08 12:59:57 +08:00
BingYuan.Zhou
f6b51a1c7b [flatmm] implement basic fp16 flatmm (#2089)
* [flatmm] implement basic fp16 flatmm

* fix CI build fail

---------

Co-authored-by: root <root@hjbog-srdc-50.amd.com>
Co-authored-by: solin <bingzhou@amd.com>

[ROCm/composable_kernel commit: eaf1f0bf3b]
2025-04-16 16:51:17 +08:00