Commit Graph

8 Commits

Author SHA1 Message Date
Feng Shijie
ff4d6434d9 optimize grid calculation in contiguous mode 2025-07-09 08:24:16 +00:00
Feng Shijie
fae4ebac66 Add support for contiguous grouped gemm 2025-07-09 07:03:30 +00:00
Feng Shijie
0deeba90e6 prune debug code 2025-07-04 08:06:43 +00:00
Feng Shijie
bce9c22bcd Improve gridDim calculation in persistent mode 2025-07-04 07:09:58 +00:00
Feng Shijie
10fe5ab7b5 Add group-flatmm 2025-07-04 03:37:43 +00:00
root
0ef8ad46c0 1.control main loop pipeline by C code instead of hotloop. 2.seperate 2 lds bank 2025-05-26 09:48:58 -05:00
BingYuan.Zhou
6a3960c1e1 Flatmm merge (#2168)
* sync with function interface of cshuffleepiloge,fix flatmm build fail

* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm

---------

Co-authored-by: solin <bingzhou@amd.com>
2025-05-08 12:59:57 +08:00
BingYuan.Zhou
eaf1f0bf3b [flatmm] implement basic fp16 flatmm (#2089)
* [flatmm] implement basic fp16 flatmm

* fix CI build fail

---------

Co-authored-by: root <root@hjbog-srdc-50.amd.com>
Co-authored-by: solin <bingzhou@amd.com>
2025-04-16 16:51:17 +08:00