Feng Shijie
ff4d6434d9
optimize grid calculation in contiguous mode
2025-07-09 08:24:16 +00:00
Feng Shijie
fae4ebac66
Add support for contiguous grouped gemm
2025-07-09 07:03:30 +00:00
Feng Shijie
0deeba90e6
prune debug code
2025-07-04 08:06:43 +00:00
Feng Shijie
bce9c22bcd
Improve gridDim calculation in persistent mode
2025-07-04 07:09:58 +00:00
Feng Shijie
10fe5ab7b5
Add group-flatmm
2025-07-04 03:37:43 +00:00
root
0ef8ad46c0
1.control main loop pipeline by C code instead of hotloop. 2.seperate 2 lds bank
2025-05-26 09:48:58 -05:00
BingYuan.Zhou
6a3960c1e1
Flatmm merge ( #2168 )
...
* sync with function interface of cshuffleepiloge,fix flatmm build fail
* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-05-08 12:59:57 +08:00
BingYuan.Zhou
eaf1f0bf3b
[flatmm] implement basic fp16 flatmm ( #2089 )
...
* [flatmm] implement basic fp16 flatmm
* fix CI build fail
---------
Co-authored-by: root <root@hjbog-srdc-50.amd.com >
Co-authored-by: solin <bingzhou@amd.com >
2025-04-16 16:51:17 +08:00