root
56f84349ca
1.fix loop_num=odd bug 2.optimize mi300 performance of big MNK(tilesize 128x128x128) 3.optimize decode perf on mi300
2025-06-25 20:41:25 -05:00
AMD-dteng
07b579d1dd
fix a bug that K_Warp_Tile not match in two files.
2025-06-18 19:51:57 -05:00
root
a6f711de98
support even odd k in tail
2025-06-17 02:31:54 -05:00
root
c097f90adf
first version
2025-05-26 04:45:18 -05:00
solin
c2d17ec24f
draft for 16*16*128 fp8
2025-05-14 09:56:13 +00:00
BingYuan.Zhou
6a3960c1e1
Flatmm merge ( #2168 )
...
* sync with function interface of cshuffleepiloge,fix flatmm build fail
* move code from solin/flatmm which add mfma16*16*32fp8 and optimize flatmm
---------
Co-authored-by: solin <bingzhou@amd.com >
2025-05-08 12:59:57 +08:00
BingYuan.Zhou
eaf1f0bf3b
[flatmm] implement basic fp16 flatmm ( #2089 )
...
* [flatmm] implement basic fp16 flatmm
* fix CI build fail
---------
Co-authored-by: root <root@hjbog-srdc-50.amd.com >
Co-authored-by: solin <bingzhou@amd.com >
2025-04-16 16:51:17 +08:00