Commit Graph

18 Commits

Author SHA1 Message Date
bobofang
127e742e96 Add MFMA M16N16K16 and M16N16K32 methods
these two methods are default off
2025-07-28 14:54:51 -04:00
YC Lin
e866f814f9 [GEMM] remove a_col_major/b_row_majro case 2025-07-28 14:54:51 -04:00
root
bf69235cfb [GEMM] modify if-else locations 2025-07-28 14:54:51 -04:00
mhYang
ba8b5112c4 Fix AccDataType and CDataType
1. Fix AccDataType and CDataType
2. Remove indent
3. Align merge_transform for tutorial
2025-07-28 14:54:51 -04:00
mhYang
d6fd468603 Fix build error 2025-07-28 14:54:51 -04:00
root
b3986c32a6 [GEMM] disable/enable instruction scheduling 2025-07-28 14:54:51 -04:00
mhYang
42f2e21865 Fix missing message 2025-07-28 14:54:51 -04:00
mhYang
38ce4dd8c3 Fix xor transform dim. 2025-07-28 14:54:51 -04:00
Clement Lin
b03668fe8a [GEMM] Add cache-aware WG schedule and adjust block tile
113 -> 121.7 TFops
2025-07-28 14:54:51 -04:00
mhYang
39ca852330 Add LDS bank conlict solutions 2025-07-28 14:54:51 -04:00
bobofang
22147ace51 Fix add accuracy issue
2673 GB/s -> 3271 GB/s
Perf: 0.0512898 ms, 3271.06 GB/s
2025-07-28 14:54:51 -04:00
root
d7d9fdaf1b [GEMM] use mfma k8 warp gemm 2025-07-28 14:54:51 -04:00
root
1b8d7cd1b9 [GEMM] disable/enable prefetch 2025-07-28 14:54:50 -04:00
Clement Lin
6a2036015e [CK TILE] Toy example - basic gemm 2025-07-28 14:54:50 -04:00
Clement Lin
077056b32d Adjust block shape
2673 GB/s -> 3647 GB/s
2025-07-28 14:54:50 -04:00
Clement Lin
2ff691f3f2 Utilize vectorized memory access
1998.24 GB/s -> 2673 GB/s
2025-07-28 14:54:50 -04:00
Clement Lin
078b5c68a0 Adjust the size of thread block
1968.42 GB/s -> 1998.24 GB/s
2025-07-28 14:54:50 -04:00
Clement Lin
8d205a9298 [CK TILE] Toy example - basic add 2025-07-28 14:54:50 -04:00