bobofang
|
127e742e96
|
Add MFMA M16N16K16 and M16N16K32 methods
these two methods are default off
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
e866f814f9
|
[GEMM] remove a_col_major/b_row_majro case
|
2025-07-28 14:54:51 -04:00 |
|
root
|
bf69235cfb
|
[GEMM] modify if-else locations
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
ba8b5112c4
|
Fix AccDataType and CDataType
1. Fix AccDataType and CDataType
2. Remove indent
3. Align merge_transform for tutorial
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
d6fd468603
|
Fix build error
|
2025-07-28 14:54:51 -04:00 |
|
root
|
b3986c32a6
|
[GEMM] disable/enable instruction scheduling
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
42f2e21865
|
Fix missing message
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
38ce4dd8c3
|
Fix xor transform dim.
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
b03668fe8a
|
[GEMM] Add cache-aware WG schedule and adjust block tile
113 -> 121.7 TFops
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
39ca852330
|
Add LDS bank conlict solutions
|
2025-07-28 14:54:51 -04:00 |
|
bobofang
|
22147ace51
|
Fix add accuracy issue
2673 GB/s -> 3271 GB/s
Perf: 0.0512898 ms, 3271.06 GB/s
|
2025-07-28 14:54:51 -04:00 |
|
root
|
d7d9fdaf1b
|
[GEMM] use mfma k8 warp gemm
|
2025-07-28 14:54:51 -04:00 |
|
root
|
1b8d7cd1b9
|
[GEMM] disable/enable prefetch
|
2025-07-28 14:54:50 -04:00 |
|
Clement Lin
|
6a2036015e
|
[CK TILE] Toy example - basic gemm
|
2025-07-28 14:54:50 -04:00 |
|
Clement Lin
|
077056b32d
|
Adjust block shape
2673 GB/s -> 3647 GB/s
|
2025-07-28 14:54:50 -04:00 |
|
Clement Lin
|
2ff691f3f2
|
Utilize vectorized memory access
1998.24 GB/s -> 2673 GB/s
|
2025-07-28 14:54:50 -04:00 |
|
Clement Lin
|
078b5c68a0
|
Adjust the size of thread block
1968.42 GB/s -> 1998.24 GB/s
|
2025-07-28 14:54:50 -04:00 |
|
Clement Lin
|
8d205a9298
|
[CK TILE] Toy example - basic add
|
2025-07-28 14:54:50 -04:00 |
|