Commit Graph

680 Commits

Author SHA1 Message Date
MHYang
4a264eb9ed Fix register spilling and K0 tile size issues 2025-07-28 14:54:51 -04:00
YC Lin
eb737b8f82 [GEMM] Fix num_loop issues 2025-07-28 14:54:51 -04:00
Clement Lin
f71b2c7e55 Add generate.py for codegen 2025-07-28 14:54:51 -04:00
YC Lin
9e020272c4 [GEMM] Remove redundant GetBlockGemm 2025-07-28 14:54:51 -04:00
YC Lin
4e6a792b82 [GEMM] Implement local prefetch and refactor block gemm pipeline 2025-07-28 14:54:51 -04:00
Clement Lin
cbc660acc7 Refactor flash_attention_fwd_traits_ for codegen 2025-07-28 14:54:51 -04:00
YC Lin
f30071289c [GEMM] Merge universal_block_gemm into block_gemm 2025-07-28 14:54:51 -04:00
mhYang
26b73c0ed1 Fix flash attention 1 tile case 2025-07-28 14:54:51 -04:00
Clement Lin
e614acfdd8 Refactor FlashAttnArgs usage for codegen 2025-07-28 14:54:51 -04:00
Clement Lin
0cc5130818 Add codegen test example 2025-07-28 14:54:51 -04:00
YC Lin
5b1c397806 [GEMM] Refactor GetStaticLdsSize and remove GetSmemSize 2025-07-28 14:54:51 -04:00
Clement Lin
2499b8d401 Fix indentation 2025-07-28 14:54:51 -04:00
Clement Lin
d98fb3e0b5 Remove unused code 2025-07-28 14:54:51 -04:00
YC Lin
ae275aa105 [GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt 2025-07-28 14:54:51 -04:00
YC Lin
6113ca8062 [Add] Add build option for generating assembly 2025-07-28 14:54:51 -04:00
YC Lin
97a960042b [GEMM] Refactor block gemm and pipeline policy of instruction schedule 2025-07-28 14:54:51 -04:00
Clement Lin
8785e6599e Add flash_attention_fwd toy_example 2025-07-28 14:54:51 -04:00
mhYang
a949b82c9f Update tile size and use slc 2025-07-28 14:54:51 -04:00
mhYang
9158612a9f Fix add flops calculation 2025-07-28 14:54:51 -04:00
ClementLinCF
88a4c7414f Create README.md 2025-07-28 14:54:51 -04:00
mhYang
ac972bfd11 Use mfma 16x16x32 2025-07-28 14:54:51 -04:00
mhYang
5326d403e4 Fix KERNEL_D config 2025-07-28 14:54:51 -04:00
YC Lin
fe319b97ae [GEMM] Add pragma message for different MFMA options 2025-07-28 14:54:51 -04:00
YC Lin
76751567b5 [GEMM] Fix print typos 2025-07-28 14:54:51 -04:00
Clement Lin
4c526ab140 Fix indentation typo 2025-07-28 14:54:51 -04:00
Clement Lin
5b10e9f3dd [GEMM] Fix MFMA condition checks 2025-07-28 14:54:51 -04:00
Clement Lin
a95665a6af [GEMM] Add new macor options check 2025-07-28 14:54:51 -04:00
Clement Lin
1099762267 [GEMM] Add macros for multiple optimization options 2025-07-28 14:54:51 -04:00
YC Lin
890a159877 [GEMM] default MFMA config 2025-07-28 14:54:51 -04:00
YC Lin
8d75ae7c96 git push test 2025-07-28 14:54:51 -04:00
root
a36d246cc0 [GEMM] fix MFMA configurations 2025-07-28 14:54:51 -04:00
mhYang
15e6f36f66 Adjust mfma schedule order 2025-07-28 14:54:51 -04:00
Clement Lin
e9f7c9bf42 [GEMM] Replace const auto with constexpr index_t 2025-07-28 14:54:51 -04:00
Clement Lin
cef77c1dcb [GEMM] Update cache-aware wg schedule 2025-07-28 14:54:51 -04:00
bobofang
127e742e96 Add MFMA M16N16K16 and M16N16K32 methods
these two methods are default off
2025-07-28 14:54:51 -04:00
YC Lin
e866f814f9 [GEMM] remove a_col_major/b_row_majro case 2025-07-28 14:54:51 -04:00
root
bf69235cfb [GEMM] modify if-else locations 2025-07-28 14:54:51 -04:00
mhYang
ba8b5112c4 Fix AccDataType and CDataType
1. Fix AccDataType and CDataType
2. Remove indent
3. Align merge_transform for tutorial
2025-07-28 14:54:51 -04:00
mhYang
d6fd468603 Fix build error 2025-07-28 14:54:51 -04:00
root
b3986c32a6 [GEMM] disable/enable instruction scheduling 2025-07-28 14:54:51 -04:00
mhYang
42f2e21865 Fix missing message 2025-07-28 14:54:51 -04:00
mhYang
38ce4dd8c3 Fix xor transform dim. 2025-07-28 14:54:51 -04:00
Clement Lin
b03668fe8a [GEMM] Add cache-aware WG schedule and adjust block tile
113 -> 121.7 TFops
2025-07-28 14:54:51 -04:00
mhYang
39ca852330 Add LDS bank conlict solutions 2025-07-28 14:54:51 -04:00
bobofang
22147ace51 Fix add accuracy issue
2673 GB/s -> 3271 GB/s
Perf: 0.0512898 ms, 3271.06 GB/s
2025-07-28 14:54:51 -04:00
root
d7d9fdaf1b [GEMM] use mfma k8 warp gemm 2025-07-28 14:54:51 -04:00
root
1b8d7cd1b9 [GEMM] disable/enable prefetch 2025-07-28 14:54:50 -04:00
Clement Lin
6a2036015e [CK TILE] Toy example - basic gemm 2025-07-28 14:54:50 -04:00
Clement Lin
077056b32d Adjust block shape
2673 GB/s -> 3647 GB/s
2025-07-28 14:54:50 -04:00
Clement Lin
2ff691f3f2 Utilize vectorized memory access
1998.24 GB/s -> 2673 GB/s
2025-07-28 14:54:50 -04:00