Commit Graph

538 Commits

Author SHA1 Message Date
AviralGoelAMD
aaa757e363 fixed function and struct names 2025-04-24 14:24:01 +00:00
AviralGoelAMD
b26f00cab2 clang format 2025-04-15 14:50:51 +00:00
AviralGoelAMD
094c3696b7 commented the 01_add example 2025-04-15 14:45:34 +00:00
AviralGoelAMD
40058afae6 added explanation comments 2025-04-15 14:35:19 +00:00
AviralGoelAMD
0fd0e7b706 added a 1d vector elementwise example 2025-04-15 14:12:04 +00:00
Clement Lin
67bd9e4bb3 Add generate.py for codegen 2025-04-14 18:13:44 +08:00
YC Lin
a664958e9d [GEMM] Remove redundant GetBlockGemm 2025-04-14 03:29:27 +00:00
YC Lin
b5e48d5459 [GEMM] Implement local prefetch and refactor block gemm pipeline 2025-04-13 02:54:44 +00:00
Clement Lin
08b175fc91 Refactor flash_attention_fwd_traits_ for codegen 2025-04-12 02:17:53 +08:00
YC Lin
b7eedac71a [GEMM] Merge universal_block_gemm into block_gemm 2025-04-11 16:09:29 +00:00
mhYang
44eaa337f6 Fix flash attention 1 tile case 2025-04-11 15:44:53 +00:00
Clement Lin
bfadc59277 Refactor FlashAttnArgs usage for codegen 2025-04-11 13:46:25 +08:00
Clement Lin
f3db7e4fb7 Add codegen test example 2025-04-10 22:30:59 +08:00
YC Lin
6fdf2bd896 [GEMM] Refactor GetStaticLdsSize and remove GetSmemSize 2025-04-10 14:22:22 +00:00
Clement Lin
04199bc0aa Fix indentation 2025-04-10 09:13:12 +08:00
Clement Lin
3e61925277 Remove unused code 2025-04-09 15:08:09 +08:00
YC Lin
fd26846d61 [GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt 2025-04-09 03:19:10 +00:00
YC Lin
fe61498468 [Add] Add build option for generating assembly 2025-04-06 23:50:26 +00:00
YC Lin
aac02a92ac [GEMM] Refactor block gemm and pipeline policy of instruction schedule 2025-04-06 23:37:29 +00:00
Clement Lin
9151a1fb42 Add flash_attention_fwd toy_example 2025-04-04 00:39:02 +08:00
mhYang
f28aabb42f Update tile size and use slc 2025-04-02 12:23:23 +00:00
mhYang
d5531ab9c9 Fix add flops calculation 2025-04-01 19:13:18 +00:00
ClementLinCF
04513ca683 Create README.md 2025-04-01 09:11:29 +08:00
mhYang
d1dbc69eda Use mfma 16x16x32 2025-03-31 23:18:22 +00:00
mhYang
ee28e965f2 Fix KERNEL_D config 2025-03-31 17:53:57 +00:00
YC Lin
68cd6609eb [GEMM] Add pragma message for different MFMA options 2025-03-30 20:05:35 +00:00
YC Lin
a8027a5b2f [GEMM] Fix print typos 2025-03-30 19:55:13 +00:00
Clement Lin
5af7efdec5 Fix indentation typo 2025-03-30 15:06:07 +08:00
Clement Lin
de9385ba51 [GEMM] Fix MFMA condition checks 2025-03-30 14:02:30 +08:00
Clement Lin
5dd8e4ae0c [GEMM] Add new macor options check 2025-03-30 10:07:21 +08:00
Clement Lin
7bc473835e [GEMM] Add macros for multiple optimization options 2025-03-29 22:58:51 +08:00
YC Lin
428bcdeb40 [GEMM] default MFMA config 2025-03-29 21:11:29 +00:00
YC Lin
9a0d9dfc0a git push test 2025-03-29 21:09:00 +00:00
root
a3c6ca1761 [GEMM] fix MFMA configurations 2025-03-29 21:05:21 +00:00
mhYang
16a4e1585a Adjust mfma schedule order 2025-03-28 18:32:12 +00:00
Clement Lin
5428f17ca2 [GEMM] Replace const auto with constexpr index_t 2025-03-28 17:39:49 +08:00
Clement Lin
4eb246f20c [GEMM] Update cache-aware wg schedule 2025-03-28 17:31:19 +08:00
bobofang
8f1aa6fc6f Add MFMA M16N16K16 and M16N16K32 methods
these two methods are default off
2025-03-28 16:31:53 +00:00
YC Lin
f1562d5911 [GEMM] remove a_col_major/b_row_majro case 2025-03-26 16:31:24 +00:00
root
6d2b55914e [GEMM] modify if-else locations 2025-03-26 13:35:00 +00:00
mhYang
9ecde871a3 Fix AccDataType and CDataType
1. Fix AccDataType and CDataType
2. Remove indent
3. Align merge_transform for tutorial
2025-03-25 20:08:59 +00:00
mhYang
67072b3ba9 Fix build error 2025-03-25 18:55:34 +00:00
root
7d08f99b02 [GEMM] disable/enable instruction scheduling 2025-03-25 16:53:20 +00:00
mhYang
4494c54dcd Fix missing message 2025-03-21 15:06:29 +00:00
mhYang
8f3b534d29 Fix xor transform dim. 2025-03-21 15:00:05 +00:00
Clement Lin
1f604e9b0a [GEMM] Add cache-aware WG schedule and adjust block tile
113 -> 121.7 TFops
2025-03-21 09:15:17 +08:00
mhYang
93193e42ea Add LDS bank conlict solutions 2025-03-20 21:36:56 +00:00
bobofang
d635209d59 Fix add accuracy issue
2673 GB/s -> 3271 GB/s
Perf: 0.0512898 ms, 3271.06 GB/s
2025-03-19 12:26:30 +08:00
root
ff15e2da7a [GEMM] use mfma k8 warp gemm 2025-03-17 16:01:29 +00:00
root
10033c1cdc [GEMM] disable/enable prefetch 2025-03-17 14:22:49 +00:00