Commit Graph

178 Commits

Author SHA1 Message Date
MHYang
0e6a23258e Fix clang-format 2025-04-24 09:09:18 +00:00
MHYang
c4b2d5074a Implement prefetch and instruction schedule 2025-04-23 23:06:23 +00:00
Clement Lin
257a06ef54 Add the warp gemm option 2025-04-23 15:04:24 +08:00
Clement Lin
11d45259d4 Add more warp gemm policies for FA 2025-04-23 14:41:09 +08:00
Clement Lin
ce4061847b Remove unused code 2025-04-23 13:59:36 +08:00
YC Lin
ef085f402d [GEMM] Add define macro for unused a/b blk window 2025-04-23 13:27:32 +00:00
Clement Lin
35de33c57b Add codegen instances
The following examples have been tested for 04_codegen:

./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 256 256
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 64 64
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 32 32
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 2048 2048 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 512 512 128 128
2025-04-23 11:51:35 +08:00
BoboFang
068d9fdbf7 Add MakeBlock2TileMap in 04_codegen_flash_attention_fwd 2025-04-23 09:47:37 +00:00
BoboFang
1a91c220a1 Change the permission of FA CMakeList.txt to 644 2025-04-22 15:37:03 +00:00
BoboFang
0a1227a5e7 Fix error after clang-format 2025-04-22 15:24:55 +00:00
BoboFang
95a8ac00c6 Run clang-format in toy_example 2025-04-22 14:55:57 +00:00
MHYang
537c519224 Initialize instruction schedule 2025-04-22 14:47:23 +00:00
BoboFang
de097a54d6 Add cache-aware in flash attention 2025-04-22 13:55:21 +00:00
MHYang
30e4b12ef4 Merge fix for bank conflict into codegen FA 2025-04-21 16:50:56 +00:00
MHYang
252b72ec30 Fix bank conflict 2025-04-21 16:47:51 +00:00
YC Lin
8a6cc0e94b [GEMM] Fix bWarpTile issue and remove redundant pipeline in BlockGemmPipeline 2025-04-21 16:44:23 +00:00
MHYang
77a96c7a82 Fix register spilling and K0 tile size issues 2025-04-18 10:15:17 +00:00
YC Lin
918d5b21bc [GEMM] Fix num_loop issues 2025-04-16 03:06:54 +00:00
Clement Lin
67bd9e4bb3 Add generate.py for codegen 2025-04-14 18:13:44 +08:00
YC Lin
a664958e9d [GEMM] Remove redundant GetBlockGemm 2025-04-14 03:29:27 +00:00
YC Lin
b5e48d5459 [GEMM] Implement local prefetch and refactor block gemm pipeline 2025-04-13 02:54:44 +00:00
Clement Lin
08b175fc91 Refactor flash_attention_fwd_traits_ for codegen 2025-04-12 02:17:53 +08:00
YC Lin
b7eedac71a [GEMM] Merge universal_block_gemm into block_gemm 2025-04-11 16:09:29 +00:00
mhYang
44eaa337f6 Fix flash attention 1 tile case 2025-04-11 15:44:53 +00:00
Clement Lin
bfadc59277 Refactor FlashAttnArgs usage for codegen 2025-04-11 13:46:25 +08:00
Clement Lin
f3db7e4fb7 Add codegen test example 2025-04-10 22:30:59 +08:00
YC Lin
6fdf2bd896 [GEMM] Refactor GetStaticLdsSize and remove GetSmemSize 2025-04-10 14:22:22 +00:00
Clement Lin
04199bc0aa Fix indentation 2025-04-10 09:13:12 +08:00
Clement Lin
3e61925277 Remove unused code 2025-04-09 15:08:09 +08:00
YC Lin
fd26846d61 [GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt 2025-04-09 03:19:10 +00:00
YC Lin
fe61498468 [Add] Add build option for generating assembly 2025-04-06 23:50:26 +00:00
YC Lin
aac02a92ac [GEMM] Refactor block gemm and pipeline policy of instruction schedule 2025-04-06 23:37:29 +00:00
Clement Lin
9151a1fb42 Add flash_attention_fwd toy_example 2025-04-04 00:39:02 +08:00
mhYang
f28aabb42f Update tile size and use slc 2025-04-02 12:23:23 +00:00
mhYang
d5531ab9c9 Fix add flops calculation 2025-04-01 19:13:18 +00:00
ClementLinCF
04513ca683 Create README.md 2025-04-01 09:11:29 +08:00
mhYang
d1dbc69eda Use mfma 16x16x32 2025-03-31 23:18:22 +00:00
mhYang
ee28e965f2 Fix KERNEL_D config 2025-03-31 17:53:57 +00:00
YC Lin
68cd6609eb [GEMM] Add pragma message for different MFMA options 2025-03-30 20:05:35 +00:00
YC Lin
a8027a5b2f [GEMM] Fix print typos 2025-03-30 19:55:13 +00:00
Clement Lin
5af7efdec5 Fix indentation typo 2025-03-30 15:06:07 +08:00
Clement Lin
de9385ba51 [GEMM] Fix MFMA condition checks 2025-03-30 14:02:30 +08:00
Clement Lin
5dd8e4ae0c [GEMM] Add new macor options check 2025-03-30 10:07:21 +08:00
Clement Lin
7bc473835e [GEMM] Add macros for multiple optimization options 2025-03-29 22:58:51 +08:00
YC Lin
428bcdeb40 [GEMM] default MFMA config 2025-03-29 21:11:29 +00:00
YC Lin
9a0d9dfc0a git push test 2025-03-29 21:09:00 +00:00
root
a3c6ca1761 [GEMM] fix MFMA configurations 2025-03-29 21:05:21 +00:00
mhYang
16a4e1585a Adjust mfma schedule order 2025-03-28 18:32:12 +00:00
Clement Lin
5428f17ca2 [GEMM] Replace const auto with constexpr index_t 2025-03-28 17:39:49 +08:00
Clement Lin
4eb246f20c [GEMM] Update cache-aware wg schedule 2025-03-28 17:31:19 +08:00