Commit Graph

695 Commits

Author SHA1 Message Date
MHYang
fcdfbcb6a7 Implement prefetch and instruction schedule 2025-07-28 14:54:51 -04:00
Clement Lin
4fccb261b8 Add the warp gemm option 2025-07-28 14:54:51 -04:00
Clement Lin
ac5b7cbf63 Add more warp gemm policies for FA 2025-07-28 14:54:51 -04:00
Clement Lin
41d6c5731e Remove unused code 2025-07-28 14:54:51 -04:00
YC Lin
b52a27d8b7 [GEMM] Add define macro for unused a/b blk window 2025-07-28 14:54:51 -04:00
Clement Lin
addf290f8e Add codegen instances
The following examples have been tested for 04_codegen:

./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 256 256
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 64 64
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 32 32
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 2048 2048 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 512 512 128 128
2025-07-28 14:54:51 -04:00
BoboFang
9262b002f5 Add MakeBlock2TileMap in 04_codegen_flash_attention_fwd 2025-07-28 14:54:51 -04:00
BoboFang
6da6d4af91 Change the permission of FA CMakeList.txt to 644 2025-07-28 14:54:51 -04:00
BoboFang
104572c1f3 Fix error after clang-format 2025-07-28 14:54:51 -04:00
BoboFang
aff041233d Run clang-format in toy_example 2025-07-28 14:54:51 -04:00
MHYang
0d8693776e Initialize instruction schedule 2025-07-28 14:54:51 -04:00
BoboFang
879edeadf1 Add cache-aware in flash attention 2025-07-28 14:54:51 -04:00
MHYang
fef9588f98 Merge fix for bank conflict into codegen FA 2025-07-28 14:54:51 -04:00
MHYang
ba264fe432 Fix bank conflict 2025-07-28 14:54:51 -04:00
YC Lin
1f5398de4a [GEMM] Fix bWarpTile issue and remove redundant pipeline in BlockGemmPipeline 2025-07-28 14:54:51 -04:00
MHYang
4a264eb9ed Fix register spilling and K0 tile size issues 2025-07-28 14:54:51 -04:00
YC Lin
eb737b8f82 [GEMM] Fix num_loop issues 2025-07-28 14:54:51 -04:00
Clement Lin
f71b2c7e55 Add generate.py for codegen 2025-07-28 14:54:51 -04:00
YC Lin
9e020272c4 [GEMM] Remove redundant GetBlockGemm 2025-07-28 14:54:51 -04:00
YC Lin
4e6a792b82 [GEMM] Implement local prefetch and refactor block gemm pipeline 2025-07-28 14:54:51 -04:00
Clement Lin
cbc660acc7 Refactor flash_attention_fwd_traits_ for codegen 2025-07-28 14:54:51 -04:00
YC Lin
f30071289c [GEMM] Merge universal_block_gemm into block_gemm 2025-07-28 14:54:51 -04:00
mhYang
26b73c0ed1 Fix flash attention 1 tile case 2025-07-28 14:54:51 -04:00
Clement Lin
e614acfdd8 Refactor FlashAttnArgs usage for codegen 2025-07-28 14:54:51 -04:00
Clement Lin
0cc5130818 Add codegen test example 2025-07-28 14:54:51 -04:00
YC Lin
5b1c397806 [GEMM] Refactor GetStaticLdsSize and remove GetSmemSize 2025-07-28 14:54:51 -04:00
Clement Lin
2499b8d401 Fix indentation 2025-07-28 14:54:51 -04:00
Clement Lin
d98fb3e0b5 Remove unused code 2025-07-28 14:54:51 -04:00
YC Lin
ae275aa105 [GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt 2025-07-28 14:54:51 -04:00
YC Lin
6113ca8062 [Add] Add build option for generating assembly 2025-07-28 14:54:51 -04:00
YC Lin
97a960042b [GEMM] Refactor block gemm and pipeline policy of instruction schedule 2025-07-28 14:54:51 -04:00
Clement Lin
8785e6599e Add flash_attention_fwd toy_example 2025-07-28 14:54:51 -04:00
mhYang
a949b82c9f Update tile size and use slc 2025-07-28 14:54:51 -04:00
mhYang
9158612a9f Fix add flops calculation 2025-07-28 14:54:51 -04:00
ClementLinCF
88a4c7414f Create README.md 2025-07-28 14:54:51 -04:00
mhYang
ac972bfd11 Use mfma 16x16x32 2025-07-28 14:54:51 -04:00
mhYang
5326d403e4 Fix KERNEL_D config 2025-07-28 14:54:51 -04:00
YC Lin
fe319b97ae [GEMM] Add pragma message for different MFMA options 2025-07-28 14:54:51 -04:00
YC Lin
76751567b5 [GEMM] Fix print typos 2025-07-28 14:54:51 -04:00
Clement Lin
4c526ab140 Fix indentation typo 2025-07-28 14:54:51 -04:00
Clement Lin
5b10e9f3dd [GEMM] Fix MFMA condition checks 2025-07-28 14:54:51 -04:00
Clement Lin
a95665a6af [GEMM] Add new macor options check 2025-07-28 14:54:51 -04:00
Clement Lin
1099762267 [GEMM] Add macros for multiple optimization options 2025-07-28 14:54:51 -04:00
YC Lin
890a159877 [GEMM] default MFMA config 2025-07-28 14:54:51 -04:00
YC Lin
8d75ae7c96 git push test 2025-07-28 14:54:51 -04:00
root
a36d246cc0 [GEMM] fix MFMA configurations 2025-07-28 14:54:51 -04:00
mhYang
15e6f36f66 Adjust mfma schedule order 2025-07-28 14:54:51 -04:00
Clement Lin
e9f7c9bf42 [GEMM] Replace const auto with constexpr index_t 2025-07-28 14:54:51 -04:00
Clement Lin
cef77c1dcb [GEMM] Update cache-aware wg schedule 2025-07-28 14:54:51 -04:00
bobofang
127e742e96 Add MFMA M16N16K16 and M16N16K32 methods
these two methods are default off
2025-07-28 14:54:51 -04:00