MHYang
|
fcdfbcb6a7
|
Implement prefetch and instruction schedule
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
4fccb261b8
|
Add the warp gemm option
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
ac5b7cbf63
|
Add more warp gemm policies for FA
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
41d6c5731e
|
Remove unused code
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
b52a27d8b7
|
[GEMM] Add define macro for unused a/b blk window
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
addf290f8e
|
Add codegen instances
The following examples have been tested for 04_codegen:
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 256 256
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 64 64
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 32 32
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 2048 2048 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 512 512 128 128
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
9262b002f5
|
Add MakeBlock2TileMap in 04_codegen_flash_attention_fwd
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
6da6d4af91
|
Change the permission of FA CMakeList.txt to 644
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
104572c1f3
|
Fix error after clang-format
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
aff041233d
|
Run clang-format in toy_example
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
0d8693776e
|
Initialize instruction schedule
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
879edeadf1
|
Add cache-aware in flash attention
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
fef9588f98
|
Merge fix for bank conflict into codegen FA
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
ba264fe432
|
Fix bank conflict
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
1f5398de4a
|
[GEMM] Fix bWarpTile issue and remove redundant pipeline in BlockGemmPipeline
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
4a264eb9ed
|
Fix register spilling and K0 tile size issues
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
eb737b8f82
|
[GEMM] Fix num_loop issues
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
f71b2c7e55
|
Add generate.py for codegen
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
9e020272c4
|
[GEMM] Remove redundant GetBlockGemm
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
4e6a792b82
|
[GEMM] Implement local prefetch and refactor block gemm pipeline
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
cbc660acc7
|
Refactor flash_attention_fwd_traits_ for codegen
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
f30071289c
|
[GEMM] Merge universal_block_gemm into block_gemm
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
26b73c0ed1
|
Fix flash attention 1 tile case
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
e614acfdd8
|
Refactor FlashAttnArgs usage for codegen
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
0cc5130818
|
Add codegen test example
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
5b1c397806
|
[GEMM] Refactor GetStaticLdsSize and remove GetSmemSize
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
2499b8d401
|
Fix indentation
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
d98fb3e0b5
|
Remove unused code
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
ae275aa105
|
[GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
6113ca8062
|
[Add] Add build option for generating assembly
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
97a960042b
|
[GEMM] Refactor block gemm and pipeline policy of instruction schedule
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
8785e6599e
|
Add flash_attention_fwd toy_example
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
a949b82c9f
|
Update tile size and use slc
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
9158612a9f
|
Fix add flops calculation
|
2025-07-28 14:54:51 -04:00 |
|
ClementLinCF
|
88a4c7414f
|
Create README.md
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
ac972bfd11
|
Use mfma 16x16x32
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
5326d403e4
|
Fix KERNEL_D config
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
fe319b97ae
|
[GEMM] Add pragma message for different MFMA options
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
76751567b5
|
[GEMM] Fix print typos
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
4c526ab140
|
Fix indentation typo
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
5b10e9f3dd
|
[GEMM] Fix MFMA condition checks
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
a95665a6af
|
[GEMM] Add new macor options check
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
1099762267
|
[GEMM] Add macros for multiple optimization options
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
890a159877
|
[GEMM] default MFMA config
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
8d75ae7c96
|
git push test
|
2025-07-28 14:54:51 -04:00 |
|
root
|
a36d246cc0
|
[GEMM] fix MFMA configurations
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
15e6f36f66
|
Adjust mfma schedule order
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
e9f7c9bf42
|
[GEMM] Replace const auto with constexpr index_t
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
cef77c1dcb
|
[GEMM] Update cache-aware wg schedule
|
2025-07-28 14:54:51 -04:00 |
|
bobofang
|
127e742e96
|
Add MFMA M16N16K16 and M16N16K32 methods
these two methods are default off
|
2025-07-28 14:54:51 -04:00 |
|