Philip Maybank
|
42b2e3bc40
|
change Atrribute to Attribute globally
|
2025-07-28 15:43:08 -04:00 |
|
Clement Lin
|
c1d4adce86
|
Change the warmup number of add example
|
2025-07-28 14:54:51 -04:00 |
|
AviralGoelAMD
|
511f5eb24e
|
clang format
|
2025-07-28 14:54:51 -04:00 |
|
AviralGoelAMD
|
67cb075ba4
|
fixed function and struct names
|
2025-07-28 14:54:51 -04:00 |
|
AviralGoelAMD
|
35ecfc1b5a
|
clang format
|
2025-07-28 14:54:51 -04:00 |
|
AviralGoelAMD
|
1056a980bc
|
commented the 01_add example
|
2025-07-28 14:54:51 -04:00 |
|
AviralGoelAMD
|
b1bada8304
|
added explanation comments
|
2025-07-28 14:54:51 -04:00 |
|
AviralGoelAMD
|
9edef9e351
|
added a 1d vector elementwise example
|
2025-07-28 14:54:51 -04:00 |
|
bobofang11235
|
cf34ed5f3d
|
Add cache aware option
|
2025-07-28 14:54:51 -04:00 |
|
ClementLinCF
|
4bfa4f6839
|
Update README.md
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
3357f3541a
|
Add QK swizzle option
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
9e9a63c918
|
Fix o_spans
|
2025-07-28 14:54:51 -04:00 |
|
ClementLinCF
|
be9516a756
|
Update README.md
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
43cd04aa47
|
Fix typo
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
4b242084ab
|
Add new instances
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
62452bd550
|
Fix unexpected errors
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
10e08b520d
|
Remove unused flag
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
54ddd4c47c
|
Fix generate.py
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
8f939db88f
|
Fix clang-format
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
fcdfbcb6a7
|
Implement prefetch and instruction schedule
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
4fccb261b8
|
Add the warp gemm option
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
ac5b7cbf63
|
Add more warp gemm policies for FA
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
41d6c5731e
|
Remove unused code
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
b52a27d8b7
|
[GEMM] Add define macro for unused a/b blk window
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
addf290f8e
|
Add codegen instances
The following examples have been tested for 04_codegen:
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 256 256
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 64 64
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 32 32
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 2048 2048 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 512 512 128 128
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
9262b002f5
|
Add MakeBlock2TileMap in 04_codegen_flash_attention_fwd
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
6da6d4af91
|
Change the permission of FA CMakeList.txt to 644
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
104572c1f3
|
Fix error after clang-format
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
aff041233d
|
Run clang-format in toy_example
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
0d8693776e
|
Initialize instruction schedule
|
2025-07-28 14:54:51 -04:00 |
|
BoboFang
|
879edeadf1
|
Add cache-aware in flash attention
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
fef9588f98
|
Merge fix for bank conflict into codegen FA
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
ba264fe432
|
Fix bank conflict
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
1f5398de4a
|
[GEMM] Fix bWarpTile issue and remove redundant pipeline in BlockGemmPipeline
|
2025-07-28 14:54:51 -04:00 |
|
MHYang
|
4a264eb9ed
|
Fix register spilling and K0 tile size issues
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
eb737b8f82
|
[GEMM] Fix num_loop issues
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
f71b2c7e55
|
Add generate.py for codegen
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
9e020272c4
|
[GEMM] Remove redundant GetBlockGemm
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
4e6a792b82
|
[GEMM] Implement local prefetch and refactor block gemm pipeline
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
cbc660acc7
|
Refactor flash_attention_fwd_traits_ for codegen
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
f30071289c
|
[GEMM] Merge universal_block_gemm into block_gemm
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
26b73c0ed1
|
Fix flash attention 1 tile case
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
e614acfdd8
|
Refactor FlashAttnArgs usage for codegen
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
0cc5130818
|
Add codegen test example
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
5b1c397806
|
[GEMM] Refactor GetStaticLdsSize and remove GetSmemSize
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
2499b8d401
|
Fix indentation
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
d98fb3e0b5
|
Remove unused code
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
ae275aa105
|
[GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
6113ca8062
|
[Add] Add build option for generating assembly
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
97a960042b
|
[GEMM] Refactor block gemm and pipeline policy of instruction schedule
|
2025-07-28 14:54:51 -04:00 |
|