AviralGoelAMD
|
06b33aa7ff
|
Resolve conflict by accepting toy branch version
|
2025-04-24 14:29:57 +00:00 |
|
MHYang
|
e0bfe71854
|
Fix unexpected errors
|
2025-04-24 16:50:22 +00:00 |
|
AviralGoelAMD
|
aaa757e363
|
fixed function and struct names
|
2025-04-24 14:24:01 +00:00 |
|
MHYang
|
7e0ca8c2c7
|
Remove unused flag
|
2025-04-24 10:41:34 +00:00 |
|
MHYang
|
34df43db86
|
Fix generate.py
|
2025-04-24 09:50:22 +00:00 |
|
MHYang
|
0e6a23258e
|
Fix clang-format
|
2025-04-24 09:09:18 +00:00 |
|
MHYang
|
c4b2d5074a
|
Implement prefetch and instruction schedule
|
2025-04-23 23:06:23 +00:00 |
|
Clement Lin
|
257a06ef54
|
Add the warp gemm option
|
2025-04-23 15:04:24 +08:00 |
|
Clement Lin
|
11d45259d4
|
Add more warp gemm policies for FA
|
2025-04-23 14:41:09 +08:00 |
|
Clement Lin
|
ce4061847b
|
Remove unused code
|
2025-04-23 13:59:36 +08:00 |
|
YC Lin
|
ef085f402d
|
[GEMM] Add define macro for unused a/b blk window
|
2025-04-23 13:27:32 +00:00 |
|
Clement Lin
|
35de33c57b
|
Add codegen instances
The following examples have been tested for 04_codegen:
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 256 256
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 64 64
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 32 32
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 2048 2048 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 512 512 128 128
|
2025-04-23 11:51:35 +08:00 |
|
BoboFang
|
068d9fdbf7
|
Add MakeBlock2TileMap in 04_codegen_flash_attention_fwd
|
2025-04-23 09:47:37 +00:00 |
|
BoboFang
|
1a91c220a1
|
Change the permission of FA CMakeList.txt to 644
|
2025-04-22 15:37:03 +00:00 |
|
BoboFang
|
0a1227a5e7
|
Fix error after clang-format
|
2025-04-22 15:24:55 +00:00 |
|
BoboFang
|
95a8ac00c6
|
Run clang-format in toy_example
|
2025-04-22 14:55:57 +00:00 |
|
MHYang
|
537c519224
|
Initialize instruction schedule
|
2025-04-22 14:47:23 +00:00 |
|
BoboFang
|
de097a54d6
|
Add cache-aware in flash attention
|
2025-04-22 13:55:21 +00:00 |
|
MHYang
|
30e4b12ef4
|
Merge fix for bank conflict into codegen FA
|
2025-04-21 16:50:56 +00:00 |
|
MHYang
|
252b72ec30
|
Fix bank conflict
|
2025-04-21 16:47:51 +00:00 |
|
YC Lin
|
8a6cc0e94b
|
[GEMM] Fix bWarpTile issue and remove redundant pipeline in BlockGemmPipeline
|
2025-04-21 16:44:23 +00:00 |
|
MHYang
|
77a96c7a82
|
Fix register spilling and K0 tile size issues
|
2025-04-18 10:15:17 +00:00 |
|
YC Lin
|
918d5b21bc
|
[GEMM] Fix num_loop issues
|
2025-04-16 03:06:54 +00:00 |
|
AviralGoelAMD
|
b26f00cab2
|
clang format
|
2025-04-15 14:50:51 +00:00 |
|
AviralGoelAMD
|
094c3696b7
|
commented the 01_add example
|
2025-04-15 14:45:34 +00:00 |
|
AviralGoelAMD
|
40058afae6
|
added explanation comments
|
2025-04-15 14:35:19 +00:00 |
|
AviralGoelAMD
|
0fd0e7b706
|
added a 1d vector elementwise example
|
2025-04-15 14:12:04 +00:00 |
|
Clement Lin
|
67bd9e4bb3
|
Add generate.py for codegen
|
2025-04-14 18:13:44 +08:00 |
|
YC Lin
|
a664958e9d
|
[GEMM] Remove redundant GetBlockGemm
|
2025-04-14 03:29:27 +00:00 |
|
YC Lin
|
b5e48d5459
|
[GEMM] Implement local prefetch and refactor block gemm pipeline
|
2025-04-13 02:54:44 +00:00 |
|
Clement Lin
|
08b175fc91
|
Refactor flash_attention_fwd_traits_ for codegen
|
2025-04-12 02:17:53 +08:00 |
|
YC Lin
|
b7eedac71a
|
[GEMM] Merge universal_block_gemm into block_gemm
|
2025-04-11 16:09:29 +00:00 |
|
mhYang
|
44eaa337f6
|
Fix flash attention 1 tile case
|
2025-04-11 15:44:53 +00:00 |
|
Clement Lin
|
bfadc59277
|
Refactor FlashAttnArgs usage for codegen
|
2025-04-11 13:46:25 +08:00 |
|
Clement Lin
|
f3db7e4fb7
|
Add codegen test example
|
2025-04-10 22:30:59 +08:00 |
|
YC Lin
|
6fdf2bd896
|
[GEMM] Refactor GetStaticLdsSize and remove GetSmemSize
|
2025-04-10 14:22:22 +00:00 |
|
Clement Lin
|
04199bc0aa
|
Fix indentation
|
2025-04-10 09:13:12 +08:00 |
|
Clement Lin
|
3e61925277
|
Remove unused code
|
2025-04-09 15:08:09 +08:00 |
|
YC Lin
|
fd26846d61
|
[GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt
|
2025-04-09 03:19:10 +00:00 |
|
YC Lin
|
fe61498468
|
[Add] Add build option for generating assembly
|
2025-04-06 23:50:26 +00:00 |
|
YC Lin
|
aac02a92ac
|
[GEMM] Refactor block gemm and pipeline policy of instruction schedule
|
2025-04-06 23:37:29 +00:00 |
|
Clement Lin
|
9151a1fb42
|
Add flash_attention_fwd toy_example
|
2025-04-04 00:39:02 +08:00 |
|
mhYang
|
f28aabb42f
|
Update tile size and use slc
|
2025-04-02 12:23:23 +00:00 |
|
mhYang
|
d5531ab9c9
|
Fix add flops calculation
|
2025-04-01 19:13:18 +00:00 |
|
ClementLinCF
|
04513ca683
|
Create README.md
|
2025-04-01 09:11:29 +08:00 |
|
mhYang
|
d1dbc69eda
|
Use mfma 16x16x32
|
2025-03-31 23:18:22 +00:00 |
|
mhYang
|
ee28e965f2
|
Fix KERNEL_D config
|
2025-03-31 17:53:57 +00:00 |
|
YC Lin
|
68cd6609eb
|
[GEMM] Add pragma message for different MFMA options
|
2025-03-30 20:05:35 +00:00 |
|
YC Lin
|
a8027a5b2f
|
[GEMM] Fix print typos
|
2025-03-30 19:55:13 +00:00 |
|
Clement Lin
|
5af7efdec5
|
Fix indentation typo
|
2025-03-30 15:06:07 +08:00 |
|