Commit Graph

572 Commits

Author SHA1 Message Date
root
727df8fe11 try to use merge transform for converting 3d to 2d 2025-05-14 14:10:43 +00:00
root
f2b4d315e4 Add a sample to do element_wise add for 3D inputs 2025-05-11 18:44:26 +00:00
Clement Lin
9d64738f88 Change the warmup number of add example 2025-05-06 09:41:56 +08:00
ClementLinCF
9156e9f11d Merge pull request #2087 from ROCm/ck_tile_hello_world
Added a much simpler example and commented 01_add
2025-05-05 12:26:48 +08:00
bobofang11235
e74c3800eb Add cache aware option 2025-05-05 12:07:59 +08:00
ClementLinCF
4a42b328de Update README.md 2025-05-05 09:11:46 +08:00
MHYang
1b0885c207 Add QK swizzle option 2025-04-29 17:00:01 +00:00
MHYang
c204cdc382 Fix o_spans 2025-04-29 11:03:49 +00:00
ClementLinCF
89bf0765fb Update README.md 2025-04-25 07:15:12 +08:00
Clement Lin
04540f2eac Fix typo 2025-04-25 05:18:37 +08:00
Clement Lin
89e33ed5ad Add new instances 2025-04-25 01:35:45 +08:00
MHYang
e0bfe71854 Fix unexpected errors 2025-04-24 16:50:22 +00:00
AviralGoelAMD
6bd124b43e clang format 2025-04-24 14:33:44 +00:00
AviralGoelAMD
06b33aa7ff Resolve conflict by accepting toy branch version 2025-04-24 14:29:57 +00:00
AviralGoelAMD
aaa757e363 fixed function and struct names 2025-04-24 14:24:01 +00:00
MHYang
7e0ca8c2c7 Remove unused flag 2025-04-24 10:41:34 +00:00
MHYang
34df43db86 Fix generate.py 2025-04-24 09:50:22 +00:00
MHYang
0e6a23258e Fix clang-format 2025-04-24 09:09:18 +00:00
MHYang
c4b2d5074a Implement prefetch and instruction schedule 2025-04-23 23:06:23 +00:00
Clement Lin
257a06ef54 Add the warp gemm option 2025-04-23 15:04:24 +08:00
Clement Lin
11d45259d4 Add more warp gemm policies for FA 2025-04-23 14:41:09 +08:00
Clement Lin
ce4061847b Remove unused code 2025-04-23 13:59:36 +08:00
YC Lin
ef085f402d [GEMM] Add define macro for unused a/b blk window 2025-04-23 13:27:32 +00:00
Clement Lin
35de33c57b Add codegen instances
The following examples have been tested for 04_codegen:

./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 256 256
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 64 64
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 32 32
./bin/codegen_basic_flash_attention_fwd 1 1 64 4096 4096 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 2048 2048 128 128
./bin/codegen_basic_flash_attention_fwd 1 1 64 512 512 128 128
2025-04-23 11:51:35 +08:00
BoboFang
068d9fdbf7 Add MakeBlock2TileMap in 04_codegen_flash_attention_fwd 2025-04-23 09:47:37 +00:00
BoboFang
1a91c220a1 Change the permission of FA CMakeList.txt to 644 2025-04-22 15:37:03 +00:00
BoboFang
0a1227a5e7 Fix error after clang-format 2025-04-22 15:24:55 +00:00
BoboFang
95a8ac00c6 Run clang-format in toy_example 2025-04-22 14:55:57 +00:00
MHYang
537c519224 Initialize instruction schedule 2025-04-22 14:47:23 +00:00
BoboFang
de097a54d6 Add cache-aware in flash attention 2025-04-22 13:55:21 +00:00
MHYang
30e4b12ef4 Merge fix for bank conflict into codegen FA 2025-04-21 16:50:56 +00:00
MHYang
252b72ec30 Fix bank conflict 2025-04-21 16:47:51 +00:00
YC Lin
8a6cc0e94b [GEMM] Fix bWarpTile issue and remove redundant pipeline in BlockGemmPipeline 2025-04-21 16:44:23 +00:00
MHYang
77a96c7a82 Fix register spilling and K0 tile size issues 2025-04-18 10:15:17 +00:00
YC Lin
918d5b21bc [GEMM] Fix num_loop issues 2025-04-16 03:06:54 +00:00
AviralGoelAMD
b26f00cab2 clang format 2025-04-15 14:50:51 +00:00
AviralGoelAMD
094c3696b7 commented the 01_add example 2025-04-15 14:45:34 +00:00
AviralGoelAMD
40058afae6 added explanation comments 2025-04-15 14:35:19 +00:00
AviralGoelAMD
0fd0e7b706 added a 1d vector elementwise example 2025-04-15 14:12:04 +00:00
Clement Lin
67bd9e4bb3 Add generate.py for codegen 2025-04-14 18:13:44 +08:00
YC Lin
a664958e9d [GEMM] Remove redundant GetBlockGemm 2025-04-14 03:29:27 +00:00
YC Lin
b5e48d5459 [GEMM] Implement local prefetch and refactor block gemm pipeline 2025-04-13 02:54:44 +00:00
Clement Lin
08b175fc91 Refactor flash_attention_fwd_traits_ for codegen 2025-04-12 02:17:53 +08:00
YC Lin
b7eedac71a [GEMM] Merge universal_block_gemm into block_gemm 2025-04-11 16:09:29 +00:00
mhYang
44eaa337f6 Fix flash attention 1 tile case 2025-04-11 15:44:53 +00:00
Clement Lin
bfadc59277 Refactor FlashAttnArgs usage for codegen 2025-04-11 13:46:25 +08:00
Clement Lin
f3db7e4fb7 Add codegen test example 2025-04-10 22:30:59 +08:00
YC Lin
6fdf2bd896 [GEMM] Refactor GetStaticLdsSize and remove GetSmemSize 2025-04-10 14:22:22 +00:00
Clement Lin
04199bc0aa Fix indentation 2025-04-10 09:13:12 +08:00
Clement Lin
3e61925277 Remove unused code 2025-04-09 15:08:09 +08:00