AviralGoelAMD
|
aaa757e363
|
fixed function and struct names
|
2025-04-24 14:24:01 +00:00 |
|
AviralGoelAMD
|
b26f00cab2
|
clang format
|
2025-04-15 14:50:51 +00:00 |
|
AviralGoelAMD
|
094c3696b7
|
commented the 01_add example
|
2025-04-15 14:45:34 +00:00 |
|
AviralGoelAMD
|
40058afae6
|
added explanation comments
|
2025-04-15 14:35:19 +00:00 |
|
AviralGoelAMD
|
0fd0e7b706
|
added a 1d vector elementwise example
|
2025-04-15 14:12:04 +00:00 |
|
Clement Lin
|
67bd9e4bb3
|
Add generate.py for codegen
|
2025-04-14 18:13:44 +08:00 |
|
YC Lin
|
a664958e9d
|
[GEMM] Remove redundant GetBlockGemm
|
2025-04-14 03:29:27 +00:00 |
|
YC Lin
|
b5e48d5459
|
[GEMM] Implement local prefetch and refactor block gemm pipeline
|
2025-04-13 02:54:44 +00:00 |
|
Clement Lin
|
08b175fc91
|
Refactor flash_attention_fwd_traits_ for codegen
|
2025-04-12 02:17:53 +08:00 |
|
YC Lin
|
b7eedac71a
|
[GEMM] Merge universal_block_gemm into block_gemm
|
2025-04-11 16:09:29 +00:00 |
|
mhYang
|
44eaa337f6
|
Fix flash attention 1 tile case
|
2025-04-11 15:44:53 +00:00 |
|
Clement Lin
|
bfadc59277
|
Refactor FlashAttnArgs usage for codegen
|
2025-04-11 13:46:25 +08:00 |
|
Clement Lin
|
f3db7e4fb7
|
Add codegen test example
|
2025-04-10 22:30:59 +08:00 |
|
YC Lin
|
6fdf2bd896
|
[GEMM] Refactor GetStaticLdsSize and remove GetSmemSize
|
2025-04-10 14:22:22 +00:00 |
|
Clement Lin
|
04199bc0aa
|
Fix indentation
|
2025-04-10 09:13:12 +08:00 |
|
Clement Lin
|
3e61925277
|
Remove unused code
|
2025-04-09 15:08:09 +08:00 |
|
YC Lin
|
fd26846d61
|
[GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt
|
2025-04-09 03:19:10 +00:00 |
|
YC Lin
|
fe61498468
|
[Add] Add build option for generating assembly
|
2025-04-06 23:50:26 +00:00 |
|
YC Lin
|
aac02a92ac
|
[GEMM] Refactor block gemm and pipeline policy of instruction schedule
|
2025-04-06 23:37:29 +00:00 |
|
Clement Lin
|
9151a1fb42
|
Add flash_attention_fwd toy_example
|
2025-04-04 00:39:02 +08:00 |
|
mhYang
|
f28aabb42f
|
Update tile size and use slc
|
2025-04-02 12:23:23 +00:00 |
|
mhYang
|
04be0fb437
|
Make buffer coherence configurable in tensor view
|
2025-04-02 11:51:23 +00:00 |
|
mhYang
|
d5531ab9c9
|
Fix add flops calculation
|
2025-04-01 19:13:18 +00:00 |
|
ClementLinCF
|
04513ca683
|
Create README.md
|
2025-04-01 09:11:29 +08:00 |
|
mhYang
|
d1dbc69eda
|
Use mfma 16x16x32
|
2025-03-31 23:18:22 +00:00 |
|
mhYang
|
ee28e965f2
|
Fix KERNEL_D config
|
2025-03-31 17:53:57 +00:00 |
|
YC Lin
|
68cd6609eb
|
[GEMM] Add pragma message for different MFMA options
|
2025-03-30 20:05:35 +00:00 |
|
YC Lin
|
a8027a5b2f
|
[GEMM] Fix print typos
|
2025-03-30 19:55:13 +00:00 |
|
Clement Lin
|
5af7efdec5
|
Fix indentation typo
|
2025-03-30 15:06:07 +08:00 |
|
Clement Lin
|
de9385ba51
|
[GEMM] Fix MFMA condition checks
|
2025-03-30 14:02:30 +08:00 |
|
Clement Lin
|
5dd8e4ae0c
|
[GEMM] Add new macor options check
|
2025-03-30 10:07:21 +08:00 |
|
Clement Lin
|
7bc473835e
|
[GEMM] Add macros for multiple optimization options
|
2025-03-29 22:58:51 +08:00 |
|
YC Lin
|
428bcdeb40
|
[GEMM] default MFMA config
|
2025-03-29 21:11:29 +00:00 |
|
YC Lin
|
9a0d9dfc0a
|
git push test
|
2025-03-29 21:09:00 +00:00 |
|
root
|
a3c6ca1761
|
[GEMM] fix MFMA configurations
|
2025-03-29 21:05:21 +00:00 |
|
mhYang
|
16a4e1585a
|
Adjust mfma schedule order
|
2025-03-28 18:32:12 +00:00 |
|
Clement Lin
|
5428f17ca2
|
[GEMM] Replace const auto with constexpr index_t
|
2025-03-28 17:39:49 +08:00 |
|
Clement Lin
|
4eb246f20c
|
[GEMM] Update cache-aware wg schedule
|
2025-03-28 17:31:19 +08:00 |
|
bobofang
|
8f1aa6fc6f
|
Add MFMA M16N16K16 and M16N16K32 methods
these two methods are default off
|
2025-03-28 16:31:53 +00:00 |
|
YC Lin
|
f1562d5911
|
[GEMM] remove a_col_major/b_row_majro case
|
2025-03-26 16:31:24 +00:00 |
|
root
|
6d2b55914e
|
[GEMM] modify if-else locations
|
2025-03-26 13:35:00 +00:00 |
|
mhYang
|
9ecde871a3
|
Fix AccDataType and CDataType
1. Fix AccDataType and CDataType
2. Remove indent
3. Align merge_transform for tutorial
|
2025-03-25 20:08:59 +00:00 |
|
mhYang
|
67072b3ba9
|
Fix build error
|
2025-03-25 18:55:34 +00:00 |
|
MHYang-gh
|
b0b5827673
|
Fix A/B lds transform (#2007)
|
2025-03-25 18:52:41 +00:00 |
|
root
|
7d08f99b02
|
[GEMM] disable/enable instruction scheduling
|
2025-03-25 16:53:20 +00:00 |
|
mhYang
|
4494c54dcd
|
Fix missing message
|
2025-03-21 15:06:29 +00:00 |
|
mhYang
|
8f3b534d29
|
Fix xor transform dim.
|
2025-03-21 15:00:05 +00:00 |
|
Clement Lin
|
1f604e9b0a
|
[GEMM] Add cache-aware WG schedule and adjust block tile
113 -> 121.7 TFops
|
2025-03-21 09:15:17 +08:00 |
|
mhYang
|
93193e42ea
|
Add LDS bank conlict solutions
|
2025-03-20 21:36:56 +00:00 |
|
bobofang
|
d635209d59
|
Fix add accuracy issue
2673 GB/s -> 3271 GB/s
Perf: 0.0512898 ms, 3271.06 GB/s
|
2025-03-19 12:26:30 +08:00 |
|