MHYang
|
4a264eb9ed
|
Fix register spilling and K0 tile size issues
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
eb737b8f82
|
[GEMM] Fix num_loop issues
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
f71b2c7e55
|
Add generate.py for codegen
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
9e020272c4
|
[GEMM] Remove redundant GetBlockGemm
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
4e6a792b82
|
[GEMM] Implement local prefetch and refactor block gemm pipeline
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
cbc660acc7
|
Refactor flash_attention_fwd_traits_ for codegen
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
f30071289c
|
[GEMM] Merge universal_block_gemm into block_gemm
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
26b73c0ed1
|
Fix flash attention 1 tile case
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
e614acfdd8
|
Refactor FlashAttnArgs usage for codegen
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
0cc5130818
|
Add codegen test example
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
5b1c397806
|
[GEMM] Refactor GetStaticLdsSize and remove GetSmemSize
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
2499b8d401
|
Fix indentation
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
d98fb3e0b5
|
Remove unused code
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
ae275aa105
|
[GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
6113ca8062
|
[Add] Add build option for generating assembly
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
97a960042b
|
[GEMM] Refactor block gemm and pipeline policy of instruction schedule
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
8785e6599e
|
Add flash_attention_fwd toy_example
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
a949b82c9f
|
Update tile size and use slc
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
9158612a9f
|
Fix add flops calculation
|
2025-07-28 14:54:51 -04:00 |
|
ClementLinCF
|
88a4c7414f
|
Create README.md
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
ac972bfd11
|
Use mfma 16x16x32
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
5326d403e4
|
Fix KERNEL_D config
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
fe319b97ae
|
[GEMM] Add pragma message for different MFMA options
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
76751567b5
|
[GEMM] Fix print typos
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
4c526ab140
|
Fix indentation typo
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
5b10e9f3dd
|
[GEMM] Fix MFMA condition checks
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
a95665a6af
|
[GEMM] Add new macor options check
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
1099762267
|
[GEMM] Add macros for multiple optimization options
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
890a159877
|
[GEMM] default MFMA config
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
8d75ae7c96
|
git push test
|
2025-07-28 14:54:51 -04:00 |
|
root
|
a36d246cc0
|
[GEMM] fix MFMA configurations
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
15e6f36f66
|
Adjust mfma schedule order
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
e9f7c9bf42
|
[GEMM] Replace const auto with constexpr index_t
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
cef77c1dcb
|
[GEMM] Update cache-aware wg schedule
|
2025-07-28 14:54:51 -04:00 |
|
bobofang
|
127e742e96
|
Add MFMA M16N16K16 and M16N16K32 methods
these two methods are default off
|
2025-07-28 14:54:51 -04:00 |
|
YC Lin
|
e866f814f9
|
[GEMM] remove a_col_major/b_row_majro case
|
2025-07-28 14:54:51 -04:00 |
|
root
|
bf69235cfb
|
[GEMM] modify if-else locations
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
ba8b5112c4
|
Fix AccDataType and CDataType
1. Fix AccDataType and CDataType
2. Remove indent
3. Align merge_transform for tutorial
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
d6fd468603
|
Fix build error
|
2025-07-28 14:54:51 -04:00 |
|
root
|
b3986c32a6
|
[GEMM] disable/enable instruction scheduling
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
42f2e21865
|
Fix missing message
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
38ce4dd8c3
|
Fix xor transform dim.
|
2025-07-28 14:54:51 -04:00 |
|
Clement Lin
|
b03668fe8a
|
[GEMM] Add cache-aware WG schedule and adjust block tile
113 -> 121.7 TFops
|
2025-07-28 14:54:51 -04:00 |
|
mhYang
|
39ca852330
|
Add LDS bank conlict solutions
|
2025-07-28 14:54:51 -04:00 |
|
bobofang
|
22147ace51
|
Fix add accuracy issue
2673 GB/s -> 3271 GB/s
Perf: 0.0512898 ms, 3271.06 GB/s
|
2025-07-28 14:54:51 -04:00 |
|
root
|
d7d9fdaf1b
|
[GEMM] use mfma k8 warp gemm
|
2025-07-28 14:54:51 -04:00 |
|
root
|
1b8d7cd1b9
|
[GEMM] disable/enable prefetch
|
2025-07-28 14:54:50 -04:00 |
|
Clement Lin
|
6a2036015e
|
[CK TILE] Toy example - basic gemm
|
2025-07-28 14:54:50 -04:00 |
|
Clement Lin
|
077056b32d
|
Adjust block shape
2673 GB/s -> 3647 GB/s
|
2025-07-28 14:54:50 -04:00 |
|
Clement Lin
|
2ff691f3f2
|
Utilize vectorized memory access
1998.24 GB/s -> 2673 GB/s
|
2025-07-28 14:54:50 -04:00 |
|