mhYang
44eaa337f6
Fix flash attention 1 tile case
2025-04-11 15:44:53 +00:00
Clement Lin
bfadc59277
Refactor FlashAttnArgs usage for codegen
2025-04-11 13:46:25 +08:00
Clement Lin
f3db7e4fb7
Add codegen test example
2025-04-10 22:30:59 +08:00
YC Lin
6fdf2bd896
[GEMM] Refactor GetStaticLdsSize and remove GetSmemSize
2025-04-10 14:22:22 +00:00
Clement Lin
04199bc0aa
Fix indentation
2025-04-10 09:13:12 +08:00
Clement Lin
3e61925277
Remove unused code
2025-04-09 15:08:09 +08:00
YC Lin
fd26846d61
[GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt
2025-04-09 03:19:10 +00:00
YC Lin
fe61498468
[Add] Add build option for generating assembly
2025-04-06 23:50:26 +00:00
YC Lin
aac02a92ac
[GEMM] Refactor block gemm and pipeline policy of instruction schedule
2025-04-06 23:37:29 +00:00
Clement Lin
9151a1fb42
Add flash_attention_fwd toy_example
2025-04-04 00:39:02 +08:00
mhYang
f28aabb42f
Update tile size and use slc
2025-04-02 12:23:23 +00:00
mhYang
d5531ab9c9
Fix add flops calculation
2025-04-01 19:13:18 +00:00
ClementLinCF
04513ca683
Create README.md
2025-04-01 09:11:29 +08:00
mhYang
d1dbc69eda
Use mfma 16x16x32
2025-03-31 23:18:22 +00:00
mhYang
ee28e965f2
Fix KERNEL_D config
2025-03-31 17:53:57 +00:00
YC Lin
68cd6609eb
[GEMM] Add pragma message for different MFMA options
2025-03-30 20:05:35 +00:00
YC Lin
a8027a5b2f
[GEMM] Fix print typos
2025-03-30 19:55:13 +00:00
Clement Lin
5af7efdec5
Fix indentation typo
2025-03-30 15:06:07 +08:00
Clement Lin
de9385ba51
[GEMM] Fix MFMA condition checks
2025-03-30 14:02:30 +08:00
Clement Lin
5dd8e4ae0c
[GEMM] Add new macor options check
2025-03-30 10:07:21 +08:00
Clement Lin
7bc473835e
[GEMM] Add macros for multiple optimization options
2025-03-29 22:58:51 +08:00
YC Lin
428bcdeb40
[GEMM] default MFMA config
2025-03-29 21:11:29 +00:00
YC Lin
9a0d9dfc0a
git push test
2025-03-29 21:09:00 +00:00
root
a3c6ca1761
[GEMM] fix MFMA configurations
2025-03-29 21:05:21 +00:00
mhYang
16a4e1585a
Adjust mfma schedule order
2025-03-28 18:32:12 +00:00
Clement Lin
5428f17ca2
[GEMM] Replace const auto with constexpr index_t
2025-03-28 17:39:49 +08:00
Clement Lin
4eb246f20c
[GEMM] Update cache-aware wg schedule
2025-03-28 17:31:19 +08:00
bobofang
8f1aa6fc6f
Add MFMA M16N16K16 and M16N16K32 methods
...
these two methods are default off
2025-03-28 16:31:53 +00:00
YC Lin
f1562d5911
[GEMM] remove a_col_major/b_row_majro case
2025-03-26 16:31:24 +00:00
root
6d2b55914e
[GEMM] modify if-else locations
2025-03-26 13:35:00 +00:00
mhYang
9ecde871a3
Fix AccDataType and CDataType
...
1. Fix AccDataType and CDataType
2. Remove indent
3. Align merge_transform for tutorial
2025-03-25 20:08:59 +00:00
mhYang
67072b3ba9
Fix build error
2025-03-25 18:55:34 +00:00
root
7d08f99b02
[GEMM] disable/enable instruction scheduling
2025-03-25 16:53:20 +00:00
mhYang
4494c54dcd
Fix missing message
2025-03-21 15:06:29 +00:00
mhYang
8f3b534d29
Fix xor transform dim.
2025-03-21 15:00:05 +00:00
Clement Lin
1f604e9b0a
[GEMM] Add cache-aware WG schedule and adjust block tile
...
113 -> 121.7 TFops
2025-03-21 09:15:17 +08:00
mhYang
93193e42ea
Add LDS bank conlict solutions
2025-03-20 21:36:56 +00:00
bobofang
d635209d59
Fix add accuracy issue
...
2673 GB/s -> 3271 GB/s
Perf: 0.0512898 ms, 3271.06 GB/s
2025-03-19 12:26:30 +08:00
root
ff15e2da7a
[GEMM] use mfma k8 warp gemm
2025-03-17 16:01:29 +00:00
root
10033c1cdc
[GEMM] disable/enable prefetch
2025-03-17 14:22:49 +00:00
Clement Lin
803ecb93d8
[CK TILE] Toy example - basic gemm
2025-03-12 15:55:47 +08:00
Clement Lin
1afc32c59c
Adjust block shape
...
2673 GB/s -> 3647 GB/s
2025-03-12 14:58:06 +08:00
Clement Lin
58bc69aa99
Utilize vectorized memory access
...
1998.24 GB/s -> 2673 GB/s
2025-03-12 14:44:41 +08:00
Clement Lin
399cdb6f9f
Adjust the size of thread block
...
1968.42 GB/s -> 1998.24 GB/s
2025-03-12 14:33:36 +08:00
Clement Lin
712d96cef5
[CK TILE] Toy example - basic add
2025-03-11 16:23:12 +08:00
Mingtao Gu
0db7c8f0b2
Ck int4 moe develop ( #1949 )
...
* Add Gemm fp8xint4 example and kernel, function pass.
* Init Gemm_fp8xint4 Bpreshuffle
* Added gemm_fp8xint4_Bpreshuffle files, function not checked yet
* General fix.
* fp8xint4 bpreshuffle function pass
* fix.
* init b preshuffle dequant in VGPR.
* fix bug, function pass.
* move b thread dequant copy to blockwise.
* fix bug, function now passes.
* modified the tile size to 256, 128x128x128.
* fixed a bug.
* Initial int4 moe, compile pass, function not check.
* fix bug in moe_gemm1.cpp, now function pass.
* test expert = 8 and function pass.
* Added moe_pk_i4_gemm2, function pass.
* Added b preshuffle pipeline v3 support.
* fixed merge issue. fp8xint4 and fp8xint4_bpreshuffle function pass.
* Split the blockwise pipeline for fp8xint4.
* commit missing files
* opt gemm2 to 2x2 wave
* fix swizzle = false
* update int4 moe with latest input changes.
* update tile size.
* enable pipeline v3.
* fix nswizzle = true
* commit a version for compiler debug.
* Updated transfer_v3r1_gather to support pk_i4_t type.
* for int4 moe2 for type_convert support.
* remove some values between mfma instructions.
* fix int4 moe
* Updated transfer_v3r1_gather to support pk_i4_t type.
* i4 support lds multiple shuffle
* fixed int4 moe tflops calculation.
* Modified CshuffleCShuffleMXdlPerWavePerShuffle to 1 to suit C multiple shuffle
* updated gemm2.
* change int4 moe example names
* fix and format code.
* format.
* format codes.
* update fp8xint4 example tile size.
* add <unordered_map> header
* fixed.
* format.
* Added conditional compilation for int4 -> fp8 conversion kernels
---------
Co-authored-by: mtgu0705 <mtgu@amd.com >
Co-authored-by: coderfeli <coderfeli@163.com >
2025-03-10 11:16:44 +08:00
Thomas Ning
c954bd0cfa
Add the instance of MBlock=144 for GemmMultiplyMultiply ( #1955 )
...
* tempsave, not selected
* finish the feature and merge with develop
---------
Co-authored-by: aska-0096 <haocwang@amd.com >
2025-03-07 13:44:06 -08:00
Max Podkorytov
9e132eb77c
refactor ck-tile kernel launch ( #1925 )
2025-03-07 08:29:40 -08:00
kylasa
66c5f5b0b6
Addressing (Post Merge) code review comments for PR 1845 ( #1883 )
...
* Addressing code review comments.
* Addressing code review comments.
* Reorganized code for better readability.
* add ck_tile gemms for new types in CI
* fix jenkins syntax
* fix script syntax
* Add the test cases back
* Address the review comments
* Address review comments
* clang format
* Solve the merging issues
* Addressed the comments
* clang format
---------
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: ThomasNing <thomas.ning@amd.com >
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com >
2025-03-06 11:40:30 -08:00
Illia Silin
9b51c08bf7
remove support for gfx940 and gfx941 targets ( #1944 )
...
* remove support for gfx940 and gfx941 targets
* update changelog
2025-03-05 11:07:33 -08:00