Commit Graph

523 Commits

Author SHA1 Message Date
Clement Lin
3e61925277 Remove unused code 2025-04-09 15:08:09 +08:00
YC Lin
fd26846d61 [GEMM] Refactor block gemm, pipeline, and policy of instruction schedule opt 2025-04-09 03:19:10 +00:00
YC Lin
fe61498468 [Add] Add build option for generating assembly 2025-04-06 23:50:26 +00:00
YC Lin
aac02a92ac [GEMM] Refactor block gemm and pipeline policy of instruction schedule 2025-04-06 23:37:29 +00:00
Clement Lin
9151a1fb42 Add flash_attention_fwd toy_example 2025-04-04 00:39:02 +08:00
mhYang
f28aabb42f Update tile size and use slc 2025-04-02 12:23:23 +00:00
mhYang
d5531ab9c9 Fix add flops calculation 2025-04-01 19:13:18 +00:00
ClementLinCF
04513ca683 Create README.md 2025-04-01 09:11:29 +08:00
mhYang
d1dbc69eda Use mfma 16x16x32 2025-03-31 23:18:22 +00:00
mhYang
ee28e965f2 Fix KERNEL_D config 2025-03-31 17:53:57 +00:00
YC Lin
68cd6609eb [GEMM] Add pragma message for different MFMA options 2025-03-30 20:05:35 +00:00
YC Lin
a8027a5b2f [GEMM] Fix print typos 2025-03-30 19:55:13 +00:00
Clement Lin
5af7efdec5 Fix indentation typo 2025-03-30 15:06:07 +08:00
Clement Lin
de9385ba51 [GEMM] Fix MFMA condition checks 2025-03-30 14:02:30 +08:00
Clement Lin
5dd8e4ae0c [GEMM] Add new macor options check 2025-03-30 10:07:21 +08:00
Clement Lin
7bc473835e [GEMM] Add macros for multiple optimization options 2025-03-29 22:58:51 +08:00
YC Lin
428bcdeb40 [GEMM] default MFMA config 2025-03-29 21:11:29 +00:00
YC Lin
9a0d9dfc0a git push test 2025-03-29 21:09:00 +00:00
root
a3c6ca1761 [GEMM] fix MFMA configurations 2025-03-29 21:05:21 +00:00
mhYang
16a4e1585a Adjust mfma schedule order 2025-03-28 18:32:12 +00:00
Clement Lin
5428f17ca2 [GEMM] Replace const auto with constexpr index_t 2025-03-28 17:39:49 +08:00
Clement Lin
4eb246f20c [GEMM] Update cache-aware wg schedule 2025-03-28 17:31:19 +08:00
bobofang
8f1aa6fc6f Add MFMA M16N16K16 and M16N16K32 methods
these two methods are default off
2025-03-28 16:31:53 +00:00
YC Lin
f1562d5911 [GEMM] remove a_col_major/b_row_majro case 2025-03-26 16:31:24 +00:00
root
6d2b55914e [GEMM] modify if-else locations 2025-03-26 13:35:00 +00:00
mhYang
9ecde871a3 Fix AccDataType and CDataType
1. Fix AccDataType and CDataType
2. Remove indent
3. Align merge_transform for tutorial
2025-03-25 20:08:59 +00:00
mhYang
67072b3ba9 Fix build error 2025-03-25 18:55:34 +00:00
root
7d08f99b02 [GEMM] disable/enable instruction scheduling 2025-03-25 16:53:20 +00:00
mhYang
4494c54dcd Fix missing message 2025-03-21 15:06:29 +00:00
mhYang
8f3b534d29 Fix xor transform dim. 2025-03-21 15:00:05 +00:00
Clement Lin
1f604e9b0a [GEMM] Add cache-aware WG schedule and adjust block tile
113 -> 121.7 TFops
2025-03-21 09:15:17 +08:00
mhYang
93193e42ea Add LDS bank conlict solutions 2025-03-20 21:36:56 +00:00
bobofang
d635209d59 Fix add accuracy issue
2673 GB/s -> 3271 GB/s
Perf: 0.0512898 ms, 3271.06 GB/s
2025-03-19 12:26:30 +08:00
root
ff15e2da7a [GEMM] use mfma k8 warp gemm 2025-03-17 16:01:29 +00:00
root
10033c1cdc [GEMM] disable/enable prefetch 2025-03-17 14:22:49 +00:00
Clement Lin
803ecb93d8 [CK TILE] Toy example - basic gemm 2025-03-12 15:55:47 +08:00
Clement Lin
1afc32c59c Adjust block shape
2673 GB/s -> 3647 GB/s
2025-03-12 14:58:06 +08:00
Clement Lin
58bc69aa99 Utilize vectorized memory access
1998.24 GB/s -> 2673 GB/s
2025-03-12 14:44:41 +08:00
Clement Lin
399cdb6f9f Adjust the size of thread block
1968.42 GB/s -> 1998.24 GB/s
2025-03-12 14:33:36 +08:00
Clement Lin
712d96cef5 [CK TILE] Toy example - basic add 2025-03-11 16:23:12 +08:00
Mingtao Gu
0db7c8f0b2 Ck int4 moe develop (#1949)
* Add Gemm fp8xint4 example and kernel, function pass.

* Init Gemm_fp8xint4 Bpreshuffle

* Added gemm_fp8xint4_Bpreshuffle files, function not checked yet

* General fix.

* fp8xint4 bpreshuffle function pass

* fix.

* init b preshuffle dequant in VGPR.

* fix bug, function pass.

* move b thread dequant copy to blockwise.

* fix bug, function now passes.

* modified the tile size to 256, 128x128x128.

* fixed a bug.

* Initial int4 moe, compile pass, function not check.

* fix bug in moe_gemm1.cpp, now function pass.

* test expert = 8 and function pass.

* Added moe_pk_i4_gemm2, function pass.

* Added b preshuffle pipeline v3 support.

* fixed merge issue. fp8xint4 and fp8xint4_bpreshuffle function pass.

* Split the blockwise pipeline for fp8xint4.

* commit missing files

* opt gemm2 to 2x2 wave

* fix swizzle = false

* update int4 moe with latest input changes.

* update tile size.

* enable pipeline v3.

* fix nswizzle = true

* commit a version for compiler debug.

* Updated transfer_v3r1_gather to support pk_i4_t type.

* for int4 moe2 for type_convert support.

* remove some values between mfma instructions.

* fix int4 moe

* Updated transfer_v3r1_gather to support pk_i4_t type.

* i4 support lds multiple shuffle

* fixed int4 moe tflops calculation.

* Modified CshuffleCShuffleMXdlPerWavePerShuffle to 1 to suit C multiple shuffle

* updated gemm2.

* change int4 moe example names

* fix and format code.

* format.

* format codes.

* update fp8xint4 example tile size.

* add <unordered_map> header

* fixed.

* format.

* Added conditional compilation for int4 -> fp8 conversion kernels

---------

Co-authored-by: mtgu0705 <mtgu@amd.com>
Co-authored-by: coderfeli <coderfeli@163.com>
2025-03-10 11:16:44 +08:00
Thomas Ning
c954bd0cfa Add the instance of MBlock=144 for GemmMultiplyMultiply (#1955)
* tempsave, not selected

* finish the feature and merge with develop

---------

Co-authored-by: aska-0096 <haocwang@amd.com>
2025-03-07 13:44:06 -08:00
Max Podkorytov
9e132eb77c refactor ck-tile kernel launch (#1925) 2025-03-07 08:29:40 -08:00
kylasa
66c5f5b0b6 Addressing (Post Merge) code review comments for PR 1845 (#1883)
* Addressing code review comments.

* Addressing code review comments.

* Reorganized code for better readability.

* add ck_tile gemms for new types in CI

* fix jenkins syntax

* fix script syntax

* Add the test cases back

* Address the review comments

* Address review comments

* clang format

* Solve the merging issues

* Addressed the comments

* clang format

---------

Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: ThomasNing <thomas.ning@amd.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2025-03-06 11:40:30 -08:00
Illia Silin
9b51c08bf7 remove support for gfx940 and gfx941 targets (#1944)
* remove support for gfx940 and gfx941 targets

* update changelog
2025-03-05 11:07:33 -08:00
feli
3786e16375 ck moe gemm implement (#1936)
* port all moe changes from ck_moe_gemm branch

* refine codes in the pr

* fix tail odd

* fix clang format

* fix clang format2

* make hot loop scheduler compatible with 16x16 and 32x32

* clang format

* fix per token quant

* rename moe example

* clang format

---------

Co-authored-by: coderfeli <coderfeli@163.com>
2025-03-05 15:56:55 +08:00
jefyang1
c95bda93ba Remove CK_USE_AMD_MFMA_GFX950 (#1935)
* Add runtime check in example_gemm_xdl_streamk for gfx950

* Add runtime check in grouped conv fwd examples for gfx950

* Disable CK_USE_AMD_MFMA_GFX950

* Add new instances for gfx950

* Fix test_gemm_universal on gfx950
2025-03-04 10:32:25 -08:00
asleepzzz
ef16010273 Revert "[BlockScale GEMM] FP8 Blockscale GEMM optimization and ckProfiler (#1913)" (#1933)
This reverts commit 020148d0f7.
2025-03-03 07:17:39 -08:00
rocking
faa2235dad explicit show no feature in kernel name (#1920) 2025-02-28 14:23:30 +08:00
slippedJim
a9bcd3c98d make fmha bwd api template for v2 & v3 (#1918)
* use template fmha_bwd function

* update

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
2025-02-27 19:26:19 +08:00