Commit Graph

656 Commits

Author SHA1 Message Date
coderfeli
9ff2394e26 fix swizzle = false 2025-02-18 17:43:00 +08:00
mtgu0705
854cd8b4a1 commit missing files 2025-02-18 16:29:26 +08:00
mtgu0705
182e7480ba Split the blockwise pipeline for fp8xint4. 2025-02-18 15:38:05 +08:00
mtgu0705
966f9051c7 fixed merge issue. fp8xint4 and fp8xint4_bpreshuffle function pass. 2025-02-18 13:54:15 +08:00
mtgu0705
49bac8cef7 Added b preshuffle pipeline v3 support. 2025-02-18 13:34:55 +08:00
mtgu0705
a0432459e7 Added moe_pk_i4_gemm2, function pass. 2025-02-18 13:34:41 +08:00
mtgu0705
be79b63bfe fix bug in moe_gemm1.cpp, now function pass. 2025-02-18 13:32:46 +08:00
mtgu0705
1b0b7810cd Initial int4 moe, compile pass, function not check. 2025-02-18 13:32:25 +08:00
mtgu0705
fba3d780f2 fix bug, function now passes. 2025-02-18 13:25:18 +08:00
mtgu0705
c0ef46ff14 move b thread dequant copy to blockwise. 2025-02-18 13:25:06 +08:00
mtgu0705
a316dff966 fix bug, function pass. 2025-02-18 13:24:48 +08:00
mtgu0705
bee790ec5d init b preshuffle dequant in VGPR. 2025-02-18 13:22:16 +08:00
mtgu0705
9a3f75eeb8 fp8xint4 bpreshuffle function pass 2025-02-18 13:21:36 +08:00
mtgu0705
ba5a6a2477 General fix. 2025-02-18 13:21:06 +08:00
mtgu0705
e0391df785 Added gemm_fp8xint4_Bpreshuffle files, function not checked yet 2025-02-18 13:20:36 +08:00
mtgu0705
8df8a17943 Add Gemm fp8xint4 example and kernel, function pass. 2025-02-18 13:14:18 +08:00
coderfeli
45d1c52ef5 hotfix moegemm2 nswizzle 2025-02-18 04:10:58 +00:00
coderfeli
bca3f14c7c fix nswizzle=0 2025-02-18 03:36:13 +00:00
coderfeli
e78fbf8785 merge 2 moegemm pipe together 2025-02-18 03:23:56 +00:00
coderfeli
1687fc988e chage ktile 2025-02-17 14:26:43 +00:00
coderfeli
4404984abc 2x2 ok 2025-02-17 09:52:22 +00:00
coderfeli
f64b137521 merge haocong branch 2025-02-17 09:30:02 +00:00
coderfeli
4b91d1ce17 revert gemm2 swizz 2025-02-17 06:19:58 +00:00
coderfeli
fcc2c867af impl gemm2 swizzle 2025-02-17 02:33:52 +00:00
coderfeli
aecd6a38e4 rm err print 2025-02-17 01:50:12 +00:00
coderfeli
96047cab6f impl e swizzel 2025-02-17 01:26:42 +00:00
coderfeli
7572a6916c merge develop 2025-02-15 03:23:00 +00:00
coderfeli
7796fc738b fix gemm2 scale, gemm2 ok now 2025-02-15 03:09:47 +00:00
coderfeli
61e3c23851 fix moe gemm2 2025-02-15 01:48:56 +00:00
coderfeli
db53dba4a0 hotfix:gemm1 use real tokens and gemm2 ok 2025-02-14 15:08:28 +00:00
coderfeli
58db931ec5 fix topk id 2025-02-14 09:50:57 +00:00
coderfeli
84b27d7504 merge max_token_id and fix err 2025-02-14 08:19:54 +00:00
coderfeli
83be79ba58 add max_token_id 2025-02-14 06:22:17 +00:00
coderfeli
1078d22916 add logics and debug 2025-02-14 05:23:15 +00:00
coderfeli
d4b8f1e3b0 add codes for a scatter 2025-02-14 11:05:26 +08:00
Haocong WANG
f18cfec43c Merge branch 'develop' into update_cka8w8_uc 2025-02-14 10:52:39 +08:00
jefyang1
7b826807cd Fix KPack and enable existing instances on gfx950 (#1871) 2025-02-12 09:46:38 -08:00
coderfeli
418baed327 moe gemm1 scaleready 2025-02-12 05:19:01 +00:00
JonathanLichtnerAMD
3c7fef7f80 Conditionally log a DeviceGroupedConvBwdWeightTwoStage_Xdl_CShuffle warning (#1860)
The code was emitting a warning if MIOpen did not create a workspace
prior to invoking the IsSupportedArgument method, but the
condition for MIOpen to create a workspace was not met, and so this
condition was not really an error but more of a log message.  This commit
addresses this issue by using the CK_LOGGING facility to only generate the
log message if the CK_LOGGING environment variable is set.
2025-02-11 17:25:00 -07:00
Mirza Halilčević
b5ca008d62 Introduce gemm_softmax_gemm to codegen (#1542)
* Introduce ck_host library and gemm_softmax_gemm.

* Minor refactor.

* Add descriptor to gemm_softmax_gemm.

* Bugfix.

* Revert ck_host library.

* fix clang format

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
2025-02-11 08:07:24 -08:00
coderfeli
b02c0b8257 gemm1 scale debug 2025-02-11 14:52:01 +00:00
coderfeli
e4ca61f9e7 moe gemm2 scales ok 2025-02-11 12:01:39 +00:00
Haocong WANG
d6e3e83a80 Merge branch 'develop' into update_cka8w8_uc 2025-02-11 16:06:08 +08:00
coderfeli
66d08ea327 impl topk weight scatter 2025-02-11 07:43:59 +00:00
coderfeli
a8a82e0cfc fix warnings and impl scale for gemm2, build ok 2025-02-11 01:54:08 +00:00
coderfeli
69f54ee822 impl 3ds epilog ok 2025-02-10 14:50:56 +00:00
coderfeli
72752420e9 merge gemm1 gemm2 together and run ok 2025-02-10 09:06:22 +00:00
coderfeli
66cff9103f merge gemm1 and gemm2 2025-02-10 07:52:32 +00:00
coderfeli
aa15c49a67 add moegemm in device and grid 2025-02-10 07:51:55 +00:00
Mingtao Gu
d9f1ead347 Added Int4 mixed batch gemm support (#1839)
* remove redundant kernels.

* added batched_gemm_xdl_fp16int4_b_scale_v3

* Enabled the split K.

* added the batched_gemm_b_scale ckProfiler, meet function issue

* fix some typo

* fix ckProfiler build issue

* fix some bugs

* updated some debug info

* comment some code

* Fix

* fixed some bugs and refactor the code

* fixed a function bug.

* formatted files.

* formatted

* uncommented the ckProfiler CMakeLists

* fixed.

* fix ckProfiler for batched_gemm_b_scale

---------

Co-authored-by: mtgu0705 <mtgu@amd.com>
Co-authored-by: aska-0096 <haocwang@amd.com>
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>
2025-02-10 11:17:02 +08:00