coderfeli
9ff2394e26
fix swizzle = false
2025-02-18 17:43:00 +08:00
mtgu0705
854cd8b4a1
commit missing files
2025-02-18 16:29:26 +08:00
mtgu0705
182e7480ba
Split the blockwise pipeline for fp8xint4.
2025-02-18 15:38:05 +08:00
mtgu0705
966f9051c7
fixed merge issue. fp8xint4 and fp8xint4_bpreshuffle function pass.
2025-02-18 13:54:15 +08:00
mtgu0705
49bac8cef7
Added b preshuffle pipeline v3 support.
2025-02-18 13:34:55 +08:00
mtgu0705
a0432459e7
Added moe_pk_i4_gemm2, function pass.
2025-02-18 13:34:41 +08:00
mtgu0705
be79b63bfe
fix bug in moe_gemm1.cpp, now function pass.
2025-02-18 13:32:46 +08:00
mtgu0705
1b0b7810cd
Initial int4 moe, compile pass, function not check.
2025-02-18 13:32:25 +08:00
mtgu0705
fba3d780f2
fix bug, function now passes.
2025-02-18 13:25:18 +08:00
mtgu0705
c0ef46ff14
move b thread dequant copy to blockwise.
2025-02-18 13:25:06 +08:00
mtgu0705
a316dff966
fix bug, function pass.
2025-02-18 13:24:48 +08:00
mtgu0705
bee790ec5d
init b preshuffle dequant in VGPR.
2025-02-18 13:22:16 +08:00
mtgu0705
9a3f75eeb8
fp8xint4 bpreshuffle function pass
2025-02-18 13:21:36 +08:00
mtgu0705
ba5a6a2477
General fix.
2025-02-18 13:21:06 +08:00
mtgu0705
e0391df785
Added gemm_fp8xint4_Bpreshuffle files, function not checked yet
2025-02-18 13:20:36 +08:00
mtgu0705
8df8a17943
Add Gemm fp8xint4 example and kernel, function pass.
2025-02-18 13:14:18 +08:00
coderfeli
45d1c52ef5
hotfix moegemm2 nswizzle
2025-02-18 04:10:58 +00:00
coderfeli
bca3f14c7c
fix nswizzle=0
2025-02-18 03:36:13 +00:00
coderfeli
e78fbf8785
merge 2 moegemm pipe together
2025-02-18 03:23:56 +00:00
coderfeli
1687fc988e
chage ktile
2025-02-17 14:26:43 +00:00
coderfeli
4404984abc
2x2 ok
2025-02-17 09:52:22 +00:00
coderfeli
f64b137521
merge haocong branch
2025-02-17 09:30:02 +00:00
coderfeli
4b91d1ce17
revert gemm2 swizz
2025-02-17 06:19:58 +00:00
coderfeli
fcc2c867af
impl gemm2 swizzle
2025-02-17 02:33:52 +00:00
coderfeli
aecd6a38e4
rm err print
2025-02-17 01:50:12 +00:00
coderfeli
96047cab6f
impl e swizzel
2025-02-17 01:26:42 +00:00
coderfeli
7572a6916c
merge develop
2025-02-15 03:23:00 +00:00
coderfeli
7796fc738b
fix gemm2 scale, gemm2 ok now
2025-02-15 03:09:47 +00:00
coderfeli
61e3c23851
fix moe gemm2
2025-02-15 01:48:56 +00:00
coderfeli
db53dba4a0
hotfix:gemm1 use real tokens and gemm2 ok
2025-02-14 15:08:28 +00:00
coderfeli
58db931ec5
fix topk id
2025-02-14 09:50:57 +00:00
coderfeli
84b27d7504
merge max_token_id and fix err
2025-02-14 08:19:54 +00:00
coderfeli
83be79ba58
add max_token_id
2025-02-14 06:22:17 +00:00
coderfeli
1078d22916
add logics and debug
2025-02-14 05:23:15 +00:00
coderfeli
d4b8f1e3b0
add codes for a scatter
2025-02-14 11:05:26 +08:00
Haocong WANG
f18cfec43c
Merge branch 'develop' into update_cka8w8_uc
2025-02-14 10:52:39 +08:00
jefyang1
7b826807cd
Fix KPack and enable existing instances on gfx950 ( #1871 )
2025-02-12 09:46:38 -08:00
coderfeli
418baed327
moe gemm1 scaleready
2025-02-12 05:19:01 +00:00
JonathanLichtnerAMD
3c7fef7f80
Conditionally log a DeviceGroupedConvBwdWeightTwoStage_Xdl_CShuffle warning ( #1860 )
...
The code was emitting a warning if MIOpen did not create a workspace
prior to invoking the IsSupportedArgument method, but the
condition for MIOpen to create a workspace was not met, and so this
condition was not really an error but more of a log message. This commit
addresses this issue by using the CK_LOGGING facility to only generate the
log message if the CK_LOGGING environment variable is set.
2025-02-11 17:25:00 -07:00
Mirza Halilčević
b5ca008d62
Introduce gemm_softmax_gemm to codegen ( #1542 )
...
* Introduce ck_host library and gemm_softmax_gemm.
* Minor refactor.
* Add descriptor to gemm_softmax_gemm.
* Bugfix.
* Revert ck_host library.
* fix clang format
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2025-02-11 08:07:24 -08:00
coderfeli
b02c0b8257
gemm1 scale debug
2025-02-11 14:52:01 +00:00
coderfeli
e4ca61f9e7
moe gemm2 scales ok
2025-02-11 12:01:39 +00:00
Haocong WANG
d6e3e83a80
Merge branch 'develop' into update_cka8w8_uc
2025-02-11 16:06:08 +08:00
coderfeli
66d08ea327
impl topk weight scatter
2025-02-11 07:43:59 +00:00
coderfeli
a8a82e0cfc
fix warnings and impl scale for gemm2, build ok
2025-02-11 01:54:08 +00:00
coderfeli
69f54ee822
impl 3ds epilog ok
2025-02-10 14:50:56 +00:00
coderfeli
72752420e9
merge gemm1 gemm2 together and run ok
2025-02-10 09:06:22 +00:00
coderfeli
66cff9103f
merge gemm1 and gemm2
2025-02-10 07:52:32 +00:00
coderfeli
aa15c49a67
add moegemm in device and grid
2025-02-10 07:51:55 +00:00
Mingtao Gu
d9f1ead347
Added Int4 mixed batch gemm support ( #1839 )
...
* remove redundant kernels.
* added batched_gemm_xdl_fp16int4_b_scale_v3
* Enabled the split K.
* added the batched_gemm_b_scale ckProfiler, meet function issue
* fix some typo
* fix ckProfiler build issue
* fix some bugs
* updated some debug info
* comment some code
* Fix
* fixed some bugs and refactor the code
* fixed a function bug.
* formatted files.
* formatted
* uncommented the ckProfiler CMakeLists
* fixed.
* fix ckProfiler for batched_gemm_b_scale
---------
Co-authored-by: mtgu0705 <mtgu@amd.com >
Co-authored-by: aska-0096 <haocwang@amd.com >
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com >
2025-02-10 11:17:02 +08:00