root
fbf91ada78
merge from testx
2025-04-08 06:55:35 +00:00
lalala-sh
62f99e5cca
Merge branch 'develop' into moe_gemm_activation
2025-04-08 14:43:41 +08:00
slippedJim
5a22b61de5
Add new receipt ( #2055 )
2025-04-07 14:18:01 +08:00
Thomas Ning
50d1f8ff90
Add the MI355 support for CK TILE GEMM ( #2046 )
...
* Get the root cause of the ck tile gemm failing on mi355
* Fix the ck tile gemm on MI355
* delete the debug info
2025-04-03 11:48:54 -07:00
aledudek
9329432f6c
Post-merge changes for fully async args copy in ck grouped gemm ( #1991 )
...
* Post-merge changes for fully async args copy in ck grouped gemm
* Post-merge documentation and naming changes
* Build fix and updated changelog
* Revised comments
2025-04-03 13:35:43 +02:00
root
b2b34fffbb
fix fp8 16x16
2025-04-02 16:27:52 +00:00
root
85f83330b5
fuse moe activation
2025-04-02 07:02:09 +00:00
Illia Silin
dcfec66bc4
Merge branch 'develop' into moe_gemm_activation
2025-04-01 12:28:10 -07:00
Muhammed Emin Ozturk
dd4c12b155
f8/bf16 GEMM Stream-K ( #1879 )
2025-03-31 20:30:17 -06:00
root
ae3ec1f579
add the arch limit of int4 moe gemm
2025-03-31 05:46:52 +00:00
illsilin
d3fb5a9b8d
fix clang format
2025-03-28 11:52:40 -07:00
rocking
8a20b62e91
Reduce redundant space in bias tensor ( #2024 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2025-03-28 21:58:06 +08:00
lalala-sh
656a3657cb
remove useless code change
2025-03-28 16:25:00 +08:00
root
7054d45579
remove useless changes
2025-03-28 08:21:42 +00:00
root
c9ce26837c
Merge remote-tracking branch 'origin/develop' into moe_gemm_activation
2025-03-28 07:57:53 +00:00
root
de65682298
int4 act ready
2025-03-28 07:45:39 +00:00
felix
a82f338fb9
hotfix fix sorting int64 ( #2025 )
...
* fix sorting int64
* clang format
* fix example issue
* update WA issue #
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: carlushuang <carlus.huang@amd.com >
2025-03-28 11:31:52 +08:00
root
529a1732cd
fp8 with act ready
2025-03-28 02:34:38 +00:00
lalala-sh
d0f3e87129
Merge remote-tracking branch 'origin/develop' into moe_gemm_activation
2025-03-28 10:02:27 +08:00
felix
36d50de50e
ckmoe: change cmake; use smaller shape for i4 ( #2027 )
...
* change cmake; use smaller shape for i4
* fix pki4 run
* fix typo
* fix runtime arch logic for moe_gemm2 example
---------
Co-authored-by: coderfeli <coderfeli@163.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
2025-03-27 09:04:31 -07:00
root
4f4bce30cf
fuse gelu silu act in moe gemm1
2025-03-27 16:59:53 +08:00
coderfeli
6654718ad2
change flops; change cshuffle dtype
2025-03-27 07:23:10 +00:00
coderfeli
f7a6598532
16x16 run ok
2025-03-27 07:08:11 +00:00
coderfeli
bd4a8c71d4
i4 gemm2 ok and i4 gemm1 build
2025-03-27 06:41:31 +00:00
Illia Silin
23a949706c
Disable all pk_i4 tests for all targets except gfx942/950. ( #2022 )
...
* only build gemm_fp8_pk_i4 examples for gfx942/950
* fix cmake logic
* moved the architecture check to IsSupported function
* Revert "moved the architecture check to IsSupported function"
This reverts commit 056d2a08b3 .
* disable all pk_i4 tests for targets other than gfx942/950
* fix cmake logic
2025-03-26 15:15:57 -07:00
coderfeli
9729c9e3f7
i4 gemm2 ok
2025-03-26 11:41:21 +00:00
coderfeli
cd12af75af
support bf16 cshuffle
2025-03-26 06:50:21 +00:00
coderfeli
6a0cc4aad1
gu fusion for m32 m64 ok
2025-03-26 05:58:22 +00:00
coderfeli
74d8ac608f
gufusion compatible ok, fix warnings
2025-03-26 02:20:30 +00:00
Illia Silin
99b2bbc1d6
Make sure gemm_fp8_pk_i4 examples only build and run on gfx942/950. ( #2010 )
...
* only build gemm_fp8_pk_i4 examples for gfx942/950
* fix cmake logic
* moved the architecture check to IsSupported function
* Revert "moved the architecture check to IsSupported function"
This reverts commit 056d2a08b3 .
2025-03-25 14:43:38 -07:00
Andriy Roshchenko
72d888821c
MX GEMM examples with FP8, FP16, and E8M0 scales ( #2016 )
...
* Add `scalar_type` specification for E8M0 exponent
* Specialize `nnvb_data_t_selector` for E8M0 exponent
* Remove partial specializations for `scalar_type` of `non_native_vector_base` template
* Reword command line helper string
* Create MX GEMM examples for different scales
2025-03-25 15:33:03 -06:00
coderfeli
6ca5892256
gemm2 ok
2025-03-25 15:01:10 +00:00
ruanjm
d49abdaa87
[CK_TILE] Improve RMS/Layer Normalization 2 Pass Pipeline Performance ( #1861 )
...
* 50ms -> 28ms
* Fix bug in non fuse_add_store cases
* Fine tuned setting for 2 pass pipeline
* adjust workload
* remove unnecessary change
* add layernorm
* Adding output quant and unquant results at the same time.
* fix test
* fix format
* tune for cases 128x640 and 128x1024
* bug ifx
2025-03-25 20:09:45 +08:00
coderfeli
2b15b67b3f
acale ok
2025-03-25 02:52:04 +00:00
Andriy Roshchenko
6660dc6b8e
Introduce MX GEMM for FP8 data type ( #2000 )
2025-03-24 15:41:07 -06:00
coderfeli
b865e2cf83
silu ok
2025-03-22 14:03:45 +00:00
carlushuang
6c08c5c46d
add mask support in hdim=192/128 ( #1999 )
2025-03-21 18:28:43 +08:00
coderfeli
d69c1c9590
fuse silu
2025-03-21 07:31:49 +00:00
BingYuan.Zhou
5a0d693b86
fix ck_tile/basic_gemm build error ( #1988 )
2025-03-20 22:01:14 -07:00
felix
902dbe89ad
change cmake ( #2006 )
...
Co-authored-by: coderfeli <coderfeli@163.com >
2025-03-20 19:25:11 -07:00
carlushuang
e3c9886cdf
[CK_TILE] return value with macro in ck_tile::kernel_launch API ( #1982 )
...
* return value with macro and revert the return value
* [CK-TILE] no-macro launch api solution (#1992 )
* no-macro solution
* address -Wcomma
---------
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com >
2025-03-20 11:00:29 -07:00
jakpiase
0e91d32c61
[CK_TILE] Switch to universal gemm for batched and grouped gemms ( #1919 )
...
* switch to universal gemm for batched and grouped gemms
* added reviewer comments
* fixed grouped gemm tests
2025-03-20 11:17:04 +01:00
rocking
b819c217e4
Sync the kname with instance name ( #1989 )
...
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com >
2025-03-20 00:06:45 +08:00
coderfeli
e285c77c5f
fix buid
2025-03-18 06:58:54 +00:00
coderfeli
1c90d50b5b
update moe api fix aiter build
2025-03-18 05:59:24 +00:00
coderfeli
5f49b91237
merge develop
2025-03-18 04:49:40 +00:00
Illia Silin
1342ecf7fb
Add a daily CI build on gfx908. ( #1987 )
...
* add one daily ci build on gfx908
* add redis invocation tag for gfx908
* make ci build for gfx908 conditional
* fix groovy logic
* add option to run perf tests for gfx908
* disable a few tests on mi100
2025-03-17 18:08:53 -07:00
aledudek
5095906975
Async grouped gemm v3 ( #1940 )
...
* Fully async grouped gemm
* Remove commented code
* Remvoe maybe_unused
* host kernel args
* Checkpoint segfault debugging...
* Working part1
* Working part2
* Remvoe comments...
* Use void ptr for gemm kernel host args
* Fix device_grouped_gemm_multiple_d_dl build issue
* Fix device_grouped_gemm_xdl build issue
2025-03-17 16:42:43 +01:00
coderfeli
7dbdff9f9f
moe sorting fix moebuf
2025-03-17 06:20:57 +00:00
coderfeli
ef8c1333b9
use uint32
2025-03-17 01:45:09 +00:00