Po Yen Chen
85becc24ee
Re-format old CK source files
2025-04-20 08:00:15 +00:00
Po Yen Chen
4f836f2f91
Re-enable some hdim pipelines
2025-04-20 07:57:47 +00:00
Po Yen Chen
889b2d33fd
Sync logits soft-capping across pipelines
2025-04-20 07:55:36 +00:00
Po Yen Chen
e0e9040f88
Align receipt used in Aiter
2025-04-20 07:18:12 +00:00
Po Yen Chen
6fb55c7e46
Do not generate non-verified kernels
2025-04-20 07:17:35 +00:00
Po Yen Chen
0200227d89
Support turn on/off logits_soft_cap in async pipeline
2025-04-20 06:56:20 +00:00
Po Yen Chen
87b22a7cff
Allow specifying logits_soft_cap through APIs
2025-04-20 06:23:38 +00:00
Po Yen Chen
4927305440
Re-format files
2025-04-20 04:28:58 +00:00
coderfeli
d2c653f177
fix bug
2025-04-18 10:45:53 +00:00
Bernard
838ff7034d
hack for cap logits
2025-04-18 08:38:25 +00:00
coderfeli
b20173a494
fix bug
2025-04-09 15:18:13 +00:00
coderfeli
867a4e527c
fix bugs
2025-04-04 13:40:10 +00:00
coderfeli
491178276f
fix fp8 scale
2025-04-03 11:10:37 +00:00
lalala-sh
3037d8bac1
mul int4 scale
2025-04-03 18:06:12 +08:00
root
20f6674bf6
fix no quant case
2025-04-03 02:46:01 +00:00
root
b2b34fffbb
fix fp8 16x16
2025-04-02 16:27:52 +00:00
root
85f83330b5
fuse moe activation
2025-04-02 07:02:09 +00:00
coderfeli
e285c77c5f
fix buid
2025-03-18 06:58:54 +00:00
coderfeli
1c90d50b5b
update moe api fix aiter build
2025-03-18 05:59:24 +00:00
coderfeli
98cee8d02b
fix merge
2025-03-18 05:45:04 +00:00
coderfeli
5f49b91237
merge develop
2025-03-18 04:49:40 +00:00
Illia Silin
1342ecf7fb
Add a daily CI build on gfx908. ( #1987 )
...
* add one daily ci build on gfx908
* add redis invocation tag for gfx908
* make ci build for gfx908 conditional
* fix groovy logic
* add option to run perf tests for gfx908
* disable a few tests on mi100
2025-03-17 18:08:53 -07:00
Illia Silin
07f25186b2
disable ck_tile basic gemm ( #1986 )
2025-03-17 15:26:43 -07:00
aledudek
5095906975
Async grouped gemm v3 ( #1940 )
...
* Fully async grouped gemm
* Remove commented code
* Remvoe maybe_unused
* host kernel args
* Checkpoint segfault debugging...
* Working part1
* Working part2
* Remvoe comments...
* Use void ptr for gemm kernel host args
* Fix device_grouped_gemm_multiple_d_dl build issue
* Fix device_grouped_gemm_xdl build issue
2025-03-17 16:42:43 +01:00
Bartłomiej Kocot
c2e4898b4b
Grouped conv bwd data NGCHW ( #1967 )
...
* Grouped conv bwd data NGCHW
* fixes
* fix
* Improvements
* Fix
* Fix
* add client example
2025-03-17 13:32:00 +01:00
coderfeli
7dbdff9f9f
moe sorting fix moebuf
2025-03-17 06:20:57 +00:00
coderfeli
5eaa36be18
mork to support 13w tokens
2025-03-17 01:45:34 +00:00
coderfeli
ef8c1333b9
use uint32
2025-03-17 01:45:09 +00:00
coderfeli
6c0e021235
revert v1 test
2025-03-17 01:39:57 +00:00
coderfeli
bccc5192cf
fix uint32
2025-03-17 01:18:32 +00:00
coderfeli
da2659d502
input output all ok
2025-03-15 14:26:30 +00:00
coderfeli
d1e999c05c
int64 index ok now
2025-03-15 13:28:49 +00:00
coderfeli
f911cf7396
impl int64 but result not correct
2025-03-14 13:01:07 +00:00
coderfeli
d4925e1637
fix output oob
2025-03-14 03:19:26 +00:00
valarLip
52b1cd7780
hotfix fmoe build issue ( #1976 )
2025-03-13 15:11:59 +08:00
dependabot[bot]
de7a745ca6
Bump rocm-docs-core from 1.17.1 to 1.18.1 in /docs/sphinx ( #1977 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.17.1 to 1.18.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.1...v1.18.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-12 23:36:36 -07:00
carlushuang
3e81279d26
Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 ( #1961 )" … ( #1971 )
...
* Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961 )" (#1969 )
This reverts commit 8cbcd3e0d0 .
* fix codegen problem
* Update config.hpp
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-03-13 11:41:39 +08:00
illsilin
f8464d2087
fix clang format
2025-03-12 20:21:14 -07:00
coderfeli
d85c034977
fix2
2025-03-13 02:30:07 +00:00
coderfeli
8b05fa935d
fix coredump in e2e test
2025-03-13 02:12:18 +00:00
Illia Silin
d4a6d69643
disable tests that take too long to build for gfx90a ( #1975 )
2025-03-12 17:54:03 -07:00
feli
251afab3b7
ck_moe: fix useless code and remove usless oob ( #1972 )
...
* fix useless code and remove usless oob
* clang format
---------
Co-authored-by: coderfeli <coderfeli@163.com >
2025-03-12 09:22:42 -07:00
Illia Silin
4c97cc511e
use old instrinsics with staging compiler ( #1970 )
2025-03-12 07:29:09 -07:00
feli
2585c78940
Merge branch 'develop' into ck_moe_rm_oob
2025-03-12 16:05:59 +08:00
coderfeli
40542296de
clang format
2025-03-12 08:05:12 +00:00
Illia Silin
8cbcd3e0d0
Revert "[CK_TILE] support hdim=192/128 pair for deepseekv3 ( #1961 )" ( #1969 )
...
This reverts commit 7a93b16ff6 .
2025-03-11 10:40:18 -07:00
Haocong WANG
cbd74c2d12
[Block Scale GEMM] Optimized block scale gemm ( #1950 )
...
* Added two kernel for M=32 problem
* Comment the first one
* Enable multiply_multiply for Scale_Block_M = 1 for deepseek
* Modify the a_thread offset since the A data load is different from B.
* edit fp8 ab scale for Scale_Block_M=1
* edit GemmSpec to MNKPadding
* enable blockwise pipelie v1 and v2. v1 is work for small K.
* add instance for gemm_ab_scale
* fix cmakelist of ckProfiler
* optimize blockscale gemm. todo: reduce vgpr usage
* fix a correctness bug
* sanity checked
* revert ckprofiler cmake changes
* clang format
* revert unnecessary changes.
* remove commented codes.
* split weight preshuffle library targets
* bring back enable-post-misched=0
* fix build issues for gemm_multiply_multiply_fp8 instances
* fix clang format
* add verbose build flag when building for all targets
* reduce path names for new instances
* fix paths in cmake
* refactor gemm_multiply_multiply library target
* fix a bug in example
* fix example 65 cmake
* reduce the number of threads when building libs for all targets to 50
* use ninja to build for all targets
* reduce teh number of threads when building for all targets
* reduce the number of threads to 32 when building libs for all targets to 50
---------
Co-authored-by: mtgu0705 <mtgu@amd.com >
Co-authored-by: chenjun <junchen2@amd.com >
Co-authored-by: illsilin <Illia.Silin@amd.com >
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com >
2025-03-11 10:11:21 -07:00
Haocong WANG
ba209b9dab
reduce test size to avoid timeout on specific silicon ( #1966 )
2025-03-11 09:15:26 -07:00
Illia Silin
aa42c3db06
disable example_moe_gemm2_xdl_pk_i4 on gfx950 ( #1968 )
2025-03-11 08:34:47 -07:00
carlushuang
7a93b16ff6
[CK_TILE] support hdim=192/128 pair for deepseekv3 ( #1961 )
...
* support hdim=192/128 pair
* remove useless print
* update
2025-03-11 21:07:40 +08:00