fsx950223
|
7ce4f50da6
|
fix bugs
|
2025-06-03 07:17:56 +00:00 |
|
fsx950223
|
8e1dd4e7f9
|
store
|
2025-05-29 09:07:15 +00:00 |
|
fsx950223
|
4f823fd8f2
|
store
|
2025-05-28 11:15:38 +00:00 |
|
Po Yen Chen
|
6070090258
|
Mask out key values for each splits
|
2025-05-27 09:50:50 +00:00 |
|
Po Yen Chen
|
9512f78616
|
Support logits_soft_cap in batch_decode()
|
2025-04-21 23:33:52 +00:00 |
|
Po Yen Chen
|
fd84cf840a
|
Fix batch_decode() codegen
|
2025-04-21 13:04:35 +00:00 |
|
Po Yen Chen
|
9f509a0f74
|
Merge branch 'dev/testx_fa_cap_logits' into poyenc/batch-prefill-decode
|
2025-04-21 08:36:44 +00:00 |
|
coderfeli
|
9aa77620b8
|
fix perf
|
2025-04-21 05:43:57 +00:00 |
|
Po Yen Chen
|
3b8662f198
|
Re-format old CK source files
|
2025-04-21 11:39:33 +08:00 |
|
Po Yen Chen
|
09c5ad5241
|
Sync logits soft-capping across pipelines
|
2025-04-21 11:39:33 +08:00 |
|
Po Yen Chen
|
2632baa05b
|
Support turn on/off logits_soft_cap in async pipeline
|
2025-04-21 11:39:33 +08:00 |
|
Po Yen Chen
|
b3829c11b5
|
Allow specifying logits_soft_cap through APIs
|
2025-04-21 11:39:33 +08:00 |
|
Po Yen Chen
|
124f47a5e8
|
Re-format files
|
2025-04-21 11:39:33 +08:00 |
|
Po Yen Chen
|
519c93bd08
|
Add batch prefill/decode kernels
|
2025-04-21 02:52:02 +00:00 |
|
Po Yen Chen
|
85becc24ee
|
Re-format old CK source files
|
2025-04-20 08:00:15 +00:00 |
|
Po Yen Chen
|
889b2d33fd
|
Sync logits soft-capping across pipelines
|
2025-04-20 07:55:36 +00:00 |
|
Po Yen Chen
|
0200227d89
|
Support turn on/off logits_soft_cap in async pipeline
|
2025-04-20 06:56:20 +00:00 |
|
Po Yen Chen
|
87b22a7cff
|
Allow specifying logits_soft_cap through APIs
|
2025-04-20 06:23:38 +00:00 |
|
Po Yen Chen
|
4927305440
|
Re-format files
|
2025-04-20 04:28:58 +00:00 |
|
coderfeli
|
d2c653f177
|
fix bug
|
2025-04-18 10:45:53 +00:00 |
|
Bernard
|
838ff7034d
|
hack for cap logits
|
2025-04-18 08:38:25 +00:00 |
|
coderfeli
|
b20173a494
|
fix bug
|
2025-04-09 15:18:13 +00:00 |
|
coderfeli
|
867a4e527c
|
fix bugs
|
2025-04-04 13:40:10 +00:00 |
|
coderfeli
|
491178276f
|
fix fp8 scale
|
2025-04-03 11:10:37 +00:00 |
|
lalala-sh
|
3037d8bac1
|
mul int4 scale
|
2025-04-03 18:06:12 +08:00 |
|
root
|
20f6674bf6
|
fix no quant case
|
2025-04-03 02:46:01 +00:00 |
|
root
|
b2b34fffbb
|
fix fp8 16x16
|
2025-04-02 16:27:52 +00:00 |
|
root
|
85f83330b5
|
fuse moe activation
|
2025-04-02 07:02:09 +00:00 |
|
coderfeli
|
98cee8d02b
|
fix merge
|
2025-03-18 05:45:04 +00:00 |
|
coderfeli
|
5f49b91237
|
merge develop
|
2025-03-18 04:49:40 +00:00 |
|
aledudek
|
5095906975
|
Async grouped gemm v3 (#1940)
* Fully async grouped gemm
* Remove commented code
* Remvoe maybe_unused
* host kernel args
* Checkpoint segfault debugging...
* Working part1
* Working part2
* Remvoe comments...
* Use void ptr for gemm kernel host args
* Fix device_grouped_gemm_multiple_d_dl build issue
* Fix device_grouped_gemm_xdl build issue
|
2025-03-17 16:42:43 +01:00 |
|
Bartłomiej Kocot
|
c2e4898b4b
|
Grouped conv bwd data NGCHW (#1967)
* Grouped conv bwd data NGCHW
* fixes
* fix
* Improvements
* Fix
* Fix
* add client example
|
2025-03-17 13:32:00 +01:00 |
|
coderfeli
|
7dbdff9f9f
|
moe sorting fix moebuf
|
2025-03-17 06:20:57 +00:00 |
|
coderfeli
|
5eaa36be18
|
mork to support 13w tokens
|
2025-03-17 01:45:34 +00:00 |
|
coderfeli
|
ef8c1333b9
|
use uint32
|
2025-03-17 01:45:09 +00:00 |
|
coderfeli
|
6c0e021235
|
revert v1 test
|
2025-03-17 01:39:57 +00:00 |
|
coderfeli
|
bccc5192cf
|
fix uint32
|
2025-03-17 01:18:32 +00:00 |
|
coderfeli
|
da2659d502
|
input output all ok
|
2025-03-15 14:26:30 +00:00 |
|
coderfeli
|
d1e999c05c
|
int64 index ok now
|
2025-03-15 13:28:49 +00:00 |
|
coderfeli
|
f911cf7396
|
impl int64 but result not correct
|
2025-03-14 13:01:07 +00:00 |
|
coderfeli
|
d4925e1637
|
fix output oob
|
2025-03-14 03:19:26 +00:00 |
|
carlushuang
|
3e81279d26
|
Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" … (#1971)
* Reapply "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" (#1969)
This reverts commit 8cbcd3e0d0.
* fix codegen problem
* Update config.hpp
---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
|
2025-03-13 11:41:39 +08:00 |
|
illsilin
|
f8464d2087
|
fix clang format
|
2025-03-12 20:21:14 -07:00 |
|
coderfeli
|
d85c034977
|
fix2
|
2025-03-13 02:30:07 +00:00 |
|
coderfeli
|
8b05fa935d
|
fix coredump in e2e test
|
2025-03-13 02:12:18 +00:00 |
|
feli
|
251afab3b7
|
ck_moe: fix useless code and remove usless oob (#1972)
* fix useless code and remove usless oob
* clang format
---------
Co-authored-by: coderfeli <coderfeli@163.com>
|
2025-03-12 09:22:42 -07:00 |
|
Illia Silin
|
4c97cc511e
|
use old instrinsics with staging compiler (#1970)
|
2025-03-12 07:29:09 -07:00 |
|
feli
|
2585c78940
|
Merge branch 'develop' into ck_moe_rm_oob
|
2025-03-12 16:05:59 +08:00 |
|
coderfeli
|
40542296de
|
clang format
|
2025-03-12 08:05:12 +00:00 |
|
Illia Silin
|
8cbcd3e0d0
|
Revert "[CK_TILE] support hdim=192/128 pair for deepseekv3 (#1961)" (#1969)
This reverts commit 7a93b16ff6.
|
2025-03-11 10:40:18 -07:00 |
|