fsx950223
7ce4f50da6
fix bugs
2025-06-03 07:17:56 +00:00
fsx950223
8e1dd4e7f9
store
2025-05-29 09:07:15 +00:00
fsx950223
4f823fd8f2
store
2025-05-28 11:15:38 +00:00
Po Yen Chen
50320955f2
Enable hdim=256 batch_decode instances
2025-05-28 02:43:31 +00:00
Po Yen Chen
6070090258
Mask out key values for each splits
2025-05-27 09:50:50 +00:00
Po Yen Chen
9512f78616
Support logits_soft_cap in batch_decode()
2025-04-21 23:33:52 +00:00
Po Yen Chen
fd84cf840a
Fix batch_decode() codegen
2025-04-21 13:04:35 +00:00
Po Yen Chen
69a3811c30
Fix missing handler
2025-04-21 09:49:06 +00:00
Po Yen Chen
9f509a0f74
Merge branch 'dev/testx_fa_cap_logits' into poyenc/batch-prefill-decode
2025-04-21 08:36:44 +00:00
coderfeli
9aa77620b8
fix perf
2025-04-21 05:43:57 +00:00
Po Yen Chen
3b8662f198
Re-format old CK source files
2025-04-21 11:39:33 +08:00
Po Yen Chen
e67f5f3d09
Re-enable some hdim pipelines
2025-04-21 11:39:33 +08:00
Po Yen Chen
09c5ad5241
Sync logits soft-capping across pipelines
2025-04-21 11:39:33 +08:00
Po Yen Chen
e0113b1c9d
Align receipt used in Aiter
2025-04-21 11:39:33 +08:00
Po Yen Chen
60ab71d780
Do not generate non-verified kernels
2025-04-21 11:39:33 +08:00
Po Yen Chen
2632baa05b
Support turn on/off logits_soft_cap in async pipeline
2025-04-21 11:39:33 +08:00
Po Yen Chen
b3829c11b5
Allow specifying logits_soft_cap through APIs
2025-04-21 11:39:33 +08:00
Po Yen Chen
124f47a5e8
Re-format files
2025-04-21 11:39:33 +08:00
Po Yen Chen
519c93bd08
Add batch prefill/decode kernels
2025-04-21 02:52:02 +00:00
Po Yen Chen
85becc24ee
Re-format old CK source files
2025-04-20 08:00:15 +00:00
Po Yen Chen
4f836f2f91
Re-enable some hdim pipelines
2025-04-20 07:57:47 +00:00
Po Yen Chen
889b2d33fd
Sync logits soft-capping across pipelines
2025-04-20 07:55:36 +00:00
Po Yen Chen
e0e9040f88
Align receipt used in Aiter
2025-04-20 07:18:12 +00:00
Po Yen Chen
6fb55c7e46
Do not generate non-verified kernels
2025-04-20 07:17:35 +00:00
Po Yen Chen
0200227d89
Support turn on/off logits_soft_cap in async pipeline
2025-04-20 06:56:20 +00:00
Po Yen Chen
87b22a7cff
Allow specifying logits_soft_cap through APIs
2025-04-20 06:23:38 +00:00
Po Yen Chen
4927305440
Re-format files
2025-04-20 04:28:58 +00:00
coderfeli
d2c653f177
fix bug
2025-04-18 10:45:53 +00:00
Bernard
838ff7034d
hack for cap logits
2025-04-18 08:38:25 +00:00
coderfeli
b20173a494
fix bug
2025-04-09 15:18:13 +00:00
coderfeli
867a4e527c
fix bugs
2025-04-04 13:40:10 +00:00
coderfeli
491178276f
fix fp8 scale
2025-04-03 11:10:37 +00:00
lalala-sh
3037d8bac1
mul int4 scale
2025-04-03 18:06:12 +08:00
root
20f6674bf6
fix no quant case
2025-04-03 02:46:01 +00:00
root
b2b34fffbb
fix fp8 16x16
2025-04-02 16:27:52 +00:00
root
85f83330b5
fuse moe activation
2025-04-02 07:02:09 +00:00
coderfeli
e285c77c5f
fix buid
2025-03-18 06:58:54 +00:00
coderfeli
1c90d50b5b
update moe api fix aiter build
2025-03-18 05:59:24 +00:00
coderfeli
98cee8d02b
fix merge
2025-03-18 05:45:04 +00:00
coderfeli
5f49b91237
merge develop
2025-03-18 04:49:40 +00:00
Illia Silin
1342ecf7fb
Add a daily CI build on gfx908. ( #1987 )
...
* add one daily ci build on gfx908
* add redis invocation tag for gfx908
* make ci build for gfx908 conditional
* fix groovy logic
* add option to run perf tests for gfx908
* disable a few tests on mi100
2025-03-17 18:08:53 -07:00
Illia Silin
07f25186b2
disable ck_tile basic gemm ( #1986 )
2025-03-17 15:26:43 -07:00
aledudek
5095906975
Async grouped gemm v3 ( #1940 )
...
* Fully async grouped gemm
* Remove commented code
* Remvoe maybe_unused
* host kernel args
* Checkpoint segfault debugging...
* Working part1
* Working part2
* Remvoe comments...
* Use void ptr for gemm kernel host args
* Fix device_grouped_gemm_multiple_d_dl build issue
* Fix device_grouped_gemm_xdl build issue
2025-03-17 16:42:43 +01:00
Bartłomiej Kocot
c2e4898b4b
Grouped conv bwd data NGCHW ( #1967 )
...
* Grouped conv bwd data NGCHW
* fixes
* fix
* Improvements
* Fix
* Fix
* add client example
2025-03-17 13:32:00 +01:00
coderfeli
7dbdff9f9f
moe sorting fix moebuf
2025-03-17 06:20:57 +00:00
coderfeli
5eaa36be18
mork to support 13w tokens
2025-03-17 01:45:34 +00:00
coderfeli
ef8c1333b9
use uint32
2025-03-17 01:45:09 +00:00
coderfeli
6c0e021235
revert v1 test
2025-03-17 01:39:57 +00:00
coderfeli
bccc5192cf
fix uint32
2025-03-17 01:18:32 +00:00
coderfeli
da2659d502
input output all ok
2025-03-15 14:26:30 +00:00