Commit Graph

1808 Commits

Author SHA1 Message Date
fsx950223
7ce4f50da6 fix bugs 2025-06-03 07:17:56 +00:00
fsx950223
8e1dd4e7f9 store 2025-05-29 09:07:15 +00:00
fsx950223
4f823fd8f2 store 2025-05-28 11:15:38 +00:00
Po Yen Chen
50320955f2 Enable hdim=256 batch_decode instances 2025-05-28 02:43:31 +00:00
Po Yen Chen
6070090258 Mask out key values for each splits 2025-05-27 09:50:50 +00:00
Po Yen Chen
9512f78616 Support logits_soft_cap in batch_decode() 2025-04-21 23:33:52 +00:00
Po Yen Chen
fd84cf840a Fix batch_decode() codegen 2025-04-21 13:04:35 +00:00
Po Yen Chen
69a3811c30 Fix missing handler 2025-04-21 09:49:06 +00:00
Po Yen Chen
9f509a0f74 Merge branch 'dev/testx_fa_cap_logits' into poyenc/batch-prefill-decode 2025-04-21 08:36:44 +00:00
coderfeli
9aa77620b8 fix perf 2025-04-21 05:43:57 +00:00
Po Yen Chen
3b8662f198 Re-format old CK source files 2025-04-21 11:39:33 +08:00
Po Yen Chen
e67f5f3d09 Re-enable some hdim pipelines 2025-04-21 11:39:33 +08:00
Po Yen Chen
09c5ad5241 Sync logits soft-capping across pipelines 2025-04-21 11:39:33 +08:00
Po Yen Chen
e0113b1c9d Align receipt used in Aiter 2025-04-21 11:39:33 +08:00
Po Yen Chen
60ab71d780 Do not generate non-verified kernels 2025-04-21 11:39:33 +08:00
Po Yen Chen
2632baa05b Support turn on/off logits_soft_cap in async pipeline 2025-04-21 11:39:33 +08:00
Po Yen Chen
b3829c11b5 Allow specifying logits_soft_cap through APIs 2025-04-21 11:39:33 +08:00
Po Yen Chen
124f47a5e8 Re-format files 2025-04-21 11:39:33 +08:00
Po Yen Chen
519c93bd08 Add batch prefill/decode kernels 2025-04-21 02:52:02 +00:00
Po Yen Chen
85becc24ee Re-format old CK source files 2025-04-20 08:00:15 +00:00
Po Yen Chen
4f836f2f91 Re-enable some hdim pipelines 2025-04-20 07:57:47 +00:00
Po Yen Chen
889b2d33fd Sync logits soft-capping across pipelines 2025-04-20 07:55:36 +00:00
Po Yen Chen
e0e9040f88 Align receipt used in Aiter 2025-04-20 07:18:12 +00:00
Po Yen Chen
6fb55c7e46 Do not generate non-verified kernels 2025-04-20 07:17:35 +00:00
Po Yen Chen
0200227d89 Support turn on/off logits_soft_cap in async pipeline 2025-04-20 06:56:20 +00:00
Po Yen Chen
87b22a7cff Allow specifying logits_soft_cap through APIs 2025-04-20 06:23:38 +00:00
Po Yen Chen
4927305440 Re-format files 2025-04-20 04:28:58 +00:00
coderfeli
d2c653f177 fix bug 2025-04-18 10:45:53 +00:00
Bernard
838ff7034d hack for cap logits 2025-04-18 08:38:25 +00:00
coderfeli
b20173a494 fix bug 2025-04-09 15:18:13 +00:00
coderfeli
867a4e527c fix bugs 2025-04-04 13:40:10 +00:00
coderfeli
491178276f fix fp8 scale 2025-04-03 11:10:37 +00:00
lalala-sh
3037d8bac1 mul int4 scale 2025-04-03 18:06:12 +08:00
root
20f6674bf6 fix no quant case 2025-04-03 02:46:01 +00:00
root
b2b34fffbb fix fp8 16x16 2025-04-02 16:27:52 +00:00
root
85f83330b5 fuse moe activation 2025-04-02 07:02:09 +00:00
coderfeli
e285c77c5f fix buid 2025-03-18 06:58:54 +00:00
coderfeli
1c90d50b5b update moe api fix aiter build 2025-03-18 05:59:24 +00:00
coderfeli
98cee8d02b fix merge 2025-03-18 05:45:04 +00:00
coderfeli
5f49b91237 merge develop 2025-03-18 04:49:40 +00:00
Illia Silin
1342ecf7fb Add a daily CI build on gfx908. (#1987)
* add one daily ci build on gfx908

* add redis invocation tag for gfx908

* make ci build for gfx908 conditional

* fix groovy logic

* add option to run perf tests for gfx908

* disable a few tests on mi100
2025-03-17 18:08:53 -07:00
Illia Silin
07f25186b2 disable ck_tile basic gemm (#1986) 2025-03-17 15:26:43 -07:00
aledudek
5095906975 Async grouped gemm v3 (#1940)
* Fully async grouped gemm

* Remove commented code

* Remvoe maybe_unused

* host kernel args

* Checkpoint segfault debugging...

* Working part1

* Working part2

* Remvoe comments...

* Use void ptr for gemm kernel host args

* Fix device_grouped_gemm_multiple_d_dl build issue

* Fix device_grouped_gemm_xdl build issue
2025-03-17 16:42:43 +01:00
Bartłomiej Kocot
c2e4898b4b Grouped conv bwd data NGCHW (#1967)
* Grouped conv bwd data NGCHW

* fixes

* fix

* Improvements

* Fix

* Fix

* add client example
2025-03-17 13:32:00 +01:00
coderfeli
7dbdff9f9f moe sorting fix moebuf 2025-03-17 06:20:57 +00:00
coderfeli
5eaa36be18 mork to support 13w tokens 2025-03-17 01:45:34 +00:00
coderfeli
ef8c1333b9 use uint32 2025-03-17 01:45:09 +00:00
coderfeli
6c0e021235 revert v1 test 2025-03-17 01:39:57 +00:00
coderfeli
bccc5192cf fix uint32 2025-03-17 01:18:32 +00:00
coderfeli
da2659d502 input output all ok 2025-03-15 14:26:30 +00:00