cctry
|
22cf7d2b42
|
[Fix] Handle nixlRemoteDisconnectError in NixlKVSender (#24296)
|
2026-05-05 17:23:42 -07:00 |
|
Lianmin Zheng
|
64f80eabbe
|
Register aten::rms_norm and aten::mm.dtype in batch invariant mode (#24459)
|
2026-05-05 17:21:34 -07:00 |
|
Lianmin Zheng
|
46bde1f426
|
Add fwd_occupancy metric to SchedulerStats and Prometheus collector (#24458)
|
2026-05-05 17:04:34 -07:00 |
|
Xinyuan Tong
|
1e404afec2
|
fix(req_pool): bump pool.size to match actual tensor row count after #24243 (#24439)
|
2026-05-05 16:58:26 -07:00 |
|
Lianmin Zheng
|
710fed10fb
|
Revert "[fix] /pause_generation and /continue_generation wrong for --tokenizer-worker-num > 1" (#24461)
|
2026-05-05 16:44:34 -07:00 |
|
maocheng23
|
431ca54334
|
[fix] /pause_generation and /continue_generation wrong for --tokenizer-worker-num > 1 (#24445)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-05 16:27:58 -07:00 |
|
Liangsheng Yin
|
08d4c2072b
|
move topk capturers to srt/state_capturer/ (#24450)
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: Ziang Li <ziangli@umich.edu>
|
2026-05-05 15:54:01 -07:00 |
|
Liangsheng Yin
|
47a416fc62
|
add indexer-topk capture (V3.2 NSA + infra) (#24392)
|
2026-05-05 15:05:15 -07:00 |
|
Liangsheng Yin
|
c4c0376fcb
|
consolidate routed-experts capturer onto reusable base (#24403)
|
2026-05-05 12:41:49 -07:00 |
|
Mick
|
d23ef408f7
|
[diffusion] fix: fix RowParallel LoRA merged forwarding (#24410)
|
2026-05-06 00:30:16 +08:00 |
|
Mick
|
cc54d8e8d0
|
[diffusion] chore: clean CUDA cache only at explicit release points (#24397)
|
2026-05-05 22:30:43 +08:00 |
|
Polisetty V R K Jyothendra Varma
|
fdfc46f3a5
|
[Intel GPU] Enable DeepSeek V3.2 inference on XPU (#24356)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
|
2026-05-05 20:47:40 +08:00 |
|
Zhangheng
|
e299ec1bff
|
[UnifiedRadixTree]: Fix flaky ci (#24421)
|
2026-05-05 20:22:19 +08:00 |
|
Khoa Pham
|
d22853480d
|
Fix deterministic inference on models with SWAKVPool (#24395)
|
2026-05-05 20:20:46 +08:00 |
|
Bi Xue
|
9fb9a1cca6
|
[sgl] expose swa and mamba cache metrics (#24396)
|
2026-05-05 20:19:50 +08:00 |
|
Xiaoyu Zhang
|
67e8bd7a80
|
[codex] Optimize Helios fused norm modulation (#24059)
|
2026-05-05 19:28:37 +08:00 |
|
Xiaoyu Zhang
|
8c703f215e
|
Add HunyuanVideo ModelOpt FP8 diffusion support (#23199)
|
2026-05-05 19:27:28 +08:00 |
|
billishyahao
|
80ccb6b93c
|
[AMD] fix tbo specv2 seq_lens_cpu NoneType error (#24319)
|
2026-05-05 01:54:43 -07:00 |
|
Mick
|
177babcc38
|
[diffusion] optimize: fuse LTX2 split rotary embedding (#24411)
|
2026-05-05 16:07:40 +08:00 |
|
Hubert Lu
|
c2db19ffa4
|
[AMD] Enable EAGLE speculative decoding for Qwen3.5 FP8 and MXFP4 models with aiter's unified attention (#23146)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: sogalin <39478626+sogalin@users.noreply.github.com>
|
2026-05-05 00:09:40 -07:00 |
|
Mick
|
04926e1d9f
|
[diffusion] feat: cache encoder results for default negative prompt (#24304)
|
2026-05-05 11:56:01 +08:00 |
|
Mick
|
e483e60b72
|
[diffusion] CI: pin diffusion consistency GT revision (#24400)
|
2026-05-05 11:53:22 +08:00 |
|
Ethan (Yusheng) Su
|
2b769d37a4
|
(2/n - prefill optimize)perf(lora): remove GPU-CPU sync barrier (.item()) in MoE LoRA path and remove duplicate code (#24246)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-04 18:11:28 -07:00 |
|
Mick
|
2f7d99b7f7
|
[diffusion] cli: support component attention backend overrides (#24320)
|
2026-05-05 08:39:27 +08:00 |
|
Xiaoyu Zhang
|
078f84d80d
|
[SKILL] Add diffusion benchmark presets for edit and Hunyuan3D models (#24288)
Co-authored-by: BBuf Codex <bbuf-codex@users.noreply.github.com>
|
2026-05-05 08:18:12 +08:00 |
|
Ji Zeng
|
4b487ca98b
|
[Fix] NGRAMWorker.update_weights_from_tensor — delegate to target worker (#24344)
|
2026-05-04 16:23:17 -07:00 |
|
Liangsheng Yin
|
6a62eabed6
|
consolidate NSA pool construction (#24389)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2026-05-04 16:04:31 -07:00 |
|
Tarushii Goel
|
d7c93e183b
|
[sgl] reduce specdec cpu overhead (#23321)
|
2026-05-04 15:02:03 -07:00 |
|
Liangsheng Yin
|
4743cf6051
|
misc: add marlin to moe runner choices; drop dead env var doc (#24384)
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
|
2026-05-04 15:01:47 -07:00 |
|
Lianmin Zheng
|
29dd3a36c0
|
Refactor device timer installation and rename prefill prealloc to bootstrap (#24341)
|
2026-05-04 13:57:13 -07:00 |
|
Vladislav Nosivskoy
|
60a1dacd89
|
[HiCache] return cached_tokens_details in sglext for streaming responses (#22055)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-05-04 12:30:17 -07:00 |
|
Ke Bao
|
6dd7aebb36
|
Minor scheduler fixes (#24359)
|
2026-05-05 02:01:23 +08:00 |
|
Sam Shleifer
|
e6f252e9b8
|
Cache FlashInfer autotune configs (#24156)
|
2026-05-05 02:00:40 +08:00 |
|
Yuan Luo
|
e5c58eb9d6
|
[VLM] Optimize Gemma4 VLM with PCG and fuse RMSNorm + residual add + scalar (#24048)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-05-04 09:36:26 -07:00 |
|
Mick
|
1be3163011
|
[diffusion] fix: use direct all-to-all for USP collectives (#24366)
|
2026-05-05 00:08:48 +08:00 |
|
Xiaoyu Zhang
|
4b6d44641b
|
[diffusion] chore: enable channels-last 3D VAE convs by default (#23200)
|
2026-05-04 22:59:31 +08:00 |
|
Zhangheng
|
05aed5e1d5
|
[UnifiedRadixTree]: Add KL accuracy CI for UnifiedTree with HiCache (#24346)
|
2026-05-04 20:18:10 +08:00 |
|
Liangsheng Yin
|
84f3b44916
|
[tiny] misc cleanups across configs, attention, jit_kernel (#24350)
|
2026-05-04 03:17:14 -07:00 |
|
Linzhang Li
|
952b3caf18
|
feat: use structural tags to enable strict tool calling and reasoning for more models (#21722)
Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Ubospica <ubospica@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-05-04 02:30:28 -07:00 |
|
Khoa Pham
|
ef2b1b6d89
|
Fix flashinfer workspace OOM (#24172)
|
2026-05-04 01:26:35 -07:00 |
|
Ke Bao
|
aea527afdc
|
Fix swa chunk req deferred (#24318)
|
2026-05-04 14:52:15 +08:00 |
|
Liangsheng Yin
|
a91ae6af9e
|
nextn subclass owns post_load_weights is_nextn (#24333)
|
2026-05-03 22:04:44 -07:00 |
|
Liangsheng Yin
|
1dd8f6d5ae
|
dedup state_kv_args setup into helper (#24340)
|
2026-05-03 20:45:26 -07:00 |
|
Liangsheng Yin
|
91fa2340ed
|
extract adjust_hybrid_swa_layers_for_pp (#24334)
|
2026-05-03 18:52:54 -07:00 |
|
Ethan (Yusheng) Su
|
b7fefc0e85
|
feat(lora): enable csgmv backend with virtual experts for MoE LoRA (#24007)
|
2026-05-03 18:44:17 -07:00 |
|
Mick
|
c611a3fb78
|
[diffusion] chore: disable VAE cpu offload by default (#24315)
|
2026-05-04 08:24:51 +08:00 |
|
Liangsheng Yin
|
00d620b77d
|
introduce arg_groups/ with nemotron_h hook (#24328)
|
2026-05-03 16:28:11 -07:00 |
|
Liangsheng Yin
|
c3b6d20a80
|
Register deepseek_v32 alias instead of rewriting config.json (#24295)
|
2026-05-03 16:02:17 -07:00 |
|
Zhangheng
|
9a5450ad73
|
[PD]: Support incremental transfer for mooncake transfer engine (#24257)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-05-04 00:57:59 +08:00 |
|
Chi McIsaac
|
62265ca7fc
|
[diffusion] feat: initial support for dynamic batching (#18764)
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: Junhao Liu <junhaoliu2023@gmail.com>
|
2026-05-04 00:44:42 +08:00 |
|