Commit Graph

7855 Commits

Author SHA1 Message Date
Shangming Cai
a58c7f381e [PD] Fix clip logic when state indices lens are mismatch (#23323) 2026-04-21 13:22:20 +08:00
Yuhao Yang
5595f6e988 Fix trtllm mla chunked-prefill zero-length bug (#22291) (#22688) 2026-04-20 22:10:13 -07:00
Liangsheng Yin
6cc2eee50d [misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305) 2026-04-20 21:16:24 -07:00
Lewis
0d0405273b [Fix] Solve the error lead by _commit_transfer_to_req() when using IntraNode NVLink in PD disaggregation (#23252)
Co-authored-by: 百麒 <yaozhong.lyz@alibaba-inc.com>
2026-04-21 11:02:18 +08:00
Ke Bao
50fc2c9e23 Fix hybrid swa chunked prefill oom (#23174) 2026-04-21 10:46:45 +08:00
Zhangheng
ab3ce02de9 [Hybrid-Cache]: Refactor hybrid_pool_assembler.py (#23243) 2026-04-21 10:45:23 +08:00
ishandhanani
3c007ee5d4 fix(hicache): emit KV events for L2 host cache insertions (#22894)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: Ishan Dhanani <ishandhanani@gmail.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
2026-04-20 19:07:03 -07:00
ChangLiu0709
ac08ebed65 [AMD] Resolve Qwen3.5 MTP (speculative decoding) radix cache conflict. (#22908) 2026-04-20 18:17:11 -07:00
Liangsheng Yin
c7a4ebf3c8 [Refactor] Replace page_align_keys helper with RadixKey.page_aligned method (#23107) 2026-04-20 18:10:42 -07:00
Tarushii Goel
3e367f9bcd [sgl] fix incorrect behavior in cuda graph draft extend (#22832) 2026-04-20 16:29:16 -07:00
Tarushii Goel
100b0f86dd [sgl] add support for weight update function in spedec (#22088) 2026-04-20 16:26:20 -07:00
Tarushii Goel
28f3a2d8ed [sgl] multilayereagleworkerv2 fix (#22954) 2026-04-20 16:22:16 -07:00
Thomas Wang
57ecce9807 [AMD] Enable MTP for GLM-5-mxfp4 model (#23219) 2026-04-20 16:09:07 -07:00
jsheng_Linkedin
575fdc2c4c [CI][LoRA] Drop flaky all-None batch from multi-LoRA parity test (#23287) 2026-04-20 14:43:25 -07:00
shuwenn
b65799cf83 [SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2026-04-20 14:25:04 -07:00
shuwenn
dbcf7459b5 fix: reset empty prefill batch fullness (#23138) 2026-04-20 14:14:00 -07:00
Liangsheng Yin
8cb957ccff [Perf] Make EAGLE bigram key an O(1) view on RadixKey (#23106) 2026-04-20 12:01:11 -07:00
Shunkangz
3dc1491c95 Support moe_dp_size = 1 for various attention_cp_size (#22003)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-04-20 11:58:19 -07:00
Lee Nau
b4bb036b73 fix legacy deepep path for flashinfer_cutedsl (#22925) 2026-04-20 11:49:33 -07:00
ishandhanani
b5d9a86e4c fix: add back priorty as radix cache policy (#23275) 2026-04-20 10:04:35 -07:00
Makcum888e
39c720d1b9 [Diffusion][NPU][CI] update perf numbers (#23056)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2026-04-20 19:34:11 +03:00
Mick
9a0fd2ff0c [diffusion] optimize: default to in-memory loading for URL/base64 image inputs (#23118) 2026-04-20 23:29:02 +08:00
Mick
0be6ab04dd [diffusion] refactor: LTX2.3 code cleanup (#23207) 2026-04-20 19:02:05 +08:00
Vladislav Nosivskoy
4028a73c10 [KV-Events] Fix kv events events publishing for CP (#22983)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
2026-04-20 17:34:38 +08:00
Bingxu Chen
69eb95f20c [AMD] Pin peft<0.19 in pyproject_other.toml to fix ROCm CI ImportError (#23161)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-19 23:43:56 -07:00
Liangsheng Yin
a2d30d27fe wait for reap in kill_process_tree (#23213) 2026-04-19 23:36:33 -07:00
Bingxu Chen
ab936ce694 Revert "perf: optimize PCG inductor path for FP8 models (#21734)" (#23159)
Feel free to PR again.
2026-04-19 23:32:50 -07:00
Alex Nails
10e17cc55e [gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736) 2026-04-20 12:39:35 +08:00
Baizhou Zhang
c304d0d64d [Refactor] Deduplicate NSA utils.py into cp_utils.py for context parallel (#22914)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 21:35:35 -07:00
Liangsheng Yin
eb76aaba88 [core] Always-on StreamingSession in UnifiedRadixCache (#23202) 2026-04-19 21:19:43 -07:00
Shunkangz
e389a52cc8 Support allreduce fusion with cp (#21249)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-04-19 21:06:00 -07:00
Liangsheng Yin
a7276b623e integrate streaming session into UnifiedRadixCache (#23145) 2026-04-19 20:47:41 -07:00
Byron Hsu
1cff871c67 [Bugfix] Fix DeepEP timeout when compiling DeepGeMM in EP+DP+TP (#23185)
Co-authored-by: Byron Hsu <byronhsu@Byrons-MacBook-Pro.local>
Co-authored-by: Cheng Wan <ch-wan@users.noreply.github.com>
2026-04-19 17:36:11 -07:00
Liangsheng Yin
d3ce664612 move session to python/sglang/srt/session (#23144) 2026-04-19 17:34:19 -07:00
Baidu-AIAK
7ca3566130 Multi platform Plugin (#21388)
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com>
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
Co-authored-by: Alex Nails <alexj.nails@gmail.com>
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-19 17:23:51 -07:00
inkcherry
4fa3482180 [Bugfix] Add missing http_worker_ipc in session error path (#22766) 2026-04-19 12:47:09 -07:00
billishyahao
b74a9dd854 [AMD] fix tbo runtime error when initializing metadata for cuda graph (#22598) 2026-04-19 12:42:48 -07:00
Baizhou Zhang
6ecd6f84db [CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-19 05:32:36 -07:00
Thomas Wang
03828f4205 [AMD] Reduce NSA indexer kernels (weights_proj, k-cache store kernel fusion) (#22850) 2026-04-19 00:18:11 -07:00
Kehan Li
2a327f0877 Fix Qwen3.5 video processing when passing video_data in "processor_output" format (#22431) 2026-04-19 00:04:01 +08:00
Xiaoyu Zhang
cd6ad80c00 diffusion: add HunyuanVideo GroupNorm+SiLU fast path (#22814) 2026-04-18 23:38:49 +08:00
Xiaoyu Zhang
c6a45fab64 Qwen3next flashinfer allreduce auto enable (#22664) 2026-04-18 22:32:41 +08:00
Yisheng Gong
4839cecbb0 [main] chore: add bias for base layer with lora (#22169) 2026-04-18 02:07:02 -07:00
Mick
0d94c3366a [diffusion] feat: introduce ltx-2-two-stage device manager (#22869) 2026-04-18 11:04:33 +08:00
Xiaoyu Zhang
615d6c93b2 [codex] Add flashinfer TRTLLM backend for diffusion NVFP4 (#22717) 2026-04-18 09:06:28 +08:00
Lianmin Zheng
9c47bbad13 Clean up bench_one_batch warning and simplify norm dispatch (#23110) 2026-04-17 17:42:20 -07:00
Cheng Wan
5f7aee726a refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 17:05:13 -07:00
R0CKSTAR
26ae7b8bd7 [MLX] Support radix cache (#21509)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-04-18 07:00:50 +08:00
Liangsheng Yin
09b689b407 Apply HF transformers patches from sglang init (#23103) 2026-04-17 15:37:51 -07:00
Liangsheng Yin
573e12a7fc Merge /get_load into /v1/loads (#23010) 2026-04-17 13:36:51 -07:00