Shangming Cai
|
a58c7f381e
|
[PD] Fix clip logic when state indices lens are mismatch (#23323)
|
2026-04-21 13:22:20 +08:00 |
|
Yuhao Yang
|
5595f6e988
|
Fix trtllm mla chunked-prefill zero-length bug (#22291) (#22688)
|
2026-04-20 22:10:13 -07:00 |
|
Liangsheng Yin
|
6cc2eee50d
|
[misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305)
|
2026-04-20 21:16:24 -07:00 |
|
Lewis
|
0d0405273b
|
[Fix] Solve the error lead by _commit_transfer_to_req() when using IntraNode NVLink in PD disaggregation (#23252)
Co-authored-by: 百麒 <yaozhong.lyz@alibaba-inc.com>
|
2026-04-21 11:02:18 +08:00 |
|
Ke Bao
|
50fc2c9e23
|
Fix hybrid swa chunked prefill oom (#23174)
|
2026-04-21 10:46:45 +08:00 |
|
Zhangheng
|
ab3ce02de9
|
[Hybrid-Cache]: Refactor hybrid_pool_assembler.py (#23243)
|
2026-04-21 10:45:23 +08:00 |
|
ishandhanani
|
3c007ee5d4
|
fix(hicache): emit KV events for L2 host cache insertions (#22894)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: Ishan Dhanani <ishandhanani@gmail.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
|
2026-04-20 19:07:03 -07:00 |
|
ChangLiu0709
|
ac08ebed65
|
[AMD] Resolve Qwen3.5 MTP (speculative decoding) radix cache conflict. (#22908)
|
2026-04-20 18:17:11 -07:00 |
|
Liangsheng Yin
|
c7a4ebf3c8
|
[Refactor] Replace page_align_keys helper with RadixKey.page_aligned method (#23107)
|
2026-04-20 18:10:42 -07:00 |
|
Tarushii Goel
|
3e367f9bcd
|
[sgl] fix incorrect behavior in cuda graph draft extend (#22832)
|
2026-04-20 16:29:16 -07:00 |
|
Tarushii Goel
|
100b0f86dd
|
[sgl] add support for weight update function in spedec (#22088)
|
2026-04-20 16:26:20 -07:00 |
|
Tarushii Goel
|
28f3a2d8ed
|
[sgl] multilayereagleworkerv2 fix (#22954)
|
2026-04-20 16:22:16 -07:00 |
|
Thomas Wang
|
57ecce9807
|
[AMD] Enable MTP for GLM-5-mxfp4 model (#23219)
|
2026-04-20 16:09:07 -07:00 |
|
jsheng_Linkedin
|
575fdc2c4c
|
[CI][LoRA] Drop flaky all-None batch from multi-LoRA parity test (#23287)
|
2026-04-20 14:43:25 -07:00 |
|
shuwenn
|
b65799cf83
|
[SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2026-04-20 14:25:04 -07:00 |
|
shuwenn
|
dbcf7459b5
|
fix: reset empty prefill batch fullness (#23138)
|
2026-04-20 14:14:00 -07:00 |
|
Liangsheng Yin
|
8cb957ccff
|
[Perf] Make EAGLE bigram key an O(1) view on RadixKey (#23106)
|
2026-04-20 12:01:11 -07:00 |
|
Shunkangz
|
3dc1491c95
|
Support moe_dp_size = 1 for various attention_cp_size (#22003)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-04-20 11:58:19 -07:00 |
|
Lee Nau
|
b4bb036b73
|
fix legacy deepep path for flashinfer_cutedsl (#22925)
|
2026-04-20 11:49:33 -07:00 |
|
ishandhanani
|
b5d9a86e4c
|
fix: add back priorty as radix cache policy (#23275)
|
2026-04-20 10:04:35 -07:00 |
|
Makcum888e
|
39c720d1b9
|
[Diffusion][NPU][CI] update perf numbers (#23056)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-04-20 19:34:11 +03:00 |
|
Mick
|
9a0fd2ff0c
|
[diffusion] optimize: default to in-memory loading for URL/base64 image inputs (#23118)
|
2026-04-20 23:29:02 +08:00 |
|
Mick
|
0be6ab04dd
|
[diffusion] refactor: LTX2.3 code cleanup (#23207)
|
2026-04-20 19:02:05 +08:00 |
|
Vladislav Nosivskoy
|
4028a73c10
|
[KV-Events] Fix kv events events publishing for CP (#22983)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-04-20 17:34:38 +08:00 |
|
Bingxu Chen
|
69eb95f20c
|
[AMD] Pin peft<0.19 in pyproject_other.toml to fix ROCm CI ImportError (#23161)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-19 23:43:56 -07:00 |
|
Liangsheng Yin
|
a2d30d27fe
|
wait for reap in kill_process_tree (#23213)
|
2026-04-19 23:36:33 -07:00 |
|
Bingxu Chen
|
ab936ce694
|
Revert "perf: optimize PCG inductor path for FP8 models (#21734)" (#23159)
Feel free to PR again.
|
2026-04-19 23:32:50 -07:00 |
|
Alex Nails
|
10e17cc55e
|
[gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736)
|
2026-04-20 12:39:35 +08:00 |
|
Baizhou Zhang
|
c304d0d64d
|
[Refactor] Deduplicate NSA utils.py into cp_utils.py for context parallel (#22914)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-19 21:35:35 -07:00 |
|
Liangsheng Yin
|
eb76aaba88
|
[core] Always-on StreamingSession in UnifiedRadixCache (#23202)
|
2026-04-19 21:19:43 -07:00 |
|
Shunkangz
|
e389a52cc8
|
Support allreduce fusion with cp (#21249)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-04-19 21:06:00 -07:00 |
|
Liangsheng Yin
|
a7276b623e
|
integrate streaming session into UnifiedRadixCache (#23145)
|
2026-04-19 20:47:41 -07:00 |
|
Byron Hsu
|
1cff871c67
|
[Bugfix] Fix DeepEP timeout when compiling DeepGeMM in EP+DP+TP (#23185)
Co-authored-by: Byron Hsu <byronhsu@Byrons-MacBook-Pro.local>
Co-authored-by: Cheng Wan <ch-wan@users.noreply.github.com>
|
2026-04-19 17:36:11 -07:00 |
|
Liangsheng Yin
|
d3ce664612
|
move session to python/sglang/srt/session (#23144)
|
2026-04-19 17:34:19 -07:00 |
|
Baidu-AIAK
|
7ca3566130
|
Multi platform Plugin (#21388)
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com>
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
Co-authored-by: Alex Nails <alexj.nails@gmail.com>
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 17:23:51 -07:00 |
|
inkcherry
|
4fa3482180
|
[Bugfix] Add missing http_worker_ipc in session error path (#22766)
|
2026-04-19 12:47:09 -07:00 |
|
billishyahao
|
b74a9dd854
|
[AMD] fix tbo runtime error when initializing metadata for cuda graph (#22598)
|
2026-04-19 12:42:48 -07:00 |
|
Baizhou Zhang
|
6ecd6f84db
|
[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 05:32:36 -07:00 |
|
Thomas Wang
|
03828f4205
|
[AMD] Reduce NSA indexer kernels (weights_proj, k-cache store kernel fusion) (#22850)
|
2026-04-19 00:18:11 -07:00 |
|
Kehan Li
|
2a327f0877
|
Fix Qwen3.5 video processing when passing video_data in "processor_output" format (#22431)
|
2026-04-19 00:04:01 +08:00 |
|
Xiaoyu Zhang
|
cd6ad80c00
|
diffusion: add HunyuanVideo GroupNorm+SiLU fast path (#22814)
|
2026-04-18 23:38:49 +08:00 |
|
Xiaoyu Zhang
|
c6a45fab64
|
Qwen3next flashinfer allreduce auto enable (#22664)
|
2026-04-18 22:32:41 +08:00 |
|
Yisheng Gong
|
4839cecbb0
|
[main] chore: add bias for base layer with lora (#22169)
|
2026-04-18 02:07:02 -07:00 |
|
Mick
|
0d94c3366a
|
[diffusion] feat: introduce ltx-2-two-stage device manager (#22869)
|
2026-04-18 11:04:33 +08:00 |
|
Xiaoyu Zhang
|
615d6c93b2
|
[codex] Add flashinfer TRTLLM backend for diffusion NVFP4 (#22717)
|
2026-04-18 09:06:28 +08:00 |
|
Lianmin Zheng
|
9c47bbad13
|
Clean up bench_one_batch warning and simplify norm dispatch (#23110)
|
2026-04-17 17:42:20 -07:00 |
|
Cheng Wan
|
5f7aee726a
|
refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-17 17:05:13 -07:00 |
|
R0CKSTAR
|
26ae7b8bd7
|
[MLX] Support radix cache (#21509)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-04-18 07:00:50 +08:00 |
|
Liangsheng Yin
|
09b689b407
|
Apply HF transformers patches from sglang init (#23103)
|
2026-04-17 15:37:51 -07:00 |
|
Liangsheng Yin
|
573e12a7fc
|
Merge /get_load into /v1/loads (#23010)
|
2026-04-17 13:36:51 -07:00 |
|