Yi Zhang
|
21a8fa16ea
|
tiny optimize for bench serving (#12553)
|
2025-11-03 14:13:18 -08:00 |
|
Lianmin Zheng
|
7a21d8b276
|
Reduce the overhead of nccl symmetric memory (#12524)
Co-authored-by: Nicolas Castet <ncastet@nvidia.com>
|
2025-11-03 11:56:27 -08:00 |
|
Jonah Bernard
|
6ef23b9833
|
[Test] Add parameters to SRTRunner (#12227)
|
2025-11-03 11:20:56 -08:00 |
|
fzyzcjy
|
385599cb04
|
Fix error when calling quantization (#12548)
|
2025-11-03 10:17:43 -08:00 |
|
Yueyang Pan
|
952fbe47cb
|
fix: fix the bug which leads qwen2_5_vl to crash with mixed_chunk (#11330)
Signed-off-by: PanJason <pyyjason@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>
|
2025-11-03 09:26:03 -08:00 |
|
Liangsheng Yin
|
edb2569356
|
[hot-fix] Fix broken CI (#12564)
|
2025-11-04 00:03:25 +08:00 |
|
Liangsheng Yin
|
3529c061bb
|
[spec v2] Fix output repetition by speculative sampling error (#12561)
|
2025-11-03 23:00:17 +08:00 |
|
harrisonlimh
|
ffb32a8548
|
Conditionally recapture cuda graph after model weight update from disk (#12060)
|
2025-11-03 05:51:27 -08:00 |
|
Atream
|
14d8064803
|
fix: Fix KTransformers hybrid inference with int8 quantization and format (#12536)
|
2025-11-03 04:59:39 -08:00 |
|
yinghui
|
de0b10cf5c
|
fix: move dummy format loader check before quantization checks (#12532)
|
2025-11-02 23:41:30 -08:00 |
|
Baizhou Zhang
|
6e29446e45
|
[hotfix] Remove flashinfer-jit-cache from pyproject (#12530)
|
2025-11-02 22:11:05 -08:00 |
|
Yineng Zhang
|
0c3543d7d5
|
chore: upgrade flashinfer 0.5.0 (#12523)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-11-02 20:54:12 -08:00 |
|
Haian Huang(深度眸)
|
65f1d065c5
|
[Bug] Fix Intern-S1 model accuracy and support /generate interface with input_ids (#12367)
|
2025-11-02 20:22:33 -08:00 |
|
Johnsonms
|
9434a0e50f
|
[Refact] Remove hardcoded KV cache dimension in MLATokenToKVPool (#12502)
|
2025-11-02 19:49:53 -08:00 |
|
Lianmin Zheng
|
20315697f4
|
move all get_stream in sgl_kernel to c++ to reduce the launch overhead (#12521)
|
2025-11-02 13:15:05 -08:00 |
|
fzyzcjy
|
c9db79117f
|
Super tiny fix naming in bench serving scripts (#12515)
|
2025-11-02 12:43:10 -08:00 |
|
Hanming Lu
|
66fb9b1307
|
[ServerArgs] allow --mamba-ssm-dtype extend (#12481)
|
2025-11-02 11:50:04 -08:00 |
|
Yuan Luo
|
819fc59123
|
Add prefix for torch symm mem (#12506)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-11-02 11:23:05 -08:00 |
|
kousakawang
|
7efd8b3d1f
|
[FEAT] Shared mem pool based cuda ipc for multi-modal data transport (#11917)
Co-authored-by: kousakawang <wanghanpei@bytedance.com>
Co-authored-by: Yuan Luo <4908075+yuan-luo@users.noreply.github.com>
|
2025-11-02 16:46:37 +08:00 |
|
Ho-Ren (Jack) Chuang
|
76196b3cbf
|
feat: Add FP4 (E2M1) KV Cache Support with Quantization Utilities for MLA (#10078)
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Co-authored-by: Yichen Wang <yichen.wang@bytedance.com>
|
2025-11-01 22:24:58 -07:00 |
|
Binyao Jiang
|
3451fc3280
|
[Feature] Qwen3-Next & FLA: Support MTP topk>1; Up to 6% faster (#11133)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-11-01 19:47:56 -07:00 |
|
Zhihao Lyu
|
c550ab9125
|
[Ascend] Add Ascend NPU support for sglang.check_env & rework proposal (#11052)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2025-11-01 19:26:45 -07:00 |
|
Xun Sun
|
0afd68321b
|
Update Mooncake EP's a2a interface (#12391)
|
2025-11-01 18:48:47 -07:00 |
|
Johnsonms
|
6f858930c8
|
[Bug] test_flashattn_mla_backend errors in Hopper #12487 (#12488)
|
2025-11-01 18:28:06 -07:00 |
|
hzh0425
|
6b634493c3
|
[HICache / PD]: Support offloading incremental KV cache in decode side. (#11966)
|
2025-11-01 14:59:37 -07:00 |
|
Xinyuan Tong
|
d2a8f71c2f
|
[feat] Add SGLANG_TOOL_STRICT_LEVEL for tool-call behavior control (#12423)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-11-01 13:15:02 -07:00 |
|
Ke Bao
|
69193f7122
|
Filter tokenizer warning for kimi models (#12485)
|
2025-11-01 16:27:31 +08:00 |
|
yinghui
|
d5b6e50fe8
|
perf: trtllm mla performance minor improvements (#12435)
|
2025-10-31 22:48:02 -07:00 |
|
Liangsheng Yin
|
9632e48f5d
|
[hot fix] Remove from python.sglang.xxx (#12483)
|
2025-11-01 11:00:05 +08:00 |
|
Qiaolin Yu
|
59cce5941a
|
Use sgl fp4 quant kernel by default (#12482)
|
2025-10-31 19:51:28 -07:00 |
|
Surya-Gunukula
|
795e98f8a6
|
Forward unknown tool calls instead of dropping (#12226)
|
2025-11-01 02:10:35 +00:00 |
|
Shangming Cai
|
358ae3563d
|
Tiny fix eos handling for PD disaggregation (#12334)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-31 17:57:10 -07:00 |
|
sglang-bot
|
41c10e67fc
|
chore: bump SGLang version to 0.5.4.post2 (#12439)
|
2025-10-31 17:38:50 -07:00 |
|
Xinyuan Tong
|
0bfe1d145c
|
fa3 & trtllm_mha spec overlap (#11874)
|
2025-10-31 17:38:13 -07:00 |
|
Ke Bao
|
a4bf5c6ad2
|
Support Kimi Linear (#12469)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2025-10-31 14:03:35 -07:00 |
|
fzyzcjy
|
30ad107028
|
Try to allow NCCL cumem for multi node nvlink case (#11987)
|
2025-10-31 12:48:25 -07:00 |
|
Ke Bao
|
f7f9e41b36
|
Fix run benchmark (#12473)
|
2025-11-01 02:39:48 +08:00 |
|
ishandhanani
|
263eab9f5d
|
fix: dummy health check server not accessible on non-zero rank nodes (#12297)
|
2025-10-31 11:34:57 -07:00 |
|
fzyzcjy
|
25257d8e00
|
Tiny assert no running requests when releasing memory to avoid IMA (#12341)
|
2025-11-01 01:28:53 +08:00 |
|
daniel, chen
|
cf0c24150a
|
add served model name in bench serving (#12428)
|
2025-11-01 01:28:11 +08:00 |
|
huangtingwei
|
5538e05cb1
|
fix default env var for mooncake store (#12429)
|
2025-11-01 01:25:33 +08:00 |
|
Yuan Luo
|
c30ebb9300
|
[VLM] Optimize async mm data process mechanism (#12066)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-11-01 01:24:53 +08:00 |
|
ykcombat
|
41efcaeb45
|
[Feature] PD-Multiplexing Context and Scheduler, lazy import spatial. (#12275)
|
2025-11-01 00:40:01 +08:00 |
|
0xNullPath
|
70562969b9
|
[Bug] OOM (Out-of-Memory) errors for extreme testing scenarios (min_tokens=2) (#11757)
Signed-off-by: Yan Lu <luyan@nvidia.com>
|
2025-11-01 00:28:41 +08:00 |
|
Ke Bao
|
0095e01874
|
Fix lint in deepseek-ocr (#12470)
|
2025-11-01 00:08:19 +08:00 |
|
Xinyuan Tong
|
684864814b
|
Feat: deepseek-ocr logits processor (#12415)
Co-authored-by: xinyuant <xinyuant@usc.edu>
|
2025-10-31 23:35:22 +08:00 |
|
sjtu_shenhai
|
410225b719
|
[Bug fix] Fix severe memory waste issue with torch.empty pin_memory (#12266)
|
2025-10-31 21:30:37 +08:00 |
|
Liangsheng Yin
|
2c9aebea70
|
Simplify watchdog (#12463)
|
2025-10-31 21:17:38 +08:00 |
|
Kindyaa
|
bc741073a3
|
fix:watchdog thread exception (#12328)
|
2025-10-31 20:54:50 +08:00 |
|
Yuhong Guo
|
2f6af1a3de
|
Enable bailing_moe to support TP=16 (#12369)
|
2025-10-31 19:32:49 +08:00 |
|