Liangsheng Yin
|
95fb722dd2
|
Add registry for custom speculative algorithms (#23991)
|
2026-05-07 16:11:45 -07:00 |
|
Revanth Reddy Airre
|
c2c57068da
|
fix(http): apply SGLANG_TIMEOUT_KEEP_ALIVE in common.py (#24323)
Signed-off-by: Revanth Reddy Airre <revanthreddy@hippocraticai.com>
|
2026-05-07 16:01:41 -07:00 |
|
Xinyuan Tong
|
5b589ed2e7
|
feat(constrained): two-phase reasoning grammar + --enable-strict-thinking (#23953)
|
2026-05-07 14:21:51 -07:00 |
|
Xinyuan Tong
|
af2a2ac618
|
fix(function_call): handle Kimi-K2.5 bare numeric tool call IDs (#23950)
|
2026-05-07 14:20:02 -07:00 |
|
Xinyuan Tong
|
d8f9d32a05
|
feat(reasoning): auto-detect reasoning/tool-call parser from chat template (#23952)
|
2026-05-07 14:19:16 -07:00 |
|
Khoa Pham
|
d2c1034163
|
[Gemma 4] Adding MTP support (#24436)
Co-authored-by: Pengyu Chen <pychen96@gmail.com>
|
2026-05-07 14:08:41 -07:00 |
|
Xinyuan Tong
|
f1395af543
|
fix(openai): map reasoning.enabled to thinking AND enable_thinking (#23951)
|
2026-05-07 14:01:35 -07:00 |
|
R0CKSTAR
|
9cffa5ed6f
|
[MUSA] Bump torchada to 0.1.54 (#24592)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-05-07 11:45:49 -07:00 |
|
GXIN
|
90a618e37b
|
[NPU][diffusion] add selectable parallel VAE decode strategies (#23248)
Co-authored-by: 高鑫 <gaoxin@gaoxindeMacBook-Pro.local>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-07 21:37:03 +03:00 |
|
Junlin Wu
|
80a6014243
|
✨ [diffusion][npu][quant] Add MXFP8 quantization support for Wan2.2 Diffusion on Ascend NPU (#20922)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
2026-05-07 21:30:56 +03:00 |
|
McZyWu
|
7d397ad23d
|
[NPU]Support model Trinity-mini for Npu, accuracy 90% (#18172)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-05-07 20:58:18 +03:00 |
|
Mick
|
b0225a69dc
|
[diffusion] optimize: precompute LTX2 guidance perturbation states (#24494)
|
2026-05-08 01:18:42 +08:00 |
|
ykcai-daniel
|
9c41b1058f
|
[diffusion] refactor: refactor cfg parallelism framework to support multi-branch CFG for LTX2 (#23736)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-05-07 22:56:55 +08:00 |
|
Vladimir Serov
|
263cb3b222
|
[LoRA] Torch Native enhancement: embedding and graph optimization (#21885)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-05-07 17:28:38 +03:00 |
|
ovidiusm
|
811d138c8a
|
Nixl async transfer (#23967)
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>
|
2026-05-07 22:05:43 +08:00 |
|
Yuxuan Zhang
|
ec4560304b
|
[Bug Fix] Preserve decode state across retract-resume of GLM-5.1 (#23346)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-05-07 21:37:53 +08:00 |
|
Shangming Cai
|
e264b5785d
|
[PD] Centralize per-room cleanup in common backend (#24601)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-05-07 18:47:55 +08:00 |
|
inkcherry
|
3b2c730320
|
[AMD] Enable dual-stream MoE on ROCm (#24005)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
|
2026-05-07 02:27:24 -07:00 |
|
Hanming Lu
|
92f281f856
|
[Spec][trtllm] use decode kernel for draft extend (#24566)
|
2026-05-07 02:25:26 -07:00 |
|
weireweire
|
684638e053
|
Fix prefill batch iter logging under overlap (#20845)
|
2026-05-07 02:10:42 -07:00 |
|
Polisetty V R K Jyothendra Varma
|
9dfb1d2ebe
|
[Intel GPU] Fix flash_mla_get_workspace_size call in intel_xpu (#24372)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
|
2026-05-07 13:45:32 +08:00 |
|
Kangyan-Zhou
|
a2586f1c53
|
[CI] pin NeMo-Skills install to known-good SHA in accuracy_test_runner (#24581)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-06 22:24:48 -07:00 |
|
Yanbin Jiang
|
f0368a6666
|
[LoRA] Use deterministic lora_id for --lora-paths so multi-node ranks agree (#24555)
Co-authored-by: gh1595 <278903827+gh1595@users.noreply.github.com>
|
2026-05-06 22:20:15 -07:00 |
|
Jun Liu
|
65ce9965ce
|
[Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (#22715)
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
|
2026-05-06 19:20:04 -07:00 |
|
Baizhou Zhang
|
ecb786c8d7
|
[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (#24268)
|
2026-05-06 18:59:01 -07:00 |
|
Liangsheng Yin
|
eaf074d50e
|
propagate pytest exit code from test __main__ entries (#24487)
|
2026-05-06 18:46:52 -07:00 |
|
Yuzhen Zhou
|
4a279d9c36
|
[R3] Avoid implicit CUDA sync in routed experts DP slicing (#24550)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2026-05-06 18:37:36 -07:00 |
|
huangtingwei
|
27445f9836
|
Add ChatCompletionRequest-style support to /v1/tokenize (#23981)
|
2026-05-06 18:35:20 -07:00 |
|
Brayden Zhong
|
3fe8bc987e
|
Support Triton MLA FP8 KV cache (#20479)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
|
2026-05-06 18:32:39 -07:00 |
|
Mick
|
2e642ea187
|
[diffusion] chore: align LTX-2 with official (#24313)
|
2026-05-07 08:46:28 +08:00 |
|
Xiaoyu Zhang
|
a9a8b20a90
|
[codex] Optimize Z-Image packed QKV (#24117)
|
2026-05-07 07:51:22 +08:00 |
|
gh1595
|
ece7e95b65
|
[LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (#24420)
Co-authored-by: Yanbin Jiang <jybsuper@gmail.com>
|
2026-05-06 14:51:30 -07:00 |
|
Lianmin Zheng
|
b859f7ffba
|
Improve metrics, observability, and PD deploy tooling (#24521)
|
2026-05-06 11:27:35 -07:00 |
|
Xiaoyu Zhang
|
d86f2916cc
|
Fix diffusion fallback guards and validation (#23335)
|
2026-05-07 00:05:43 +08:00 |
|
Shangming Cai
|
32d9998b9d
|
[PD] Prevent update_status to Failed from cleared entries (#24539)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-05-06 23:32:04 +08:00 |
|
sky
|
bfc1aeae13
|
[CP] Register KV cache allgather buffer with symmetric memory (#24040)
Signed-off-by: wangfakang <fakangwang@gmail.com>
|
2026-05-06 23:24:36 +08:00 |
|
fzyzcjy
|
c4c5541618
|
Support getting checksums in weight checker (#24537)
|
2026-05-06 22:59:28 +08:00 |
|
fzyzcjy
|
ae5ae840f6
|
Refactor buffer patterns in weight checker (#24538)
|
2026-05-06 22:52:07 +08:00 |
|
Ke Bao
|
eb5f0fbeef
|
Support swa HiCache for unified radix cache (#23391)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-05-06 22:19:25 +08:00 |
|
fzyzcjy
|
491051c622
|
Cherry pick weight_checker _weight_fp32 buffer skip from #22663 (#24534)
Co-authored-by: JD <jaedon.guo@gmail.com>
|
2026-05-06 21:21:12 +08:00 |
|
fzyzcjy
|
0d40931b08
|
Cherry pick weight_checker non-persistent buffer pattern list from #21278 (#24533)
Co-authored-by: JD <jaedon.guo@gmail.com>
|
2026-05-06 21:14:01 +08:00 |
|
fzyzcjy
|
864f9633f2
|
Cherry pick weight_checker fp8 dequant fix and non-persistent buffer skip from #21494 (#24532)
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
|
2026-05-06 21:13:17 +08:00 |
|
Lianmin Zheng
|
d4d4b04d66
|
[PD] Fix missing update_status call in abort() across all KV backends (#24522)
|
2026-05-06 05:30:11 -07:00 |
|
cctry
|
163bf1ba71
|
[PD] Fix KV transfer metrics (#24416)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-05-06 03:44:48 -07:00 |
|
Xiaoyu Zhang
|
b67df7cd1b
|
[Codex] Diffusion handle non-contiguous CFG communication (#24332)
Co-authored-by: BBuf Codex <bbuf-codex@users.noreply.github.com>
|
2026-05-06 17:27:14 +08:00 |
|
sky
|
c8bc23522f
|
Refactor: decouple segment tracking from comm registration (#21392)
Signed-off-by: wangfakang <fakangwang@gmail.com>
|
2026-05-06 17:07:58 +08:00 |
|
fzyzcjy
|
a858fda708
|
Add e2e test with log snapshot in dumper grafter (#24513)
|
2026-05-06 17:00:13 +08:00 |
|
fzyzcjy
|
8527db0a91
|
Enhance diff and tensor-info logging in dumper grafter (#24512)
|
2026-05-06 16:58:08 +08:00 |
|
fzyzcjy
|
75943cfbcf
|
Support per-call extras and dataclass transform input in dumper grafter (#24511)
|
2026-05-06 16:57:44 +08:00 |
|
fzyzcjy
|
833279eb2e
|
Support multi-rank exchange via all_gather_object in dumper grafter (#24510)
|
2026-05-06 16:57:20 +08:00 |
|