Commit Graph

7855 Commits

Author SHA1 Message Date
Brayden Zhong
80a6b32703 [Perf] Optimize NSA backend metadata under MTP (#19536)
Co-authored-by: Baidu-AIAK <Baidu_AIAK@163.com>
Co-authored-by: zengpai <zengpai@baidu.com>
2026-03-01 01:59:26 -08:00
Mick
d098c8dab0 [diffusion] add .claude and update contributing with attitude towards vibe-pr (#19511) 2026-03-01 14:41:55 +08:00
Bingxu Chen
5fa6633485 [AMD] Fix MoRI EP warmup hang by restoring deepep_mode=normal default (#19498) 2026-02-28 22:05:22 -08:00
Kangyan-Zhou
dcf462cfba Revert "[HiCache] Enable spec v2 + decode KV cache offloading compatibility" (#19613) 2026-02-28 21:54:32 -08:00
Kangyan-Zhou
8167346609 [HiCache] Enable spec v2 + decode KV cache offloading compatibility (#19518)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:53:52 -08:00
Yi Zhong
894e887e4a [Blackwell] Make mxint4 flashinfer_trtllm moe gemm set by default on blackwell (#18136)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2026-03-01 05:21:01 +00:00
Jun Liu
38dc372dae [Bugfix] Fix KeyError: 'prompt_tokens' when streaming requests are aborted (#19514) 2026-02-28 20:21:35 -08:00
Zhang Yiyang (SII)
4ec450e97b [diffusion][MOVA] fix: fix task type in MOVA pipeline and shared model placement (#19489) 2026-03-01 12:13:15 +08:00
Liangsheng Yin
5acb45cf32 [Session] Extract SessionController and clean up session logic in Scheduler (#19547) 2026-02-28 19:47:44 -08:00
Alison Shao
a45613f2a6 Revert "[SGL] sync patch: Remove sync points, prefill cudagraph for DP, disable cache reset in mem check (#19190)" (#19581)
Co-authored-by: Alison Shao <alisonshao@mac.lan>
2026-02-28 19:46:47 -08:00
fzyzcjy
e64095c3c7 Support data parallel attention in dump comparator (#19602) 2026-03-01 10:51:21 +08:00
fzyzcjy
ea6ff7b01f Support multi sharding group on the same dimension in dump comparator (#19601) 2026-03-01 10:36:48 +08:00
fzyzcjy
46960e65cf Add skip patterns, tee to file, tensor load warning in dump comparator (#19600) 2026-03-01 10:36:22 +08:00
fzyzcjy
b0b26a7ef1 Support concat mode in token aligner in dump comparator (#19599) 2026-03-01 10:35:50 +08:00
fzyzcjy
e78f1283f7 Support overriding and post-hoc providing metadata in dump comparator (#19598) 2026-03-01 10:35:06 +08:00
fzyzcjy
e41164af1c Enhance replicated tensor checker in dump comparator (#19597) 2026-03-01 10:34:34 +08:00
fzyzcjy
ec08240a6a Support data parallel in dump comparator (#19596) 2026-03-01 10:34:03 +08:00
fzyzcjy
003ad6daaa Support partial tensors waiting for reduction and pipeline parallel in dump comparator (#19595) 2026-03-01 10:33:39 +08:00
fzyzcjy
67810828cf Visualize per-token information in dump comparator (#19594) 2026-03-01 10:32:59 +08:00
fzyzcjy
f5a10e04cd Support arbitrary filtering in dumper (#19593) 2026-03-01 10:31:21 +08:00
Duyi-Wang
8240a87306 [AMD] MORI-EP support for EP4. (#19578) 2026-02-28 13:13:46 -08:00
Haodi Lei
f451664504 [Fix] Add --disable-draft-model-update to control draft model updates(especially in RL) (#15726)
Co-authored-by: leihaodi <haodilei@gmail.com>
2026-02-28 12:09:55 -08:00
Mohammad Miadh Angkad
9c81ce4707 [Anthropic API] Preserve image content in tool_result conversion (#19233)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2026-02-28 12:07:22 -08:00
zhangheng
a0d8a7ae6d [RadixTree][6/N Refactor]: Refactor SWARadixTree to simplify the computation and alignment of bigram keys. (#19427) 2026-02-28 20:01:39 +08:00
fzyzcjy
5705e02d28 Support singleton dimension squeezing in dump comparator (#19566) 2026-02-28 18:11:46 +08:00
fzyzcjy
80bbd30909 Visualize comparison detailed results in dump comparator (#19565) 2026-02-28 18:08:16 +08:00
fzyzcjy
40facdb28c Handle recompute and verify closeness in dumper (#19564) 2026-02-28 18:07:44 +08:00
fzyzcjy
63a4778542 Support non-intrusive arbitrary dumping in dumper and add e2e tests (#19563) 2026-02-28 18:06:55 +08:00
fzyzcjy
ccbc47d6be Update layer id extraction, diffing, empty handling and error sentinel in dump comparator (#19562) 2026-02-28 18:06:26 +08:00
fzyzcjy
4097eb5ce9 Support patching source code (#19561) 2026-02-28 18:05:45 +08:00
fzyzcjy
b73aa53d7e Enhance metrics in dump comparator (#19560) 2026-02-28 18:05:19 +08:00
fzyzcjy
706ab9296a Support method decorator for tagging and add minimalistic comparator in dumper (#19559) 2026-02-28 18:04:54 +08:00
fzyzcjy
9bf3638a25 Support handling arbitrary objects in dump comparator (#19558) 2026-02-28 18:04:13 +08:00
Michelle Wu
b7f13a7b73 [NPU] bugs fix for Deepseek models (#19544) 2026-02-28 17:26:15 +08:00
Shangming Cai
366574b2b8 [PD] Cleanup BootstrapServer init and ready check (#19551)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-02-28 16:41:42 +08:00
Hexq0210
4ebe9e1e2f [NPU] bugfix: resolve modelslim load weights bug (#19472) 2026-02-28 16:22:45 +08:00
Junhao Liu
53c767d224 [diffusion] Postprocess: implement frame interpolation using RIFE (#19384) 2026-02-28 14:13:20 +08:00
Yuhao Yang
b01b07aa16 [diffusion] CI: GT generation flow for diffusion CI (#19236)
Co-authored-by: Prozac614 <dwt614707404@163.com>
2026-02-28 14:07:45 +08:00
Shangming Cai
b01f3590be [PD] Support PD with context parallel after refactor (#19504)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
2026-02-28 13:11:15 +08:00
Yilong Zhao
79b1d2bac6 [loader] support presharded fused mlp loading (#19519) 2026-02-27 20:37:24 -08:00
Chang Su
71620122c9 feat(grpc): add multimodal TensorData parsing for vision inference (#19535)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-02-27 19:29:43 -08:00
Zheng Duan
a2ea5941d5 [feat] Support nvfp4 quantized model of Qwen3-Next (#17627) 2026-02-27 18:28:47 -08:00
Liangsheng Yin
ac400cb7bb [CLI] Add --model-type override and keep launch_server supported (#19523) 2026-02-27 18:16:31 -08:00
Liangsheng Yin
e08ef06758 [Session] Gate streaming sessions with --enable-streaming-session and spec v2 guard (#19531) 2026-02-27 18:14:55 -08:00
Leon Gao
b5a8e4179e [SGL] sync patch: Remove sync points, prefill cudagraph for DP, disable cache reset in mem check (#19190)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
2026-02-27 18:11:05 -08:00
chenxu214
5f07ff9271 Added the prefill delayer policy: The prefill deplay range is expanded. (#17456)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
2026-02-28 08:56:49 +08:00
Aurick Qiao
c6cb0c9649 [Session] Add streaming mode with SessionAwareCache fast path (#19171)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
2026-02-27 16:31:08 -08:00
Ziang Li
9469ad089b Fix nvfp4 weight update (#18085) 2026-02-27 14:55:08 -08:00
Alison Shao
6ca7da3e7c Fix nightly VLM accuracy: gemma3n TP fixes + removal, latency thresholds (#19401)
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
2026-02-27 14:24:02 -08:00
yrk111222
e6da514c2c CI: use 'sglang serve' in CI tests (#18597)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
2026-02-27 14:00:41 -08:00