Brayden Zhong
|
80a6b32703
|
[Perf] Optimize NSA backend metadata under MTP (#19536)
Co-authored-by: Baidu-AIAK <Baidu_AIAK@163.com>
Co-authored-by: zengpai <zengpai@baidu.com>
|
2026-03-01 01:59:26 -08:00 |
|
Mick
|
d098c8dab0
|
[diffusion] add .claude and update contributing with attitude towards vibe-pr (#19511)
|
2026-03-01 14:41:55 +08:00 |
|
Bingxu Chen
|
5fa6633485
|
[AMD] Fix MoRI EP warmup hang by restoring deepep_mode=normal default (#19498)
|
2026-02-28 22:05:22 -08:00 |
|
Kangyan-Zhou
|
dcf462cfba
|
Revert "[HiCache] Enable spec v2 + decode KV cache offloading compatibility" (#19613)
|
2026-02-28 21:54:32 -08:00 |
|
Kangyan-Zhou
|
8167346609
|
[HiCache] Enable spec v2 + decode KV cache offloading compatibility (#19518)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-28 21:53:52 -08:00 |
|
Yi Zhong
|
894e887e4a
|
[Blackwell] Make mxint4 flashinfer_trtllm moe gemm set by default on blackwell (#18136)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
|
2026-03-01 05:21:01 +00:00 |
|
Jun Liu
|
38dc372dae
|
[Bugfix] Fix KeyError: 'prompt_tokens' when streaming requests are aborted (#19514)
|
2026-02-28 20:21:35 -08:00 |
|
Zhang Yiyang (SII)
|
4ec450e97b
|
[diffusion][MOVA] fix: fix task type in MOVA pipeline and shared model placement (#19489)
|
2026-03-01 12:13:15 +08:00 |
|
Liangsheng Yin
|
5acb45cf32
|
[Session] Extract SessionController and clean up session logic in Scheduler (#19547)
|
2026-02-28 19:47:44 -08:00 |
|
Alison Shao
|
a45613f2a6
|
Revert "[SGL] sync patch: Remove sync points, prefill cudagraph for DP, disable cache reset in mem check (#19190)" (#19581)
Co-authored-by: Alison Shao <alisonshao@mac.lan>
|
2026-02-28 19:46:47 -08:00 |
|
fzyzcjy
|
e64095c3c7
|
Support data parallel attention in dump comparator (#19602)
|
2026-03-01 10:51:21 +08:00 |
|
fzyzcjy
|
ea6ff7b01f
|
Support multi sharding group on the same dimension in dump comparator (#19601)
|
2026-03-01 10:36:48 +08:00 |
|
fzyzcjy
|
46960e65cf
|
Add skip patterns, tee to file, tensor load warning in dump comparator (#19600)
|
2026-03-01 10:36:22 +08:00 |
|
fzyzcjy
|
b0b26a7ef1
|
Support concat mode in token aligner in dump comparator (#19599)
|
2026-03-01 10:35:50 +08:00 |
|
fzyzcjy
|
e78f1283f7
|
Support overriding and post-hoc providing metadata in dump comparator (#19598)
|
2026-03-01 10:35:06 +08:00 |
|
fzyzcjy
|
e41164af1c
|
Enhance replicated tensor checker in dump comparator (#19597)
|
2026-03-01 10:34:34 +08:00 |
|
fzyzcjy
|
ec08240a6a
|
Support data parallel in dump comparator (#19596)
|
2026-03-01 10:34:03 +08:00 |
|
fzyzcjy
|
003ad6daaa
|
Support partial tensors waiting for reduction and pipeline parallel in dump comparator (#19595)
|
2026-03-01 10:33:39 +08:00 |
|
fzyzcjy
|
67810828cf
|
Visualize per-token information in dump comparator (#19594)
|
2026-03-01 10:32:59 +08:00 |
|
fzyzcjy
|
f5a10e04cd
|
Support arbitrary filtering in dumper (#19593)
|
2026-03-01 10:31:21 +08:00 |
|
Duyi-Wang
|
8240a87306
|
[AMD] MORI-EP support for EP4. (#19578)
|
2026-02-28 13:13:46 -08:00 |
|
Haodi Lei
|
f451664504
|
[Fix] Add --disable-draft-model-update to control draft model updates(especially in RL) (#15726)
Co-authored-by: leihaodi <haodilei@gmail.com>
|
2026-02-28 12:09:55 -08:00 |
|
Mohammad Miadh Angkad
|
9c81ce4707
|
[Anthropic API] Preserve image content in tool_result conversion (#19233)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-02-28 12:07:22 -08:00 |
|
zhangheng
|
a0d8a7ae6d
|
[RadixTree][6/N Refactor]: Refactor SWARadixTree to simplify the computation and alignment of bigram keys. (#19427)
|
2026-02-28 20:01:39 +08:00 |
|
fzyzcjy
|
5705e02d28
|
Support singleton dimension squeezing in dump comparator (#19566)
|
2026-02-28 18:11:46 +08:00 |
|
fzyzcjy
|
80bbd30909
|
Visualize comparison detailed results in dump comparator (#19565)
|
2026-02-28 18:08:16 +08:00 |
|
fzyzcjy
|
40facdb28c
|
Handle recompute and verify closeness in dumper (#19564)
|
2026-02-28 18:07:44 +08:00 |
|
fzyzcjy
|
63a4778542
|
Support non-intrusive arbitrary dumping in dumper and add e2e tests (#19563)
|
2026-02-28 18:06:55 +08:00 |
|
fzyzcjy
|
ccbc47d6be
|
Update layer id extraction, diffing, empty handling and error sentinel in dump comparator (#19562)
|
2026-02-28 18:06:26 +08:00 |
|
fzyzcjy
|
4097eb5ce9
|
Support patching source code (#19561)
|
2026-02-28 18:05:45 +08:00 |
|
fzyzcjy
|
b73aa53d7e
|
Enhance metrics in dump comparator (#19560)
|
2026-02-28 18:05:19 +08:00 |
|
fzyzcjy
|
706ab9296a
|
Support method decorator for tagging and add minimalistic comparator in dumper (#19559)
|
2026-02-28 18:04:54 +08:00 |
|
fzyzcjy
|
9bf3638a25
|
Support handling arbitrary objects in dump comparator (#19558)
|
2026-02-28 18:04:13 +08:00 |
|
Michelle Wu
|
b7f13a7b73
|
[NPU] bugs fix for Deepseek models (#19544)
|
2026-02-28 17:26:15 +08:00 |
|
Shangming Cai
|
366574b2b8
|
[PD] Cleanup BootstrapServer init and ready check (#19551)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-28 16:41:42 +08:00 |
|
Hexq0210
|
4ebe9e1e2f
|
[NPU] bugfix: resolve modelslim load weights bug (#19472)
|
2026-02-28 16:22:45 +08:00 |
|
Junhao Liu
|
53c767d224
|
[diffusion] Postprocess: implement frame interpolation using RIFE (#19384)
|
2026-02-28 14:13:20 +08:00 |
|
Yuhao Yang
|
b01b07aa16
|
[diffusion] CI: GT generation flow for diffusion CI (#19236)
Co-authored-by: Prozac614 <dwt614707404@163.com>
|
2026-02-28 14:07:45 +08:00 |
|
Shangming Cai
|
b01f3590be
|
[PD] Support PD with context parallel after refactor (#19504)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-02-28 13:11:15 +08:00 |
|
Yilong Zhao
|
79b1d2bac6
|
[loader] support presharded fused mlp loading (#19519)
|
2026-02-27 20:37:24 -08:00 |
|
Chang Su
|
71620122c9
|
feat(grpc): add multimodal TensorData parsing for vision inference (#19535)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-02-27 19:29:43 -08:00 |
|
Zheng Duan
|
a2ea5941d5
|
[feat] Support nvfp4 quantized model of Qwen3-Next (#17627)
|
2026-02-27 18:28:47 -08:00 |
|
Liangsheng Yin
|
ac400cb7bb
|
[CLI] Add --model-type override and keep launch_server supported (#19523)
|
2026-02-27 18:16:31 -08:00 |
|
Liangsheng Yin
|
e08ef06758
|
[Session] Gate streaming sessions with --enable-streaming-session and spec v2 guard (#19531)
|
2026-02-27 18:14:55 -08:00 |
|
Leon Gao
|
b5a8e4179e
|
[SGL] sync patch: Remove sync points, prefill cudagraph for DP, disable cache reset in mem check (#19190)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2026-02-27 18:11:05 -08:00 |
|
chenxu214
|
5f07ff9271
|
Added the prefill delayer policy: The prefill deplay range is expanded. (#17456)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-28 08:56:49 +08:00 |
|
Aurick Qiao
|
c6cb0c9649
|
[Session] Add streaming mode with SessionAwareCache fast path (#19171)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-02-27 16:31:08 -08:00 |
|
Ziang Li
|
9469ad089b
|
Fix nvfp4 weight update (#18085)
|
2026-02-27 14:55:08 -08:00 |
|
Alison Shao
|
6ca7da3e7c
|
Fix nightly VLM accuracy: gemma3n TP fixes + removal, latency thresholds (#19401)
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
|
2026-02-27 14:24:02 -08:00 |
|
yrk111222
|
e6da514c2c
|
CI: use 'sglang serve' in CI tests (#18597)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2026-02-27 14:00:41 -08:00 |
|