Commit Graph

6645 Commits

Author SHA1 Message Date
fzyzcjy
e64095c3c7 Support data parallel attention in dump comparator (#19602) 2026-03-01 10:51:21 +08:00
fzyzcjy
ea6ff7b01f Support multi sharding group on the same dimension in dump comparator (#19601) 2026-03-01 10:36:48 +08:00
fzyzcjy
46960e65cf Add skip patterns, tee to file, tensor load warning in dump comparator (#19600) 2026-03-01 10:36:22 +08:00
fzyzcjy
b0b26a7ef1 Support concat mode in token aligner in dump comparator (#19599) 2026-03-01 10:35:50 +08:00
fzyzcjy
e78f1283f7 Support overriding and post-hoc providing metadata in dump comparator (#19598) 2026-03-01 10:35:06 +08:00
fzyzcjy
e41164af1c Enhance replicated tensor checker in dump comparator (#19597) 2026-03-01 10:34:34 +08:00
fzyzcjy
ec08240a6a Support data parallel in dump comparator (#19596) 2026-03-01 10:34:03 +08:00
fzyzcjy
003ad6daaa Support partial tensors waiting for reduction and pipeline parallel in dump comparator (#19595) 2026-03-01 10:33:39 +08:00
fzyzcjy
67810828cf Visualize per-token information in dump comparator (#19594) 2026-03-01 10:32:59 +08:00
fzyzcjy
f5a10e04cd Support arbitrary filtering in dumper (#19593) 2026-03-01 10:31:21 +08:00
Duyi-Wang
8240a87306 [AMD] MORI-EP support for EP4. (#19578) 2026-02-28 13:13:46 -08:00
Haodi Lei
f451664504 [Fix] Add --disable-draft-model-update to control draft model updates(especially in RL) (#15726)
Co-authored-by: leihaodi <haodilei@gmail.com>
2026-02-28 12:09:55 -08:00
Mohammad Miadh Angkad
9c81ce4707 [Anthropic API] Preserve image content in tool_result conversion (#19233)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2026-02-28 12:07:22 -08:00
zhangheng
a0d8a7ae6d [RadixTree][6/N Refactor]: Refactor SWARadixTree to simplify the computation and alignment of bigram keys. (#19427) 2026-02-28 20:01:39 +08:00
fzyzcjy
5705e02d28 Support singleton dimension squeezing in dump comparator (#19566) 2026-02-28 18:11:46 +08:00
fzyzcjy
80bbd30909 Visualize comparison detailed results in dump comparator (#19565) 2026-02-28 18:08:16 +08:00
fzyzcjy
40facdb28c Handle recompute and verify closeness in dumper (#19564) 2026-02-28 18:07:44 +08:00
fzyzcjy
63a4778542 Support non-intrusive arbitrary dumping in dumper and add e2e tests (#19563) 2026-02-28 18:06:55 +08:00
fzyzcjy
ccbc47d6be Update layer id extraction, diffing, empty handling and error sentinel in dump comparator (#19562) 2026-02-28 18:06:26 +08:00
fzyzcjy
4097eb5ce9 Support patching source code (#19561) 2026-02-28 18:05:45 +08:00
fzyzcjy
b73aa53d7e Enhance metrics in dump comparator (#19560) 2026-02-28 18:05:19 +08:00
fzyzcjy
706ab9296a Support method decorator for tagging and add minimalistic comparator in dumper (#19559) 2026-02-28 18:04:54 +08:00
fzyzcjy
9bf3638a25 Support handling arbitrary objects in dump comparator (#19558) 2026-02-28 18:04:13 +08:00
Michelle Wu
b7f13a7b73 [NPU] bugs fix for Deepseek models (#19544) 2026-02-28 17:26:15 +08:00
Shangming Cai
366574b2b8 [PD] Cleanup BootstrapServer init and ready check (#19551)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-02-28 16:41:42 +08:00
Hexq0210
4ebe9e1e2f [NPU] bugfix: resolve modelslim load weights bug (#19472) 2026-02-28 16:22:45 +08:00
Junhao Liu
53c767d224 [diffusion] Postprocess: implement frame interpolation using RIFE (#19384) 2026-02-28 14:13:20 +08:00
Yuhao Yang
b01b07aa16 [diffusion] CI: GT generation flow for diffusion CI (#19236)
Co-authored-by: Prozac614 <dwt614707404@163.com>
2026-02-28 14:07:45 +08:00
Shangming Cai
b01f3590be [PD] Support PD with context parallel after refactor (#19504)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
2026-02-28 13:11:15 +08:00
Yilong Zhao
79b1d2bac6 [loader] support presharded fused mlp loading (#19519) 2026-02-27 20:37:24 -08:00
Chang Su
71620122c9 feat(grpc): add multimodal TensorData parsing for vision inference (#19535)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-02-27 19:29:43 -08:00
Zheng Duan
a2ea5941d5 [feat] Support nvfp4 quantized model of Qwen3-Next (#17627) 2026-02-27 18:28:47 -08:00
Liangsheng Yin
ac400cb7bb [CLI] Add --model-type override and keep launch_server supported (#19523) 2026-02-27 18:16:31 -08:00
Liangsheng Yin
e08ef06758 [Session] Gate streaming sessions with --enable-streaming-session and spec v2 guard (#19531) 2026-02-27 18:14:55 -08:00
Leon Gao
b5a8e4179e [SGL] sync patch: Remove sync points, prefill cudagraph for DP, disable cache reset in mem check (#19190)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
2026-02-27 18:11:05 -08:00
chenxu214
5f07ff9271 Added the prefill delayer policy: The prefill deplay range is expanded. (#17456)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
2026-02-28 08:56:49 +08:00
Aurick Qiao
c6cb0c9649 [Session] Add streaming mode with SessionAwareCache fast path (#19171)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
2026-02-27 16:31:08 -08:00
Ziang Li
9469ad089b Fix nvfp4 weight update (#18085) 2026-02-27 14:55:08 -08:00
Alison Shao
6ca7da3e7c Fix nightly VLM accuracy: gemma3n TP fixes + removal, latency thresholds (#19401)
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
2026-02-27 14:24:02 -08:00
yrk111222
e6da514c2c CI: use 'sglang serve' in CI tests (#18597)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
2026-02-27 14:00:41 -08:00
Baizhou Zhang
776709efe8 [3/n] deepseek_v2.py Refactor: Migrate MLA forward method in deepseek_v2.py (#19122) 2026-02-27 13:37:29 -08:00
wufann
7e46aafebb [AMD] Enable cudagraph for aiter nsa backend and add aiter impl for nsa pr… (#18526) 2026-02-27 13:18:32 -08:00
Shu Wang
1b75d0d1a9 Fix BatchMLAPagedAttentionWrapper query/qo_inptr mismatch for EAGLE (#15601) 2026-02-27 11:35:45 -08:00
ishandhanani
6a1480ce45 Fix HiCacheNixl TypeError: mem_pool_host passed as file_path (#19517) 2026-02-27 10:59:32 -08:00
Mohammad Miadh Angkad
35ef38c61b Remove gpt-oss hybrid swa gate for trtllm_mha (#19079) 2026-02-27 10:30:00 -08:00
Michael
1b79934d34 [AMD] Fix AMD CI test of TestToolChoiceLfm2Moe (#19113)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
2026-02-27 10:18:15 -08:00
R0CKSTAR
fe4bc8ebd5 [diffusion] fix: MulAdd 4D path (shift indexing) (#18673)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2026-02-28 01:52:57 +08:00
Makcum888e
b1249ac909 [Diffusion] [NPU] [CI] fix CI performance (#19486) 2026-02-27 18:23:02 +03:00
Yuan Luo
d2885a9094 [Qwen3-Next] Support gdn fused_rms_norm_gated (#19434)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-02-27 23:08:08 +08:00
joesun
ca5f2e2ed1 [diffusion] fix: Support default response_format=url in /v1/images/generations to avoid 400 errors when response_format is omitted (#19360)
Co-authored-by: Makcum888e <79456407+Makcum888e@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-27 19:47:38 +08:00