fzyzcjy
|
e64095c3c7
|
Support data parallel attention in dump comparator (#19602)
|
2026-03-01 10:51:21 +08:00 |
|
fzyzcjy
|
ea6ff7b01f
|
Support multi sharding group on the same dimension in dump comparator (#19601)
|
2026-03-01 10:36:48 +08:00 |
|
fzyzcjy
|
46960e65cf
|
Add skip patterns, tee to file, tensor load warning in dump comparator (#19600)
|
2026-03-01 10:36:22 +08:00 |
|
fzyzcjy
|
b0b26a7ef1
|
Support concat mode in token aligner in dump comparator (#19599)
|
2026-03-01 10:35:50 +08:00 |
|
fzyzcjy
|
e78f1283f7
|
Support overriding and post-hoc providing metadata in dump comparator (#19598)
|
2026-03-01 10:35:06 +08:00 |
|
fzyzcjy
|
e41164af1c
|
Enhance replicated tensor checker in dump comparator (#19597)
|
2026-03-01 10:34:34 +08:00 |
|
fzyzcjy
|
ec08240a6a
|
Support data parallel in dump comparator (#19596)
|
2026-03-01 10:34:03 +08:00 |
|
fzyzcjy
|
003ad6daaa
|
Support partial tensors waiting for reduction and pipeline parallel in dump comparator (#19595)
|
2026-03-01 10:33:39 +08:00 |
|
fzyzcjy
|
67810828cf
|
Visualize per-token information in dump comparator (#19594)
|
2026-03-01 10:32:59 +08:00 |
|
fzyzcjy
|
f5a10e04cd
|
Support arbitrary filtering in dumper (#19593)
|
2026-03-01 10:31:21 +08:00 |
|
Duyi-Wang
|
8240a87306
|
[AMD] MORI-EP support for EP4. (#19578)
|
2026-02-28 13:13:46 -08:00 |
|
Haodi Lei
|
f451664504
|
[Fix] Add --disable-draft-model-update to control draft model updates(especially in RL) (#15726)
Co-authored-by: leihaodi <haodilei@gmail.com>
|
2026-02-28 12:09:55 -08:00 |
|
Mohammad Miadh Angkad
|
9c81ce4707
|
[Anthropic API] Preserve image content in tool_result conversion (#19233)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-02-28 12:07:22 -08:00 |
|
zhangheng
|
a0d8a7ae6d
|
[RadixTree][6/N Refactor]: Refactor SWARadixTree to simplify the computation and alignment of bigram keys. (#19427)
|
2026-02-28 20:01:39 +08:00 |
|
fzyzcjy
|
5705e02d28
|
Support singleton dimension squeezing in dump comparator (#19566)
|
2026-02-28 18:11:46 +08:00 |
|
fzyzcjy
|
80bbd30909
|
Visualize comparison detailed results in dump comparator (#19565)
|
2026-02-28 18:08:16 +08:00 |
|
fzyzcjy
|
40facdb28c
|
Handle recompute and verify closeness in dumper (#19564)
|
2026-02-28 18:07:44 +08:00 |
|
fzyzcjy
|
63a4778542
|
Support non-intrusive arbitrary dumping in dumper and add e2e tests (#19563)
|
2026-02-28 18:06:55 +08:00 |
|
fzyzcjy
|
ccbc47d6be
|
Update layer id extraction, diffing, empty handling and error sentinel in dump comparator (#19562)
|
2026-02-28 18:06:26 +08:00 |
|
fzyzcjy
|
4097eb5ce9
|
Support patching source code (#19561)
|
2026-02-28 18:05:45 +08:00 |
|
fzyzcjy
|
b73aa53d7e
|
Enhance metrics in dump comparator (#19560)
|
2026-02-28 18:05:19 +08:00 |
|
fzyzcjy
|
706ab9296a
|
Support method decorator for tagging and add minimalistic comparator in dumper (#19559)
|
2026-02-28 18:04:54 +08:00 |
|
fzyzcjy
|
9bf3638a25
|
Support handling arbitrary objects in dump comparator (#19558)
|
2026-02-28 18:04:13 +08:00 |
|
Michelle Wu
|
b7f13a7b73
|
[NPU] bugs fix for Deepseek models (#19544)
|
2026-02-28 17:26:15 +08:00 |
|
Shangming Cai
|
366574b2b8
|
[PD] Cleanup BootstrapServer init and ready check (#19551)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-28 16:41:42 +08:00 |
|
Hexq0210
|
4ebe9e1e2f
|
[NPU] bugfix: resolve modelslim load weights bug (#19472)
|
2026-02-28 16:22:45 +08:00 |
|
Junhao Liu
|
53c767d224
|
[diffusion] Postprocess: implement frame interpolation using RIFE (#19384)
|
2026-02-28 14:13:20 +08:00 |
|
Yuhao Yang
|
b01b07aa16
|
[diffusion] CI: GT generation flow for diffusion CI (#19236)
Co-authored-by: Prozac614 <dwt614707404@163.com>
|
2026-02-28 14:07:45 +08:00 |
|
Shangming Cai
|
b01f3590be
|
[PD] Support PD with context parallel after refactor (#19504)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-02-28 13:11:15 +08:00 |
|
Yilong Zhao
|
79b1d2bac6
|
[loader] support presharded fused mlp loading (#19519)
|
2026-02-27 20:37:24 -08:00 |
|
Chang Su
|
71620122c9
|
feat(grpc): add multimodal TensorData parsing for vision inference (#19535)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-02-27 19:29:43 -08:00 |
|
Zheng Duan
|
a2ea5941d5
|
[feat] Support nvfp4 quantized model of Qwen3-Next (#17627)
|
2026-02-27 18:28:47 -08:00 |
|
Liangsheng Yin
|
ac400cb7bb
|
[CLI] Add --model-type override and keep launch_server supported (#19523)
|
2026-02-27 18:16:31 -08:00 |
|
Liangsheng Yin
|
e08ef06758
|
[Session] Gate streaming sessions with --enable-streaming-session and spec v2 guard (#19531)
|
2026-02-27 18:14:55 -08:00 |
|
Leon Gao
|
b5a8e4179e
|
[SGL] sync patch: Remove sync points, prefill cudagraph for DP, disable cache reset in mem check (#19190)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2026-02-27 18:11:05 -08:00 |
|
chenxu214
|
5f07ff9271
|
Added the prefill delayer policy: The prefill deplay range is expanded. (#17456)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-28 08:56:49 +08:00 |
|
Aurick Qiao
|
c6cb0c9649
|
[Session] Add streaming mode with SessionAwareCache fast path (#19171)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-02-27 16:31:08 -08:00 |
|
Ziang Li
|
9469ad089b
|
Fix nvfp4 weight update (#18085)
|
2026-02-27 14:55:08 -08:00 |
|
Alison Shao
|
6ca7da3e7c
|
Fix nightly VLM accuracy: gemma3n TP fixes + removal, latency thresholds (#19401)
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
|
2026-02-27 14:24:02 -08:00 |
|
yrk111222
|
e6da514c2c
|
CI: use 'sglang serve' in CI tests (#18597)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2026-02-27 14:00:41 -08:00 |
|
Baizhou Zhang
|
776709efe8
|
[3/n] deepseek_v2.py Refactor: Migrate MLA forward method in deepseek_v2.py (#19122)
|
2026-02-27 13:37:29 -08:00 |
|
wufann
|
7e46aafebb
|
[AMD] Enable cudagraph for aiter nsa backend and add aiter impl for nsa pr… (#18526)
|
2026-02-27 13:18:32 -08:00 |
|
Shu Wang
|
1b75d0d1a9
|
Fix BatchMLAPagedAttentionWrapper query/qo_inptr mismatch for EAGLE (#15601)
|
2026-02-27 11:35:45 -08:00 |
|
ishandhanani
|
6a1480ce45
|
Fix HiCacheNixl TypeError: mem_pool_host passed as file_path (#19517)
|
2026-02-27 10:59:32 -08:00 |
|
Mohammad Miadh Angkad
|
35ef38c61b
|
Remove gpt-oss hybrid swa gate for trtllm_mha (#19079)
|
2026-02-27 10:30:00 -08:00 |
|
Michael
|
1b79934d34
|
[AMD] Fix AMD CI test of TestToolChoiceLfm2Moe (#19113)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
|
2026-02-27 10:18:15 -08:00 |
|
R0CKSTAR
|
fe4bc8ebd5
|
[diffusion] fix: MulAdd 4D path (shift indexing) (#18673)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-02-28 01:52:57 +08:00 |
|
Makcum888e
|
b1249ac909
|
[Diffusion] [NPU] [CI] fix CI performance (#19486)
|
2026-02-27 18:23:02 +03:00 |
|
Yuan Luo
|
d2885a9094
|
[Qwen3-Next] Support gdn fused_rms_norm_gated (#19434)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-27 23:08:08 +08:00 |
|
joesun
|
ca5f2e2ed1
|
[diffusion] fix: Support default response_format=url in /v1/images/generations to avoid 400 errors when response_format is omitted (#19360)
Co-authored-by: Makcum888e <79456407+Makcum888e@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-27 19:47:38 +08:00 |
|