Lianmin Zheng
|
44e67c6835
|
Remove deprecated double sparsity feature (#23009)
|
2026-04-17 13:33:12 -07:00 |
|
andyluo7
|
9df6107dca
|
[AMD] Enable DFLASH speculative decoding on ROCm (#22342)
Signed-off-by: Andy Luo <andyluo7@users.noreply.github.com>
Co-authored-by: Andy Luo <andyluo7@users.noreply.github.com>
|
2026-04-17 13:10:14 -07:00 |
|
shuwenn
|
90c76d665e
|
[HiCache] fix: HiCacheFile component key suffixing (#22891)
Co-authored-by: Zhangheng <hzh0425@apache.org>
|
2026-04-17 13:06:28 -07:00 |
|
YC Yen-Ching Tseng
|
5d4e899477
|
[AMD] Fix AMD Multimodal Test - skip nvfp4 tests (#23045)
|
2026-04-17 09:02:39 -07:00 |
|
Jincong Chen
|
2bac219d0c
|
[Perf] Precompute gemma_weight to avoid redundant add on every forward (#22673)
|
2026-04-17 23:37:41 +08:00 |
|
Xiaoyu Zhang
|
83c5119d01
|
[diffusion] CI: fix ModelOpt B200 CI artifact coverage (#22955)
|
2026-04-17 23:33:42 +08:00 |
|
Mick
|
5de89ea942
|
[diffusion] CI: fix auto-partition (#23076)
|
2026-04-17 22:37:24 +08:00 |
|
Opher Lieber
|
6e3bbef568
|
expose num_embeddings in VocabParallelEmbeddingWithLoRA (#22547)
|
2026-04-17 02:35:13 -07:00 |
|
Jonah Bernard
|
0d031335ed
|
[Pipeline Parallelism][Bug] Fix scheduler hang in pipeline parallelism setup (#23006)
|
2026-04-17 14:50:47 +08:00 |
|
Duyi-Wang
|
8c190f6b91
|
[AMD] Add SGLANG_MORI_MOE_MAX_INPUT_TOKENS to truncate dispatch before MoE. (#22952)
|
2026-04-16 23:40:15 -07:00 |
|
RichardoMu
|
7390eddf28
|
feat(observability): add OpenTelemetry tracing for speculative decoding (#19545)
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
|
2026-04-17 14:01:58 +08:00 |
|
narutolhy
|
5fa0c6a52e
|
Allow piecewise CUDA graph with speculative decoding (#22128)
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-17 13:39:30 +08:00 |
|
Xiaoyu Zhang
|
91679d935d
|
[codex] Update diffusion skills (#23028)
|
2026-04-17 13:29:26 +08:00 |
|
blzheng
|
0dcfae5553
|
[CPU] Add gemma4_rmsnorm_cpu kernel (#22842)
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-17 13:03:16 +08:00 |
|
YC Yen-Ching Tseng
|
f0f0148167
|
Revert "feat: Support MXFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#19143)" (#23031)
|
2026-04-16 21:53:25 -07:00 |
|
Zhangheng
|
7d47f40a96
|
[UnifiedRadixTree]: Add HiCache hook interface for TreeComponent (#22924)
|
2026-04-17 12:09:41 +08:00 |
|
Byron Hsu
|
cf9845f8e3
|
[Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank (#22990)
Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
|
2026-04-17 11:10:31 +08:00 |
|
Jan Bernlöhr
|
04a53955b9
|
feat: add coordinated checkpoint prefetch for network filesystem loading (#20843)
|
2026-04-16 20:08:19 -07:00 |
|
Yuhao Yang
|
a77abbe005
|
[VLM] Reduce GPU memory footprint of CUDA IPC MM feature transport (#22662)
|
2026-04-17 10:38:36 +08:00 |
|
Yuxuan Zhang
|
16d11c2a10
|
Fix for the low-probability garbled output issue in the GLM-5 series models. (#22811)
|
2026-04-17 09:52:13 +08:00 |
|
Makcum888e
|
e353630b57
|
[Diffusion] [NPU] Fix multimodal gen CI (#22879)
|
2026-04-17 04:09:44 +03:00 |
|
Egor Filimonov
|
ba850d3a9d
|
[Bugfix] [NPU] Fix check_env on Ascend for CANN 8.5 (#22888)
|
2026-04-17 04:05:20 +03:00 |
|
Mick
|
3d2d57c6cc
|
[diffusion] refactor: extract LTX2 image encoding from denoising stage (#22976)
|
2026-04-17 08:35:15 +08:00 |
|
Daifeng Li
|
2cc52d8326
|
feat: Support MXFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#19143)
|
2026-04-16 16:51:32 -07:00 |
|
pdasgup
|
f639425ff0
|
add check for none status code in FinishAbort (#22535)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-04-16 16:21:07 -07:00 |
|
Tarushii Goel
|
2211b4d9c6
|
[sgl] improve accuracy of additional page requirement during spec decode (#22406)
|
2026-04-16 15:50:51 -07:00 |
|
Liangsheng Yin
|
db7a751d48
|
refactor: extract FanOutCommunicator and use declarative spec table (#22967)
|
2026-04-16 15:37:19 -07:00 |
|
mqhc2020
|
52f0b86f5d
|
[AMD] Qwen3.5 MXFP4 breaks after shared expert fusion is enabled (#22948)
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
|
2026-04-16 15:25:33 -07:00 |
|
Liangsheng Yin
|
c83ef4fdb6
|
use envs in server_args (#22994)
|
2026-04-16 15:01:33 -07:00 |
|
Xinyu Zhang
|
c0172aef6e
|
[Ray] Bind scheduler actors to GPU-local NUMA node (#22989)
Co-authored-by: xyuzh <xyuzh@users.noreply.github.com>
|
2026-04-16 14:52:15 -07:00 |
|
Xinyu Zhang
|
d430034bde
|
[Ray] Support multi-replica serving by making scheduler actor names unique (#22917)
|
2026-04-16 14:51:01 -07:00 |
|
Qiaolin Yu
|
a87806a65f
|
[misc] refine outdated comments for chain-style multi-layer MTP (#22996)
|
2026-04-16 14:49:43 -07:00 |
|
ybyang
|
41258f874d
|
[PD]feat(bench): add --fake-prefill flag for decode-only stress testing (#22973)
|
2026-04-16 13:57:55 -07:00 |
|
Yuhao Yang
|
9da998a882
|
[diffusion] feat: disaggregated diffusion (#21701)
|
2026-04-16 23:51:32 +08:00 |
|
Liangsheng Yin
|
62309f09db
|
fix(loads): preserve include filtering after watching mode switch (#22959)
|
2026-04-16 03:04:53 -07:00 |
|
ybyang
|
03fef357a6
|
fix(loads): switch get_loads_communicator to watching mode (#22919)
|
2026-04-16 02:12:22 -07:00 |
|
ybyang
|
fbd6dc3565
|
fix: normalize tool message content for GLM5.1 chat template (#22595)
|
2026-04-16 16:48:38 +08:00 |
|
Aleksi Vesanto
|
aaa682346e
|
[diffusion] model: Properly validate device for Mistral 3 attention (#22690)
|
2026-04-16 00:29:23 -07:00 |
|
Lianmin Zheng
|
35da90cb76
|
[misc] Configure logging before ServerArgs.__post_init__ (#22926)
|
2026-04-15 23:53:15 -07:00 |
|
yuefeng Wu
|
65bc839a5f
|
[Fix] eagle/eagle3 speculative decoding conflicts with xgrammar in NPU (#20989)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-15 23:34:23 -07:00 |
|
Bi Xue
|
c43716a357
|
[sgl] provide an option to send control req to all dp ranks rank0 (#22758)
|
2026-04-16 14:24:26 +08:00 |
|
Byron Hsu
|
3600465e81
|
[Bug Fix] Remove follow_bootstrap_room fast path in PD disaggregation DP rank resolution (#22901)
|
2026-04-15 22:53:29 -07:00 |
|
LHXuuu
|
e7ad7c587a
|
[EPD][VLM] Support Kimi VL EPD (#22490)
Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>
|
2026-04-16 12:40:02 +08:00 |
|
CYYYC0310
|
58c6b871b2
|
Remove compatibility restriction between Pipeline Parallelism and Mixed Chunked Prefill (#22920)
Co-authored-by: cyy <cy02433585@alibaba-inc.com>
|
2026-04-16 11:25:31 +08:00 |
|
Xinyuan Tong
|
34fef07a15
|
Upgrade transformers to 5.5.3 and refactor hf_transformers_utils into subpackage (#21569)
|
2026-04-15 20:03:44 -07:00 |
|
JINZ
|
14e122cdee
|
[BugFix][RadixTree]:Fix stale eviction assertion in HiMambaRadixCache host eviction path (#22592)
Co-authored-by: Zhangheng <hzh0425@apache.org>
|
2026-04-16 10:49:30 +08:00 |
|
Yuhao Yang
|
b8794baa6d
|
[Step3p5] Optimize allreduce in MoE layers (#22773)
|
2026-04-16 09:33:12 +08:00 |
|
Liangsheng Yin
|
a4cf2ea128
|
streaming session: spec v2 bonus accounting + comprehensive test matrix (#22651)
|
2026-04-15 17:12:41 -07:00 |
|
Xinyu Zhang
|
e8c6e5466c
|
[Ray] Auto-create placement group in RayEngine when none is detected (#22898)
|
2026-04-15 15:17:52 -07:00 |
|
Qiaolin Yu
|
0b1b07db72
|
[misc] fix ray folder lint (#22905)
|
2026-04-15 15:08:18 -07:00 |
|