Aurick Qiao
|
bfccc8e504
|
Allow configuring NIXL backend parameters from env (#24169)
|
2026-05-01 18:30:43 -07:00 |
|
Mick
|
193b977572
|
[diffusion] chore: clean scheduler (#24229)
|
2026-05-02 09:30:06 +08:00 |
|
Liangsheng Yin
|
cb8fbd53fc
|
Reserve slot 0 as padding in all req pools (#24243)
|
2026-05-01 16:41:36 -07:00 |
|
Cheng Wan
|
b47fab6f5d
|
[bugfix] Support MIXED forward mode in TBO splitter for DP attention (#24241)
|
2026-05-01 16:01:23 -07:00 |
|
Lucia Fang
|
05de73efd1
|
[core/model] Use explicit model arch for Llama4 attention backend auto-selection (#24232)
|
2026-05-01 15:49:30 -07:00 |
|
Liangsheng Yin
|
8a530468fd
|
[Bug] Size mamba mappings from req pool, not mamba pool (#24244)
|
2026-05-01 15:45:20 -07:00 |
|
Yuxuan Zhang
|
79bc2505a5
|
[Bug Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5.1 (#23037)
|
2026-05-01 13:53:52 -07:00 |
|
Lucia Fang
|
b58fa60a1f
|
[core/attention] Add SGLANG_FLASHINFER_USE_PAGED env to force paged wrapper (#24165)
|
2026-05-01 12:52:46 -07:00 |
|
Lianmin Zheng
|
ece8a1a788
|
Refactor device timer, clean up metrics collector, and add fwd occupancy metric (#24197)
|
2026-05-01 10:25:25 -07:00 |
|
JINZ
|
4a50cd781e
|
[BugFix][HiMamba] Fix host-protected node deletion in HiMamba tombstone del (#23696)
Co-authored-by: diemchai <diemchai@tencent.com>
Co-authored-by: Zhangheng <hzh0425@apache.org>
|
2026-05-01 21:57:47 +08:00 |
|
ishandhanani
|
5b7ce417d0
|
[P/D disagg] - support decode side radix cache (#19746)
|
2026-05-01 21:55:34 +08:00 |
|
Cheng Wan
|
d48095ba53
|
Bypass torch.cuda.use_mem_pool generator-CM in SymmetricMemoryContext (#24190)
|
2026-05-01 01:25:49 -07:00 |
|
Lianmin Zheng
|
d9e8a4a7f8
|
[SWA] Ensure we use pre-computed SWA cache location during prefill (#24138)
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Yinghai Lu <yinghai@meta.com>
|
2026-05-01 00:01:49 -07:00 |
|
Yanbin Jiang
|
8975479f87
|
[LoRA][MOE] Fix EP correctness in MoE LoRA slicing and virtual-experts kernels (#24171)
|
2026-04-30 22:42:10 -07:00 |
|
Mick
|
9d84268705
|
[diffusion] refactor: introduce component residency manager (#23771)
|
2026-05-01 11:10:41 +08:00 |
|
Cheng Wan
|
108bfd8b6a
|
[MoE] Add Aiter MoE runner backend and purge aiter.fused_moe from quant methods (#23597)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-30 19:50:52 -07:00 |
|
Yilong Zhao
|
f67292539f
|
spec: gate dp mlp sync with server args (#24177)
|
2026-04-30 16:29:41 -07:00 |
|
Polisetty V R K Jyothendra Varma
|
da7f890788
|
[Intel GPU] Integrate flash_mla_decode in Intel XPU attention backend (#23557)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-01 07:21:28 +08:00 |
|
shubham singhal
|
e35ac95cdc
|
[Test] Add XPU device support to unit tests (#22236)
Co-authored-by: vshekhawat-hlab <vshekhawat@habana.ai>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-01 07:18:51 +08:00 |
|
Roopak Srivastava
|
9c5cad3914
|
Use device-agnostic helpers for Mamba tests and core ops (#20234)
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-01 07:14:53 +08:00 |
|
Kalyan Kumar
|
8a9e424faa
|
Replace hardcoded CUDA device with get_device() for XPU support (#13599)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-01 07:13:46 +08:00 |
|
Lawrence Wu
|
f75a8b6220
|
fix: support HybridLinearAttnBackend in TboAttnBackend (#20114)
|
2026-04-30 15:40:13 -07:00 |
|
Hubert Lu
|
d57671527a
|
Fix LFM2 ShortConv Mamba State Indexing (#23975)
|
2026-04-30 15:23:39 -07:00 |
|
Xinyuan Tong
|
989a16187d
|
[Bench] Fix bench_serving missing reasoning_content stream chunks (#23954)
|
2026-04-30 15:00:27 -07:00 |
|
Erik Wijmans
|
c04b20dc88
|
Fix KeyError in prepare_lora_batch when lora_ids contains None (#21974)
|
2026-04-30 14:50:16 -04:00 |
|
ori
|
71e89e9003
|
[MUSA][19/N] Support qwen series models (#23654)
Co-authored-by: zhiguo.qin <zhiguo.qin@mthreads.com>
|
2026-04-30 11:26:47 -07:00 |
|
Zhonghua Deng
|
651af06a0b
|
[Feature] Xiaomi MiMo-V2.5 day0 support (#23811)
Co-authored-by: 张袁 <zhangyuan36@xiaomi.com>
Co-authored-by: 刘安岐 <liuanqi6@xiaomi.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-05-01 00:02:26 +08:00 |
|
jianzhao-xu
|
aa74911448
|
[NPU] fix some npu error with OffloaderV2 (#19541)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-04-30 15:05:35 +03:00 |
|
Yaochen Han
|
577dbc4ab9
|
[4/N] Quantization Refactor: AWQ schemes and Kernel call and weight init split (#21126)
|
2026-04-30 14:51:01 +03:00 |
|
Qiaolin Yu
|
583929c0a1
|
fix the compatibility between --moe-dense-tp-size 1 and piecewise cuda graph (#23972)
|
2026-04-30 02:12:13 -07:00 |
|
Opher Lieber
|
99c0b62f1e
|
allow requests with exactly context_len total tokens (#22546)
Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>
|
2026-04-30 01:12:06 -07:00 |
|
Ethan (Yusheng) Su
|
125f75db72
|
fix(lora): avoid CUDA graph-breaking scalar assignment in seg_indptr (#23738)
|
2026-04-30 01:11:45 -07:00 |
|
billishyahao
|
692979a8d9
|
[AMD] Support sdma path for moriep (#23929)
|
2026-04-29 23:57:00 -07:00 |
|
Shaojun Zhou
|
4f0b44c5c6
|
[fix] moss-vl: use Conv3dLayer and remove no-op flat_encoder_result (#23932)
|
2026-04-30 14:19:45 +08:00 |
|
kkyyxhll
|
936c9c2355
|
fix(qwen3_5): broadcast per-tensor scale in _make_packed_weight_loader for FP8 models (#23062)
|
2026-04-30 14:16:57 +08:00 |
|
Jay Thakur
|
bcb34da9f9
|
Add deterministic mode for XPU operations (#16793)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-30 13:39:06 +08:00 |
|
Opher Lieber
|
c8c1c9261d
|
LoRA support for qwen3.5 and nemotron3 (#23594)
Co-authored-by: Yanbin Jiang <jybsuper@gmail.com>
|
2026-04-29 21:51:53 -07:00 |
|
Mick
|
0b1fbdba15
|
[diffusion] CI: change ground truth upload path and improve publish script (#24120)
|
2026-04-30 12:26:10 +08:00 |
|
Yuxuan Zhang
|
d040333c95
|
[Bug Fix] missing index/KV transfer for MTP layer in NSA disaggregation (#23539)
|
2026-04-30 11:55:45 +08:00 |
|
yudian0504
|
2d2be5d7b2
|
[PD][Bugfix] fix mamba cache capping (#22462)
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2026-04-30 10:57:55 +08:00 |
|
MingxuZh
|
62136073f9
|
pin the version of xgrammar to v0.1.32 (#24010)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-30 10:13:08 +08:00 |
|
heziiop
|
3553fd0322
|
[NPU] add split_qkv_tp_rmsnorm_rope ops for minimax2 & fix eagle3 hidden states capture in dp attn mode (#23190)
|
2026-04-30 08:51:22 +08:00 |
|
Lianmin Zheng
|
e60c60eff0
|
[SWA] Fix missing mamba_indices parameter in cpu copy interface (#24026)
|
2026-04-29 17:33:38 -07:00 |
|
Kangyan-Zhou
|
6575aea128
|
[CI] Fix black formatting on main (unblocks PR #21247 lint) (#24093)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-29 14:59:17 -07:00 |
|
Jimmy Shong
|
3d31ac2672
|
[Fix] FP8 Qwen3-Next quant error by removing fallback fused shards (#23973)
|
2026-04-29 17:33:47 -04:00 |
|
jsheng_Linkedin
|
850021378a
|
[Score API] Hoist query placeholder scan and specialize PositionalEmbeds stacking (#23513)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-29 13:51:53 -07:00 |
|
Qiaolin Yu
|
79dbfe4505
|
Use spec v2 by default (#21062)
|
2026-04-29 13:40:42 -07:00 |
|
Zhongdongming Dai
|
7389743d85
|
feat: Support modelexpress p2p RDMA transfer (#23105)
|
2026-04-29 12:57:40 -07:00 |
|
jsheng_Linkedin
|
db84a8ebbb
|
[Model] Qwen3ForPooledOutput: forward get_input_embeddings to inner model (#23434)
|
2026-04-29 12:25:06 -07:00 |
|
Chang Min Bark
|
3272af2f00
|
[Apple Silicon] [MLX] MLX decode partial overlap scheduling for generation (async eval) (#22416)
Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
|
2026-04-29 12:21:14 -07:00 |
|