sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 22:07:12 +00:00

Author	SHA1	Message	Date
Aurick Qiao	bfccc8e504	Allow configuring NIXL backend parameters from env (#24169 )	2026-05-01 18:30:43 -07:00
Mick	193b977572	[diffusion] chore: clean scheduler (#24229 )	2026-05-02 09:30:06 +08:00
Liangsheng Yin	cb8fbd53fc	Reserve slot 0 as padding in all req pools (#24243 )	2026-05-01 16:41:36 -07:00
Cheng Wan	b47fab6f5d	[bugfix] Support MIXED forward mode in TBO splitter for DP attention (#24241 )	2026-05-01 16:01:23 -07:00
Lucia Fang	05de73efd1	[core/model] Use explicit model arch for Llama4 attention backend auto-selection (#24232 )	2026-05-01 15:49:30 -07:00
Liangsheng Yin	8a530468fd	[Bug] Size mamba mappings from req pool, not mamba pool (#24244 )	2026-05-01 15:45:20 -07:00
Yuxuan Zhang	79bc2505a5	[Bug Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5.1 (#23037 )	2026-05-01 13:53:52 -07:00
Lucia Fang	b58fa60a1f	[core/attention] Add SGLANG_FLASHINFER_USE_PAGED env to force paged wrapper (#24165 )	2026-05-01 12:52:46 -07:00
Lianmin Zheng	ece8a1a788	Refactor device timer, clean up metrics collector, and add fwd occupancy metric (#24197 )	2026-05-01 10:25:25 -07:00
JINZ	4a50cd781e	[BugFix][HiMamba] Fix host-protected node deletion in HiMamba tombstone del (#23696 ) Co-authored-by: diemchai <diemchai@tencent.com> Co-authored-by: Zhangheng <hzh0425@apache.org>	2026-05-01 21:57:47 +08:00
ishandhanani	5b7ce417d0	[P/D disagg] - support decode side radix cache (#19746 )	2026-05-01 21:55:34 +08:00
Cheng Wan	d48095ba53	Bypass torch.cuda.use_mem_pool generator-CM in SymmetricMemoryContext (#24190 )	2026-05-01 01:25:49 -07:00
Lianmin Zheng	d9e8a4a7f8	[SWA] Ensure we use pre-computed SWA cache location during prefill (#24138 ) Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Yinghai Lu <yinghai@meta.com>	2026-05-01 00:01:49 -07:00
Yanbin Jiang	8975479f87	[LoRA][MOE] Fix EP correctness in MoE LoRA slicing and virtual-experts kernels (#24171 )	2026-04-30 22:42:10 -07:00
Mick	9d84268705	[diffusion] refactor: introduce component residency manager (#23771 )	2026-05-01 11:10:41 +08:00
Cheng Wan	108bfd8b6a	[MoE] Add Aiter MoE runner backend and purge aiter.fused_moe from quant methods (#23597 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 19:50:52 -07:00
Yilong Zhao	f67292539f	spec: gate dp mlp sync with server args (#24177 )	2026-04-30 16:29:41 -07:00
Polisetty V R K Jyothendra Varma	da7f890788	[Intel GPU] Integrate flash_mla_decode in Intel XPU attention backend (#23557 ) Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-05-01 07:21:28 +08:00
shubham singhal	e35ac95cdc	[Test] Add XPU device support to unit tests (#22236 ) Co-authored-by: vshekhawat-hlab <vshekhawat@habana.ai> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-05-01 07:18:51 +08:00
Roopak Srivastava	9c5cad3914	Use device-agnostic helpers for Mamba tests and core ops (#20234 ) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-05-01 07:14:53 +08:00
Kalyan Kumar	8a9e424faa	Replace hardcoded CUDA device with get_device() for XPU support (#13599 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-05-01 07:13:46 +08:00
Lawrence Wu	f75a8b6220	fix: support HybridLinearAttnBackend in TboAttnBackend (#20114 )	2026-04-30 15:40:13 -07:00
Hubert Lu	d57671527a	Fix LFM2 ShortConv Mamba State Indexing (#23975 )	2026-04-30 15:23:39 -07:00
Xinyuan Tong	989a16187d	[Bench] Fix bench_serving missing reasoning_content stream chunks (#23954 )	2026-04-30 15:00:27 -07:00
Erik Wijmans	c04b20dc88	Fix KeyError in prepare_lora_batch when lora_ids contains None (#21974 )	2026-04-30 14:50:16 -04:00
ori	71e89e9003	[MUSA][19/N] Support qwen series models (#23654 ) Co-authored-by: zhiguo.qin <zhiguo.qin@mthreads.com>	2026-04-30 11:26:47 -07:00
Zhonghua Deng	651af06a0b	[Feature] Xiaomi MiMo-V2.5 day0 support (#23811 ) Co-authored-by: 张袁 <zhangyuan36@xiaomi.com> Co-authored-by: 刘安岐 <liuanqi6@xiaomi.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2026-05-01 00:02:26 +08:00
jianzhao-xu	aa74911448	[NPU] fix some npu error with OffloaderV2 (#19541 ) Co-authored-by: Jianzhao Xu <xujianchao@huawei.com> Co-authored-by: sglang-npu-bot <sglangnpu@163.com>	2026-04-30 15:05:35 +03:00
Yaochen Han	577dbc4ab9	[4/N] Quantization Refactor: AWQ schemes and Kernel call and weight init split (#21126 )	2026-04-30 14:51:01 +03:00
Qiaolin Yu	583929c0a1	fix the compatibility between --moe-dense-tp-size 1 and piecewise cuda graph (#23972 )	2026-04-30 02:12:13 -07:00
Opher Lieber	99c0b62f1e	allow requests with exactly context_len total tokens (#22546 ) Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>	2026-04-30 01:12:06 -07:00
Ethan (Yusheng) Su	125f75db72	fix(lora): avoid CUDA graph-breaking scalar assignment in seg_indptr (#23738 )	2026-04-30 01:11:45 -07:00
billishyahao	692979a8d9	[AMD] Support sdma path for moriep (#23929 )	2026-04-29 23:57:00 -07:00
Shaojun Zhou	4f0b44c5c6	[fix] moss-vl: use Conv3dLayer and remove no-op flat_encoder_result (#23932 )	2026-04-30 14:19:45 +08:00
kkyyxhll	936c9c2355	fix(qwen3_5): broadcast per-tensor scale in _make_packed_weight_loader for FP8 models (#23062 )	2026-04-30 14:16:57 +08:00
Jay Thakur	bcb34da9f9	Add deterministic mode for XPU operations (#16793 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-30 13:39:06 +08:00
Opher Lieber	c8c1c9261d	LoRA support for qwen3.5 and nemotron3 (#23594 ) Co-authored-by: Yanbin Jiang <jybsuper@gmail.com>	2026-04-29 21:51:53 -07:00
Mick	0b1fbdba15	[diffusion] CI: change ground truth upload path and improve publish script (#24120 )	2026-04-30 12:26:10 +08:00
Yuxuan Zhang	d040333c95	[Bug Fix] missing index/KV transfer for MTP layer in NSA disaggregation (#23539 )	2026-04-30 11:55:45 +08:00
yudian0504	2d2be5d7b2	[PD][Bugfix] fix mamba cache capping (#22462 ) Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: yizhang2077 <1109276519@qq.com>	2026-04-30 10:57:55 +08:00
MingxuZh	62136073f9	pin the version of xgrammar to v0.1.32 (#24010 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-30 10:13:08 +08:00
heziiop	3553fd0322	[NPU] add split_qkv_tp_rmsnorm_rope ops for minimax2 & fix eagle3 hidden states capture in dp attn mode (#23190 )	2026-04-30 08:51:22 +08:00
Lianmin Zheng	e60c60eff0	[SWA] Fix missing mamba_indices parameter in cpu copy interface (#24026 )	2026-04-29 17:33:38 -07:00
Kangyan-Zhou	6575aea128	[CI] Fix black formatting on main (unblocks PR #21247 lint) (#24093 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:59:17 -07:00
Jimmy Shong	3d31ac2672	[Fix] FP8 Qwen3-Next quant error by removing fallback fused shards (#23973 )	2026-04-29 17:33:47 -04:00
jsheng_Linkedin	850021378a	[Score API] Hoist query placeholder scan and specialize PositionalEmbeds stacking (#23513 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 13:51:53 -07:00
Qiaolin Yu	79dbfe4505	Use spec v2 by default (#21062 )	2026-04-29 13:40:42 -07:00
Zhongdongming Dai	7389743d85	feat: Support modelexpress p2p RDMA transfer (#23105 )	2026-04-29 12:57:40 -07:00
jsheng_Linkedin	db84a8ebbb	[Model] Qwen3ForPooledOutput: forward get_input_embeddings to inner model (#23434 )	2026-04-29 12:25:06 -07:00
Chang Min Bark	3272af2f00	[Apple Silicon] [MLX] MLX decode partial overlap scheduling for generation (async eval) (#22416 ) Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com> Co-authored-by: Alex Nails <alex.nails@radixark.ai>	2026-04-29 12:21:14 -07:00

1 2 3 4 5 ...

8078 Commits