sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 13:57:04 +00:00

Author	SHA1	Message	Date
Liangsheng Yin	00d620b77d	introduce arg_groups/ with nemotron_h hook (#24328 )	2026-05-03 16:28:11 -07:00
Liangsheng Yin	c3b6d20a80	Register deepseek_v32 alias instead of rewriting config.json (#24295 )	2026-05-03 16:02:17 -07:00
Zhangheng	9a5450ad73	[PD]: Support incremental transfer for mooncake transfer engine (#24257 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-05-04 00:57:59 +08:00
Chi McIsaac	62265ca7fc	[diffusion] feat: initial support for dynamic batching (#18764 ) Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com> Co-authored-by: Junhao Liu <junhaoliu2023@gmail.com>	2026-05-04 00:44:42 +08:00
Xiaoyu Zhang	f2d1390909	[Diffusion] Add Qwen Image ModelOpt FP8 support (#23155 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-05-04 00:24:22 +08:00
Mick	5925572c95	[diffusion] CI: switch CI data references to sgl-project/ci-data (#24299 )	2026-05-03 23:05:12 +08:00
Zhangheng	c0f5950636	[UnifiedRadixTree]: Support HiCache Framework for UnifiedRadixTree (#23316 ) Co-authored-by: JINZ <1023553676@qq.com> Co-authored-by: diemchai <diemchai@tencent.com>	2026-05-03 22:13:22 +08:00
GXIN	e37f46fcf7	[NPU] Fix Z-Image negative-branch rotary embeddings for CFG (#23538 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-05-03 16:18:26 +03:00
Zhangheng	44ca2d01fc	[pd]: (Bug Fix) Incorrect out_cache_loc slicing in prepare_for_prebuilt (#24230 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-05-03 18:35:16 +08:00
Mick	2bfc5d3bb1	[diffusion] optimize LTX2.3 HQ denoising split passes (#24298 )	2026-05-03 16:37:46 +08:00
Liangsheng Yin	fcc8b7b126	Rename SGLANG_USE_JIT_ALL_REDUCE to SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2 (#24297 ) Co-authored-by: DarkSharpness <2040703891@qq.com>	2026-05-02 23:59:46 -07:00
Glen Liu	76b9c8de6f	[Feature] add LoRADrainer to address high P99 TTFT (#17913 )	2026-05-02 16:13:43 -07:00
Brayden Zhong	88bb5dffe4	[Dependency] Upgrade to Torch 2.11.0 (#21247 ) Co-authored-by: Kangyan Zhou <zky314343421@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: b8zhong <b8zhong@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-05-02 12:25:36 -07:00
Glen Liu	e0474fdd9b	throw ValueError for DoRA adapters (#22125 ) Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>	2026-05-02 14:54:19 +00:00
Xiaoyu Zhang	4128f1ffe2	[SKILLS] Tiny upgrade diffusion skills (#24273 )	2026-05-02 22:04:05 +08:00
Xiaoyu Zhang	b712dd48fe	[codex] diffusion: enable group norm silu fuse by default (#23148 )	2026-05-02 20:55:51 +08:00
Xiaoyu Zhang	1360848ee1	Optimize large GroupNorm SiLU apply (#23938 )	2026-05-02 20:54:46 +08:00
egvenediktov	83bf5d6869	[NPU]TP Communications compression For Qwen3 models for NPU (#20520 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-05-02 14:29:11 +03:00
Elizaveta Martirosian	ebbaab5597	[NPU] Add GitHub test summary and deduplicate test code. Part 1 (#23835 ) Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com> Co-authored-by: root <root@localhost.localdomain> Co-authored-by: Elizaveta Martirosian <you@example.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-05-02 14:18:18 +03:00
Liangsheng Yin	3259a2c789	Encode routed_experts in the detokenizer, off the tokenizer hot path (#24263 ) Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: Yueming Yuan <yym022502@gmail.com>	2026-05-02 02:44:32 -07:00
Xiaoyu Zhang	589f90b368	[diffusion] chore: use lmsys as org for modelopt checkpoints (#23924 )	2026-05-02 17:18:58 +08:00
Alison Shao	f3dbadb82b	fix: accept 0-indexed safetensors shard names in CI weight validator (#24237 )	2026-05-02 00:58:15 -07:00
Kangyan-Zhou	2e72a36420	[CI] Restore SMG e2e on 2-gpu-h100 / 4-gpu-h100 runners (#24222 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 23:55:20 -07:00
Clay	5ec3b26799	[diffusion] model: support JoyAI-Image-Edit (#22625 ) Co-authored-by: chengyusong1 <chengyusong1@jd.com>	2026-05-02 14:08:57 +08:00
Kangyan-Zhou	cd27baaffd	[ci][cu13] Bump torch_memory_saver to 0.0.9.post1; restore manual tests (#23182 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 22:50:38 -07:00
Mick	b7d4647568	[diffusion] CI: change ground truth repo (#24219 )	2026-05-01 21:25:40 -07:00
Sam Shleifer	63f225ca2e	[session] fix mamba pool leak in StreamingSession.release_session + plumb idle leak check (#23496 )	2026-05-02 11:38:08 +08:00
Sam Shleifer	d41e8c459d	Support RunAI loading for quantized checkpoints (#23850 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Sam Shleifer <sam@thinkingmachines.ai>	2026-05-02 11:11:40 +08:00
Johnsonms	4c2ed9a254	Flux2 nvfp4 quantization correctness on Blackwell (B200) (#23625 )	2026-05-02 09:57:35 +08:00
Aurick Qiao	bfccc8e504	Allow configuring NIXL backend parameters from env (#24169 )	2026-05-01 18:30:43 -07:00
Mick	193b977572	[diffusion] chore: clean scheduler (#24229 )	2026-05-02 09:30:06 +08:00
Liangsheng Yin	cb8fbd53fc	Reserve slot 0 as padding in all req pools (#24243 )	2026-05-01 16:41:36 -07:00
Cheng Wan	b47fab6f5d	[bugfix] Support MIXED forward mode in TBO splitter for DP attention (#24241 )	2026-05-01 16:01:23 -07:00
Lucia Fang	05de73efd1	[core/model] Use explicit model arch for Llama4 attention backend auto-selection (#24232 )	2026-05-01 15:49:30 -07:00
Liangsheng Yin	8a530468fd	[Bug] Size mamba mappings from req pool, not mamba pool (#24244 )	2026-05-01 15:45:20 -07:00
Yuxuan Zhang	79bc2505a5	[Bug Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5.1 (#23037 )	2026-05-01 13:53:52 -07:00
Lucia Fang	b58fa60a1f	[core/attention] Add SGLANG_FLASHINFER_USE_PAGED env to force paged wrapper (#24165 )	2026-05-01 12:52:46 -07:00
Lianmin Zheng	ece8a1a788	Refactor device timer, clean up metrics collector, and add fwd occupancy metric (#24197 )	2026-05-01 10:25:25 -07:00
JINZ	4a50cd781e	[BugFix][HiMamba] Fix host-protected node deletion in HiMamba tombstone del (#23696 ) Co-authored-by: diemchai <diemchai@tencent.com> Co-authored-by: Zhangheng <hzh0425@apache.org>	2026-05-01 21:57:47 +08:00
ishandhanani	5b7ce417d0	[P/D disagg] - support decode side radix cache (#19746 )	2026-05-01 21:55:34 +08:00
Cheng Wan	d48095ba53	Bypass torch.cuda.use_mem_pool generator-CM in SymmetricMemoryContext (#24190 )	2026-05-01 01:25:49 -07:00
Lianmin Zheng	d9e8a4a7f8	[SWA] Ensure we use pre-computed SWA cache location during prefill (#24138 ) Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Yinghai Lu <yinghai@meta.com>	2026-05-01 00:01:49 -07:00
Yanbin Jiang	8975479f87	[LoRA][MOE] Fix EP correctness in MoE LoRA slicing and virtual-experts kernels (#24171 )	2026-04-30 22:42:10 -07:00
Mick	9d84268705	[diffusion] refactor: introduce component residency manager (#23771 )	2026-05-01 11:10:41 +08:00
Cheng Wan	108bfd8b6a	[MoE] Add Aiter MoE runner backend and purge aiter.fused_moe from quant methods (#23597 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 19:50:52 -07:00
Yilong Zhao	f67292539f	spec: gate dp mlp sync with server args (#24177 )	2026-04-30 16:29:41 -07:00
Polisetty V R K Jyothendra Varma	da7f890788	[Intel GPU] Integrate flash_mla_decode in Intel XPU attention backend (#23557 ) Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-05-01 07:21:28 +08:00
shubham singhal	e35ac95cdc	[Test] Add XPU device support to unit tests (#22236 ) Co-authored-by: vshekhawat-hlab <vshekhawat@habana.ai> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-05-01 07:18:51 +08:00
Roopak Srivastava	9c5cad3914	Use device-agnostic helpers for Mamba tests and core ops (#20234 ) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-05-01 07:14:53 +08:00
Kalyan Kumar	8a9e424faa	Replace hardcoded CUDA device with get_device() for XPU support (#13599 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-05-01 07:13:46 +08:00

1 2 3 4 5 ...

8107 Commits