sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 13:57:04 +00:00

Author	SHA1	Message	Date
cctry	22cf7d2b42	[Fix] Handle nixlRemoteDisconnectError in NixlKVSender (#24296 )	2026-05-05 17:23:42 -07:00
Lianmin Zheng	64f80eabbe	Register aten::rms_norm and aten::mm.dtype in batch invariant mode (#24459 )	2026-05-05 17:21:34 -07:00
Lianmin Zheng	46bde1f426	Add fwd_occupancy metric to SchedulerStats and Prometheus collector (#24458 )	2026-05-05 17:04:34 -07:00
Xinyuan Tong	1e404afec2	fix(req_pool): bump pool.size to match actual tensor row count after #24243 (#24439 )	2026-05-05 16:58:26 -07:00
Lianmin Zheng	710fed10fb	Revert "[fix] /pause_generation and /continue_generation wrong for --tokenizer-worker-num > 1" (#24461 )	2026-05-05 16:44:34 -07:00
maocheng23	431ca54334	[fix] /pause_generation and /continue_generation wrong for --tokenizer-worker-num > 1 (#24445 ) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-05 16:27:58 -07:00
Liangsheng Yin	08d4c2072b	move topk capturers to srt/state_capturer/ (#24450 ) Co-authored-by: Yueming Yuan <yym022502@gmail.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ziang Li <ziangli@umich.edu>	2026-05-05 15:54:01 -07:00
Liangsheng Yin	47a416fc62	add indexer-topk capture (V3.2 NSA + infra) (#24392 )	2026-05-05 15:05:15 -07:00
Liangsheng Yin	c4c0376fcb	consolidate routed-experts capturer onto reusable base (#24403 )	2026-05-05 12:41:49 -07:00
Mick	d23ef408f7	[diffusion] fix: fix RowParallel LoRA merged forwarding (#24410 )	2026-05-06 00:30:16 +08:00
Mick	cc54d8e8d0	[diffusion] chore: clean CUDA cache only at explicit release points (#24397 )	2026-05-05 22:30:43 +08:00
Polisetty V R K Jyothendra Varma	fdfc46f3a5	[Intel GPU] Enable DeepSeek V3.2 inference on XPU (#24356 ) Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>	2026-05-05 20:47:40 +08:00
Zhangheng	e299ec1bff	[UnifiedRadixTree]: Fix flaky ci (#24421 )	2026-05-05 20:22:19 +08:00
Khoa Pham	d22853480d	Fix deterministic inference on models with `SWAKVPool` (#24395 )	2026-05-05 20:20:46 +08:00
Bi Xue	9fb9a1cca6	[sgl] expose swa and mamba cache metrics (#24396 )	2026-05-05 20:19:50 +08:00
Xiaoyu Zhang	67e8bd7a80	[codex] Optimize Helios fused norm modulation (#24059 )	2026-05-05 19:28:37 +08:00
Xiaoyu Zhang	8c703f215e	Add HunyuanVideo ModelOpt FP8 diffusion support (#23199 )	2026-05-05 19:27:28 +08:00
billishyahao	80ccb6b93c	[AMD] fix tbo specv2 seq_lens_cpu NoneType error (#24319 )	2026-05-05 01:54:43 -07:00
Mick	177babcc38	[diffusion] optimize: fuse LTX2 split rotary embedding (#24411 )	2026-05-05 16:07:40 +08:00
Hubert Lu	c2db19ffa4	[AMD] Enable EAGLE speculative decoding for Qwen3.5 FP8 and MXFP4 models with aiter's unified attention (#23146 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: sogalin <39478626+sogalin@users.noreply.github.com>	2026-05-05 00:09:40 -07:00
Mick	04926e1d9f	[diffusion] feat: cache encoder results for default negative prompt (#24304 )	2026-05-05 11:56:01 +08:00
Mick	e483e60b72	[diffusion] CI: pin diffusion consistency GT revision (#24400 )	2026-05-05 11:53:22 +08:00
Ethan (Yusheng) Su	2b769d37a4	(2/n - prefill optimize)perf(lora): remove GPU-CPU sync barrier (.item()) in MoE LoRA path and remove duplicate code (#24246 ) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 18:11:28 -07:00
Mick	2f7d99b7f7	[diffusion] cli: support component attention backend overrides (#24320 )	2026-05-05 08:39:27 +08:00
Xiaoyu Zhang	078f84d80d	[SKILL] Add diffusion benchmark presets for edit and Hunyuan3D models (#24288 ) Co-authored-by: BBuf Codex <bbuf-codex@users.noreply.github.com>	2026-05-05 08:18:12 +08:00
Ji Zeng	4b487ca98b	[Fix] NGRAMWorker.update_weights_from_tensor — delegate to target worker (#24344 )	2026-05-04 16:23:17 -07:00
Liangsheng Yin	6a62eabed6	consolidate NSA pool construction (#24389 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-05-04 16:04:31 -07:00
Tarushii Goel	d7c93e183b	[sgl] reduce specdec cpu overhead (#23321 )	2026-05-04 15:02:03 -07:00
Liangsheng Yin	4743cf6051	misc: add marlin to moe runner choices; drop dead env var doc (#24384 ) Co-authored-by: fzyzcjy <ch271828n@outlook.com>	2026-05-04 15:01:47 -07:00
Lianmin Zheng	29dd3a36c0	Refactor device timer installation and rename prefill prealloc to bootstrap (#24341 )	2026-05-04 13:57:13 -07:00
Vladislav Nosivskoy	60a1dacd89	[HiCache] return cached_tokens_details in sglext for streaming responses (#22055 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>	2026-05-04 12:30:17 -07:00
Ke Bao	6dd7aebb36	Minor scheduler fixes (#24359 )	2026-05-05 02:01:23 +08:00
Sam Shleifer	e6f252e9b8	Cache FlashInfer autotune configs (#24156 )	2026-05-05 02:00:40 +08:00
Yuan Luo	e5c58eb9d6	[VLM] Optimize Gemma4 VLM with PCG and fuse RMSNorm + residual add + scalar (#24048 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-05-04 09:36:26 -07:00
Mick	1be3163011	[diffusion] fix: use direct all-to-all for USP collectives (#24366 )	2026-05-05 00:08:48 +08:00
Xiaoyu Zhang	4b6d44641b	[diffusion] chore: enable channels-last 3D VAE convs by default (#23200 )	2026-05-04 22:59:31 +08:00
Zhangheng	05aed5e1d5	[UnifiedRadixTree]: Add KL accuracy CI for UnifiedTree with HiCache (#24346 )	2026-05-04 20:18:10 +08:00
Liangsheng Yin	84f3b44916	[tiny] misc cleanups across configs, attention, jit_kernel (#24350 )	2026-05-04 03:17:14 -07:00
Linzhang Li	952b3caf18	feat: use structural tags to enable strict tool calling and reasoning for more models (#21722 ) Signed-off-by: Yuchuan <yuchuan.7streams@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Ubospica <ubospica@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-05-04 02:30:28 -07:00
Khoa Pham	ef2b1b6d89	Fix flashinfer workspace OOM (#24172 )	2026-05-04 01:26:35 -07:00
Ke Bao	aea527afdc	Fix swa chunk req deferred (#24318 )	2026-05-04 14:52:15 +08:00
Liangsheng Yin	a91ae6af9e	nextn subclass owns post_load_weights is_nextn (#24333 )	2026-05-03 22:04:44 -07:00
Liangsheng Yin	1dd8f6d5ae	dedup state_kv_args setup into helper (#24340 )	2026-05-03 20:45:26 -07:00
Liangsheng Yin	91fa2340ed	extract adjust_hybrid_swa_layers_for_pp (#24334 )	2026-05-03 18:52:54 -07:00
Ethan (Yusheng) Su	b7fefc0e85	feat(lora): enable csgmv backend with virtual experts for MoE LoRA (#24007 )	2026-05-03 18:44:17 -07:00
Mick	c611a3fb78	[diffusion] chore: disable VAE cpu offload by default (#24315 )	2026-05-04 08:24:51 +08:00
Liangsheng Yin	00d620b77d	introduce arg_groups/ with nemotron_h hook (#24328 )	2026-05-03 16:28:11 -07:00
Liangsheng Yin	c3b6d20a80	Register deepseek_v32 alias instead of rewriting config.json (#24295 )	2026-05-03 16:02:17 -07:00
Zhangheng	9a5450ad73	[PD]: Support incremental transfer for mooncake transfer engine (#24257 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-05-04 00:57:59 +08:00
Chi McIsaac	62265ca7fc	[diffusion] feat: initial support for dynamic batching (#18764 ) Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com> Co-authored-by: Junhao Liu <junhaoliu2023@gmail.com>	2026-05-04 00:44:42 +08:00

1 2 3 4 5 ...

8153 Commits