sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 04:08:10 +00:00

Author	SHA1	Message	Date
Mick	9a0fd2ff0c	[diffusion] optimize: default to in-memory loading for URL/base64 image inputs (#23118 )	2026-04-20 23:29:02 +08:00
Mick	0be6ab04dd	[diffusion] refactor: LTX2.3 code cleanup (#23207 )	2026-04-20 19:02:05 +08:00
Vladislav Nosivskoy	4028a73c10	[KV-Events] Fix kv events events publishing for CP (#22983 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>	2026-04-20 17:34:38 +08:00
Bingxu Chen	69eb95f20c	[AMD] Pin peft<0.19 in pyproject_other.toml to fix ROCm CI ImportError (#23161 ) Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-19 23:43:56 -07:00
Liangsheng Yin	a2d30d27fe	wait for reap in kill_process_tree (#23213 )	2026-04-19 23:36:33 -07:00
Bingxu Chen	ab936ce694	Revert "perf: optimize PCG inductor path for FP8 models (#21734 )" (#23159 ) Feel free to PR again.	2026-04-19 23:32:50 -07:00
Alex Nails	10e17cc55e	[gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736 )	2026-04-20 12:39:35 +08:00
Baizhou Zhang	c304d0d64d	[Refactor] Deduplicate NSA utils.py into cp_utils.py for context parallel (#22914 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-19 21:35:35 -07:00
Liangsheng Yin	eb76aaba88	[core] Always-on `StreamingSession` in `UnifiedRadixCache` (#23202 )	2026-04-19 21:19:43 -07:00
Shunkangz	e389a52cc8	Support allreduce fusion with cp (#21249 ) Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2026-04-19 21:06:00 -07:00
Liangsheng Yin	a7276b623e	integrate streaming session into UnifiedRadixCache (#23145 )	2026-04-19 20:47:41 -07:00
Byron Hsu	1cff871c67	[Bugfix] Fix DeepEP timeout when compiling DeepGeMM in EP+DP+TP (#23185 ) Co-authored-by: Byron Hsu <byronhsu@Byrons-MacBook-Pro.local> Co-authored-by: Cheng Wan <ch-wan@users.noreply.github.com>	2026-04-19 17:36:11 -07:00
Liangsheng Yin	d3ce664612	move session to python/sglang/srt/session (#23144 )	2026-04-19 17:34:19 -07:00
Baidu-AIAK	7ca3566130	Multi platform Plugin (#21388 ) Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com> Co-authored-by: Alex Nails <alex.nails@radixark.ai> Co-authored-by: Alex Nails <alexj.nails@gmail.com> Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-19 17:23:51 -07:00
inkcherry	4fa3482180	[Bugfix] Add missing http_worker_ipc in session error path (#22766 )	2026-04-19 12:47:09 -07:00
billishyahao	b74a9dd854	[AMD] fix tbo runtime error when initializing metadata for cuda graph (#22598 )	2026-04-19 12:42:48 -07:00
Baizhou Zhang	6ecd6f84db	[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119 ) Co-authored-by: Kangyan Zhou <zky314343421@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alison Shao <a.shao@wustl.edu> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-19 05:32:36 -07:00
Thomas Wang	03828f4205	[AMD] Reduce NSA indexer kernels (weights_proj, k-cache store kernel fusion) (#22850 )	2026-04-19 00:18:11 -07:00
Kehan Li	2a327f0877	Fix Qwen3.5 video processing when passing video_data in "processor_output" format (#22431 )	2026-04-19 00:04:01 +08:00
Xiaoyu Zhang	cd6ad80c00	diffusion: add HunyuanVideo GroupNorm+SiLU fast path (#22814 )	2026-04-18 23:38:49 +08:00
Xiaoyu Zhang	c6a45fab64	Qwen3next flashinfer allreduce auto enable (#22664 )	2026-04-18 22:32:41 +08:00
Yisheng Gong	4839cecbb0	[main] chore: add bias for base layer with lora (#22169 )	2026-04-18 02:07:02 -07:00
Mick	0d94c3366a	[diffusion] feat: introduce ltx-2-two-stage device manager (#22869 )	2026-04-18 11:04:33 +08:00
Xiaoyu Zhang	615d6c93b2	[codex] Add flashinfer TRTLLM backend for diffusion NVFP4 (#22717 )	2026-04-18 09:06:28 +08:00
Lianmin Zheng	9c47bbad13	Clean up bench_one_batch warning and simplify norm dispatch (#23110 )	2026-04-17 17:42:20 -07:00
Cheng Wan	5f7aee726a	refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 17:05:13 -07:00
R0CKSTAR	26ae7b8bd7	[MLX] Support radix cache (#21509 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>	2026-04-18 07:00:50 +08:00
Liangsheng Yin	09b689b407	Apply HF transformers patches from sglang init (#23103 )	2026-04-17 15:37:51 -07:00
Liangsheng Yin	573e12a7fc	Merge /get_load into /v1/loads (#23010 )	2026-04-17 13:36:51 -07:00
Lianmin Zheng	44e67c6835	Remove deprecated double sparsity feature (#23009 )	2026-04-17 13:33:12 -07:00
andyluo7	9df6107dca	[AMD] Enable DFLASH speculative decoding on ROCm (#22342 ) Signed-off-by: Andy Luo <andyluo7@users.noreply.github.com> Co-authored-by: Andy Luo <andyluo7@users.noreply.github.com>	2026-04-17 13:10:14 -07:00
shuwenn	90c76d665e	[HiCache] fix: HiCacheFile component key suffixing (#22891 ) Co-authored-by: Zhangheng <hzh0425@apache.org>	2026-04-17 13:06:28 -07:00
YC Yen-Ching Tseng	5d4e899477	[AMD] Fix AMD Multimodal Test - skip nvfp4 tests (#23045 )	2026-04-17 09:02:39 -07:00
Jincong Chen	2bac219d0c	[Perf] Precompute gemma_weight to avoid redundant add on every forward (#22673 )	2026-04-17 23:37:41 +08:00
Xiaoyu Zhang	83c5119d01	[diffusion] CI: fix ModelOpt B200 CI artifact coverage (#22955 )	2026-04-17 23:33:42 +08:00
Mick	5de89ea942	[diffusion] CI: fix auto-partition (#23076 )	2026-04-17 22:37:24 +08:00
Opher Lieber	6e3bbef568	expose num_embeddings in VocabParallelEmbeddingWithLoRA (#22547 )	2026-04-17 02:35:13 -07:00
Jonah Bernard	0d031335ed	[Pipeline Parallelism][Bug] Fix scheduler hang in pipeline parallelism setup (#23006 )	2026-04-17 14:50:47 +08:00
Duyi-Wang	8c190f6b91	[AMD] Add SGLANG_MORI_MOE_MAX_INPUT_TOKENS to truncate dispatch before MoE. (#22952 )	2026-04-16 23:40:15 -07:00
RichardoMu	7390eddf28	feat(observability): add OpenTelemetry tracing for speculative decoding (#19545 ) Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>	2026-04-17 14:01:58 +08:00
narutolhy	5fa0c6a52e	Allow piecewise CUDA graph with speculative decoding (#22128 ) Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 13:39:30 +08:00
Xiaoyu Zhang	91679d935d	[codex] Update diffusion skills (#23028 )	2026-04-17 13:29:26 +08:00
blzheng	0dcfae5553	[CPU] Add gemma4_rmsnorm_cpu kernel (#22842 ) Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-17 13:03:16 +08:00
YC Yen-Ching Tseng	f0f0148167	Revert "feat: Support MXFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#19143 )" (#23031 )	2026-04-16 21:53:25 -07:00
Zhangheng	7d47f40a96	[UnifiedRadixTree]: Add HiCache hook interface for TreeComponent (#22924 )	2026-04-17 12:09:41 +08:00
Byron Hsu	cf9845f8e3	[Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank (#22990 ) Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>	2026-04-17 11:10:31 +08:00
Jan Bernlöhr	04a53955b9	feat: add coordinated checkpoint prefetch for network filesystem loading (#20843 )	2026-04-16 20:08:19 -07:00
Yuhao Yang	a77abbe005	[VLM] Reduce GPU memory footprint of CUDA IPC MM feature transport (#22662 )	2026-04-17 10:38:36 +08:00
Yuxuan Zhang	16d11c2a10	Fix for the low-probability garbled output issue in the GLM-5 series models. (#22811 )	2026-04-17 09:52:13 +08:00
Makcum888e	e353630b57	[Diffusion] [NPU] Fix multimodal gen CI (#22879 )	2026-04-17 04:09:44 +03:00

1 2 3 4 5 ...

7834 Commits