sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 04:08:10 +00:00

Author	SHA1	Message	Date
Liangsheng Yin	cd2d45e220	Isolate spec V1 path in decode post-processing (#22146 )	2026-04-05 03:16:56 -07:00
R0CKSTAR	10b18b8b29	[diffusion] Add is_float64_supported to Platform (#22112 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-04-05 18:12:28 +08:00
iLeGend	5a35316417	Enable IndexCache for DeepSeek V3.2 (#21405 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-04-05 02:45:58 -07:00
Baizhou Zhang	088203454b	[Fix] Fix nightly tests (#22140 )	2026-04-05 02:26:42 -07:00
Liangsheng Yin	bd6a585605	Consolidate reasoning tests into test/registered/reasoning/ (#22139 )	2026-04-05 01:09:11 -07:00
Yuhao Yang	2b119ba388	[diffusion] fix: fix accuracy for flux series (#22059 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-05 16:03:17 +08:00
Xiancheng Meng	71544f0341	[model] support voxtral (speech-to-text) (#21635 ) Co-authored-by: mengxiancheng03 <mengxiancheng03@kuaishou.com>	2026-04-05 15:46:30 +08:00
Liangsheng Yin	904bb476d8	Migrate reasoning_tokens tests to existing server fixtures (#22102 )	2026-04-05 00:30:57 -07:00
RoyWang	dd49127fe6	[AMD]: Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5… (#21213 ) Co-authored-by: RoyWang <RoyWang@amd.com>	2026-04-04 22:13:29 -07:00
Ricardo-M-L	70658bfeb5	fix: add missing f-string prefixes in warning and assert messages (#22067 ) Co-authored-by: yuj <yuj@ztjzsoft.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 17:20:16 -07:00
Mick	efee62efa6	[diffusion] CI: improve diffusion comparison benchmark setting for realistic perf and auto-discover ut (#22086 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-04 23:20:37 +08:00
Xiaoyu Zhang	0f0f004f1f	[Benchmark] Add auto benchmark tool with YAML-driven server flag search and canonical dataset format (#21736 )	2026-04-04 21:46:58 +08:00
Xiaoyu Zhang	da25b471e3	Align diffusion nightly presets and broaden skill discovery (#22099 )	2026-04-04 21:43:52 +08:00
Liangsheng Yin	abc297521f	Fix killall_sglang missing the main sglang serve process (#22103 )	2026-04-04 03:43:08 -07:00
Muqi Li	1ad6839659	[Feature] Add Reasoning Tokens Usage (#15562 ) Signed-off-by: Muqi Li <muqi1029@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com> Co-authored-by: cklxx <1293822641@qq.com> Co-authored-by: hnyls2002 <lsyincs@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2026-04-04 02:18:10 -07:00
Baizhou Zhang	bf984ae65d	Revert "[Bugfix] Temporarily skip TRTLLM attention on (G)B300 (SM103) to avoid high-concurrency hang" (#22098 )	2026-04-04 02:17:19 -07:00
sglang-bot	46bf19cdab	chore: bump flashinfer version to 0.6.7.post2 (#22097 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-04-04 02:16:25 -07:00
narutolhy	24763256b9	[Speculative Decoding] Add FA4-based Spec Support (#21080 ) Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>	2026-04-04 02:09:45 -07:00
Yuhao Yang	34d5765e2f	[VLM] Chunk-aware ViT encoding with per-image cache and lazy device transfer (#22038 )	2026-04-04 16:55:17 +08:00
Piotr Mazurek	b5e8c4b9e3	model: support LFM2-VL (Liquid Foundation Model 2 Vision-Language) (#21230 ) Co-authored-by: Piotr Mazurek <piotr.mazurek@liquid.ai>	2026-04-04 16:36:04 +08:00
R0CKSTAR	1fb4bf3558	[diffusion] fix: validate attention backend for Ring Attention in USPAttention (#21828 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-04-04 16:24:02 +08:00
harrisonlimh	9fa12d605a	Add dsv3 router gemm benchmark on blackwell (#17707 )	2026-04-04 01:18:01 -07:00
Xiaoyu Zhang	82ea4906cf	[diffusion] Default NVFP4 to CUTLASS and add all-model shape benchmarks (#22091 )	2026-04-04 16:14:38 +08:00
Ethan (Yusheng) Su	ff8e47edf9	[5/n] Lora support cuda graph (#21647 )	2026-04-04 00:31:46 -07:00
Douglas Yang	a94c3804c2	fix: mistral embedding regression fix (#21913 )	2026-04-04 00:11:51 -07:00
Chi McIsaac	005e582d06	[diffusion] improve: norm fusion for z-image (#18762 ) Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com> Co-authored-by: yihanc <yingluosanqian@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-04 14:01:01 +08:00
Qiaolin Yu	ef13031243	Tiny fix step3.5-flash launch crash (#22076 )	2026-04-03 22:25:25 -07:00
Ziang Li	990c7590b8	[RL] Support mxfp8 DeepSeek V3 (#21280 )	2026-04-03 21:57:45 -07:00
faceless void	de9859073f	Add `--stream-response-default-include-usage` server flag (#16711 )	2026-04-03 21:36:00 -07:00
CHEN Xi	31c9d8e885	[Diffusion] Fix weight scale swizzle and add large-M kernel config for FLUX.2-dev-NVFP4 (#22064 )	2026-04-04 11:50:30 +08:00
Yilong Zhao	fe92f3563c	dp: add profile req hook (#22083 )	2026-04-03 20:47:09 -07:00
Yuxuan Zhang	b7ae3b5a9a	GLM-4.7 and GLM-4.7-Flash Loading and import format (#21851 )	2026-04-03 20:44:08 -07:00
Prozac614	db3d4f4b76	[diffusion] model: support two stage pipeline of LTX-2 (#20707 ) Co-authored-by: daiweitao <dwti614707404@163.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: GMI Xiao Jin <xiao.j@gmicloud.ai>	2026-04-04 09:37:28 +08:00
Liangsheng Yin	95cdbce34f	[Test] Extract common PD server setup into base fixture (#22080 )	2026-04-03 16:37:12 -07:00
Lawrence Wu	9593d434c4	fix: pause_generation should not populate running_batch on prefill nodes (#20273 )	2026-04-03 16:16:06 -07:00
Sundara Raman Ramachandran	90e86800f4	[Score API] Implement EngineScoreMixin for scoring functionality and refactor Tok… (#21342 )	2026-04-03 15:17:42 -07:00
Baizhou Zhang	ac1e437f6a	Revert "[Feature] JIT activation and update skills (by codex)" (#22078 )	2026-04-03 15:04:15 -07:00
Mohammad Miadh Angkad	8cb337c8ea	[Bugfix] Temporarily skip TRTLLM attention on (G)B300 (SM103) to avoid high-concurrency hang (#21906 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-04-03 14:19:13 -07:00
Yz Xiao	1d7a53dd03	[Fix] XGrammarGrammarBackend reset to clear inherited cache (#22054 )	2026-04-03 14:17:59 -07:00
sglang-bot	84118acf50	chore: bump sglang-kernel version to 0.4.1 (#22009 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-04-03 13:58:35 -07:00
Lianmin Zheng	eb407b80f3	[Kernel] Make FA3/FA4 imports lazy in FlashAttentionBackend (#22028 )	2026-04-03 13:49:00 -07:00
Brayden Zhong	6aafe756b9	Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… (#22047 )	2026-04-03 13:12:30 -07:00
Shiyan Deng	0c9dc098e7	Fix DP attention worker port binding for IPv6 support (#21917 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com>	2026-04-03 12:39:39 -07:00
Zhangheng	ed3435e37f	[HiSparse]: Optimize server args checking-HiSparse is temporarily only available for DSA models. (#22065 )	2026-04-04 02:23:56 +08:00
Mick	151f727163	[diffusion] fix: fix gated repo failing the generate cmd (#22040 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-04 00:43:11 +08:00
DarkSharpness	44e5d35703	[Feature][JIT Kernel] JIT activation and update skills (by codex) (#21766 ) Co-authored-by: weiminc <tnwilly@gmail.com>	2026-04-03 23:28:54 +08:00
Mick	030fb1c4b1	refactor: replace mm_inputs dict with MultimodalProcessorOutput (#21738 )	2026-04-03 23:26:37 +08:00
Ke Bao	9f409d0749	[CI] Adjust CI server launch timeout (#22045 )	2026-04-03 22:38:07 +08:00
Xiaoyu Zhang	ee9d922f5a	Revert "[Kernel] Fuse temperature + softmax in sampling for decode speedup" (#22046 )	2026-04-03 21:32:08 +08:00
Kangyan-Zhou	56ac9c9932	[Fix] Add _MOE_TP to graph_capture for MoE models with ep>1 (#21907 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-04-03 02:33:16 -07:00

... 6 7 8 9 10 ...

7855 Commits