sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-02 21:37:11 +00:00

Author	SHA1	Message	Date
Lianmin Zheng	a80961333b	Clean up req_time_stats: reduce overhead and simplify (#22186 )	2026-04-06 14:20:51 -07:00
Qiaolin Yu	93f38fe410	tiny fix chain-style multi layer eagle comments (#22206 )	2026-04-06 13:49:03 -07:00
Tarushii Goel	8f337682bd	[sgl] potential chained spec v2 fixes (#22041 ) Co-authored-by: Mook <Godmook@users.noreply.github.com> Co-authored-by: yudian0504 <yudian0504@users.noreply.github.com>	2026-04-06 13:38:04 -07:00
Ratish P	7f2fcc0b08	[VLM]: allow Qwen3.5 models for encoder disaggregation (#21849 )	2026-04-07 02:07:24 +08:00
Aurick Qiao	3178f3959f	Align incremental streaming logprobs with streamed output tokens (#21583 )	2026-04-06 00:30:02 -07:00
Khoa Pham	12272b6791	[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425 )	2026-04-06 00:11:14 -07:00
Liangsheng Yin	6de2ff2a80	[Spec][Ngram] Followup fixes for `MatchState` incremental advance (#22180 )	2026-04-05 23:04:28 -07:00
YAMY	dc125afffb	Add staging buffer CI test and documentation for heterogeneous TP (#21921 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-04-06 14:00:20 +08:00
Khoa Pham	b2008bf9e0	[Spec][Ngram] 5/N: Store and advance anchor match state across decode steps (#21243 )	2026-04-05 22:21:05 -07:00
Mick	82c41a2d9e	[diffusion] model: support LTX2.3 (#22111 )	2026-04-06 12:26:30 +08:00
Qiaolin Yu	f407461ec8	Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype (#22006 )	2026-04-05 21:11:45 -07:00
Lianmin Zheng	e835601fb7	Cache gfx95 quant format detection in DeepseekV2DecoderLayer (#22143 )	2026-04-05 20:20:54 -07:00
Bi Xue	52801ff20c	[sgl] two potential spec_v2 bug fixes (#21589 ) Co-authored-by: yilian49 <yilian49@users.noreply.github.com>	2026-04-05 19:41:43 -07:00
Prozac614	2f00e42555	[diffusion] CI: apply diffusers backend in lora case (#22157 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-04-06 10:14:35 +08:00
Zhiqiang Xie	41c7c97ff3	fix hisparse LRU policy (#22170 ) Co-authored-by: huangtingwei9988 <huangtingwei9988@users.noreply.github.com> Co-authored-by: hzh0425 <hzh0425@users.noreply.github.com>	2026-04-05 18:47:58 -07:00
Kangyan-Zhou	93109cc89b	[Fix] Fix setuptools-scm version resolution for rc tags (#22165 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-04-05 16:55:32 -07:00
Zhiqiang Xie	30ba1f78b0	Hisparse Minor Fix (#22131 ) Co-authored-by: huangtingwei9988 <141888744+huangtingwei9988@users.noreply.github.com> Co-authored-by: hzh0425 <58988019+hzh0425@users.noreply.github.com>	2026-04-05 16:15:47 -07:00
Kangyan-Zhou	5dd2c243eb	fix: TRT-LLM MHA CUDA illegal address with EAGLE v2 + DP attention (#21649 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-04-05 09:41:14 -07:00
Baizhou Zhang	c5fa364b80	[Hotfix] Fix router gemm on sm103 (#22134 )	2026-04-05 09:33:14 -07:00
Zhangheng	51b276de74	[BugFix][RadixTree]: Fix backup invariant violation in Hi-MambaRadixTree (#22062 ) Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: linjianyu77@foxmail.com	2026-04-05 23:19:50 +08:00
Shangming Cai	dccb11881f	[PD] Fix staging warmup for GQA prefill decode different tp (#22153 )	2026-04-05 23:13:06 +08:00
Liangsheng Yin	df9c831ab8	Unify think_end_id to model_config as single source of truth (#22148 )	2026-04-05 03:35:38 -07:00
Liangsheng Yin	aeff9fb7c1	Add dump_metric to MMMU, lm-eval, and NeMo Skills eval paths (#22147 )	2026-04-05 03:23:52 -07:00
Liangsheng Yin	cd2d45e220	Isolate spec V1 path in decode post-processing (#22146 )	2026-04-05 03:16:56 -07:00
R0CKSTAR	10b18b8b29	[diffusion] Add is_float64_supported to Platform (#22112 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-04-05 18:12:28 +08:00
iLeGend	5a35316417	Enable IndexCache for DeepSeek V3.2 (#21405 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-04-05 02:45:58 -07:00
Baizhou Zhang	088203454b	[Fix] Fix nightly tests (#22140 )	2026-04-05 02:26:42 -07:00
Liangsheng Yin	bd6a585605	Consolidate reasoning tests into test/registered/reasoning/ (#22139 )	2026-04-05 01:09:11 -07:00
Yuhao Yang	2b119ba388	[diffusion] fix: fix accuracy for flux series (#22059 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-05 16:03:17 +08:00
Xiancheng Meng	71544f0341	[model] support voxtral (speech-to-text) (#21635 ) Co-authored-by: mengxiancheng03 <mengxiancheng03@kuaishou.com>	2026-04-05 15:46:30 +08:00
Liangsheng Yin	904bb476d8	Migrate reasoning_tokens tests to existing server fixtures (#22102 )	2026-04-05 00:30:57 -07:00
RoyWang	dd49127fe6	[AMD]: Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5… (#21213 ) Co-authored-by: RoyWang <RoyWang@amd.com>	2026-04-04 22:13:29 -07:00
Ricardo-M-L	70658bfeb5	fix: add missing f-string prefixes in warning and assert messages (#22067 ) Co-authored-by: yuj <yuj@ztjzsoft.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 17:20:16 -07:00
Mick	efee62efa6	[diffusion] CI: improve diffusion comparison benchmark setting for realistic perf and auto-discover ut (#22086 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-04 23:20:37 +08:00
Xiaoyu Zhang	0f0f004f1f	[Benchmark] Add auto benchmark tool with YAML-driven server flag search and canonical dataset format (#21736 )	2026-04-04 21:46:58 +08:00
Xiaoyu Zhang	da25b471e3	Align diffusion nightly presets and broaden skill discovery (#22099 )	2026-04-04 21:43:52 +08:00
Liangsheng Yin	abc297521f	Fix killall_sglang missing the main sglang serve process (#22103 )	2026-04-04 03:43:08 -07:00
Muqi Li	1ad6839659	[Feature] Add Reasoning Tokens Usage (#15562 ) Signed-off-by: Muqi Li <muqi1029@gmail.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com> Co-authored-by: cklxx <1293822641@qq.com> Co-authored-by: hnyls2002 <lsyincs@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2026-04-04 02:18:10 -07:00
Baizhou Zhang	bf984ae65d	Revert "[Bugfix] Temporarily skip TRTLLM attention on (G)B300 (SM103) to avoid high-concurrency hang" (#22098 )	2026-04-04 02:17:19 -07:00
sglang-bot	46bf19cdab	chore: bump flashinfer version to 0.6.7.post2 (#22097 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-04-04 02:16:25 -07:00
narutolhy	24763256b9	[Speculative Decoding] Add FA4-based Spec Support (#21080 ) Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>	2026-04-04 02:09:45 -07:00
Yuhao Yang	34d5765e2f	[VLM] Chunk-aware ViT encoding with per-image cache and lazy device transfer (#22038 )	2026-04-04 16:55:17 +08:00
Piotr Mazurek	b5e8c4b9e3	model: support LFM2-VL (Liquid Foundation Model 2 Vision-Language) (#21230 ) Co-authored-by: Piotr Mazurek <piotr.mazurek@liquid.ai>	2026-04-04 16:36:04 +08:00
R0CKSTAR	1fb4bf3558	[diffusion] fix: validate attention backend for Ring Attention in USPAttention (#21828 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-04-04 16:24:02 +08:00
harrisonlimh	9fa12d605a	Add dsv3 router gemm benchmark on blackwell (#17707 )	2026-04-04 01:18:01 -07:00
Xiaoyu Zhang	82ea4906cf	[diffusion] Default NVFP4 to CUTLASS and add all-model shape benchmarks (#22091 )	2026-04-04 16:14:38 +08:00
Ethan (Yusheng) Su	ff8e47edf9	[5/n] Lora support cuda graph (#21647 )	2026-04-04 00:31:46 -07:00
Douglas Yang	a94c3804c2	fix: mistral embedding regression fix (#21913 )	2026-04-04 00:11:51 -07:00
Chi McIsaac	005e582d06	[diffusion] improve: norm fusion for z-image (#18762 ) Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com> Co-authored-by: yihanc <yingluosanqian@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-04 14:01:01 +08:00
Qiaolin Yu	ef13031243	Tiny fix step3.5-flash launch crash (#22076 )	2026-04-03 22:25:25 -07:00

1 2 3 4 5 ...

7528 Commits