sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 04:08:10 +00:00

Author	SHA1	Message	Date
Thomas Wang	671fe73961	Reduce unnecessary kernels and copies in the NSA indexer (#22232 )	2026-04-07 15:37:08 -07:00
David Wang	f08726fd56	[Feature] Add DFLASH speculative decoding support (#22077 ) Co-authored-by: Jian Chen <141193260+jianc99@users.noreply.github.com> Co-authored-by: Zhijian Liu <5782437+zhijian-liu@users.noreply.github.com> Co-authored-by: Richard Gong <8001209+gongy@users.noreply.github.com> Co-authored-by: David Wang <21328423+dcw02@users.noreply.github.com> Co-authored-by: yilian49 <43861414+yilian49@users.noreply.github.com> Co-authored-by: xm:D <38322020+xiaomin-d@users.noreply.github.com>	2026-04-07 14:48:51 -07:00
Liangsheng Yin	cc35714b03	[tiny] migrate /get_server_info; print accept length in accuracy tests (#22282 )	2026-04-07 13:08:35 -07:00
Rain Jiang	1a8eb890f6	Kernels community fa3 (#20796 )	2026-04-07 12:48:44 -07:00
huangtingwei	0c204fbd57	[HiSparse] Optimize the scheduling of decode backup. (#21932 ) Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-04-07 10:34:58 -07:00
khalilzhk	6131fb5882	[NPU] enable mla prepare fused kernel only when being mla attn (#22024 )	2026-04-08 00:49:16 +08:00
Ke Bao	be42fbbbd7	Support HTTP2 server (#21700 )	2026-04-08 00:42:52 +08:00
shuwenn	ec5742f4ab	fix: Auto-correct page_size for Mamba no_buffer radix cache mode (#20538 )	2026-04-08 00:19:31 +08:00
Henson-Zh-Ali	727a182067	[Mamba] eliminate D2H if tracking mamba states (#20522 ) Co-authored-by: hzh0425 <hzh0425@apache.org>	2026-04-08 00:17:26 +08:00
YAMY	5ae00ecd48	[Disagg][NIXL] Support Mamba state slice transfer for heterogeneous TP (Step 2/2 for Qwen3.5) (#22240 )	2026-04-07 23:47:31 +08:00
Mick	e7bc23cdab	[diffusion] CI: fix consistency check (#22251 )	2026-04-07 23:43:18 +08:00
Yujun Dong	233f3e31bf	fix(pcg,mm): fix zeroing of input_embeds when replay PCG (#22229 )	2026-04-07 20:33:17 +08:00
Xingyu Liu	98f38b14df	Add registration API for external linear attention backend (#21983 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>	2026-04-07 02:47:40 -07:00
Nicolas Castet	490fa9fa44	[Perf] Restore torch.compile fusion for topk postprocessing (#21771 )	2026-04-07 01:38:38 -07:00
YAMY	3148742ddb	[Disagg][NIXL] Fix heterogeneous TP KV transfer for non-MLA models (same logic with mooncake, Step 1/2 for Qwen3.5 support) (#22145 )	2026-04-07 14:52:02 +08:00
amote-i	3f7dfba419	fix qwen2_5_math_rm_72b (#21295 )	2026-04-07 14:36:57 +08:00
Aditya Sharma	f6e85676b5	model: support qwen3-asr (#22073 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2026-04-07 13:27:05 +08:00
Chang Min Bark	a757c1e3fb	[Apple Silicon] [MLX] Add mlx and mlx-lm dependencies (#22162 ) Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>	2026-04-07 11:36:43 +08:00
Xinyuan Tong	2813cb6d9a	[New Model] Gemma 4 (#21952 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Pengyu Chen <pychen96@gmail.com> Co-authored-by: kpham-sgl <khoa.pham@radixark.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Andy Luo <andy.luo@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>	2026-04-06 20:24:44 -07:00
jianzhao-xu	73fc87a74f	fix(grok): adapt huihui-ai/grok-2 (#21522 ) Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>	2026-04-07 10:04:41 +08:00
Lianmin Zheng	494bb86169	Cache sub-objects in __getitem__ to ensure identity stability (#22184 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 18:53:38 -07:00
Prozac614	ef2d4013d7	[diffusion] CI: add consistency test (#15236 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-04-07 09:50:23 +08:00
Liangsheng Yin	e4b1366a46	[Spec][Ngram] Support multiple SAMs with dynamic HTTP API (#22203 )	2026-04-06 18:49:22 -07:00
Liangsheng Yin	49cb7d546e	Move hash utils out of hicache_storage to break CUDA import chain (#22214 )	2026-04-06 18:16:40 -07:00
AichenF	5e2b0f860c	[diffusion] perf: replace Conv3d with reshape + F.linear in PatchEmbed (#21014 )	2026-04-07 09:12:59 +08:00
shadowxz109	ae38b24cc3	[NPU] Support dp-attention for MiniMax2.5 (#20919 )	2026-04-07 08:55:37 +08:00
Trevor Morris	5cc246e095	Fix extra calls to get_numa_node_if_available to clean up logs (#21781 )	2026-04-06 16:18:40 -07:00
Lianmin Zheng	a80961333b	Clean up req_time_stats: reduce overhead and simplify (#22186 )	2026-04-06 14:20:51 -07:00
Qiaolin Yu	93f38fe410	tiny fix chain-style multi layer eagle comments (#22206 )	2026-04-06 13:49:03 -07:00
Tarushii Goel	8f337682bd	[sgl] potential chained spec v2 fixes (#22041 ) Co-authored-by: Mook <Godmook@users.noreply.github.com> Co-authored-by: yudian0504 <yudian0504@users.noreply.github.com>	2026-04-06 13:38:04 -07:00
Ratish P	7f2fcc0b08	[VLM]: allow Qwen3.5 models for encoder disaggregation (#21849 )	2026-04-07 02:07:24 +08:00
Aurick Qiao	3178f3959f	Align incremental streaming logprobs with streamed output tokens (#21583 )	2026-04-06 00:30:02 -07:00
Khoa Pham	12272b6791	[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425 )	2026-04-06 00:11:14 -07:00
Liangsheng Yin	6de2ff2a80	[Spec][Ngram] Followup fixes for `MatchState` incremental advance (#22180 )	2026-04-05 23:04:28 -07:00
YAMY	dc125afffb	Add staging buffer CI test and documentation for heterogeneous TP (#21921 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-04-06 14:00:20 +08:00
Khoa Pham	b2008bf9e0	[Spec][Ngram] 5/N: Store and advance anchor match state across decode steps (#21243 )	2026-04-05 22:21:05 -07:00
Mick	82c41a2d9e	[diffusion] model: support LTX2.3 (#22111 )	2026-04-06 12:26:30 +08:00
Qiaolin Yu	f407461ec8	Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype (#22006 )	2026-04-05 21:11:45 -07:00
Lianmin Zheng	e835601fb7	Cache gfx95 quant format detection in DeepseekV2DecoderLayer (#22143 )	2026-04-05 20:20:54 -07:00
Bi Xue	52801ff20c	[sgl] two potential spec_v2 bug fixes (#21589 ) Co-authored-by: yilian49 <yilian49@users.noreply.github.com>	2026-04-05 19:41:43 -07:00
Prozac614	2f00e42555	[diffusion] CI: apply diffusers backend in lora case (#22157 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-04-06 10:14:35 +08:00
Zhiqiang Xie	41c7c97ff3	fix hisparse LRU policy (#22170 ) Co-authored-by: huangtingwei9988 <huangtingwei9988@users.noreply.github.com> Co-authored-by: hzh0425 <hzh0425@users.noreply.github.com>	2026-04-05 18:47:58 -07:00
Kangyan-Zhou	93109cc89b	[Fix] Fix setuptools-scm version resolution for rc tags (#22165 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-04-05 16:55:32 -07:00
Zhiqiang Xie	30ba1f78b0	Hisparse Minor Fix (#22131 ) Co-authored-by: huangtingwei9988 <141888744+huangtingwei9988@users.noreply.github.com> Co-authored-by: hzh0425 <58988019+hzh0425@users.noreply.github.com>	2026-04-05 16:15:47 -07:00
Kangyan-Zhou	5dd2c243eb	fix: TRT-LLM MHA CUDA illegal address with EAGLE v2 + DP attention (#21649 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-04-05 09:41:14 -07:00
Baizhou Zhang	c5fa364b80	[Hotfix] Fix router gemm on sm103 (#22134 )	2026-04-05 09:33:14 -07:00
Zhangheng	51b276de74	[BugFix][RadixTree]: Fix backup invariant violation in Hi-MambaRadixTree (#22062 ) Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: linjianyu77@foxmail.com	2026-04-05 23:19:50 +08:00
Shangming Cai	dccb11881f	[PD] Fix staging warmup for GQA prefill decode different tp (#22153 )	2026-04-05 23:13:06 +08:00
Liangsheng Yin	df9c831ab8	Unify think_end_id to model_config as single source of truth (#22148 )	2026-04-05 03:35:38 -07:00
Liangsheng Yin	aeff9fb7c1	Add dump_metric to MMMU, lm-eval, and NeMo Skills eval paths (#22147 )	2026-04-05 03:23:52 -07:00

... 5 6 7 8 9 ...

7855 Commits