sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-04 06:17:17 +00:00

Author	SHA1	Message	Date
PiteXChen	dc7bdc7329	bugfix[schedule]: Excessive preemption occurs when preempting running requests to schedule new prefill requests. (#12494 ) Signed-off-by: CLFutureX <chenyongqyl@163.com>	2025-11-30 22:29:26 +08:00
Liangsheng Yin	0a9d64530d	Support grammar + spec + reasoning (#14163 )	2025-11-30 21:19:57 +08:00
fzyzcjy	340c613ab5	Support numactl bind for CPU and memory before process starts (#14156 )	2025-11-30 17:00:33 +08:00
fzyzcjy	36b729c2b8	Implement profiler v2 and fix stage mixture bug (#14148 )	2025-11-30 16:59:52 +08:00
Tianhao Zhou	67e6ef4b2d	feat: longcat flash add aux layers capture for eagle3 (#14161 )	2025-11-30 00:50:55 -08:00
strgrb	65ba5ab8b1	add cpp files for cpp_radix_tree to pyproject.toml. (#14052 )	2025-11-30 13:05:04 +08:00
WenhaoZhang	990023e59b	[diffusion] lora: Fix LoRA weight merging for torch.nn.Linear layers from diffusers modules (#14150 ) Co-authored-by: niehen6174 <niehen.6174@gmail.com>	2025-11-30 12:44:12 +08:00
fzyzcjy	0ae4b1ad81	Show errors when misusing env variables (#14154 )	2025-11-30 10:57:35 +08:00
fzyzcjy	94cd64a7b0	Support checking fp8 params in weight_checker (#14147 )	2025-11-30 09:08:59 +08:00
fzyzcjy	b870271a50	Fix spec v2 does not support RL update weights from tensor (#14146 )	2025-11-30 09:08:05 +08:00
fzyzcjy	22ee9b0111	Super tiny add more info in dumper (#14145 )	2025-11-30 09:07:39 +08:00
fzyzcjy	9d0e5f1f74	Tiny fix DeepGEMM precompile rank check (#14136 )	2025-11-30 09:07:17 +08:00
Kangyan-Zhou	1d3d8b3418	Fix Minimax M2 loading issue (#13956 )	2025-11-29 17:07:19 -05:00
Lianmin Zheng	155a9e7237	Fix condition for streaming output_ids in tokenizer manager (#13759 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Chang Su <chang.s.su\n@oracle.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-11-29 13:56:15 -08:00
gongwei-130	3339c81072	fix RuntimeError: RMSNorm failed with error code an illegal memory access was encountered (#14135 )	2025-11-29 12:17:41 -08:00
Yuhao Yang	f03ea34a3d	add runtime check for PyTorch 2.9.1 + CuDNN < 9.15 to prevent Conv3d performance issues (#14119 )	2025-11-29 10:05:54 -05:00
fzyzcjy	4cafc835d3	Super tiny fix typo (#14131 )	2025-11-29 21:08:31 +08:00
Mick	c6a52f4411	[diffusion] chore: add resolution shortcuts for sampling params (#14129 )	2025-11-29 18:00:21 +08:00
elvischenv	848ee57067	feat: support flashinfer kernel autotune (#12306 ) Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>	2025-11-29 00:05:37 -08:00
Cheng Wan	0fe74af563	Remove incorrect deep_gemm assertions from server_args.py (#14113 ) Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>	2025-11-28 20:25:39 -08:00
Mick	0a362d653f	[diffusion] log: unify generation performance logging (#14117 )	2025-11-29 12:21:59 +08:00
fjybiocs	143b57b805	enable piecewise cuda graph for prefill server (#13377 ) Co-authored-by: serverance.fu <serverance.fu@temu.com>	2025-11-29 12:09:26 +08:00
Yan Ru Pei	f446b51c41	fix: malformed KV events for NVIDIA Dynamo (#13488 ) Signed-off-by: PeaBrane <yanrpei@gmail.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-11-28 14:55:20 -08:00
Tomer Shmilovich	11b6217aee	Fix NIXL OBJ desciptors (#10712 ) Co-authored-by: Tomer Shmilovich <tshmilovich@login-eos01.eos.clusters.nvidia.com>	2025-11-28 11:32:07 -08:00
Yuhao Yang	841eb29d3d	[diffusion] model: support z-image (#14067 )	2025-11-28 21:48:31 +08:00
fzyzcjy	45cf575852	Fix overlap scheduler not take effect when outputing logprobs (#14096 )	2025-11-28 18:15:56 +08:00
Mick	0e8ce1e832	[diffusion] refactor: clean useless files (#14094 )	2025-11-28 18:14:00 +08:00
Lzhang-hub	ea1e9f6b3c	feat: support qwen3_vl vision model dp (#13724 )	2025-11-28 17:29:07 +08:00
Lzhang-hub	f6e37d3edb	[Bugfix] qwen2.5-vl spec decode accept_len low (#13904 ) Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>	2025-11-28 17:26:32 +08:00
vipwangerxiao	ab9a46d462	Support configuring the request limit per receiving poll (#14076 ) Co-authored-by: Peng Wang <peng_wang@linux.alibaba.com> Co-authored-by: Feng Su <225349073+sufeng-buaa@users.noreply.github.com>	2025-11-28 16:14:21 +08:00
shuwenn	621061f017	[Bugfix] input prompt was not logged (#13936 )	2025-11-28 16:00:51 +08:00
Aleksandr Krotov	7daddcdb58	Fix structural_tag tool call with null schema (#14006 )	2025-11-27 23:04:16 -08:00
Mick	951028968c	[diffusion] refactor: refactor ComponentLoader and support loading native models from diffusers and transformers (#13205 )	2025-11-28 14:17:32 +08:00
Mick	3543a04a48	[diffusion] refactor: refactor condition image resize logic (#14079 )	2025-11-28 14:06:34 +08:00
fzyzcjy	21af8e73ad	Super tiny add comments to SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK (#14048 )	2025-11-27 22:16:43 +08:00
Baizhou Zhang	7ab548ef64	[2/2] Refactor DeepGeem requant for FP8 FusedMoE on Blackwell (#13960 )	2025-11-27 09:00:26 -05:00
Yixin Dong	6350042696	feat: Naive support Spec V2 + Constrained Decoding (#13425 ) Signed-off-by: Ubospica <ubospica@gmail.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2025-11-27 20:31:46 +08:00
fzyzcjy	25758647b1	Support sanity checking weight consistency especially for RL (#13854 )	2025-11-27 20:25:12 +08:00
fzyzcjy	2bc8ee8b74	Tiny support 3D tensors in inverse_transform_scale_ue8m0 (#14002 )	2025-11-27 20:20:45 +08:00
Jimmy	ab843ced31	[Feat]Add scheduler recv skipper weights to environment configuration (#13855 )	2025-11-27 18:16:11 +08:00
Mick	6edffc6391	[diffusion] perf: improve black-forest-labs/FLUX.2-dev (#14040 )	2025-11-27 14:49:52 +08:00
gaopengff	077ca70ee4	[Intel XPU]Add xpu support for get_device_memory_capacity (#13895 )	2025-11-26 20:55:52 -08:00
Qiaolin Yu	7cb04dc0e5	Use trtllm mha decode kernel for target_verify in speculative decoding (#13976 ) Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>	2025-11-26 20:40:34 -08:00
sunxxuns	5443db8759	fix: Fix AMD CI failures with HIP layernorm and PyPI connectivity (#13814 ) Co-authored-by: root <root@mi300x8-005.atl1.do.cpe.ice.amd.com>	2025-11-27 11:30:37 +08:00
Stefan He	9f340ab1fb	[Piecewise] support disable decode cuda graph when enable piecewise cuda graph (#13965 )	2025-11-26 18:35:59 -08:00
alisonshao	6330d6641b	Fix flashinfer cutlass MoE output shape for non-FP4-packed inputs (#14028 )	2025-11-26 18:09:02 -07:00
Sam	91e8dc371a	[Feat][NVFP4] Enable NVFP4 MoE for Qwen series models (eg. Qwen3-Next) #13761 (#13761 ) Co-authored-by: Kaixi Hou <kaixih@nvidia.com>	2025-11-26 17:53:45 -07:00
Lianmin Zheng	231df4b0d4	Cleanup server args (#14027 )	2025-11-26 16:32:41 -08:00
ShawnY112358	5155016b56	[feat] update bucketed weights from distributed (#13824 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-11-26 15:30:45 -08:00
Netanel Haber	082b54c689	Support nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 (and nvidia/C-RADIOv2-H) (#12277 )	2025-11-26 16:28:52 -07:00

1 2 3 4 5 ...

4840 Commits