sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-04 06:17:17 +00:00

Author	SHA1	Message	Date
zackyoray	d275d47973	[NIXL] Add custom NIXL backend selection for KVManager (#17146 ) Signed-off-by: Yoray Zack <yorayz@nvidia.com>	2026-01-26 14:35:38 +08:00
Yuan Luo	1e8db18290	[Kimi-Linear] Remove duplicated code in kimi-linear (#17731 )	2026-01-26 14:20:24 +08:00
chenxu214	444b9521e4	[Bugfix]Repeated add modelslim quant_config and bugfix with "enable-piecewise-cuda-graph" on NPU (#17511 )	2026-01-26 09:51:07 +08:00
Kangyan-Zhou	592603d77b	Fix flaky streaming logprobs test by handling detokenizer text buffering (#17687 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2026-01-25 15:09:06 -08:00
Kangyan-Zhou	344eeaee90	Upload nightly test metrics to GH artifacts (#17696 )	2026-01-25 14:35:14 -08:00
Kangyan-Zhou	8d3e1ac0c8	Add an all type in pyproject.tml to include diffusion support (#17697 )	2026-01-25 12:52:13 -08:00
Kangyan-Zhou	9123491430	A few updates to the night tests (#17694 )	2026-01-25 11:20:17 -08:00
HandH1998	a883906a24	Support mxint4 flashinfer_trtllm moe gemm (#16892 )	2026-01-26 00:15:53 +08:00
Mick	b105dad5da	[diffusion] refactor: remove useless lazy-import cache-dit codes (#17659 )	2026-01-25 22:43:22 +08:00
Zhengbo Wang	fb61164f27	[Refactor] Use is_in_ci() utility in JIT kernel benchmarks (#17118 )	2026-01-25 20:40:47 +08:00
xjx471258437	9bd92ba0f6	Support PD disaggregation with different TP/DP size for Qwen3-Next (#16056 ) Co-authored-by: xjx392321 <xjx392321@alibaba-inc.com>	2026-01-25 15:34:02 +08:00
Ke Bao	30ece5e1d6	Fix swa memory pool size with spec (#17630 )	2026-01-25 14:10:43 +08:00
Mohammad Miadh Angkad	1674b9ef44	[DeepSeek-V3.2] Fix TRT-LLM NSA in target_verify/draft_extend (#17662 )	2026-01-25 13:10:14 +08:00
Alison Shao	9121f22656	Add PyTorch .bin file validation to CI weight validation (#17533 )	2026-01-24 19:18:15 -08:00
Chen Shen	59f027a8c8	[diffusion]: Fix ZImage SP sharding for caption and latent (#17301 ) Co-authored-by: rhyshen <rhyshen@tencent.com> Co-authored-by: florianzhao <florianzhao@tencent.com>	2026-01-25 10:10:48 +08:00
Xinyuan Tong	37c04c2245	fix: Refactor register_image_processor to use kwarg instead of positional arg (#17685 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-01-24 15:31:01 -08:00
Trevor Morris	2c2c4e446b	[NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668 )	2026-01-24 22:59:55 +08:00
TMC	458a43d4ac	[NPU] torch_npu profiler tensorboard path type fix (#17545 )	2026-01-24 22:55:49 +08:00
Yuan Luo	0c8165ffbd	[Kimi-Linear] Refactor Kimi-Linear to support RadixLinearAttention (#17506 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-01-24 21:27:13 +08:00
yunkchen	bf19d20d89	[Bugfix] fix TypeError when log-requests-level >=2 in prefill node warmup (#17129 )	2026-01-24 19:16:22 +08:00
Lianmin Zheng	0834f9afeb	[Auto Sync] Update test_deterministic.py (20260124) (#17665 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>	2026-01-24 02:52:47 -08:00
Glen Liu	a6280b2a23	add documentation example for LoRA overlap loading and cleanup unused function (#17464 )	2026-01-24 15:33:16 +08:00
Xiaoyu Zhang	3992a023e6	Move fa4 from sgl-kernel to jit kernel (#17353 )	2026-01-24 15:25:03 +08:00
Xiaoyu Zhang	7a4bb0d516	[Diffusion] Add diffusion time embedding to jit kernel (#17658 )	2026-01-24 14:27:08 +08:00
Ke Bao	fb683be6eb	Use attn tp group in embedding for more models (#17570 )	2026-01-24 13:37:44 +08:00
strgrb	176da1bbdd	Fix: mistake sigmoid in kda (#17508 )	2026-01-24 13:35:14 +08:00
Qi Yuhang	4c512a7d1d	[JIT Kernel]Add Some CUDA Runtime API Wrapper for JIT Kernel Header (#17588 )	2026-01-24 12:57:58 +08:00
GMI Xiao Jin	d0919be733	[diffusion] model: LTX-2 Support (2/2) (#17496 ) Co-authored-by: Fan Yin <1106310035@qq.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>	2026-01-24 12:51:37 +08:00
GMI Xiao Jin	797a9811a2	[diffusion] model: LTX-2 (1/2) (#17495 ) Co-authored-by: FlamingoPg <1106310035@qq.com>	2026-01-24 11:59:48 +08:00
Ananya	894928a951	Refactor: Extract DeepSeek common utilities into shared module (#16969 )	2026-01-24 11:29:52 +08:00
Lianmin Zheng	bc6f0b5ce7	[Auto Sync] Update logits_processor.py, test_logprobs.py (20260124) (#17664 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: yehu-ux <yehu@x.ai>	2026-01-23 17:57:41 -08:00
McZyWu	b4a611fb33	[NPU] solve accuracy problem for stablelm-2-1-6b for npu (#17470 )	2026-01-24 08:27:38 +08:00
McZyWu	8a5ed2434f	[NPU]support model MiniCPM3-4B for npu (#16866 )	2026-01-24 08:25:12 +08:00
Douglas Yang	4c7136bb36	feature: adding openai compatible API request to bench_serving (#17219 )	2026-01-23 16:04:28 -08:00
Nan Jiang	ad05782160	fix post_residual_addition more generally (#17286 )	2026-01-23 15:43:37 -08:00
R0CKSTAR	a77729a276	[MUSA][1/N] sglang.check_env (#16959 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-01-23 14:41:17 -08:00
Mansoor	bdaa3de075	Add return routed experts to the completions and chat/completions endpoints (#17434 )	2026-01-23 12:12:36 -08:00
Tiwei Bie	5438cd20ce	[DLLM] Remove cuda graph batch size limitation (#17458 )	2026-01-23 09:52:39 -08:00
Jerry Ji	010c17a133	[Refactor] Algebraic data type for nextn config + some basic refactors (#17347 )	2026-01-24 01:16:55 +08:00
Yi Zhong	08fcda2f63	add the fa4 mm backend and varlen func (#13539 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2026-01-23 23:12:06 +08:00
akhilg-nv	2fb328109f	[DeepSeek V3.2] Enable trtllm NSA with bf16 kvcache (#16758 ) Co-authored-by: DarkSharpness <76582120+DarkSharpness@users.noreply.github.com>	2026-01-23 20:26:21 +08:00
Nicolas Castet	48e9daadff	Support symmetric memory pre-allocation to avoid fragmentation (#17089 )	2026-01-23 17:57:04 +08:00
Yuzhen Zhou	2169025b77	turn off dit_layerwise_offload for wan on rocm (#17569 )	2026-01-23 15:22:42 +08:00
Lianmin Zheng	56e6652d1d	Lazy import torchao (#17626 )	2026-01-22 22:04:51 -08:00
JiaruiChang5268	c0b5a180fe	[NPU]bugfix: fix for dsv3.2 and dsvl2 (#17007 ) Co-authored-by: Hexq0210 <893781835@qq.com> Co-authored-by: liupeng374 <782420244@qq.com> Co-authored-by: cy <chenyang08056032@163.com>	2026-01-23 11:15:15 +08:00
Ke Bao	7ace64d1d8	Update mamba env setting (#17566 )	2026-01-23 11:02:32 +08:00
siyu	62e6a749b0	Skip mm feature pool init to avoid EPD OOM (#16388 )	2026-01-23 10:53:45 +08:00
MMuzzammil1	2399af5557	Bugfix: Writing to storage when write-back method is chosen (#14718 )	2026-01-22 15:08:25 -08:00
hxie	13f88045b3	configuration file support and nixl integration augmentation for hicache-storage-backend-extra-config (#16602 )	2026-01-22 14:31:48 -08:00
wufann	a921029b97	[AMD] Support ds3.2 on gfx942 platform (#17504 ) Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>	2026-01-22 13:57:08 -08:00

... 9 10 11 12 13 ...

6437 Commits