sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 13:57:04 +00:00

Author	SHA1	Message	Date
Duyi-Wang	5ddc84e33e	[AMD] MORI-EP inter kernel type switch (#18437 ) Co-authored-by: HAI <hixiao@gmail.com>	2026-02-15 20:59:39 -08:00
shuwenn	3299c4f9c1	[CI] feat: add early exit to wait_for_server when process dies (#18602 )	2026-02-13 16:46:09 -08:00
Liangsheng Yin	e6f7a372ef	Rename request timeout env vars for waiting/running stages (#18766 )	2026-02-12 22:58:40 -08:00
qianyue76	f06ab17a73	[diffusion] docs: consolidate diffusion documentation into docs (#18095 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: JiaxinD <djx2048@gmail.com>	2026-02-11 16:55:07 -08:00
Rishit Shivam	c850a8a41a	[Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888 ) Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com> Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2026-02-06 13:17:51 -08:00
rinbaro	de6a03260f	[docs] fix misspellings & typos (#18276 )	2026-02-05 03:35:29 +00:00
Teng Ma	c8212b9fac	[PD] doc: Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL and supported values (#18259 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-02-05 03:03:33 +00:00
Baizhou Zhang	d279520ba5	[DeepGemm] Add a flag for fast warmup (#18111 )	2026-02-04 14:12:13 +08:00
Yuan Luo	afebb7ab78	Optimize custom-all-reduce (#17674 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-01 18:59:31 +08:00
StonyPort	2b3408ff14	feat: add forward timeout (#17831 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>	2026-01-30 08:52:29 +08:00
Joe Redmond	0ff0d181ca	feat: add custom request header logging (#17786 )	2026-01-28 19:33:08 -08:00
kk	f1384f5293	Integration mori backend for EP a2a data communication (#17012 ) Co-authored-by: Duyi-Wang <duyi.wang@amd.com> Co-authored-by: billishyahao <bill.he@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-01-28 19:07:34 -08:00
Trevor Morris	2c2c4e446b	[NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668 )	2026-01-24 22:59:55 +08:00
Nicolas Castet	48e9daadff	Support symmetric memory pre-allocation to avoid fragmentation (#17089 )	2026-01-23 17:57:04 +08:00
Baizhou Zhang	6ea491e439	Overlap shared experts with deepep dispatch for single batch overlap on Blackwell (#17289 )	2026-01-21 02:56:55 +08:00
Baizhou Zhang	55c616427d	Add flag that enables NCCL mlp sync batch for overlap scheduler (#17288 )	2026-01-20 23:06:55 +08:00
b8zhong	f374623fa9	[Refactor] Set `fp4-gemm-backend=auto` on SM100 and rename `fp4-gemm-backend` with `flashinfer_` prefix (#17309 )	2026-01-19 20:09:07 +08:00
StonyPort	3355b6e21b	feat: add request queued timeout (#17143 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-01-16 17:55:09 +08:00
siyu	068abe7e40	add doc for #14386 (#14655 )	2026-01-09 22:38:51 +08:00
Baizhou Zhang	7d757d6f17	Clean Some Environment Variables for DeepSeek V32 (#15938 )	2026-01-07 14:00:16 +08:00
Huaixin Chang	c1dfbc777b	deprecate prefill-round-robin-balance (#16195 ) Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2025-12-31 22:25:33 +08:00
Baizhou Zhang	656f4d69a1	Refactor fp8 nextn layer for DeepSeek nvfp4 checkpoint (#15353 )	2025-12-28 11:57:09 +08:00
Liangsheng Yin	f4e835af2f	Cleanup `ModelRunner` (#15802 )	2025-12-25 18:13:30 +08:00
Huaixin Chang	0c39730b18	DP: support piggyback server load report (#11469 ) Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>	2025-12-25 11:35:05 +08:00
vincentzed	ac320a6f04	Move some quant args to its own section in environ variables doc (#15722 ) Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>	2025-12-23 20:11:08 -08:00
Feng Su	29e8f7f9e5	multimodal: precompute hash for MultimodalDataItem (#14354 ) Signed-off-by: Feng Su <sufeng@linux.alibaba.com> Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>	2025-12-18 15:27:59 -08:00
Huang Lin	6abdf73f4d	[Fix] Environment variable SGL_* is deprecated (#14943 )	2025-12-13 19:55:43 -08:00
Liangsheng Yin	c660d8dfd0	Re-org eagle unit tests (#14909 )	2025-12-12 12:25:39 +09:00
Tiance Wang	624725cb5e	Move and update MindSpore docs, make it appear on the online documentation (#14861 ) Co-authored-by: wangtiance <tiancew@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-10 23:03:50 -08:00
b8zhong	55504df2f7	Add FP8 Blockwise GEMM Backend Flag `--fp8-gemm-backend` (#14379 )	2025-12-09 12:05:56 -08:00
Even Zhou	60d36e7be7	[NPU] chore: bump basic software version to 8.3.rc2 (#14614 )	2025-12-09 09:14:27 +08:00
Baizhou Zhang	9dfa01a435	[Misc]Register and refactor some environs for dpsk-fp4 and DeepEp (#14538 )	2025-12-06 12:29:16 -08:00
b8zhong	9d82340298	Revert "Revert "enable csgmv automatically on cuda"" (#14277 )	2025-12-03 13:12:30 -08:00
Lianmin Zheng	bc3d2a85af	[Minor] update docs (#14212 )	2025-12-01 02:33:58 -08:00
Richard Chen	4addb60274	Pull Request Instructions: RL and Training Framework Integrations (#14187 ) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-11-30 21:58:54 -08:00
vipwangerxiao	ab9a46d462	Support configuring the request limit per receiving poll (#14076 ) Co-authored-by: Peng Wang <peng_wang@linux.alibaba.com> Co-authored-by: Feng Su <225349073+sufeng-buaa@users.noreply.github.com>	2025-11-28 16:14:21 +08:00
Jimmy	ab843ced31	[Feat]Add scheduler recv skipper weights to environment configuration (#13855 )	2025-11-27 18:16:11 +08:00
Tiance Wang	75222bfed9	Update MindSpore documentation (#13656 ) Co-authored-by: wangtiance <tiancew@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-24 11:20:51 +08:00
Baizhou Zhang	8bfce9b08d	[Tiny] Renaming environ for NVFP4 dispatch (#13756 )	2025-11-22 00:05:20 -08:00
Qiaolin Yu	681b9e6425	Revert "enable csgmv automatically on cuda" (#13707 )	2025-11-21 10:37:41 -08:00
Lianmin Zheng	99e13d189b	Fix url: use https://roadmap.sglang.io for roadmap (#13733 ) Co-authored-by: sglang-bot <sglangbot@gmail.com>	2025-11-21 05:42:32 -08:00
b8zhong	42028af614	enable csgmv automatically on cuda (#13600 )	2025-11-20 12:53:02 -08:00
Liangsheng Yin	196b940aed	[3/N] CI refactor: move some manually triggered tests. (#13448 )	2025-11-19 23:06:53 +08:00
Chen Haozhe	6c2e5fcd91	[feat][Ascend][Mindspore]: support model-impl of mindspore (#9234 )	2025-11-19 09:17:47 +08:00
Lianmin Zheng	7e626d12b7	Update docs (#13391 ) Co-authored-by: sglang-bot <sglangbot@gmail.com>	2025-11-16 19:36:33 -08:00
Kaixi Hou	5ae0ac4244	[NVIDIA] Fix use case of SGLANG_ENABLE_FLASHINFER_GEMM (#13274 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-11-14 12:51:11 -08:00
Taishi Nakamura	bfe638f7e8	Fix broken Markdown formatting in DeepEP documentation (#13210 )	2025-11-13 11:34:57 -08:00
Shu Wang	6664083522	Replace [silu_and_mul_]scaled_fp4_group_quant by Flashinfer equivalent (#12376 )	2025-11-13 00:26:00 -08:00
zhanghaotong	5ded5e2729	[Feature] Trace: Support http/protobuf span exporter protocol (#12396 ) Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>	2025-11-12 05:16:43 +00:00
vipwangerxiao	8f01a12d43	Improve overlap scheduling for better TTFT (#11856 ) Co-authored-by: Peng Wang <peng_wang@linux.alibaba.com>	2025-11-12 11:24:04 +08:00

1 2 3 4 5 ...

272 Commits