Commit Graph

272 Commits

Author SHA1 Message Date
Duyi-Wang
5ddc84e33e [AMD] MORI-EP inter kernel type switch (#18437)
Co-authored-by: HAI <hixiao@gmail.com>
2026-02-15 20:59:39 -08:00
shuwenn
3299c4f9c1 [CI] feat: add early exit to wait_for_server when process dies (#18602) 2026-02-13 16:46:09 -08:00
Liangsheng Yin
e6f7a372ef Rename request timeout env vars for waiting/running stages (#18766) 2026-02-12 22:58:40 -08:00
qianyue76
f06ab17a73 [diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
2026-02-11 16:55:07 -08:00
Rishit Shivam
c850a8a41a [Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888)
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2026-02-06 13:17:51 -08:00
rinbaro
de6a03260f [docs] fix misspellings & typos (#18276) 2026-02-05 03:35:29 +00:00
Teng Ma
c8212b9fac [PD] doc: Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL and supported values (#18259)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2026-02-05 03:03:33 +00:00
Baizhou Zhang
d279520ba5 [DeepGemm] Add a flag for fast warmup (#18111) 2026-02-04 14:12:13 +08:00
Yuan Luo
afebb7ab78 Optimize custom-all-reduce (#17674)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-02-01 18:59:31 +08:00
StonyPort
2b3408ff14 feat: add forward timeout (#17831)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
2026-01-30 08:52:29 +08:00
Joe Redmond
0ff0d181ca feat: add custom request header logging (#17786) 2026-01-28 19:33:08 -08:00
kk
f1384f5293 Integration mori backend for EP a2a data communication (#17012)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
2026-01-28 19:07:34 -08:00
Trevor Morris
2c2c4e446b [NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668) 2026-01-24 22:59:55 +08:00
Nicolas Castet
48e9daadff Support symmetric memory pre-allocation to avoid fragmentation (#17089) 2026-01-23 17:57:04 +08:00
Baizhou Zhang
6ea491e439 Overlap shared experts with deepep dispatch for single batch overlap on Blackwell (#17289) 2026-01-21 02:56:55 +08:00
Baizhou Zhang
55c616427d Add flag that enables NCCL mlp sync batch for overlap scheduler (#17288) 2026-01-20 23:06:55 +08:00
b8zhong
f374623fa9 [Refactor] Set fp4-gemm-backend=auto on SM100 and rename fp4-gemm-backend with flashinfer_ prefix (#17309) 2026-01-19 20:09:07 +08:00
StonyPort
3355b6e21b feat: add request queued timeout (#17143)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-01-16 17:55:09 +08:00
siyu
068abe7e40 add doc for #14386 (#14655) 2026-01-09 22:38:51 +08:00
Baizhou Zhang
7d757d6f17 Clean Some Environment Variables for DeepSeek V32 (#15938) 2026-01-07 14:00:16 +08:00
Huaixin Chang
c1dfbc777b deprecate prefill-round-robin-balance (#16195)
Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-12-31 22:25:33 +08:00
Baizhou Zhang
656f4d69a1 Refactor fp8 nextn layer for DeepSeek nvfp4 checkpoint (#15353) 2025-12-28 11:57:09 +08:00
Liangsheng Yin
f4e835af2f Cleanup ModelRunner (#15802) 2025-12-25 18:13:30 +08:00
Huaixin Chang
0c39730b18 DP: support piggyback server load report (#11469)
Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
2025-12-25 11:35:05 +08:00
vincentzed
ac320a6f04 Move some quant args to its own section in environ variables doc (#15722)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2025-12-23 20:11:08 -08:00
Feng Su
29e8f7f9e5 multimodal: precompute hash for MultimodalDataItem (#14354)
Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>
2025-12-18 15:27:59 -08:00
Huang Lin
6abdf73f4d [Fix] Environment variable SGL_* is deprecated (#14943) 2025-12-13 19:55:43 -08:00
Liangsheng Yin
c660d8dfd0 Re-org eagle unit tests (#14909) 2025-12-12 12:25:39 +09:00
Tiance Wang
624725cb5e Move and update MindSpore docs, make it appear on the online documentation (#14861)
Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-10 23:03:50 -08:00
b8zhong
55504df2f7 Add FP8 Blockwise GEMM Backend Flag --fp8-gemm-backend (#14379) 2025-12-09 12:05:56 -08:00
Even Zhou
60d36e7be7 [NPU] chore: bump basic software version to 8.3.rc2 (#14614) 2025-12-09 09:14:27 +08:00
Baizhou Zhang
9dfa01a435 [Misc]Register and refactor some environs for dpsk-fp4 and DeepEp (#14538) 2025-12-06 12:29:16 -08:00
b8zhong
9d82340298 Revert "Revert "enable csgmv automatically on cuda"" (#14277) 2025-12-03 13:12:30 -08:00
Lianmin Zheng
bc3d2a85af [Minor] update docs (#14212) 2025-12-01 02:33:58 -08:00
Richard Chen
4addb60274 Pull Request Instructions: RL and Training Framework Integrations (#14187)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
2025-11-30 21:58:54 -08:00
vipwangerxiao
ab9a46d462 Support configuring the request limit per receiving poll (#14076)
Co-authored-by: Peng Wang <peng_wang@linux.alibaba.com>
Co-authored-by: Feng Su <225349073+sufeng-buaa@users.noreply.github.com>
2025-11-28 16:14:21 +08:00
Jimmy
ab843ced31 [Feat]Add scheduler recv skipper weights to environment configuration (#13855) 2025-11-27 18:16:11 +08:00
Tiance Wang
75222bfed9 Update MindSpore documentation (#13656)
Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-24 11:20:51 +08:00
Baizhou Zhang
8bfce9b08d [Tiny] Renaming environ for NVFP4 dispatch (#13756) 2025-11-22 00:05:20 -08:00
Qiaolin Yu
681b9e6425 Revert "enable csgmv automatically on cuda" (#13707) 2025-11-21 10:37:41 -08:00
Lianmin Zheng
99e13d189b Fix url: use https://roadmap.sglang.io for roadmap (#13733)
Co-authored-by: sglang-bot <sglangbot@gmail.com>
2025-11-21 05:42:32 -08:00
b8zhong
42028af614 enable csgmv automatically on cuda (#13600) 2025-11-20 12:53:02 -08:00
Liangsheng Yin
196b940aed [3/N] CI refactor: move some manually triggered tests. (#13448) 2025-11-19 23:06:53 +08:00
Chen Haozhe
6c2e5fcd91 [feat][Ascend][Mindspore]: support model-impl of mindspore (#9234) 2025-11-19 09:17:47 +08:00
Lianmin Zheng
7e626d12b7 Update docs (#13391)
Co-authored-by: sglang-bot <sglangbot@gmail.com>
2025-11-16 19:36:33 -08:00
Kaixi Hou
5ae0ac4244 [NVIDIA] Fix use case of SGLANG_ENABLE_FLASHINFER_GEMM (#13274)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-11-14 12:51:11 -08:00
Taishi Nakamura
bfe638f7e8 Fix broken Markdown formatting in DeepEP documentation (#13210) 2025-11-13 11:34:57 -08:00
Shu Wang
6664083522 Replace [silu_and_mul_]scaled_fp4_group_quant by Flashinfer equivalent (#12376) 2025-11-13 00:26:00 -08:00
zhanghaotong
5ded5e2729 [Feature] Trace: Support http/protobuf span exporter protocol (#12396)
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
2025-11-12 05:16:43 +00:00
vipwangerxiao
8f01a12d43 Improve overlap scheduling for better TTFT (#11856)
Co-authored-by: Peng Wang <peng_wang@linux.alibaba.com>
2025-11-12 11:24:04 +08:00