Duyi-Wang
|
5ddc84e33e
|
[AMD] MORI-EP inter kernel type switch (#18437)
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-15 20:59:39 -08:00 |
|
shuwenn
|
3299c4f9c1
|
[CI] feat: add early exit to wait_for_server when process dies (#18602)
|
2026-02-13 16:46:09 -08:00 |
|
Liangsheng Yin
|
e6f7a372ef
|
Rename request timeout env vars for waiting/running stages (#18766)
|
2026-02-12 22:58:40 -08:00 |
|
qianyue76
|
f06ab17a73
|
[diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
|
2026-02-11 16:55:07 -08:00 |
|
Rishit Shivam
|
c850a8a41a
|
[Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888)
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2026-02-06 13:17:51 -08:00 |
|
rinbaro
|
de6a03260f
|
[docs] fix misspellings & typos (#18276)
|
2026-02-05 03:35:29 +00:00 |
|
Teng Ma
|
c8212b9fac
|
[PD] doc: Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL and supported values (#18259)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-05 03:03:33 +00:00 |
|
Baizhou Zhang
|
d279520ba5
|
[DeepGemm] Add a flag for fast warmup (#18111)
|
2026-02-04 14:12:13 +08:00 |
|
Yuan Luo
|
afebb7ab78
|
Optimize custom-all-reduce (#17674)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-01 18:59:31 +08:00 |
|
StonyPort
|
2b3408ff14
|
feat: add forward timeout (#17831)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
|
2026-01-30 08:52:29 +08:00 |
|
Joe Redmond
|
0ff0d181ca
|
feat: add custom request header logging (#17786)
|
2026-01-28 19:33:08 -08:00 |
|
kk
|
f1384f5293
|
Integration mori backend for EP a2a data communication (#17012)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-01-28 19:07:34 -08:00 |
|
Trevor Morris
|
2c2c4e446b
|
[NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668)
|
2026-01-24 22:59:55 +08:00 |
|
Nicolas Castet
|
48e9daadff
|
Support symmetric memory pre-allocation to avoid fragmentation (#17089)
|
2026-01-23 17:57:04 +08:00 |
|
Baizhou Zhang
|
6ea491e439
|
Overlap shared experts with deepep dispatch for single batch overlap on Blackwell (#17289)
|
2026-01-21 02:56:55 +08:00 |
|
Baizhou Zhang
|
55c616427d
|
Add flag that enables NCCL mlp sync batch for overlap scheduler (#17288)
|
2026-01-20 23:06:55 +08:00 |
|
b8zhong
|
f374623fa9
|
[Refactor] Set fp4-gemm-backend=auto on SM100 and rename fp4-gemm-backend with flashinfer_ prefix (#17309)
|
2026-01-19 20:09:07 +08:00 |
|
StonyPort
|
3355b6e21b
|
feat: add request queued timeout (#17143)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-01-16 17:55:09 +08:00 |
|
siyu
|
068abe7e40
|
add doc for #14386 (#14655)
|
2026-01-09 22:38:51 +08:00 |
|
Baizhou Zhang
|
7d757d6f17
|
Clean Some Environment Variables for DeepSeek V32 (#15938)
|
2026-01-07 14:00:16 +08:00 |
|
Huaixin Chang
|
c1dfbc777b
|
deprecate prefill-round-robin-balance (#16195)
Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-12-31 22:25:33 +08:00 |
|
Baizhou Zhang
|
656f4d69a1
|
Refactor fp8 nextn layer for DeepSeek nvfp4 checkpoint (#15353)
|
2025-12-28 11:57:09 +08:00 |
|
Liangsheng Yin
|
f4e835af2f
|
Cleanup ModelRunner (#15802)
|
2025-12-25 18:13:30 +08:00 |
|
Huaixin Chang
|
0c39730b18
|
DP: support piggyback server load report (#11469)
Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
|
2025-12-25 11:35:05 +08:00 |
|
vincentzed
|
ac320a6f04
|
Move some quant args to its own section in environ variables doc (#15722)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
|
2025-12-23 20:11:08 -08:00 |
|
Feng Su
|
29e8f7f9e5
|
multimodal: precompute hash for MultimodalDataItem (#14354)
Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>
|
2025-12-18 15:27:59 -08:00 |
|
Huang Lin
|
6abdf73f4d
|
[Fix] Environment variable SGL_* is deprecated (#14943)
|
2025-12-13 19:55:43 -08:00 |
|
Liangsheng Yin
|
c660d8dfd0
|
Re-org eagle unit tests (#14909)
|
2025-12-12 12:25:39 +09:00 |
|
Tiance Wang
|
624725cb5e
|
Move and update MindSpore docs, make it appear on the online documentation (#14861)
Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-10 23:03:50 -08:00 |
|
b8zhong
|
55504df2f7
|
Add FP8 Blockwise GEMM Backend Flag --fp8-gemm-backend (#14379)
|
2025-12-09 12:05:56 -08:00 |
|
Even Zhou
|
60d36e7be7
|
[NPU] chore: bump basic software version to 8.3.rc2 (#14614)
|
2025-12-09 09:14:27 +08:00 |
|
Baizhou Zhang
|
9dfa01a435
|
[Misc]Register and refactor some environs for dpsk-fp4 and DeepEp (#14538)
|
2025-12-06 12:29:16 -08:00 |
|
b8zhong
|
9d82340298
|
Revert "Revert "enable csgmv automatically on cuda"" (#14277)
|
2025-12-03 13:12:30 -08:00 |
|
Lianmin Zheng
|
bc3d2a85af
|
[Minor] update docs (#14212)
|
2025-12-01 02:33:58 -08:00 |
|
Richard Chen
|
4addb60274
|
Pull Request Instructions: RL and Training Framework Integrations (#14187)
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
|
2025-11-30 21:58:54 -08:00 |
|
vipwangerxiao
|
ab9a46d462
|
Support configuring the request limit per receiving poll (#14076)
Co-authored-by: Peng Wang <peng_wang@linux.alibaba.com>
Co-authored-by: Feng Su <225349073+sufeng-buaa@users.noreply.github.com>
|
2025-11-28 16:14:21 +08:00 |
|
Jimmy
|
ab843ced31
|
[Feat]Add scheduler recv skipper weights to environment configuration (#13855)
|
2025-11-27 18:16:11 +08:00 |
|
Tiance Wang
|
75222bfed9
|
Update MindSpore documentation (#13656)
Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-24 11:20:51 +08:00 |
|
Baizhou Zhang
|
8bfce9b08d
|
[Tiny] Renaming environ for NVFP4 dispatch (#13756)
|
2025-11-22 00:05:20 -08:00 |
|
Qiaolin Yu
|
681b9e6425
|
Revert "enable csgmv automatically on cuda" (#13707)
|
2025-11-21 10:37:41 -08:00 |
|
Lianmin Zheng
|
99e13d189b
|
Fix url: use https://roadmap.sglang.io for roadmap (#13733)
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2025-11-21 05:42:32 -08:00 |
|
b8zhong
|
42028af614
|
enable csgmv automatically on cuda (#13600)
|
2025-11-20 12:53:02 -08:00 |
|
Liangsheng Yin
|
196b940aed
|
[3/N] CI refactor: move some manually triggered tests. (#13448)
|
2025-11-19 23:06:53 +08:00 |
|
Chen Haozhe
|
6c2e5fcd91
|
[feat][Ascend][Mindspore]: support model-impl of mindspore (#9234)
|
2025-11-19 09:17:47 +08:00 |
|
Lianmin Zheng
|
7e626d12b7
|
Update docs (#13391)
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2025-11-16 19:36:33 -08:00 |
|
Kaixi Hou
|
5ae0ac4244
|
[NVIDIA] Fix use case of SGLANG_ENABLE_FLASHINFER_GEMM (#13274)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2025-11-14 12:51:11 -08:00 |
|
Taishi Nakamura
|
bfe638f7e8
|
Fix broken Markdown formatting in DeepEP documentation (#13210)
|
2025-11-13 11:34:57 -08:00 |
|
Shu Wang
|
6664083522
|
Replace [silu_and_mul_]scaled_fp4_group_quant by Flashinfer equivalent (#12376)
|
2025-11-13 00:26:00 -08:00 |
|
zhanghaotong
|
5ded5e2729
|
[Feature] Trace: Support http/protobuf span exporter protocol (#12396)
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
|
2025-11-12 05:16:43 +00:00 |
|
vipwangerxiao
|
8f01a12d43
|
Improve overlap scheduling for better TTFT (#11856)
Co-authored-by: Peng Wang <peng_wang@linux.alibaba.com>
|
2025-11-12 11:24:04 +08:00 |
|