Commit Graph

301 Commits

Author SHA1 Message Date
Duyi-Wang
8c190f6b91 [AMD] Add SGLANG_MORI_MOE_MAX_INPUT_TOKENS to truncate dispatch before MoE. (#22952) 2026-04-16 23:40:15 -07:00
Jan Bernlöhr
04a53955b9 feat: add coordinated checkpoint prefetch for network filesystem loading (#20843) 2026-04-16 20:08:19 -07:00
Baizhou Zhang
d14d368191 [Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467) 2026-04-11 01:59:57 -07:00
billishyahao
1df9f4e2f6 [AMD] Add prealloc token env for mori-ep (#22329) 2026-04-09 09:34:35 -07:00
Nicolas Castet
e379befbac Add symmetric debug mode to print stack trace of comm ops with unregistered tensors (#18569) 2026-04-08 22:34:58 -07:00
Rain Jiang
1a8eb890f6 Kernels community fa3 (#20796) 2026-04-07 12:48:44 -07:00
YAMY
dc125afffb Add staging buffer CI test and documentation for heterogeneous TP (#21921)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2026-04-06 14:00:20 +08:00
Brayden Zhong
6aafe756b9 Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… (#22047) 2026-04-03 13:12:30 -07:00
Mook
991f3aa5b3 [Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+) (#19652) 2026-04-03 10:48:15 +08:00
Aishwarya Ramasethu
c32ee48886 MFU metrics in Prometheus (#19395) 2026-03-29 23:40:06 -07:00
Shu Wang
efebcab43e Support skip-softmax attention (#19089) 2026-03-28 15:55:48 -07:00
Baizhou Zhang
edd4d54023 [Clean] Remove deprecated environs (#21536) 2026-03-28 00:35:44 -07:00
Duyi-Wang
61a902ce88 [AMD][MoRI] Auto-select dispatch quantization type from MoE weight dtype. (#21040) 2026-03-24 22:53:57 -07:00
Jiaxin(Jackson) Deng
c4db64c16b Add Lychee Doc Links Check to Local and CI (#19742)
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
2026-03-24 13:48:26 -07:00
Xiaoyu Zhang
766d225fcc Add SGLang CUDA crash API logging inspired by FlashInfer (#20910) 2026-03-22 16:39:40 +08:00
Qiaolin Yu
c5d2528bff Revert "[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars." (#20797) 2026-03-17 17:28:09 -07:00
Duyi-Wang
385a35bd11 [AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars. (#20647) 2026-03-17 01:13:42 -07:00
AnonTokyo
e9fae69e5f docs: align environment variable reference with environ defaults (#20419) 2026-03-15 18:07:29 -07:00
Baizhou Zhang
39008955ff Revert "[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars." (#20602) 2026-03-14 12:12:42 -07:00
Duyi-Wang
0eea80bc00 [AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars. (#20453) 2026-03-13 14:03:17 -07:00
StonyPort
d4e68ead1d [quant] Ignore FP8 quantization layers (#20340)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-13 13:59:39 +08:00
Baizhou Zhang
be63f982b7 [V32/GLM5] Control the threshold of applying dense attention with an environ (#20062) 2026-03-09 14:36:10 -07:00
AMD-yanfeiwang
f0153ad225 [AMD][Feature] support fp4 dispatch and fp8 combine in moriep (#19757)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
2026-03-09 12:52:05 -07:00
yuyu5333
230fb55899 [Performance] Decode Offload improves the long texts performance 100% through dynamic block offload. (#17216)
Co-authored-by: zhangheng <hzh0425@apache.org>
2026-03-08 17:16:53 +08:00
shuwenn
7bd3dd9270 fix: image URL in notebook to use raw.githubusercontent.com (#20100) 2026-03-07 13:28:20 -08:00
StonyPort
806d41ab65 [quant] fix fp32 downcasting (#19844)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
2026-03-05 17:54:59 +08:00
Brayden Zhong
e2af840c3d Various SM120 improvements (#19721) 2026-03-03 16:46:13 -08:00
Feng Su
3b89302277 Refactor: observability code cleanup (#17862)
Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
2026-02-24 18:07:29 -08:00
shuwenn
0c224c3c62 docs: Embed release lookup tool into Sphinx documentation site (#19264) 2026-02-24 11:11:27 -08:00
Duyi-Wang
5ddc84e33e [AMD] MORI-EP inter kernel type switch (#18437)
Co-authored-by: HAI <hixiao@gmail.com>
2026-02-15 20:59:39 -08:00
shuwenn
3299c4f9c1 [CI] feat: add early exit to wait_for_server when process dies (#18602) 2026-02-13 16:46:09 -08:00
Liangsheng Yin
e6f7a372ef Rename request timeout env vars for waiting/running stages (#18766) 2026-02-12 22:58:40 -08:00
qianyue76
f06ab17a73 [diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
2026-02-11 16:55:07 -08:00
Rishit Shivam
c850a8a41a [Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888)
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2026-02-06 13:17:51 -08:00
rinbaro
de6a03260f [docs] fix misspellings & typos (#18276) 2026-02-05 03:35:29 +00:00
Teng Ma
c8212b9fac [PD] doc: Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL and supported values (#18259)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2026-02-05 03:03:33 +00:00
Baizhou Zhang
d279520ba5 [DeepGemm] Add a flag for fast warmup (#18111) 2026-02-04 14:12:13 +08:00
Yuan Luo
afebb7ab78 Optimize custom-all-reduce (#17674)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-02-01 18:59:31 +08:00
StonyPort
2b3408ff14 feat: add forward timeout (#17831)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
2026-01-30 08:52:29 +08:00
Joe Redmond
0ff0d181ca feat: add custom request header logging (#17786) 2026-01-28 19:33:08 -08:00
kk
f1384f5293 Integration mori backend for EP a2a data communication (#17012)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
2026-01-28 19:07:34 -08:00
Trevor Morris
2c2c4e446b [NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668) 2026-01-24 22:59:55 +08:00
Nicolas Castet
48e9daadff Support symmetric memory pre-allocation to avoid fragmentation (#17089) 2026-01-23 17:57:04 +08:00
Baizhou Zhang
6ea491e439 Overlap shared experts with deepep dispatch for single batch overlap on Blackwell (#17289) 2026-01-21 02:56:55 +08:00
Baizhou Zhang
55c616427d Add flag that enables NCCL mlp sync batch for overlap scheduler (#17288) 2026-01-20 23:06:55 +08:00
b8zhong
f374623fa9 [Refactor] Set fp4-gemm-backend=auto on SM100 and rename fp4-gemm-backend with flashinfer_ prefix (#17309) 2026-01-19 20:09:07 +08:00
StonyPort
3355b6e21b feat: add request queued timeout (#17143)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-01-16 17:55:09 +08:00
siyu
068abe7e40 add doc for #14386 (#14655) 2026-01-09 22:38:51 +08:00
Baizhou Zhang
7d757d6f17 Clean Some Environment Variables for DeepSeek V32 (#15938) 2026-01-07 14:00:16 +08:00
Huaixin Chang
c1dfbc777b deprecate prefill-round-robin-balance (#16195)
Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-12-31 22:25:33 +08:00