Duyi-Wang
|
8c190f6b91
|
[AMD] Add SGLANG_MORI_MOE_MAX_INPUT_TOKENS to truncate dispatch before MoE. (#22952)
|
2026-04-16 23:40:15 -07:00 |
|
Jan Bernlöhr
|
04a53955b9
|
feat: add coordinated checkpoint prefetch for network filesystem loading (#20843)
|
2026-04-16 20:08:19 -07:00 |
|
Baizhou Zhang
|
d14d368191
|
[Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467)
|
2026-04-11 01:59:57 -07:00 |
|
billishyahao
|
1df9f4e2f6
|
[AMD] Add prealloc token env for mori-ep (#22329)
|
2026-04-09 09:34:35 -07:00 |
|
Nicolas Castet
|
e379befbac
|
Add symmetric debug mode to print stack trace of comm ops with unregistered tensors (#18569)
|
2026-04-08 22:34:58 -07:00 |
|
Rain Jiang
|
1a8eb890f6
|
Kernels community fa3 (#20796)
|
2026-04-07 12:48:44 -07:00 |
|
YAMY
|
dc125afffb
|
Add staging buffer CI test and documentation for heterogeneous TP (#21921)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-04-06 14:00:20 +08:00 |
|
Brayden Zhong
|
6aafe756b9
|
Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… (#22047)
|
2026-04-03 13:12:30 -07:00 |
|
Mook
|
991f3aa5b3
|
[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+) (#19652)
|
2026-04-03 10:48:15 +08:00 |
|
Aishwarya Ramasethu
|
c32ee48886
|
MFU metrics in Prometheus (#19395)
|
2026-03-29 23:40:06 -07:00 |
|
Shu Wang
|
efebcab43e
|
Support skip-softmax attention (#19089)
|
2026-03-28 15:55:48 -07:00 |
|
Baizhou Zhang
|
edd4d54023
|
[Clean] Remove deprecated environs (#21536)
|
2026-03-28 00:35:44 -07:00 |
|
Duyi-Wang
|
61a902ce88
|
[AMD][MoRI] Auto-select dispatch quantization type from MoE weight dtype. (#21040)
|
2026-03-24 22:53:57 -07:00 |
|
Jiaxin(Jackson) Deng
|
c4db64c16b
|
Add Lychee Doc Links Check to Local and CI (#19742)
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
|
2026-03-24 13:48:26 -07:00 |
|
Xiaoyu Zhang
|
766d225fcc
|
Add SGLang CUDA crash API logging inspired by FlashInfer (#20910)
|
2026-03-22 16:39:40 +08:00 |
|
Qiaolin Yu
|
c5d2528bff
|
Revert "[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars." (#20797)
|
2026-03-17 17:28:09 -07:00 |
|
Duyi-Wang
|
385a35bd11
|
[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars. (#20647)
|
2026-03-17 01:13:42 -07:00 |
|
AnonTokyo
|
e9fae69e5f
|
docs: align environment variable reference with environ defaults (#20419)
|
2026-03-15 18:07:29 -07:00 |
|
Baizhou Zhang
|
39008955ff
|
Revert "[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars." (#20602)
|
2026-03-14 12:12:42 -07:00 |
|
Duyi-Wang
|
0eea80bc00
|
[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars. (#20453)
|
2026-03-13 14:03:17 -07:00 |
|
StonyPort
|
d4e68ead1d
|
[quant] Ignore FP8 quantization layers (#20340)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-13 13:59:39 +08:00 |
|
Baizhou Zhang
|
be63f982b7
|
[V32/GLM5] Control the threshold of applying dense attention with an environ (#20062)
|
2026-03-09 14:36:10 -07:00 |
|
AMD-yanfeiwang
|
f0153ad225
|
[AMD][Feature] support fp4 dispatch and fp8 combine in moriep (#19757)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
|
2026-03-09 12:52:05 -07:00 |
|
yuyu5333
|
230fb55899
|
[Performance] Decode Offload improves the long texts performance 100% through dynamic block offload. (#17216)
Co-authored-by: zhangheng <hzh0425@apache.org>
|
2026-03-08 17:16:53 +08:00 |
|
shuwenn
|
7bd3dd9270
|
fix: image URL in notebook to use raw.githubusercontent.com (#20100)
|
2026-03-07 13:28:20 -08:00 |
|
StonyPort
|
806d41ab65
|
[quant] fix fp32 downcasting (#19844)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
|
2026-03-05 17:54:59 +08:00 |
|
Brayden Zhong
|
e2af840c3d
|
Various SM120 improvements (#19721)
|
2026-03-03 16:46:13 -08:00 |
|
Feng Su
|
3b89302277
|
Refactor: observability code cleanup (#17862)
Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
|
2026-02-24 18:07:29 -08:00 |
|
shuwenn
|
0c224c3c62
|
docs: Embed release lookup tool into Sphinx documentation site (#19264)
|
2026-02-24 11:11:27 -08:00 |
|
Duyi-Wang
|
5ddc84e33e
|
[AMD] MORI-EP inter kernel type switch (#18437)
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-15 20:59:39 -08:00 |
|
shuwenn
|
3299c4f9c1
|
[CI] feat: add early exit to wait_for_server when process dies (#18602)
|
2026-02-13 16:46:09 -08:00 |
|
Liangsheng Yin
|
e6f7a372ef
|
Rename request timeout env vars for waiting/running stages (#18766)
|
2026-02-12 22:58:40 -08:00 |
|
qianyue76
|
f06ab17a73
|
[diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
|
2026-02-11 16:55:07 -08:00 |
|
Rishit Shivam
|
c850a8a41a
|
[Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888)
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2026-02-06 13:17:51 -08:00 |
|
rinbaro
|
de6a03260f
|
[docs] fix misspellings & typos (#18276)
|
2026-02-05 03:35:29 +00:00 |
|
Teng Ma
|
c8212b9fac
|
[PD] doc: Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL and supported values (#18259)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-05 03:03:33 +00:00 |
|
Baizhou Zhang
|
d279520ba5
|
[DeepGemm] Add a flag for fast warmup (#18111)
|
2026-02-04 14:12:13 +08:00 |
|
Yuan Luo
|
afebb7ab78
|
Optimize custom-all-reduce (#17674)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-01 18:59:31 +08:00 |
|
StonyPort
|
2b3408ff14
|
feat: add forward timeout (#17831)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
|
2026-01-30 08:52:29 +08:00 |
|
Joe Redmond
|
0ff0d181ca
|
feat: add custom request header logging (#17786)
|
2026-01-28 19:33:08 -08:00 |
|
kk
|
f1384f5293
|
Integration mori backend for EP a2a data communication (#17012)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-01-28 19:07:34 -08:00 |
|
Trevor Morris
|
2c2c4e446b
|
[NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668)
|
2026-01-24 22:59:55 +08:00 |
|
Nicolas Castet
|
48e9daadff
|
Support symmetric memory pre-allocation to avoid fragmentation (#17089)
|
2026-01-23 17:57:04 +08:00 |
|
Baizhou Zhang
|
6ea491e439
|
Overlap shared experts with deepep dispatch for single batch overlap on Blackwell (#17289)
|
2026-01-21 02:56:55 +08:00 |
|
Baizhou Zhang
|
55c616427d
|
Add flag that enables NCCL mlp sync batch for overlap scheduler (#17288)
|
2026-01-20 23:06:55 +08:00 |
|
b8zhong
|
f374623fa9
|
[Refactor] Set fp4-gemm-backend=auto on SM100 and rename fp4-gemm-backend with flashinfer_ prefix (#17309)
|
2026-01-19 20:09:07 +08:00 |
|
StonyPort
|
3355b6e21b
|
feat: add request queued timeout (#17143)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-01-16 17:55:09 +08:00 |
|
siyu
|
068abe7e40
|
add doc for #14386 (#14655)
|
2026-01-09 22:38:51 +08:00 |
|
Baizhou Zhang
|
7d757d6f17
|
Clean Some Environment Variables for DeepSeek V32 (#15938)
|
2026-01-07 14:00:16 +08:00 |
|
Huaixin Chang
|
c1dfbc777b
|
deprecate prefill-round-robin-balance (#16195)
Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-12-31 22:25:33 +08:00 |
|