billishyahao
|
fbb6098487
|
[AMD] support two batch overlapping for mori ep (#17953)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com>
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-20 08:45:55 -08:00 |
|
Mohammad Miadh Angkad
|
2f592c3b18
|
[Doc] Add flashinfer_deepgemm to --fp8-gemm-backend (#18982)
|
2026-02-18 14:45:47 -05:00 |
|
Mengyang Liu
|
4f980f6f23
|
[Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2026-02-18 11:24:07 -08:00 |
|
Estrella-xx
|
1b3513a7e4
|
refactor FAKE transfer backend and remove --disaggregation-decode-enable-fake-auto parameter (#18345)
|
2026-02-16 17:27:02 +03:00 |
|
Rain Jiang
|
0ffd0a3995
|
Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389)
|
2026-02-16 09:29:54 +08:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
shuwenn
|
4cf4f0859f
|
[Doc] Convert the speculative decoding notebook to markdow (#18395)
|
2026-02-14 18:18:56 -08:00 |
|
shuwenn
|
3299c4f9c1
|
[CI] feat: add early exit to wait_for_server when process dies (#18602)
|
2026-02-13 16:46:09 -08:00 |
|
dongjiyingdjy
|
8b4c364960
|
refactor context parallel state (#17213)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-02-13 23:18:17 +08:00 |
|
danielafrimi
|
e422bcaed8
|
[Mamba] Add float16 support for SSM cache dtype (#18444)
|
2026-02-12 11:27:47 +08:00 |
|
qianyue76
|
f06ab17a73
|
[diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
|
2026-02-11 16:55:07 -08:00 |
|
Baizhou Zhang
|
947927bdb5
|
[V3.2] Change default CP token split method to --round-robin-split (#18613)
|
2026-02-11 20:14:35 +08:00 |
|
赵晨阳
|
a2c38f7796
|
Enhance SMG guide with RL rollout systems benefits (#18588)
|
2026-02-10 20:20:45 -08:00 |
|
AlexZhao
|
3167bcc01c
|
[Doc] Comprehensive Guide: Navigating DP, DPA, and SMG Best Practices (#18096)
Co-authored-by: 赵海源 <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2026-02-10 18:31:28 -08:00 |
|
Zack Yu
|
54589a2f2d
|
docs: expand and update modelopt documentation (#18479)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-09 23:09:52 +00:00 |
|
Mohammad Miadh Angkad
|
fddef76619
|
[Doc] Fix outdated --fp4-gemm-backend documentation (#18350)
|
2026-02-07 20:42:47 +08:00 |
|
shuwenn
|
ef1d0ea885
|
[Doc] add a summary section for spec decode document (#18323)
|
2026-02-05 16:34:31 -05:00 |
|
shuwenn
|
8b21dd4b77
|
[Doc] refine spec decode docs for SpecV2/STANDALONE/NGRAM (#18321)
|
2026-02-05 15:12:33 -05:00 |
|
rinbaro
|
de6a03260f
|
[docs] fix misspellings & typos (#18276)
|
2026-02-05 03:35:29 +00:00 |
|
Teng Ma
|
c8212b9fac
|
[PD] doc: Document SGLANG_MOONCAKE_CUSTOM_MEM_POOL and supported values (#18259)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-05 03:03:33 +00:00 |
|
Viacheslav
|
74f716dbd7
|
Gigachat 3 tool parser and tests (#14765)
|
2026-02-02 22:28:34 -08:00 |
|
husf
|
c52578c7fd
|
【docs】【NPU】Update Expert Parallelism docs for Ascend NPU (#17940)
|
2026-01-30 23:42:11 -05:00 |
|
Ziang Li
|
3c9cc44ff5
|
Add mxfp8 support for online quantization, Triton dense linear, and CUTLASS MoE (#17449)
|
2026-01-29 21:33:57 +08:00 |
|
kk
|
f1384f5293
|
Integration mori backend for EP a2a data communication (#17012)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-01-28 19:07:34 -08:00 |
|
Baizhou Zhang
|
832c756549
|
[Doc] Tiny update description on torch compile (#17819)
|
2026-01-27 18:59:04 +08:00 |
|
shuwenn
|
fd3b179ffd
|
[HiCache][HA 1/N] Support HiCache storage runtime attach/detach (#15892)
|
2026-01-26 19:33:19 -08:00 |
|
zijiexia
|
dd97e1fe38
|
[Docs] Add RL documentation (#17663)
Co-authored-by: JD <jaedon.guo@gmail.com>
|
2026-01-26 12:16:54 -08:00 |
|
zackyoray
|
d275d47973
|
[NIXL] Add custom NIXL backend selection for KVManager (#17146)
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
|
2026-01-26 14:35:38 +08:00 |
|
Trevor Morris
|
2c2c4e446b
|
[NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668)
|
2026-01-24 22:59:55 +08:00 |
|
Glen Liu
|
a6280b2a23
|
add documentation example for LoRA overlap loading and cleanup unused function (#17464)
|
2026-01-24 15:33:16 +08:00 |
|
Yi Zhong
|
08fcda2f63
|
add the fa4 mm backend and varlen func (#13539)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2026-01-23 23:12:06 +08:00 |
|
hxie
|
13f88045b3
|
configuration file support and nixl integration augmentation for hicache-storage-backend-extra-config (#16602)
|
2026-01-22 14:31:48 -08:00 |
|
zijiexia
|
4ecd9afde9
|
[Docs] Rename SGLang Router to SGLang Model Gateway (#17436)
|
2026-01-20 12:31:10 -08:00 |
|
Ruihuan He
|
eb38d64413
|
[Docs] Explain CUDA attention backend choices and aiter FP8 KV cache support (#17428)
|
2026-01-20 11:10:36 -08:00 |
|
Shangming Cai
|
23d765d1f2
|
[Doc] Update pipeline parallelism documentation (#17414)
|
2026-01-20 20:10:56 +08:00 |
|
b8zhong
|
f374623fa9
|
[Refactor] Set fp4-gemm-backend=auto on SM100 and rename fp4-gemm-backend with flashinfer_ prefix (#17309)
|
2026-01-19 20:09:07 +08:00 |
|
Glen Liu
|
ad1b4e4728
|
[Feature] overlap LoRA weight loading with compute (#15512)
|
2026-01-19 10:43:17 +08:00 |
|
Xinyuan Tong
|
2069050d3f
|
fix: Handle multiple named chat templates in HuggingFace tokenizers (#17236)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-01-18 17:20:04 +08:00 |
|
zijiexia
|
166396ca4c
|
[Docs] minor update on ep docs (#17242)
|
2026-01-16 18:20:47 -08:00 |
|
shuwenn
|
8ec160ed46
|
feature: support uvicorn access log filter(disable logging /metrics) (#15513)
|
2026-01-15 20:00:06 -08:00 |
|
Yi Zhong
|
d1110e1c3e
|
docs only add kimi k2 thinking and kimi linear (#15789)
|
2026-01-15 12:09:52 -05:00 |
|
shuwenn
|
9227d9f60c
|
[Docs] sort and update server_arguments.md (#17163)
|
2026-01-15 12:07:18 -05:00 |
|
Glen Liu
|
6b065298b5
|
[Docs] add routing-key to schedule-policy in docs (#17101)
|
2026-01-14 22:22:07 -05:00 |
|
fxmarty-amd
|
5af84c8af5
|
[AMD][Quantization] Add int4fp8_moe online quantization on ROCm (#7392)
Co-authored-by: Dehua Tang <dehtang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: YC Tseng <yctseng@amd.com>
|
2026-01-14 01:44:40 -08:00 |
|
shuwenn
|
cd33694585
|
feat: add --admin-api-key for finer-grained endpoint auth (#15908)
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
|
2026-01-13 20:21:55 -08:00 |
|
James
|
ae0baefb94
|
[NPU] upgrade npu mf_apater plugin (#15853)
|
2026-01-13 09:02:10 +08:00 |
|
Wenyi Xu
|
3c16c58619
|
[model-gateway] Add Redis support as a history backend (#16300)
|
2026-01-11 01:03:00 -08:00 |
|
Ratish P
|
c0248d6f37
|
[dpc]: unify DP controller load balancing and simplify dispatch logic (#16258)
|
2026-01-11 12:38:03 +08:00 |
|
Shangming Cai
|
973116e6bb
|
[Doc] Optimize pipeline parallelism doc (#16630)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-01-07 14:52:42 +08:00 |
|
Huapeng Zhou
|
078270473a
|
[Doc] Default lora backend: csgmv (#16444)
|
2026-01-05 12:45:49 +08:00 |
|