Lianmin Zheng
|
44e67c6835
|
Remove deprecated double sparsity feature (#23009)
|
2026-04-17 13:33:12 -07:00 |
|
Jan Bernlöhr
|
04a53955b9
|
feat: add coordinated checkpoint prefetch for network filesystem loading (#20843)
|
2026-04-16 20:08:19 -07:00 |
|
Khoa Pham
|
f836658077
|
[Spec][Ngram] 4/N: Remove max_match_window_size and min_match_window_size, matching all suffixes of the Trie (#21225)
|
2026-04-01 22:09:46 -07:00 |
|
David Cheung
|
ed427e1299
|
Migrate all callers from /get_server_info to /server_info (#21463)
|
2026-04-01 21:17:50 -07:00 |
|
Noa Neria
|
8d9145d97e
|
Direct model loading from object storage with Runai Model Streamer (#17948)
Signed-off-by: Noa Neria <noa@run.ai>
|
2026-04-01 18:41:22 -07:00 |
|
Aishwarya Ramasethu
|
c32ee48886
|
MFU metrics in Prometheus (#19395)
|
2026-03-29 23:40:06 -07:00 |
|
Baizhou Zhang
|
edd4d54023
|
[Clean] Remove deprecated environs (#21536)
|
2026-03-28 00:35:44 -07:00 |
|
kpham-sgl
|
bc4aaab6a1
|
[Spec][Ngram] 2/N: Rename branch length to max trie depth (#21181)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-22 23:35:25 -07:00 |
|
kpham-sgl
|
6d160b42bb
|
[Spec][Ngram] 1/N: Reference based Speculative Decoding refactor (#20393)
|
2026-03-22 00:55:10 -07:00 |
|
Kangyan-Zhou
|
3d8fc9a0ca
|
Revert "[Nvidia] Add trtllm mnnvl allreduce with unified flashinfer allreduce fusion api" (#20792)
|
2026-03-17 11:59:02 -07:00 |
|
Shu Wang
|
d35fea1b2b
|
[Nvidia] Add trtllm mnnvl allreduce with unified flashinfer allreduce fusion api (#12787)
|
2026-03-17 10:02:45 -07:00 |
|
Teng Ma
|
7c498a6538
|
[DOC] add documents for encoder global mm cache (#20636)
|
2026-03-15 16:44:21 -07:00 |
|
Liangsheng Yin
|
fc7f9c1de7
|
Rename --stream-output to --incremental-streaming-output (#20614)
|
2026-03-14 23:22:33 -07:00 |
|
Yoray Zack
|
9991debde3
|
[Feature] Integrate Elastic NIXL-EP into SGLang (#19248)
Signed-off-by: Barak Biber <bbiber@nvidia.com>
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Co-authored-by: Barak Biber <bbiber@nvidia.com>
|
2026-03-11 17:37:43 +08:00 |
|
Ziang Li
|
76ee4bb98c
|
[FlashInfer v0.6.4] [RL] Integrate FlashInfer mxfp8 gemm, MoE, and routed MoE (#19537)
|
2026-03-10 15:37:57 -07:00 |
|
Brayden Zhong
|
e2af840c3d
|
Various SM120 improvements (#19721)
|
2026-03-03 16:46:13 -08:00 |
|
Yuwei An
|
0abb9f4176
|
Piecewise Cuda Graph Docs (#19738)
Signed-off-by: yuweia <ayw.sirius19@gmail.com>
Co-authored-by: Wenyao Gao <wgao11@u.rochester.edu>
|
2026-03-03 11:51:17 +08:00 |
|
Shangming Cai
|
0a6678bf3a
|
[PD] Remove unused server args for disaggregation (#19618)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-02 11:38:50 +08:00 |
|
ympcMark
|
43fade5f69
|
[4/N] (Elastic EP) Back up Expert Weights in DRAM (#17374)
Co-authored-by: UNIDY2002 <unidy2002@outlook.com>
|
2026-02-27 15:59:13 +08:00 |
|
billishyahao
|
60eeef7370
|
[AMD][with CI Fix] support two batch overlapping for mori ep (#19216)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-25 02:14:08 -08:00 |
|
Hubert Lu
|
17b0affbdf
|
[AMD] Support --enable-aiter-allreduce-fusion on AMD GPUs (#13747)
Co-authored-by: yctseng0211 <yctseng@amd.com>
|
2026-02-24 23:11:55 -08:00 |
|
Baizhou Zhang
|
43f83525c0
|
Revert "[AMD] support two batch overlapping for mori ep #17953" (#19161)
|
2026-02-23 01:19:23 +08:00 |
|
billishyahao
|
fbb6098487
|
[AMD] support two batch overlapping for mori ep (#17953)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com>
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-20 08:45:55 -08:00 |
|
Mohammad Miadh Angkad
|
2f592c3b18
|
[Doc] Add flashinfer_deepgemm to --fp8-gemm-backend (#18982)
|
2026-02-18 14:45:47 -05:00 |
|
Estrella-xx
|
1b3513a7e4
|
refactor FAKE transfer backend and remove --disaggregation-decode-enable-fake-auto parameter (#18345)
|
2026-02-16 17:27:02 +03:00 |
|
Rain Jiang
|
0ffd0a3995
|
Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389)
|
2026-02-16 09:29:54 +08:00 |
|
dongjiyingdjy
|
8b4c364960
|
refactor context parallel state (#17213)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-02-13 23:18:17 +08:00 |
|
danielafrimi
|
e422bcaed8
|
[Mamba] Add float16 support for SSM cache dtype (#18444)
|
2026-02-12 11:27:47 +08:00 |
|
qianyue76
|
f06ab17a73
|
[diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
|
2026-02-11 16:55:07 -08:00 |
|
Baizhou Zhang
|
947927bdb5
|
[V3.2] Change default CP token split method to --round-robin-split (#18613)
|
2026-02-11 20:14:35 +08:00 |
|
Mohammad Miadh Angkad
|
fddef76619
|
[Doc] Fix outdated --fp4-gemm-backend documentation (#18350)
|
2026-02-07 20:42:47 +08:00 |
|
rinbaro
|
de6a03260f
|
[docs] fix misspellings & typos (#18276)
|
2026-02-05 03:35:29 +00:00 |
|
Viacheslav
|
74f716dbd7
|
Gigachat 3 tool parser and tests (#14765)
|
2026-02-02 22:28:34 -08:00 |
|
Ziang Li
|
3c9cc44ff5
|
Add mxfp8 support for online quantization, Triton dense linear, and CUTLASS MoE (#17449)
|
2026-01-29 21:33:57 +08:00 |
|
Baizhou Zhang
|
832c756549
|
[Doc] Tiny update description on torch compile (#17819)
|
2026-01-27 18:59:04 +08:00 |
|
Yi Zhong
|
08fcda2f63
|
add the fa4 mm backend and varlen func (#13539)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2026-01-23 23:12:06 +08:00 |
|
hxie
|
13f88045b3
|
configuration file support and nixl integration augmentation for hicache-storage-backend-extra-config (#16602)
|
2026-01-22 14:31:48 -08:00 |
|
zijiexia
|
4ecd9afde9
|
[Docs] Rename SGLang Router to SGLang Model Gateway (#17436)
|
2026-01-20 12:31:10 -08:00 |
|
b8zhong
|
f374623fa9
|
[Refactor] Set fp4-gemm-backend=auto on SM100 and rename fp4-gemm-backend with flashinfer_ prefix (#17309)
|
2026-01-19 20:09:07 +08:00 |
|
Glen Liu
|
ad1b4e4728
|
[Feature] overlap LoRA weight loading with compute (#15512)
|
2026-01-19 10:43:17 +08:00 |
|
Xinyuan Tong
|
2069050d3f
|
fix: Handle multiple named chat templates in HuggingFace tokenizers (#17236)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-01-18 17:20:04 +08:00 |
|
shuwenn
|
8ec160ed46
|
feature: support uvicorn access log filter(disable logging /metrics) (#15513)
|
2026-01-15 20:00:06 -08:00 |
|
shuwenn
|
9227d9f60c
|
[Docs] sort and update server_arguments.md (#17163)
|
2026-01-15 12:07:18 -05:00 |
|
Glen Liu
|
6b065298b5
|
[Docs] add routing-key to schedule-policy in docs (#17101)
|
2026-01-14 22:22:07 -05:00 |
|
shuwenn
|
cd33694585
|
feat: add --admin-api-key for finer-grained endpoint auth (#15908)
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
|
2026-01-13 20:21:55 -08:00 |
|
Ratish P
|
c0248d6f37
|
[dpc]: unify DP controller load balancing and simplify dispatch logic (#16258)
|
2026-01-11 12:38:03 +08:00 |
|
Huapeng Zhou
|
078270473a
|
[Doc] Default lora backend: csgmv (#16444)
|
2026-01-05 12:45:49 +08:00 |
|
Yongfei Xu
|
0d244116d2
|
[DeepSeek v3.2] opt Context Parallelism: support fused moe, multi batch and fp8 kvcache (#13959)
|
2026-01-02 23:49:14 +08:00 |
|
Huaixin Chang
|
c1dfbc777b
|
deprecate prefill-round-robin-balance (#16195)
Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-12-31 22:25:33 +08:00 |
|
Mufeez Amjad
|
cbff7ad985
|
dp-attention: add follow_bootstrap_room + auto load-balance; drop decode_round_robin (#16110)
|
2025-12-30 22:33:06 +08:00 |
|