shuwenn
|
b65799cf83
|
[SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2026-04-20 14:25:04 -07:00 |
|
Lianmin Zheng
|
44e67c6835
|
Remove deprecated double sparsity feature (#23009)
|
2026-04-17 13:33:12 -07:00 |
|
Jan Bernlöhr
|
04a53955b9
|
feat: add coordinated checkpoint prefetch for network filesystem loading (#20843)
|
2026-04-16 20:08:19 -07:00 |
|
Liangsheng Yin
|
db7a751d48
|
refactor: extract FanOutCommunicator and use declarative spec table (#22967)
|
2026-04-16 15:37:19 -07:00 |
|
hhwxw
|
2480cc2a16
|
docs: fix incorrect default max-payload-size in gateway config reference (#22923)
|
2026-04-16 13:25:27 +08:00 |
|
cctry
|
f855a0bde6
|
Introduce CUDA graph debug mode with breakable CUDA graph (#19102)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Cheng Wan <chwan@rice.edu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-11 00:36:56 -07:00 |
|
Zhangheng
|
5ba7d4e523
|
[HiSparse]: Update HiSparse's user-guide (#22499)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
|
2026-04-10 15:06:43 +08:00 |
|
Zhangheng
|
3d3a32c0b9
|
[HiSparse]: Add readme docs for HiSparse Feature (#22238)
|
2026-04-07 00:39:24 -07:00 |
|
Khoa Pham
|
12272b6791
|
[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425)
|
2026-04-06 00:11:14 -07:00 |
|
YAMY
|
dc125afffb
|
Add staging buffer CI test and documentation for heterogeneous TP (#21921)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-04-06 14:00:20 +08:00 |
|
narutolhy
|
24763256b9
|
[Speculative Decoding] Add FA4-based Spec Support (#21080)
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>
|
2026-04-04 02:09:45 -07:00 |
|
Liangsheng Yin
|
f25bf86065
|
Fix ngram doc for speculative_num_draft_tokens default (#21910)
|
2026-04-01 22:18:24 -07:00 |
|
Khoa Pham
|
f836658077
|
[Spec][Ngram] 4/N: Remove max_match_window_size and min_match_window_size, matching all suffixes of the Trie (#21225)
|
2026-04-01 22:09:46 -07:00 |
|
David Cheung
|
ed427e1299
|
Migrate all callers from /get_server_info to /server_info (#21463)
|
2026-04-01 21:17:50 -07:00 |
|
Noa Neria
|
8d9145d97e
|
Direct model loading from object storage with Runai Model Streamer (#17948)
Signed-off-by: Noa Neria <noa@run.ai>
|
2026-04-01 18:41:22 -07:00 |
|
Brayden Zhong
|
6a9b09847c
|
CUTLASS NVFP4 GEMM improvement of SM120 (#21314)
|
2026-04-01 09:04:34 +08:00 |
|
Aishwarya Ramasethu
|
c32ee48886
|
MFU metrics in Prometheus (#19395)
|
2026-03-29 23:40:06 -07:00 |
|
Артем Савкин
|
27071e0a43
|
[NPU] Update quantization&CI documentation (#21100)
Co-authored-by: Tamir Baydasov <41994229+TamirBaydasov@users.noreply.github.com>
|
2026-03-28 21:42:21 +03:00 |
|
Baizhou Zhang
|
edd4d54023
|
[Clean] Remove deprecated environs (#21536)
|
2026-03-28 00:35:44 -07:00 |
|
Jiaxin(Jackson) Deng
|
c4db64c16b
|
Add Lychee Doc Links Check to Local and CI (#19742)
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
|
2026-03-24 13:48:26 -07:00 |
|
kpham-sgl
|
bc4aaab6a1
|
[Spec][Ngram] 2/N: Rename branch length to max trie depth (#21181)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-22 23:35:25 -07:00 |
|
kpham-sgl
|
6d160b42bb
|
[Spec][Ngram] 1/N: Reference based Speculative Decoding refactor (#20393)
|
2026-03-22 00:55:10 -07:00 |
|
Xinyuan Tong
|
d1e95af282
|
Upgrade transformers==5.3.0 (#17784)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Alison Shao <alisonshao@mac.lan>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-18 13:50:43 -07:00 |
|
ishandhanani
|
8f0f36c64b
|
[1/2] Add ModelExpress coordination for remote instance weight loading - matching TP (#19920)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishan Dhanani <ishan@dhanani.dev>
|
2026-03-18 13:38:32 -07:00 |
|
Kangyan-Zhou
|
3d8fc9a0ca
|
Revert "[Nvidia] Add trtllm mnnvl allreduce with unified flashinfer allreduce fusion api" (#20792)
|
2026-03-17 11:59:02 -07:00 |
|
Shu Wang
|
d35fea1b2b
|
[Nvidia] Add trtllm mnnvl allreduce with unified flashinfer allreduce fusion api (#12787)
|
2026-03-17 10:02:45 -07:00 |
|
Teng Ma
|
7c498a6538
|
[DOC] add documents for encoder global mm cache (#20636)
|
2026-03-15 16:44:21 -07:00 |
|
Mook
|
23c191afb6
|
fix(docs): correct quantization documentation (#20301) (#20619)
|
2026-03-15 12:33:12 -04:00 |
|
Liangsheng Yin
|
fc7f9c1de7
|
Rename --stream-output to --incremental-streaming-output (#20614)
|
2026-03-14 23:22:33 -07:00 |
|
Matt Van Horn
|
d093e70067
|
[Doc] Add DSA/NSA attention backend to support matrix (#20326)
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-11 13:40:35 -04:00 |
|
Yoray Zack
|
9991debde3
|
[Feature] Integrate Elastic NIXL-EP into SGLang (#19248)
Signed-off-by: Barak Biber <bbiber@nvidia.com>
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Co-authored-by: Barak Biber <bbiber@nvidia.com>
|
2026-03-11 17:37:43 +08:00 |
|
Liangsheng Yin
|
50953aea8d
|
[Scheduler] Unify idle checks into is_fully_idle() and fix weight update test (#20296)
|
2026-03-10 17:50:23 -07:00 |
|
Ziang Li
|
76ee4bb98c
|
[FlashInfer v0.6.4] [RL] Integrate FlashInfer mxfp8 gemm, MoE, and routed MoE (#19537)
|
2026-03-10 15:37:57 -07:00 |
|
shuwenn
|
5a11ae19c1
|
[CI] fix: notebook ci often OOM (#20199)
|
2026-03-09 22:32:41 -07:00 |
|
Brayden Zhong
|
591e61245a
|
[Doc] Add smal table for GEMM backends (#20213)
|
2026-03-09 22:19:57 -07:00 |
|
YEJIN KIM
|
0fd9a57d80
|
[Doc] Verify and Modify some attention backend specs (#20210)
|
2026-03-09 23:05:52 +00:00 |
|
shuwenn
|
7bd3dd9270
|
fix: image URL in notebook to use raw.githubusercontent.com (#20100)
|
2026-03-07 13:28:20 -08:00 |
|
Bruce Changlong Xu
|
feda2b11c4
|
[AMD] Add AWQ AMD CI coverage and quantization platform compatibility docs (#19550)
|
2026-03-04 19:50:55 -08:00 |
|
Brayden Zhong
|
e2af840c3d
|
Various SM120 improvements (#19721)
|
2026-03-03 16:46:13 -08:00 |
|
Sam (Kesen Li)
|
5b2e2750b5
|
Enable XQA for SM90 and SM120 (#17115)
Co-authored-by: Xiaowei Wang <100599594+xiaoweiw-nv@users.noreply.github.com>
|
2026-03-03 14:09:44 -08:00 |
|
zwang86
|
d6ac5f23cc
|
[Docs] Add GDN attention backends matrix documentation (#19755)
Co-authored-by: Zeyu Wang <zeyu.wang@yahooinc.com>
|
2026-03-03 13:00:34 -08:00 |
|
Jasonzhang517
|
d939e26585
|
[model gateway][0/N] router EPD support: add encoder grpc server backend support (#16552)
Co-authored-by: Zongyao Chen <ZongYao.Chen@linux.alibaba.com>
Co-authored-by: Zongyao Chen <solar1s@163.com>
|
2026-03-03 19:38:15 +08:00 |
|
Yuwei An
|
0abb9f4176
|
Piecewise Cuda Graph Docs (#19738)
Signed-off-by: yuweia <ayw.sirius19@gmail.com>
Co-authored-by: Wenyao Gao <wgao11@u.rochester.edu>
|
2026-03-03 11:51:17 +08:00 |
|
shuwenn
|
bdffb027a8
|
[CI] fix: handle missing repo in lora notebook (#19700)
|
2026-03-02 10:27:32 -08:00 |
|
Shangming Cai
|
0a6678bf3a
|
[PD] Remove unused server args for disaggregation (#19618)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-02 11:38:50 +08:00 |
|
shuwenn
|
e3e71f275a
|
docs: refactor speculative decoding doc (#19186)
|
2026-03-01 22:03:20 -05:00 |
|
zwang86
|
f51ddba131
|
feat: add FA4 SM90 paged KV decode support & update attention docs (#18442)
Co-authored-by: Zeyu Wang <zeyu.wang@yahooinc.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2026-03-02 09:12:19 +08:00 |
|
ympcMark
|
43fade5f69
|
[4/N] (Elastic EP) Back up Expert Weights in DRAM (#17374)
Co-authored-by: UNIDY2002 <unidy2002@outlook.com>
|
2026-02-27 15:59:13 +08:00 |
|
billishyahao
|
60eeef7370
|
[AMD][with CI Fix] support two batch overlapping for mori ep (#19216)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-25 02:14:08 -08:00 |
|
huangtingwei
|
d40cb2f725
|
[HiCache] Support heterogeneous tp for hicache storage (#18541)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-02-25 00:13:57 -08:00 |
|