Commit Graph

101 Commits

Author SHA1 Message Date
Xinyuan Tong
6d03861476 support Hy3 preview (#23533)
Co-authored-by: pengmeng <pengmeng@tencent.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
Co-authored-by: chengvjiang <chengvjiang@tencent.com>
Co-authored-by: russellfeng <russellfeng@tencent.com>
2026-04-24 12:03:24 -07:00
Mohammad Miadh Angkad
bcc0c65aa8 [DSA] Hopper FP8 FlashMLA KV padding (#22372) 2026-04-12 02:19:17 -07:00
Zhangheng
3d3a32c0b9 [HiSparse]: Add readme docs for HiSparse Feature (#22238) 2026-04-07 00:39:24 -07:00
Mohammad Miadh Angkad
b311db2e49 [Doc] Fix and improve DeepSeek V3.2/GLM-5 documentation (#22179) 2026-04-05 23:26:42 -07:00
Baizhou Zhang
106baedbfb [Doc] Update GLM-5 instructions in sglang documentation (#21716) 2026-04-05 03:13:07 -07:00
David Cheung
ed427e1299 Migrate all callers from /get_server_info to /server_info (#21463) 2026-04-01 21:17:50 -07:00
Артем Савкин
27071e0a43 [NPU] Update quantization&CI documentation (#21100)
Co-authored-by: Tamir Baydasov <41994229+TamirBaydasov@users.noreply.github.com>
2026-03-28 21:42:21 +03:00
SevenJ
2e65c27b29 Api add flush cache timeout (#21413)
Signed-off-by: root <wenjun7j@gmail.com>
2026-03-26 14:44:37 -07:00
Jiaxin(Jackson) Deng
c4db64c16b Add Lychee Doc Links Check to Local and CI (#19742)
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
2026-03-24 13:48:26 -07:00
Mook
2720ea2667 [Typo] Fix H200 doc links pointing to H20 section in deepseek_v3.md (#20383) 2026-03-11 13:35:20 -07:00
shuwenn
5a11ae19c1 [CI] fix: notebook ci often OOM (#20199) 2026-03-09 22:32:41 -07:00
shuwenn
7bd3dd9270 fix: image URL in notebook to use raw.githubusercontent.com (#20100) 2026-03-07 13:28:20 -08:00
Baidu-AIAK
6851613b93 [Bugfix] For cp: Fixed hang problem in prefix cache and kvcache support fp8 in-seq-split mode (#19656)
Co-authored-by: vincent <vincent@vincentdeMacBook-Pro.local>
2026-03-03 19:19:46 -08:00
Michael
6b8e62f94f [AMD] [Qwen 3.5 Day 0] Add Qwen 3.5 nightly accuracy tests (#19479) 2026-03-02 19:42:42 -08:00
Michael
403195d59d [AMD] [MiniMax-M2.5 Day 0] Add MiniMax-M2.5 nightly accuracy test (#19443) 2026-02-27 02:39:33 -08:00
赵晨阳
e239f8aa85 Remove error dllm and diffusion doc in basic_useage (#19105) 2026-02-20 20:28:00 -08:00
Rain Jiang
0ffd0a3995 Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389) 2026-02-16 09:29:54 +08:00
SoluMilken
07a24f1a38 update pre-commit config (#18860) 2026-02-16 00:18:31 +08:00
shuwenn
3299c4f9c1 [CI] feat: add early exit to wait_for_server when process dies (#18602) 2026-02-13 16:46:09 -08:00
dongjiyingdjy
8b4c364960 refactor context parallel state (#17213)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-02-13 23:18:17 +08:00
qianyue76
f06ab17a73 [diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
2026-02-11 16:55:07 -08:00
Baizhou Zhang
947927bdb5 [V3.2] Change default CP token split method to --round-robin-split (#18613) 2026-02-11 20:14:35 +08:00
Rishit Shivam
c850a8a41a [Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888)
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2026-02-06 13:17:51 -08:00
rinbaro
de6a03260f [docs] fix misspellings & typos (#18276) 2026-02-05 03:35:29 +00:00
sglang-bot
c971852ffc docs: move deepseek_ocr to popular model usage and add cookbook reference (#18120) 2026-02-02 05:45:41 -08:00
baonudesifeizhai
84ab611af8 model: support DeepSeek-OCR-2 (#17897) 2026-01-30 09:49:51 +08:00
Baizhou Zhang
1d942e4eef [DeepSeek] Update tests and document for DeepSeek V3.2 NVFP4 checkpoint (#17657) 2026-01-27 22:10:57 +08:00
Hubert Lu
df42f4d386 [AMD] Update dsv3.2 AMD GPU docs and unify ROCm TileLang build (#17783)
Co-authored-by: wufann <715544327@qq.com>
2026-01-26 21:10:32 -08:00
Mansoor
bdaa3de075 Add return routed experts to the completions and chat/completions endpoints (#17434) 2026-01-23 12:12:36 -08:00
Yi Zhong
458fe5a337 [docs] Show user the fastAPI docs available (#17510)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2026-01-21 14:26:25 +00:00
b8zhong
3d72944fb8 [Doc] Add tip on how to use Spec V2 (#15455) 2026-01-16 05:30:18 +08:00
Guy Stone
cd23c2f0a3 [Docs] add v1/score api to native api documentation (#16568) 2026-01-15 12:29:40 -05:00
ybyang
2122fea3c4 Update deepseekV32 Cp doc (#17054) 2026-01-14 11:19:26 +08:00
ybyang
aab640c99f add doc for dsv32 cp+pp (#16916) 2026-01-12 19:14:07 +08:00
hlu1
aeb480c11f Add top-p to run_eval.py (#16844) 2026-01-10 17:10:37 +08:00
Ke Bao
3aa11ca722 Remove hybrid_kvcache_ratio in server args (#16399) 2026-01-06 13:13:13 +08:00
Baizhou Zhang
f07e76b229 Multiple refactors of DeepSeek V32 and context parallel (#16305) 2026-01-03 02:21:22 +08:00
Yongfei Xu
0d244116d2 [DeepSeek v3.2] opt Context Parallelism: support fused moe, multi batch and fp8 kvcache (#13959) 2026-01-02 23:49:14 +08:00
Roger Young
5c64a20da7 Update MiniMax-M2 ToolCall and add MiniMax-M2.1 in Docs (#15538)
Co-authored-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-12-23 15:11:52 -08:00
mlmz
1f1f05a85e vlm: refactor engine vlm params and support processor output as input (#14091)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: BenYao21 <cyao22@asu.edu>
Co-authored-by: minleminzui <minleminzui@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
2025-12-20 18:31:24 +08:00
Yuxuan Zhang
b82c7a0ae7 [GLM-4.7] GLM-4.7 Tool Parser and Doc Update (#15333) 2025-12-19 20:30:44 -08:00
Yi Zhang
9d4f066fb9 Add doc for qwen3 next (#15337) 2025-12-17 17:53:07 -08:00
b8zhong
d20699a33c [Deepseek V3.2] Support Overlap Spec + NSA (#15307)
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
2025-12-17 13:35:39 -08:00
Ashton Chew
2bdbaef18e [DeepSeekV3.2] Add pure TP+MTP test (#15088)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-12-16 21:48:12 -08:00
Alison Shao
31d48d7f6f Add Ollama-compatible API endpoints + Smart Router (#14376)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-12-16 20:43:38 -08:00
almaslof
d0f756aec9 [docs] Fix kernel name (#14887) 2025-12-11 10:48:16 -05:00
Binyao Jiang
cf0478d602 [Glm46v] Bug fix for accuracy drop and unable to launch server (#14585)
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
2025-12-07 23:45:02 -08:00
George Armstrong
91c9c14c28 DOC update nemo-skills in docs (#14555)
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2025-12-06 19:03:08 -08:00
Lee Nau
5f6f550af8 Update DeepSeek V3 docs to use B200 (#14447) 2025-12-06 17:22:11 -08:00
Baizhou Zhang
42fcf5438f Revert "tiny remove deprecated endpoint call" (#14533) 2025-12-05 23:48:54 -08:00