Commit Graph

1089 Commits

Author SHA1 Message Date
Mick
6503f94211 [diffusion] feat: support passing component path via server args (#19108) 2026-02-21 21:22:47 +08:00
Mick
b89ca65789 [diffusion] refactor: reduce redundancy and improve stage api (#19060) 2026-02-21 16:35:47 +08:00
赵晨阳
e239f8aa85 Remove error dllm and diffusion doc in basic_useage (#19105) 2026-02-20 20:28:00 -08:00
billishyahao
fbb6098487 [AMD] support two batch overlapping for mori ep (#17953)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com>
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
2026-02-20 08:45:55 -08:00
chengshuang18
295bc17576 Feature/sdar support (#19044)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
2026-02-19 21:58:15 -08:00
Cheng Wan
73a7f0d049 Revert "Add SDAR model support" (#19032) 2026-02-19 16:03:56 -08:00
chengshuang18
44ab752b7a Add SDAR model support (#18318)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
2026-02-19 11:20:32 -08:00
Mohammad Miadh Angkad
2f592c3b18 [Doc] Add flashinfer_deepgemm to --fp8-gemm-backend (#18982) 2026-02-18 14:45:47 -05:00
Mengyang Liu
4f980f6f23 [Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2026-02-18 11:24:07 -08:00
HAI
934b36693c Reasoning models fix docs (#18963) 2026-02-17 23:05:55 -08:00
Makcum888e
14c95d255c [Diffusion] [NPU] [Doc] Add NPU documentation for sglang-diffusion (#18894)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-17 10:12:20 +03:00
Estrella-xx
1b3513a7e4 refactor FAKE transfer backend and remove --disaggregation-decode-enable-fake-auto parameter (#18345) 2026-02-16 17:27:02 +03:00
RAY
d85884ca57 Update ascend_npu_qwen3_5_examples.md (#18888) 2026-02-16 10:24:27 +03:00
Douglas Yang
f1efb46bdd fix: adding performance logging for nightly diffusion (#18023) 2026-02-16 14:09:00 +08:00
Duyi-Wang
5ddc84e33e [AMD] MORI-EP inter kernel type switch (#18437)
Co-authored-by: HAI <hixiao@gmail.com>
2026-02-15 20:59:39 -08:00
Rain Jiang
0ffd0a3995 Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389) 2026-02-16 09:29:54 +08:00
chenxu214
fd5a45d5cf Update ascend_npu_support.rst (#18868) 2026-02-16 01:41:38 +08:00
chenxu214
f2d72866e9 Create ascend_npu_qwen3_5_examples.md (#18864) 2026-02-16 01:15:20 +08:00
SoluMilken
07a24f1a38 update pre-commit config (#18860) 2026-02-16 00:18:31 +08:00
Bhavneek Singh
1ce3420784 Model: Support IBM Granite (Dense/Mamba + MoE) (#18040) 2026-02-15 11:24:41 +08:00
shuwenn
4cf4f0859f [Doc] Convert the speculative decoding notebook to markdow (#18395) 2026-02-14 18:18:56 -08:00
Kangyan-Zhou
3a1c388b43 Update performance dashboard for nightly tests (#18824) 2026-02-14 09:28:28 +08:00
shuwenn
3299c4f9c1 [CI] feat: add early exit to wait_for_server when process dies (#18602) 2026-02-13 16:46:09 -08:00
dongjiyingdjy
8b4c364960 refactor context parallel state (#17213)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-02-13 23:18:17 +08:00
Xinwei Qiang
356e338607 [diffusion] feat: support SparseVideoGen2 attention backend (#17507)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-13 16:20:46 +08:00
Liangsheng Yin
e6f7a372ef Rename request timeout env vars for waiting/running stages (#18766) 2026-02-12 22:58:40 -08:00
HuangJi
f4d80f9d42 [diffusion] feat: allows quality adjustment of generated images/videos (#17937) 2026-02-13 11:56:20 +08:00
BourneSun0527
f65c885e7c Modify glm5 readme on npu (#18768) 2026-02-13 11:42:40 +08:00
shuwenn
bc2405e6c1 feat: support release lookup (#18450) 2026-02-13 10:47:02 +08:00
danielafrimi
e422bcaed8 [Mamba] Add float16 support for SSM cache dtype (#18444) 2026-02-12 11:27:47 +08:00
fy
123f57b84b update glm5 readme on npu (#18657) 2026-02-12 10:37:12 +08:00
liupeng374
c34832c02c glm5 md (#18655) 2026-02-12 10:11:59 +08:00
qianyue76
f06ab17a73 [diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
2026-02-11 16:55:07 -08:00
Baizhou Zhang
947927bdb5 [V3.2] Change default CP token split method to --round-robin-split (#18613) 2026-02-11 20:14:35 +08:00
赵晨阳
a2c38f7796 Enhance SMG guide with RL rollout systems benefits (#18588) 2026-02-10 20:20:45 -08:00
AlexZhao
3167bcc01c [Doc] Comprehensive Guide: Navigating DP, DPA, and SMG Best Practices (#18096)
Co-authored-by: 赵海源 <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2026-02-10 18:31:28 -08:00
husf
573ff55814 [NPU][docs]fix bug about hyperlink for best practice for ascend npu (#18561) 2026-02-10 20:03:28 +03:00
Hexq0210
d0d387dea1 [NPU] update npu doc (#18474) 2026-02-10 16:59:13 +03:00
husf
99101ce30b [NPU][docs] improve docs for Best Practice on Ascend NPU (#18360) 2026-02-10 16:52:27 +03:00
Zack Yu
54589a2f2d docs: expand and update modelopt documentation (#18479)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-09 23:09:52 +00:00
brimon
ddbcfbaaab feature: support bidirectional attention for Gemma-3 (#10707) 2026-02-09 23:17:45 +08:00
Junlin Zhou
14652243bd [DLLM] Add JointThreshold algorithm for joint M2T and T2T decoding (#18171)
Signed-off-by: Junlin Zhou <zhoujunlin.zjl@antgroup.com>
Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>
2026-02-09 14:20:45 +08:00
Mohammad Miadh Angkad
fddef76619 [Doc] Fix outdated --fp4-gemm-backend documentation (#18350) 2026-02-07 20:42:47 +08:00
Mohammad Miadh Angkad
c47c2f9466 [Doc] Update CUDA 13 install guide to install torch first (#18404)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-02-07 18:04:37 +08:00
Hexq0210
e834b85ab6 [NPU] update npu doc (#18344) 2026-02-07 16:38:05 +08:00
Rishit Shivam
c850a8a41a [Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888)
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2026-02-06 13:17:51 -08:00
amote-i
92b8bd6833 fix npu best practice (#18330) 2026-02-05 21:14:46 -05:00
shuwenn
ef1d0ea885 [Doc] add a summary section for spec decode document (#18323) 2026-02-05 16:34:31 -05:00
shuwenn
8b21dd4b77 [Doc] refine spec decode docs for SpecV2/STANDALONE/NGRAM (#18321) 2026-02-05 15:12:33 -05:00
Kun Lin
e616d35847 Support Markdown/Notebook-Friendly Documentation Export for Downstream Integration(convert rat files to md files and save) (#18278) 2026-02-04 19:59:40 -08:00