Mick
|
6503f94211
|
[diffusion] feat: support passing component path via server args (#19108)
|
2026-02-21 21:22:47 +08:00 |
|
Mick
|
b89ca65789
|
[diffusion] refactor: reduce redundancy and improve stage api (#19060)
|
2026-02-21 16:35:47 +08:00 |
|
赵晨阳
|
e239f8aa85
|
Remove error dllm and diffusion doc in basic_useage (#19105)
|
2026-02-20 20:28:00 -08:00 |
|
billishyahao
|
fbb6098487
|
[AMD] support two batch overlapping for mori ep (#17953)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com>
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-20 08:45:55 -08:00 |
|
chengshuang18
|
295bc17576
|
Feature/sdar support (#19044)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
|
2026-02-19 21:58:15 -08:00 |
|
Cheng Wan
|
73a7f0d049
|
Revert "Add SDAR model support" (#19032)
|
2026-02-19 16:03:56 -08:00 |
|
chengshuang18
|
44ab752b7a
|
Add SDAR model support (#18318)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
|
2026-02-19 11:20:32 -08:00 |
|
Mohammad Miadh Angkad
|
2f592c3b18
|
[Doc] Add flashinfer_deepgemm to --fp8-gemm-backend (#18982)
|
2026-02-18 14:45:47 -05:00 |
|
Mengyang Liu
|
4f980f6f23
|
[Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2026-02-18 11:24:07 -08:00 |
|
HAI
|
934b36693c
|
Reasoning models fix docs (#18963)
|
2026-02-17 23:05:55 -08:00 |
|
Makcum888e
|
14c95d255c
|
[Diffusion] [NPU] [Doc] Add NPU documentation for sglang-diffusion (#18894)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-17 10:12:20 +03:00 |
|
Estrella-xx
|
1b3513a7e4
|
refactor FAKE transfer backend and remove --disaggregation-decode-enable-fake-auto parameter (#18345)
|
2026-02-16 17:27:02 +03:00 |
|
RAY
|
d85884ca57
|
Update ascend_npu_qwen3_5_examples.md (#18888)
|
2026-02-16 10:24:27 +03:00 |
|
Douglas Yang
|
f1efb46bdd
|
fix: adding performance logging for nightly diffusion (#18023)
|
2026-02-16 14:09:00 +08:00 |
|
Duyi-Wang
|
5ddc84e33e
|
[AMD] MORI-EP inter kernel type switch (#18437)
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-15 20:59:39 -08:00 |
|
Rain Jiang
|
0ffd0a3995
|
Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389)
|
2026-02-16 09:29:54 +08:00 |
|
chenxu214
|
fd5a45d5cf
|
Update ascend_npu_support.rst (#18868)
|
2026-02-16 01:41:38 +08:00 |
|
chenxu214
|
f2d72866e9
|
Create ascend_npu_qwen3_5_examples.md (#18864)
|
2026-02-16 01:15:20 +08:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
Bhavneek Singh
|
1ce3420784
|
Model: Support IBM Granite (Dense/Mamba + MoE) (#18040)
|
2026-02-15 11:24:41 +08:00 |
|
shuwenn
|
4cf4f0859f
|
[Doc] Convert the speculative decoding notebook to markdow (#18395)
|
2026-02-14 18:18:56 -08:00 |
|
Kangyan-Zhou
|
3a1c388b43
|
Update performance dashboard for nightly tests (#18824)
|
2026-02-14 09:28:28 +08:00 |
|
shuwenn
|
3299c4f9c1
|
[CI] feat: add early exit to wait_for_server when process dies (#18602)
|
2026-02-13 16:46:09 -08:00 |
|
dongjiyingdjy
|
8b4c364960
|
refactor context parallel state (#17213)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-02-13 23:18:17 +08:00 |
|
Xinwei Qiang
|
356e338607
|
[diffusion] feat: support SparseVideoGen2 attention backend (#17507)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-02-13 16:20:46 +08:00 |
|
Liangsheng Yin
|
e6f7a372ef
|
Rename request timeout env vars for waiting/running stages (#18766)
|
2026-02-12 22:58:40 -08:00 |
|
HuangJi
|
f4d80f9d42
|
[diffusion] feat: allows quality adjustment of generated images/videos (#17937)
|
2026-02-13 11:56:20 +08:00 |
|
BourneSun0527
|
f65c885e7c
|
Modify glm5 readme on npu (#18768)
|
2026-02-13 11:42:40 +08:00 |
|
shuwenn
|
bc2405e6c1
|
feat: support release lookup (#18450)
|
2026-02-13 10:47:02 +08:00 |
|
danielafrimi
|
e422bcaed8
|
[Mamba] Add float16 support for SSM cache dtype (#18444)
|
2026-02-12 11:27:47 +08:00 |
|
fy
|
123f57b84b
|
update glm5 readme on npu (#18657)
|
2026-02-12 10:37:12 +08:00 |
|
liupeng374
|
c34832c02c
|
glm5 md (#18655)
|
2026-02-12 10:11:59 +08:00 |
|
qianyue76
|
f06ab17a73
|
[diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
|
2026-02-11 16:55:07 -08:00 |
|
Baizhou Zhang
|
947927bdb5
|
[V3.2] Change default CP token split method to --round-robin-split (#18613)
|
2026-02-11 20:14:35 +08:00 |
|
赵晨阳
|
a2c38f7796
|
Enhance SMG guide with RL rollout systems benefits (#18588)
|
2026-02-10 20:20:45 -08:00 |
|
AlexZhao
|
3167bcc01c
|
[Doc] Comprehensive Guide: Navigating DP, DPA, and SMG Best Practices (#18096)
Co-authored-by: 赵海源 <zhaohaiyuan@xiaohongshu.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2026-02-10 18:31:28 -08:00 |
|
husf
|
573ff55814
|
[NPU][docs]fix bug about hyperlink for best practice for ascend npu (#18561)
|
2026-02-10 20:03:28 +03:00 |
|
Hexq0210
|
d0d387dea1
|
[NPU] update npu doc (#18474)
|
2026-02-10 16:59:13 +03:00 |
|
husf
|
99101ce30b
|
[NPU][docs] improve docs for Best Practice on Ascend NPU (#18360)
|
2026-02-10 16:52:27 +03:00 |
|
Zack Yu
|
54589a2f2d
|
docs: expand and update modelopt documentation (#18479)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-09 23:09:52 +00:00 |
|
brimon
|
ddbcfbaaab
|
feature: support bidirectional attention for Gemma-3 (#10707)
|
2026-02-09 23:17:45 +08:00 |
|
Junlin Zhou
|
14652243bd
|
[DLLM] Add JointThreshold algorithm for joint M2T and T2T decoding (#18171)
Signed-off-by: Junlin Zhou <zhoujunlin.zjl@antgroup.com>
Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>
|
2026-02-09 14:20:45 +08:00 |
|
Mohammad Miadh Angkad
|
fddef76619
|
[Doc] Fix outdated --fp4-gemm-backend documentation (#18350)
|
2026-02-07 20:42:47 +08:00 |
|
Mohammad Miadh Angkad
|
c47c2f9466
|
[Doc] Update CUDA 13 install guide to install torch first (#18404)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-02-07 18:04:37 +08:00 |
|
Hexq0210
|
e834b85ab6
|
[NPU] update npu doc (#18344)
|
2026-02-07 16:38:05 +08:00 |
|
Rishit Shivam
|
c850a8a41a
|
[Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888)
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2026-02-06 13:17:51 -08:00 |
|
amote-i
|
92b8bd6833
|
fix npu best practice (#18330)
|
2026-02-05 21:14:46 -05:00 |
|
shuwenn
|
ef1d0ea885
|
[Doc] add a summary section for spec decode document (#18323)
|
2026-02-05 16:34:31 -05:00 |
|
shuwenn
|
8b21dd4b77
|
[Doc] refine spec decode docs for SpecV2/STANDALONE/NGRAM (#18321)
|
2026-02-05 15:12:33 -05:00 |
|
Kun Lin
|
e616d35847
|
Support Markdown/Notebook-Friendly Documentation Export for Downstream Integration(convert rat files to md files and save) (#18278)
|
2026-02-04 19:59:40 -08:00 |
|