amote-i
|
301604f953
|
[NPU] [DOC] Quick start doc for Ascend NPU (#23238)
|
2026-04-21 11:19:09 +08:00 |
|
shuwenn
|
b65799cf83
|
[SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2026-04-20 14:25:04 -07:00 |
|
Baidu-AIAK
|
7ca3566130
|
Multi platform Plugin (#21388)
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com>
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
Co-authored-by: Alex Nails <alexj.nails@gmail.com>
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 17:23:51 -07:00 |
|
amote-i
|
ea20f1baa4
|
[NPU] [DOC] Update npu best practice docs to match latest code (#23077)
|
2026-04-18 14:17:00 +08:00 |
|
Mick
|
0d94c3366a
|
[diffusion] feat: introduce ltx-2-two-stage device manager (#22869)
|
2026-04-18 11:04:33 +08:00 |
|
Xiaoyu Zhang
|
615d6c93b2
|
[codex] Add flashinfer TRTLLM backend for diffusion NVFP4 (#22717)
|
2026-04-18 09:06:28 +08:00 |
|
Lianmin Zheng
|
44e67c6835
|
Remove deprecated double sparsity feature (#23009)
|
2026-04-17 13:33:12 -07:00 |
|
Mick
|
0b2058853d
|
[diffusion] doc: update doc (#23052)
|
2026-04-17 16:23:46 +08:00 |
|
Duyi-Wang
|
8c190f6b91
|
[AMD] Add SGLANG_MORI_MOE_MAX_INPUT_TOKENS to truncate dispatch before MoE. (#22952)
|
2026-04-16 23:40:15 -07:00 |
|
xdtbynd
|
53f87c463d
|
[Docs] [npu] change the feature support status (#23041)
|
2026-04-17 14:34:54 +08:00 |
|
Jan Bernlöhr
|
04a53955b9
|
feat: add coordinated checkpoint prefetch for network filesystem loading (#20843)
|
2026-04-16 20:08:19 -07:00 |
|
Liangsheng Yin
|
db7a751d48
|
refactor: extract FanOutCommunicator and use declarative spec table (#22967)
|
2026-04-16 15:37:19 -07:00 |
|
ybyang
|
41258f874d
|
[PD]feat(bench): add --fake-prefill flag for decode-only stress testing (#22973)
|
2026-04-16 13:57:55 -07:00 |
|
Zaire
|
71377deda7
|
[Docs] fix profiling endpoint (#22982)
Signed-off-by: Zaire404 <3147879462@qq.com>
|
2026-04-16 12:51:39 -04:00 |
|
Yuhao Yang
|
9da998a882
|
[diffusion] feat: disaggregated diffusion (#21701)
|
2026-04-16 23:51:32 +08:00 |
|
amote-i
|
78147306b7
|
[NPU] [DOC] Update npu best practice docs to match latest code (#22975)
|
2026-04-16 20:45:22 +08:00 |
|
hhwxw
|
2480cc2a16
|
docs: fix incorrect default max-payload-size in gateway config reference (#22923)
|
2026-04-16 13:25:27 +08:00 |
|
Xiaoyu Zhang
|
695ab705cb
|
[diffusion] quant: update modelopt quantization docs and CI coverage (#22772)
|
2026-04-15 21:30:28 +08:00 |
|
jianzhao-xu
|
45a83ffbe3
|
[NPU] Offloading docs update (#22860)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
|
2026-04-15 15:04:41 +08:00 |
|
Po-Han Huang (NVIDIA)
|
ada52e5972
|
[Docs] Move ptxas sm_103a workaround into For CUDA 13 section (#22852)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-14 22:30:21 -07:00 |
|
chx96642264
|
680bd4b429
|
[NPU] Modify the parameter name and optional values, and add the parameter restrictions. Modify some parameters supported type. (#22804)
|
2026-04-14 21:34:07 +08:00 |
|
McZyWu
|
1588856e9b
|
[NPU] qwen3next low latency best practice docs. (#22808)
Co-authored-by: root <root@localhost.localdomain>
|
2026-04-14 21:21:37 +08:00 |
|
amote-i
|
ddc7daaf89
|
[NPU] [DOC] Update NPU docs to match latest code (#22796)
|
2026-04-14 21:10:28 +08:00 |
|
loading66
|
074c2a476d
|
fix:[NPU]correct the full name of then Kimi model (#22799)
|
2026-04-14 20:15:22 +08:00 |
|
jianzhao-xu
|
68dfffaaa3
|
Offloading docs update (#22795)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
|
2026-04-14 20:03:29 +08:00 |
|
xdtbynd
|
88253c39b0
|
[Docs] Fix formatting of tool-call-parser options (#22793)
|
2026-04-14 19:21:31 +08:00 |
|
amote-i
|
368cdfbe2f
|
[NPU] [DOC] Fix outdated descriptions in the NPU documentation (#22707)
|
2026-04-14 19:21:15 +08:00 |
|
Xiaoyu Zhang
|
f97c608caa
|
[diffusion] quant: add FLUX.1-dev modelopt nvfp4 support (#22672)
|
2026-04-14 15:00:59 +08:00 |
|
看海的人
|
13a4aafdbe
|
[NPU] update glm5 running guide (#22712)
|
2026-04-13 22:53:24 +08:00 |
|
chx96642264
|
c6403a11cb
|
Modify the optional values and constraints of parameter. (#22705)
|
2026-04-13 22:50:48 +08:00 |
|
jianzhao-xu
|
b6a91b1afe
|
[NPU] --attn-cp-size --init-expert-location --eplb-algorithm parameter docs update (#22704)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
|
2026-04-13 22:42:34 +08:00 |
|
Liwansi
|
8d904e50f2
|
[NPU]qwen3-8b and 32b md bugfix (#22687)
|
2026-04-13 22:20:17 +08:00 |
|
loading66
|
2089ac86a7
|
Improve parameters usage constraints for npu deployment (#22700)
Co-authored-by: h30064329 <hanbing45@h-partners.com>
|
2026-04-13 22:02:56 +08:00 |
|
看海的人
|
56c97c7738
|
[NPU] update npu doc (#22697)
Co-authored-by: zhsurpass <zhsurpass@users.noreply.github.com>
|
2026-04-13 21:55:38 +08:00 |
|
xdtbynd
|
d01b2bf257
|
[Docs] Fix default values and options in Ascend server arguments documentation (#22698)
Co-authored-by: xdtbynd <supercluster@vip.qq.com>
|
2026-04-13 21:22:37 +08:00 |
|
Polisetty V R K Jyothendra Varma
|
7d2c11970c
|
[Intel GPU] Upgrade pytorch xpu version to 2.11 (#21908)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-13 13:16:24 +08:00 |
|
Mick
|
bf022e177c
|
Revert "[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support (#22574)" (#22649)
|
2026-04-13 11:17:32 +08:00 |
|
Xiaoyu Zhang
|
37fc47c645
|
diffusion: fix layerwise offload for ModelOpt quantized DiTs (#22594)
|
2026-04-13 08:01:54 +08:00 |
|
Xiaoyu Zhang
|
03a1a7b81c
|
[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support (#22574)
|
2026-04-13 07:57:41 +08:00 |
|
Mick
|
495ef8ec64
|
[diffusion] model: support LTX2.3 two stage (#22182)
|
2026-04-12 22:15:57 +08:00 |
|
Mohammad Miadh Angkad
|
bcc0c65aa8
|
[DSA] Hopper FP8 FlashMLA KV padding (#22372)
|
2026-04-12 02:19:17 -07:00 |
|
Wenyao Gao
|
4dfc8e1c3f
|
VLM: support passing --mm-process-config for all models (#18467)
|
2026-04-12 17:08:05 +08:00 |
|
Baizhou Zhang
|
d14d368191
|
[Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467)
|
2026-04-11 01:59:57 -07:00 |
|
heziiop
|
4f45472f34
|
[NPU][Doc] add qwen3-30b-a3b low latency example (#22446)
|
2026-04-11 15:52:47 +08:00 |
|
cctry
|
f855a0bde6
|
Introduce CUDA graph debug mode with breakable CUDA graph (#19102)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Cheng Wan <chwan@rice.edu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-11 00:36:56 -07:00 |
|
Xiaoyu Zhang
|
1ff51555f2
|
[Diffusion] modelopt diffusion fp8 support for flux1/flux2 and wan2.2 (#22365)
|
2026-04-10 20:56:57 +08:00 |
|
Zhangheng
|
5ba7d4e523
|
[HiSparse]: Update HiSparse's user-guide (#22499)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
|
2026-04-10 15:06:43 +08:00 |
|
billishyahao
|
1df9f4e2f6
|
[AMD] Add prealloc token env for mori-ep (#22329)
|
2026-04-09 09:34:35 -07:00 |
|
amote-i
|
7965573eb4
|
fix issues for npu docs (#22307)
|
2026-04-09 16:27:34 +08:00 |
|
Liwansi
|
8ec0934f8f
|
[NPU]add Qwen3-32b and Qwen3-8b low latency md (#22429)
|
2026-04-09 16:18:34 +08:00 |
|