amote-i
|
301604f953
|
[NPU] [DOC] Quick start doc for Ascend NPU (#23238)
|
2026-04-21 11:19:09 +08:00 |
|
Baidu-AIAK
|
7ca3566130
|
Multi platform Plugin (#21388)
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com>
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
Co-authored-by: Alex Nails <alexj.nails@gmail.com>
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 17:23:51 -07:00 |
|
amote-i
|
ea20f1baa4
|
[NPU] [DOC] Update npu best practice docs to match latest code (#23077)
|
2026-04-18 14:17:00 +08:00 |
|
Lianmin Zheng
|
44e67c6835
|
Remove deprecated double sparsity feature (#23009)
|
2026-04-17 13:33:12 -07:00 |
|
xdtbynd
|
53f87c463d
|
[Docs] [npu] change the feature support status (#23041)
|
2026-04-17 14:34:54 +08:00 |
|
amote-i
|
78147306b7
|
[NPU] [DOC] Update npu best practice docs to match latest code (#22975)
|
2026-04-16 20:45:22 +08:00 |
|
jianzhao-xu
|
45a83ffbe3
|
[NPU] Offloading docs update (#22860)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
|
2026-04-15 15:04:41 +08:00 |
|
chx96642264
|
680bd4b429
|
[NPU] Modify the parameter name and optional values, and add the parameter restrictions. Modify some parameters supported type. (#22804)
|
2026-04-14 21:34:07 +08:00 |
|
McZyWu
|
1588856e9b
|
[NPU] qwen3next low latency best practice docs. (#22808)
Co-authored-by: root <root@localhost.localdomain>
|
2026-04-14 21:21:37 +08:00 |
|
amote-i
|
ddc7daaf89
|
[NPU] [DOC] Update NPU docs to match latest code (#22796)
|
2026-04-14 21:10:28 +08:00 |
|
loading66
|
074c2a476d
|
fix:[NPU]correct the full name of then Kimi model (#22799)
|
2026-04-14 20:15:22 +08:00 |
|
jianzhao-xu
|
68dfffaaa3
|
Offloading docs update (#22795)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
|
2026-04-14 20:03:29 +08:00 |
|
xdtbynd
|
88253c39b0
|
[Docs] Fix formatting of tool-call-parser options (#22793)
|
2026-04-14 19:21:31 +08:00 |
|
amote-i
|
368cdfbe2f
|
[NPU] [DOC] Fix outdated descriptions in the NPU documentation (#22707)
|
2026-04-14 19:21:15 +08:00 |
|
看海的人
|
13a4aafdbe
|
[NPU] update glm5 running guide (#22712)
|
2026-04-13 22:53:24 +08:00 |
|
chx96642264
|
c6403a11cb
|
Modify the optional values and constraints of parameter. (#22705)
|
2026-04-13 22:50:48 +08:00 |
|
jianzhao-xu
|
b6a91b1afe
|
[NPU] --attn-cp-size --init-expert-location --eplb-algorithm parameter docs update (#22704)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
|
2026-04-13 22:42:34 +08:00 |
|
Liwansi
|
8d904e50f2
|
[NPU]qwen3-8b and 32b md bugfix (#22687)
|
2026-04-13 22:20:17 +08:00 |
|
loading66
|
2089ac86a7
|
Improve parameters usage constraints for npu deployment (#22700)
Co-authored-by: h30064329 <hanbing45@h-partners.com>
|
2026-04-13 22:02:56 +08:00 |
|
看海的人
|
56c97c7738
|
[NPU] update npu doc (#22697)
Co-authored-by: zhsurpass <zhsurpass@users.noreply.github.com>
|
2026-04-13 21:55:38 +08:00 |
|
xdtbynd
|
d01b2bf257
|
[Docs] Fix default values and options in Ascend server arguments documentation (#22698)
Co-authored-by: xdtbynd <supercluster@vip.qq.com>
|
2026-04-13 21:22:37 +08:00 |
|
Polisetty V R K Jyothendra Varma
|
7d2c11970c
|
[Intel GPU] Upgrade pytorch xpu version to 2.11 (#21908)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-13 13:16:24 +08:00 |
|
heziiop
|
4f45472f34
|
[NPU][Doc] add qwen3-30b-a3b low latency example (#22446)
|
2026-04-11 15:52:47 +08:00 |
|
amote-i
|
7965573eb4
|
fix issues for npu docs (#22307)
|
2026-04-09 16:27:34 +08:00 |
|
Liwansi
|
8ec0934f8f
|
[NPU]add Qwen3-32b and Qwen3-8b low latency md (#22429)
|
2026-04-09 16:18:34 +08:00 |
|
amote-i
|
81efcc353a
|
[NPU] Optimized the wording in the npu docs (#21998)
|
2026-04-03 11:51:40 +08:00 |
|
yuefeng Wu
|
c9f5d1d502
|
[Diffusion][NPU] add ring sp performance benchmark page in npu (#21811)
|
2026-04-01 18:53:10 +03:00 |
|
amote-i
|
80b1bc5f56
|
[NPU] update ascend docs (#21807)
|
2026-04-01 17:14:26 +08:00 |
|
Michelle Wu
|
965f03cdc2
|
[NPU] Update DeepSeek-V3.2 model deployment instructions in documentation (#21468)
Co-authored-by: wuxue (C) <w00964934@china.huawei.com>
|
2026-03-30 15:51:42 +08:00 |
|
Артем Савкин
|
27071e0a43
|
[NPU] Update quantization&CI documentation (#21100)
Co-authored-by: Tamir Baydasov <41994229+TamirBaydasov@users.noreply.github.com>
|
2026-03-28 21:42:21 +03:00 |
|
amote-i
|
2d583799eb
|
Update ascend docs (#20846)
|
2026-03-25 09:58:44 +03:00 |
|
Jiaxin(Jackson) Deng
|
c4db64c16b
|
Add Lychee Doc Links Check to Local and CI (#19742)
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
|
2026-03-24 13:48:26 -07:00 |
|
kpham-sgl
|
bc4aaab6a1
|
[Spec][Ngram] 2/N: Rename branch length to max trie depth (#21181)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-22 23:35:25 -07:00 |
|
kpham-sgl
|
6d160b42bb
|
[Spec][Ngram] 1/N: Reference based Speculative Decoding refactor (#20393)
|
2026-03-22 00:55:10 -07:00 |
|
Cao E
|
274581fb77
|
Add support for more batch sizes in cpu_graph_runner (#13881)
|
2026-03-19 09:50:56 -07:00 |
|
blzheng
|
cbea9f6909
|
[CPU] improve numa memory binding (#19666)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-18 22:15:50 -07:00 |
|
amote-i
|
210d0fbaef
|
Update ascend docs (#20674)
|
2026-03-16 20:14:26 -07:00 |
|
Xiaoyu Zhang
|
15097c5c3b
|
Release sglang kernel 0.4.0 (#20440)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-16 20:34:58 +08:00 |
|
amote-i
|
da1793f63a
|
update ascend feature docs (#20506)
|
2026-03-15 20:09:20 -07:00 |
|
Mook
|
23c191afb6
|
fix(docs): correct quantization documentation (#20301) (#20619)
|
2026-03-15 12:33:12 -04:00 |
|
Liangsheng Yin
|
fc7f9c1de7
|
Rename --stream-output to --incremental-streaming-output (#20614)
|
2026-03-14 23:22:33 -07:00 |
|
Артем Савкин
|
ed42af99a9
|
[NPU] [Quantization] w4a4 MoE layer support (#18924)
|
2026-03-11 16:52:35 +03:00 |
|
Polisetty V R K Jyothendra Varma
|
b2dd104ade
|
[Intel GPU] Upgrade pytorch xpu version to 2.10 (#20254)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
|
2026-03-10 18:47:25 -07:00 |
|
R0CKSTAR
|
db97f193b7
|
[diffusion][llm] macOS support (#19549)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-10 13:11:07 -07:00 |
|
Артем Савкин
|
5297b02c88
|
[Diffusion] [NPU] Wan2.2-T2V-A14B-Diffusers modelslim quantization support (#17996)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-03-07 17:26:44 +03:00 |
|
Bruce Changlong Xu
|
feda2b11c4
|
[AMD] Add AWQ AMD CI coverage and quantization platform compatibility docs (#19550)
|
2026-03-04 19:50:55 -08:00 |
|
Mohammad Miadh Angkad
|
1b76eb9361
|
[Doc] Update version references and add automation (#18409)
|
2026-03-04 09:51:46 -08:00 |
|
amote-i
|
e33e833d11
|
update model names (#19870)
|
2026-03-04 14:27:37 +03:00 |
|
Brayden Zhong
|
e2af840c3d
|
Various SM120 improvements (#19721)
|
2026-03-03 16:46:13 -08:00 |
|
Hexq0210
|
56891e46bc
|
[Ascend ] Add qwen3.5 122B/35B/27B deployment examples on doc (#19339)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-25 21:09:22 +08:00 |
|