Commit Graph

101 Commits

Author SHA1 Message Date
RAY
d85884ca57 Update ascend_npu_qwen3_5_examples.md (#18888) 2026-02-16 10:24:27 +03:00
chenxu214
fd5a45d5cf Update ascend_npu_support.rst (#18868) 2026-02-16 01:41:38 +08:00
chenxu214
f2d72866e9 Create ascend_npu_qwen3_5_examples.md (#18864) 2026-02-16 01:15:20 +08:00
BourneSun0527
f65c885e7c Modify glm5 readme on npu (#18768) 2026-02-13 11:42:40 +08:00
fy
123f57b84b update glm5 readme on npu (#18657) 2026-02-12 10:37:12 +08:00
liupeng374
c34832c02c glm5 md (#18655) 2026-02-12 10:11:59 +08:00
husf
573ff55814 [NPU][docs]fix bug about hyperlink for best practice for ascend npu (#18561) 2026-02-10 20:03:28 +03:00
Hexq0210
d0d387dea1 [NPU] update npu doc (#18474) 2026-02-10 16:59:13 +03:00
husf
99101ce30b [NPU][docs] improve docs for Best Practice on Ascend NPU (#18360) 2026-02-10 16:52:27 +03:00
Hexq0210
e834b85ab6 [NPU] update npu doc (#18344) 2026-02-07 16:38:05 +08:00
Rishit Shivam
c850a8a41a [Docs] Add Falcon H1, Hunyuan-Large, Qwen3-Omni support and update Diffusion usage (#17888)
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2026-02-06 13:17:51 -08:00
amote-i
92b8bd6833 fix npu best practice (#18330) 2026-02-05 21:14:46 -05:00
rinbaro
de6a03260f [docs] fix misspellings & typos (#18276) 2026-02-05 03:35:29 +00:00
Zaili Wang
97593c9f41 [CPU] toml file update (#17861) 2026-01-31 13:16:06 -08:00
amote-i
9c2a468e2c update ascend docs (#17987) 2026-01-30 18:03:55 +08:00
RoyWang
30adf78f82 [diffusion]: align sglang diffusion AMD pyproject_other.toml diffusion dependency with pyproject.toml (#16225)
Co-authored-by: roywang <roywang@amd.com>
2026-01-29 01:50:57 -08:00
amote-i
1b22f2ee1c update ascend docs (#17741) 2026-01-29 11:43:48 +08:00
Артем Савкин
b77b0ffd60 [NPU] NZ for non-quantized MOE, Qwen3 MOE double memory consumption fix (#15904)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 00:55:08 +08:00
Makcum888e
bba6e38ff8 [NPU] Split pyproject npu from pyproject other (#17641) 2026-01-26 09:45:44 -08:00
R0CKSTAR
628ab5d57b [MUSA][2/N] sgl-kernel build (#17053)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2026-01-23 14:41:47 -08:00
Hexq0210
69470dbc1f [NPU] update doc for Ascend NPU (#17621) 2026-01-23 15:21:00 +08:00
amote-i
0fec8820d1 update dependence docs of npu (#17573) 2026-01-23 09:13:36 +08:00
Zaili Wang
672eb37534 [CPU][Fix CI] Solidate torch version for sgl-kernel-cpu and fix device orientation error (#17460) 2026-01-22 14:04:50 +08:00
amote-i
0a9099e137 update ascend docs (#17457) 2026-01-21 15:36:26 +08:00
khalilzhk
aca354bcb3 [NPU] remove features supported on Ascend NPU (#17455) 2026-01-21 11:00:04 +08:00
zijiexia
4ecd9afde9 [Docs] Rename SGLang Router to SGLang Model Gateway (#17436) 2026-01-20 12:31:10 -08:00
amote-i
603f386c6b update docs of Ascend plateform (#17358) 2026-01-20 19:31:23 +08:00
Hexq0210
1b192cf198 [NPU] Update NPU doc for model and features supported (#17385) 2026-01-20 12:28:06 +08:00
R0CKSTAR
a1dd3d48ac [diffusion] hardware: support diffusion (single GPU, 3/N) (#17105)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2026-01-16 17:01:09 +08:00
shuwenn
9227d9f60c [Docs] sort and update server_arguments.md (#17163) 2026-01-15 12:07:18 -05:00
Hexq0210
6586f44ad4 [NPU] Add Ascend NPU best practice in doc (#17103) 2026-01-15 15:21:45 +08:00
Артем Савкин
424a380077 [NPU] NPU quantization refactoring & more quantization formats support (#14504)
Co-authored-by: TamirBaydasov <mr.jeijy@gmail.com>
Co-authored-by: Tamir Baydasov <41994229+TamirBaydasov@users.noreply.github.com>
Co-authored-by: Савкин Артем <savkinartem@MacBook-Air-Viktoria.local>
Co-authored-by: Edward Shogulin <edward.shogulin@gmail.com>
2026-01-15 04:25:15 +08:00
Hubert Lu
8716589826 [AMD][Diffusion] support timestep embedding kernel for AMD GPUs (#16766) 2026-01-12 22:17:07 -08:00
James
ae0baefb94 [NPU] upgrade npu mf_apater plugin (#15853) 2026-01-13 09:02:10 +08:00
Hexq0210
c581b5ed79 [NPU] update feature supported on ascend NPU (#16915) 2026-01-12 11:47:58 +08:00
Ratish P
c0248d6f37 [dpc]: unify DP controller load balancing and simplify dispatch logic (#16258) 2026-01-11 12:38:03 +08:00
Hexq0210
20ca2c6e1e [NPU] update model and features supported (#16733) 2026-01-08 21:50:42 +08:00
Hexq0210
5e867f60cf [NPU] Update model and features supported (#16652) 2026-01-08 09:13:30 +08:00
Even Zhou
d4b717c01e [NPU] update docs (#16651) 2026-01-07 20:20:01 +08:00
Hexq0210
4c9ac8566c [NPU] fix command in npu best practice (#16576) 2026-01-07 09:37:27 +08:00
Hexq0210
abb06be946 [NPU] Fixed link redirection issue. (#16475) 2026-01-05 19:03:29 +08:00
Hexq0210
bf32cd8397 [npu] update model and feature supported for ascend npu (#16390) 2026-01-04 21:50:20 +08:00
Huaixin Chang
c1dfbc777b deprecate prefill-round-robin-balance (#16195)
Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-12-31 22:25:33 +08:00
Hexq0210
386e541520 Update document for Ascend NPU (#16214) 2025-12-31 19:03:53 +08:00
husf
7f9a3d0609 [docs][NPU]Update model and feature docs support (#16124) 2025-12-30 20:05:40 +08:00
Hexq0210
f784cbfa92 Update model and feature support for Ascend NPU (#16003) 2025-12-29 15:16:25 +08:00
Liwansi
30da2f0598 [NPU][eagle3] support qwen eagle3 on NPU (#14820) 2025-12-16 02:25:13 +08:00
Brian
a7a4b1755d [Doc][TPU]add sglang-jax tpu docs (#15056) 2025-12-14 09:29:41 +08:00
sglang-bot
5c8bd8b51b chore: bump SGLang version to 0.5.6.post2 (#14858)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-12-11 12:29:52 -08:00
Even Zhou
60d36e7be7 [NPU] chore: bump basic software version to 8.3.rc2 (#14614) 2025-12-09 09:14:27 +08:00