Commit Graph

6437 Commits

Author SHA1 Message Date
Mick
1a006c2a0d [diffusion] refactor: split component_loader into component-wise files (#17820) 2026-01-31 20:22:31 +08:00
Lianmin Zheng
7412ceb4eb [Auto Sync] Update linear.py to assert shapes (20260130) (#17966)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2026-01-31 01:01:55 -08:00
Lianmin Zheng
0e184609d3 Add launch_command assignment in crash dump (#17967)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Archit Patke <apatke@x.ai>
2026-01-31 01:00:40 -08:00
b8zhong
22498e10c0 [Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB (#17965) 2026-01-31 15:56:26 +08:00
Zheng Wengang
a4df95c15f [EPD][Perf] parallelize ZMQ send for encode server (#16487)
Co-authored-by: siyu <liusy58@linux.alibaba.com>
2026-01-31 14:30:11 +08:00
jeff
04efd03dbf Fix OOM in DeepSeek weight loading by deferring dict(weights) materialization (#17744) 2026-01-31 13:59:00 +08:00
Hudson Xing
c72bf50706 add reasoning_tokens usage test for tool call (#18022) 2026-01-30 21:09:23 -08:00
Mohammad Miadh Angkad
d0d9cecd1b Fix cuBLAS >=12.9 detection for cu12/cu13 package naming (#17766) 2026-01-31 12:01:52 +08:00
Xiaoyu Zhang
22aad4e2c4 [Diffusion] Fix FLUX.1-schnell time embedding argument mismatch (#17988) 2026-01-31 11:47:27 +08:00
Bi Xue
5d00150e99 [sglang] fix mm token padded value overlap with text token id (#17781) 2026-01-30 17:09:13 -08:00
JiaruiChang5268
e86476acfc [NPU] support llama-3.2-11B-vision-instruct mode for NPU (#17492)
Co-authored-by: McZyWu <zhuoyun.wu.23@ucl.ac.uk>
Co-authored-by: chenyang08056032 <chenyang08056032@163.com>
Co-authored-by: Hexq0210 <893781835@qq.com>
2026-01-31 08:49:38 +08:00
Siyuan Chen
578b119bc6 [BugFix] Fix server crashes when req.grammar and ngram spec are enabled (#17585) 2026-01-30 11:57:42 -08:00
Sam (Kesen Li)
81449b4bee Optimize GDN decode for Qwen3 Next (#17094) 2026-01-31 01:02:12 +08:00
Xiaoyu Zhang
abf13ccc11 [Diffusion] Fix lora default lora_scale bug (#17982) 2026-01-30 22:04:54 +08:00
Zheng Li
0c5a81acb8 [BUGFIX] Fix dp size > 1 for qwen3 vl model (#17624)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2026-01-30 20:44:25 +08:00
Changhun Lee
c04efe030a [Model] Add K-EXAONE model support (#16294)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: lgai-exaone <exaonemodels@lgresearch.ai>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2026-01-30 20:01:14 +08:00
Fan Yin
8ce9609fa2 fix: fix SHM pointer re-serialization in DP attention (#17930) 2026-01-30 17:03:30 +08:00
Ke Bao
77a27e728c Add cuda graph status to prefill log (#17836) 2026-01-30 16:56:53 +08:00
Haotong Zhang
c8dc543dc5 SGLang Tracing: Improve root span attributes (#17008)
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
2026-01-30 16:02:05 +08:00
jianzhao-xu
96584ab692 adapt MODELSCOPE download (#17922) 2026-01-29 23:26:54 -08:00
McZyWu
70db3398d1 [NPU] enhance accuracy for model kimi-vl-a3b-instruct (#17480)
Co-authored-by: cy <chenyang08056032@163.com>
2026-01-30 15:19:42 +08:00
jianan-gu
c35aa0238c [CPU][INT4] Add INT4 kernels for CPU (#8226)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 22:30:13 -08:00
jianan-gu
336dc4579e [CPU] Optimize Qwen3-next model on CPU (#12525)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
2026-01-29 22:03:58 -08:00
Polisetty V R K Jyothendra Varma
71e4d3b6bc [Intel GPU] fix import error to run DeepSeek-V2-Lite model with BF16 on XPU (#10858) 2026-01-29 21:53:53 -08:00
gaopengff
7541da15d2 Fix prefill latency performance drop of bench serving (#14592) 2026-01-29 21:28:17 -08:00
Polisetty V R K Jyothendra Varma
858dc80aff [Intel GPU] fix device in DeepseekScalingRotaryEmbedding to run DeepSeek-V2-Lite BF16 on XPU (#10021)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-01-29 21:21:38 -08:00
LHXuuu
0e4d9ddbd6 Fix the scenario where eh_proj is quantized in the bailing moe nextn weights (#17808)
Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>
2026-01-29 21:11:47 -08:00
Kangyan-Zhou
606ff09ef8 [Fix] Remove unused Type import in gpt_j.py (#17975)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 21:11:11 -08:00
Koushik Dutta
632c7afa8c [Fix] add block size logic for sm120 smem size (#14311) 2026-01-29 21:01:57 -08:00
Wenchen Lo
046b29be16 GPTJForCausalLM Support (#7839)
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
2026-01-29 21:00:04 -08:00
b8zhong
22df62d586 add weightless qk norm to RMSNorm interface for Llama 4 (#12813)
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
2026-01-29 19:09:55 -08:00
baonudesifeizhai
84ab611af8 model: support DeepSeek-OCR-2 (#17897) 2026-01-30 09:49:51 +08:00
StonyPort
2b3408ff14 feat: add forward timeout (#17831)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
2026-01-30 08:52:29 +08:00
Cheng Wan
6a6b36367e Fix logprob_start_len handling for prefill-only requests (#17395) 2026-01-29 15:14:43 -08:00
Kangyan-Zhou
c3bf53c7c1 Fix ci weight validation logic to check the safetensor completeness (#17917) 2026-01-29 13:00:42 -08:00
Cheng Wan
a416af4be7 Fix capture_sizes range for pcg (#17956) 2026-01-29 12:46:35 -08:00
EduardDurech
1b6798a6a4 Fix torch.__version__ for PEP440 (#15682) 2026-01-29 11:55:13 -08:00
Hudson Xing
d417c6809e Add tool call tests for DeepSeek V3.2 in nightly CI (#17951) 2026-01-29 09:50:54 -08:00
Ratish P
88fb927cc9 [diffusion]: add dummy device attribute to fix AttributeError (#17949) 2026-01-29 09:35:12 -08:00
Shivam jindal
0769de9b0f Support LightOnOCR-2-1B (#17806) 2026-01-29 23:03:41 +08:00
Ziang Li
3c9cc44ff5 Add mxfp8 support for online quantization, Triton dense linear, and CUTLASS MoE (#17449) 2026-01-29 21:33:57 +08:00
Yuhao Yang
3c2f4c7bbe [diffusion] model: sync with upstream z-Image (#17822) 2026-01-29 21:10:11 +08:00
RoyWang
30adf78f82 [diffusion]: align sglang diffusion AMD pyproject_other.toml diffusion dependency with pyproject.toml (#16225)
Co-authored-by: roywang <roywang@amd.com>
2026-01-29 01:50:57 -08:00
kk
ef1c512754 Add aiter bias moe support in gpt-oss mxfp4 model (#17735)
Co-authored-by: wunhuang <wunhuang@amd.com>
2026-01-29 01:50:11 -08:00
triple-mu
319f6886fe [diffusion] model: move tp_rmsnorm check to WanTransformerBlock (#17792) 2026-01-29 16:39:00 +08:00
Zhang Yiyang (SII)
cdedbf1486 [diffusion] fix: resolve library mismatch in scheduler and update dit offload method name (#17916) 2026-01-29 15:54:36 +08:00
22dimensions
7b79326751 [NPU] support GPTQ quantization on npu (#15203)
Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2026-01-29 15:48:18 +08:00
Niko Ma
cbf90d70ff [PD] Support KV transfer with MORI-IO (#14626)
Co-authored-by: cwortman-amd <cwortman@amd.com>
2026-01-28 23:22:41 -08:00
R0CKSTAR
d3cdee0a04 [MUSA][4/N] Add common device utilities, distributed backend, and custom op wiring (#17246)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2026-01-28 23:13:24 -08:00
Xinyuan Tong
9409c43593 Fix flaky tool calls in the Kimi K2.5 model (#17914)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2026-01-28 20:58:16 -08:00