Mick
|
1a006c2a0d
|
[diffusion] refactor: split component_loader into component-wise files (#17820)
|
2026-01-31 20:22:31 +08:00 |
|
Lianmin Zheng
|
7412ceb4eb
|
[Auto Sync] Update linear.py to assert shapes (20260130) (#17966)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2026-01-31 01:01:55 -08:00 |
|
Lianmin Zheng
|
0e184609d3
|
Add launch_command assignment in crash dump (#17967)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Archit Patke <apatke@x.ai>
|
2026-01-31 01:00:40 -08:00 |
|
b8zhong
|
22498e10c0
|
[Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB (#17965)
|
2026-01-31 15:56:26 +08:00 |
|
Zheng Wengang
|
a4df95c15f
|
[EPD][Perf] parallelize ZMQ send for encode server (#16487)
Co-authored-by: siyu <liusy58@linux.alibaba.com>
|
2026-01-31 14:30:11 +08:00 |
|
jeff
|
04efd03dbf
|
Fix OOM in DeepSeek weight loading by deferring dict(weights) materialization (#17744)
|
2026-01-31 13:59:00 +08:00 |
|
Hudson Xing
|
c72bf50706
|
add reasoning_tokens usage test for tool call (#18022)
|
2026-01-30 21:09:23 -08:00 |
|
Mohammad Miadh Angkad
|
d0d9cecd1b
|
Fix cuBLAS >=12.9 detection for cu12/cu13 package naming (#17766)
|
2026-01-31 12:01:52 +08:00 |
|
Xiaoyu Zhang
|
22aad4e2c4
|
[Diffusion] Fix FLUX.1-schnell time embedding argument mismatch (#17988)
|
2026-01-31 11:47:27 +08:00 |
|
Bi Xue
|
5d00150e99
|
[sglang] fix mm token padded value overlap with text token id (#17781)
|
2026-01-30 17:09:13 -08:00 |
|
JiaruiChang5268
|
e86476acfc
|
[NPU] support llama-3.2-11B-vision-instruct mode for NPU (#17492)
Co-authored-by: McZyWu <zhuoyun.wu.23@ucl.ac.uk>
Co-authored-by: chenyang08056032 <chenyang08056032@163.com>
Co-authored-by: Hexq0210 <893781835@qq.com>
|
2026-01-31 08:49:38 +08:00 |
|
Siyuan Chen
|
578b119bc6
|
[BugFix] Fix server crashes when req.grammar and ngram spec are enabled (#17585)
|
2026-01-30 11:57:42 -08:00 |
|
Sam (Kesen Li)
|
81449b4bee
|
Optimize GDN decode for Qwen3 Next (#17094)
|
2026-01-31 01:02:12 +08:00 |
|
Xiaoyu Zhang
|
abf13ccc11
|
[Diffusion] Fix lora default lora_scale bug (#17982)
|
2026-01-30 22:04:54 +08:00 |
|
Zheng Li
|
0c5a81acb8
|
[BUGFIX] Fix dp size > 1 for qwen3 vl model (#17624)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2026-01-30 20:44:25 +08:00 |
|
Changhun Lee
|
c04efe030a
|
[Model] Add K-EXAONE model support (#16294)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: lgai-exaone <exaonemodels@lgresearch.ai>
Co-authored-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-01-30 20:01:14 +08:00 |
|
Fan Yin
|
8ce9609fa2
|
fix: fix SHM pointer re-serialization in DP attention (#17930)
|
2026-01-30 17:03:30 +08:00 |
|
Ke Bao
|
77a27e728c
|
Add cuda graph status to prefill log (#17836)
|
2026-01-30 16:56:53 +08:00 |
|
Haotong Zhang
|
c8dc543dc5
|
SGLang Tracing: Improve root span attributes (#17008)
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
|
2026-01-30 16:02:05 +08:00 |
|
jianzhao-xu
|
96584ab692
|
adapt MODELSCOPE download (#17922)
|
2026-01-29 23:26:54 -08:00 |
|
McZyWu
|
70db3398d1
|
[NPU] enhance accuracy for model kimi-vl-a3b-instruct (#17480)
Co-authored-by: cy <chenyang08056032@163.com>
|
2026-01-30 15:19:42 +08:00 |
|
jianan-gu
|
c35aa0238c
|
[CPU][INT4] Add INT4 kernels for CPU (#8226)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 22:30:13 -08:00 |
|
jianan-gu
|
336dc4579e
|
[CPU] Optimize Qwen3-next model on CPU (#12525)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
|
2026-01-29 22:03:58 -08:00 |
|
Polisetty V R K Jyothendra Varma
|
71e4d3b6bc
|
[Intel GPU] fix import error to run DeepSeek-V2-Lite model with BF16 on XPU (#10858)
|
2026-01-29 21:53:53 -08:00 |
|
gaopengff
|
7541da15d2
|
Fix prefill latency performance drop of bench serving (#14592)
|
2026-01-29 21:28:17 -08:00 |
|
Polisetty V R K Jyothendra Varma
|
858dc80aff
|
[Intel GPU] fix device in DeepseekScalingRotaryEmbedding to run DeepSeek-V2-Lite BF16 on XPU (#10021)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-01-29 21:21:38 -08:00 |
|
LHXuuu
|
0e4d9ddbd6
|
Fix the scenario where eh_proj is quantized in the bailing moe nextn weights (#17808)
Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>
|
2026-01-29 21:11:47 -08:00 |
|
Kangyan-Zhou
|
606ff09ef8
|
[Fix] Remove unused Type import in gpt_j.py (#17975)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-29 21:11:11 -08:00 |
|
Koushik Dutta
|
632c7afa8c
|
[Fix] add block size logic for sm120 smem size (#14311)
|
2026-01-29 21:01:57 -08:00 |
|
Wenchen Lo
|
046b29be16
|
GPTJForCausalLM Support (#7839)
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
|
2026-01-29 21:00:04 -08:00 |
|
b8zhong
|
22df62d586
|
add weightless qk norm to RMSNorm interface for Llama 4 (#12813)
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
|
2026-01-29 19:09:55 -08:00 |
|
baonudesifeizhai
|
84ab611af8
|
model: support DeepSeek-OCR-2 (#17897)
|
2026-01-30 09:49:51 +08:00 |
|
StonyPort
|
2b3408ff14
|
feat: add forward timeout (#17831)
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
|
2026-01-30 08:52:29 +08:00 |
|
Cheng Wan
|
6a6b36367e
|
Fix logprob_start_len handling for prefill-only requests (#17395)
|
2026-01-29 15:14:43 -08:00 |
|
Kangyan-Zhou
|
c3bf53c7c1
|
Fix ci weight validation logic to check the safetensor completeness (#17917)
|
2026-01-29 13:00:42 -08:00 |
|
Cheng Wan
|
a416af4be7
|
Fix capture_sizes range for pcg (#17956)
|
2026-01-29 12:46:35 -08:00 |
|
EduardDurech
|
1b6798a6a4
|
Fix torch.__version__ for PEP440 (#15682)
|
2026-01-29 11:55:13 -08:00 |
|
Hudson Xing
|
d417c6809e
|
Add tool call tests for DeepSeek V3.2 in nightly CI (#17951)
|
2026-01-29 09:50:54 -08:00 |
|
Ratish P
|
88fb927cc9
|
[diffusion]: add dummy device attribute to fix AttributeError (#17949)
|
2026-01-29 09:35:12 -08:00 |
|
Shivam jindal
|
0769de9b0f
|
Support LightOnOCR-2-1B (#17806)
|
2026-01-29 23:03:41 +08:00 |
|
Ziang Li
|
3c9cc44ff5
|
Add mxfp8 support for online quantization, Triton dense linear, and CUTLASS MoE (#17449)
|
2026-01-29 21:33:57 +08:00 |
|
Yuhao Yang
|
3c2f4c7bbe
|
[diffusion] model: sync with upstream z-Image (#17822)
|
2026-01-29 21:10:11 +08:00 |
|
RoyWang
|
30adf78f82
|
[diffusion]: align sglang diffusion AMD pyproject_other.toml diffusion dependency with pyproject.toml (#16225)
Co-authored-by: roywang <roywang@amd.com>
|
2026-01-29 01:50:57 -08:00 |
|
kk
|
ef1c512754
|
Add aiter bias moe support in gpt-oss mxfp4 model (#17735)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2026-01-29 01:50:11 -08:00 |
|
triple-mu
|
319f6886fe
|
[diffusion] model: move tp_rmsnorm check to WanTransformerBlock (#17792)
|
2026-01-29 16:39:00 +08:00 |
|
Zhang Yiyang (SII)
|
cdedbf1486
|
[diffusion] fix: resolve library mismatch in scheduler and update dit offload method name (#17916)
|
2026-01-29 15:54:36 +08:00 |
|
22dimensions
|
7b79326751
|
[NPU] support GPTQ quantization on npu (#15203)
Signed-off-by: 22dimensions <waitingwind@foxmail.com>
|
2026-01-29 15:48:18 +08:00 |
|
Niko Ma
|
cbf90d70ff
|
[PD] Support KV transfer with MORI-IO (#14626)
Co-authored-by: cwortman-amd <cwortman@amd.com>
|
2026-01-28 23:22:41 -08:00 |
|
R0CKSTAR
|
d3cdee0a04
|
[MUSA][4/N] Add common device utilities, distributed backend, and custom op wiring (#17246)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-01-28 23:13:24 -08:00 |
|
Xinyuan Tong
|
9409c43593
|
Fix flaky tool calls in the Kimi K2.5 model (#17914)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-01-28 20:58:16 -08:00 |
|