Commit Graph

6437 Commits

Author SHA1 Message Date
Lianmin Zheng
9815ee934c [Auto Sync] Update weight_utils.py (20260212) (#18692)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dan Zheng <dzheng@x.ai>
2026-02-12 16:26:05 -08:00
Hao Jin
b48fe1d95e [Diffusion] [BUG] Fix missing initialization of GLM-Image text encoder config (#18704)
Co-authored-by: Hao Jin <Hao Jin>
2026-02-12 16:19:22 -08:00
Shangming Cai
2a8a48c0ca Reuse initialized transfer engine in mooncake store (#18460)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-02-13 01:21:35 +08:00
Yi Zhang
b168723424 [BUGFIX] fix bug in handle mamba radix cache in server_args (#18723) 2026-02-12 21:33:32 +08:00
Simo Lin
92c5749f41 refactor: replace local proto compilation with smg-grpc-proto package (#18682) 2026-02-12 05:29:24 -08:00
Scott Lee
c59b9223e6 Add spec_accept_histogram request statistic (#18332) 2026-02-12 21:09:21 +08:00
Hudson Xing
f3656432c7 add tool_choice=auto nightly test case (#18302) 2026-02-12 19:28:05 +08:00
Thomas Wang
e20e6c28b9 [AMD] Fix accuracy issue when running TP4 dsv3 model with mtp (#18607)
Co-authored-by: YC Tseng <yctseng@amd.com>
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
2026-02-12 01:13:16 -08:00
chenxu214
1edc69be08 [Ascend]Support qwen3.5 (#18544)
This PR affects only the NPU. If any issues arise, please contact iforgetmyname.
2026-02-12 15:22:47 +08:00
Vinh H. Pham
feaa9e7e00 [diffusion] fix: replace TextEncoderConfig with Qwen3TextConfig for Z-Image (#18560)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-12 14:31:23 +08:00
JooYeon
c9297661b9 fix: /metrics endpoint always reports engine_type="unified" in PD disaggregation mode (#18552)
Co-authored-by: joo_yeon.lee <joo_yeon.lee@samsung.com>
2026-02-12 14:20:43 +08:00
Li Jinliang
d91ce176bf Update README commands to include model-path option (#18557)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-12 14:15:26 +08:00
Zheng Li
4ed2548427 [Qwen3_5] Refactor Qwen3_5ForCausalLMMTP class implementation (#18538) 2026-02-12 13:38:26 +08:00
YAMY
454676811e [Flashinfer Autotune] Fix FlashInfer FP4 MoE autotuning crash by removing incorrect flatten on hidden_states_scale (#18500) 2026-02-12 13:31:27 +08:00
YC Tseng
20554a0a4f [AMD] rocm 7.2 image release, PR test, Nightly Test (#17799)
Co-authored-by: Alan Kao <akao@amd.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: Michael <13900043+michaelzhang-ai@users.noreply.github.com>
2026-02-11 21:29:25 -08:00
danielafrimi
e422bcaed8 [Mamba] Add float16 support for SSM cache dtype (#18444) 2026-02-12 11:27:47 +08:00
Zhiyu
7e262b6496 Update modelopt quantization config parsing (#13919)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
2026-02-12 11:08:29 +08:00
R0CKSTAR
41e1fd0be7 [diffusion] fix: webui cannot correctly display generated video using wan2.2 (#18473)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-12 10:35:39 +08:00
Yi Zhong
dc1309fc7e Avoid kimi linear stream sync (#16186)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2026-02-12 09:27:22 +08:00
Jiayi Yan
539bbf485c [Bugfix] fix config bug caused by PR #18273 (#18535) 2026-02-12 09:26:46 +08:00
Yuwei An
2bd8363486 [PCG] GPT OSS Triton Kernel Support (#18405)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
2026-02-12 09:23:55 +08:00
qianyue76
f06ab17a73 [diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
2026-02-11 16:55:07 -08:00
Lianmin Zheng
5875ef0a34 Clean up noisy startup log messages and refactor loader.py (#18531) 2026-02-11 16:12:57 -08:00
Piotr Mazurek
ded068a76e Add LMF2 MoE model architecture (#17997) 2026-02-12 01:03:43 +08:00
Ke Bao
5d185efb78 Fix prefill stats for dllm (#18632) 2026-02-12 01:00:30 +08:00
Vedant V Jhaveri
98b5013d59 add support to enable lora with embedding models (#17780)
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
2026-02-11 23:19:40 +08:00
Baizhou Zhang
947927bdb5 [V3.2] Change default CP token split method to --round-robin-split (#18613) 2026-02-11 20:14:35 +08:00
McZyWu
4f7422f7ba [NPU] support model skywork-reward-gemma2-2-27B-v0.2 (#16947)
Co-authored-by: cy <chenyang08056032@163.com>
2026-02-11 15:34:53 +08:00
sky
72c1526657 Register cp-atten-allgather buffers with symm memory (#17756)
Signed-off-by: wangfakang <fakangwang@gmail.com>
2026-02-11 15:26:37 +08:00
Thomas Wang
a8eef53dc4 Fp8 prefill attn kernel integration (#18528)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
2026-02-10 23:23:48 -08:00
BourneSun0527
2cc235e795 Fix Bug on dsv3.2 (#18553)
This PR affects only the NPU. If any issues arise, please contact iforgetmyname.
2026-02-11 14:39:01 +08:00
Michael
d84d2063d3 [AMD] Fix Janus-Pro crash and add Kimi-K2.5 nightly test (#18269) 2026-02-10 22:33:13 -08:00
Liangsheng Yin
cd90346a2b Add cache hit rate UT (#18566) 2026-02-10 21:27:41 -08:00
cutetocute
8d2892330c chore: fix some typos (#18577)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2026-02-10 20:47:41 -08:00
Liangsheng Yin
93fca0bbc3 Fix wrong prefill log. (#18570) 2026-02-10 15:54:03 -08:00
Yi-Chia Chen
2bfab1bb67 Fix radix cache key to include generated tokens in multi-turn (regression) (#16521)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-02-10 14:08:34 -08:00
Thomas Wang
4262f5259b Tilelang sparse decode fwd for dsv32 mi355 (#18488)
Co-authored-by: kk <43161300+kkHuang-amd@users.noreply.github.com>
2026-02-10 09:53:26 -08:00
Zheng Li
44603764d6 fix(config): Support setting Mamba state dtype via config file (#18532)
Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>
2026-02-11 00:20:06 +08:00
Mick
efcdda0176 [diffusion] fix: fix fsdp (#18187) 2026-02-10 20:22:20 +08:00
wxy
47978ee858 [diffusion] feat: support parallel wan-vae decode (#18179) 2026-02-10 18:32:00 +08:00
Zehuan Li
26f2b3798d [DLLM] Basic dLLM scheduling strategy and implementation (#17484)
Signed-off-by: Zehuan Li <lizehuan.lzh@antgroup.com>
2026-02-10 16:54:15 +08:00
shuwenn
8da14aea88 [HiCache] fix: StorageMetricsCollector was initialized twice (#18354) 2026-02-10 00:53:00 -08:00
Xinyuan Tong
398b81f78c Support GlmMoeDsaForCausalLM (#18521)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: BBuf <1182563586@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: BBuf <1182563586@qq.com>
2026-02-10 15:20:10 +08:00
Xinyuan Tong
e8a2c13380 Deepseekv32 compatibility with transformers v5 (#18297)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-02-10 14:50:40 +08:00
siyu
0b15f19927 [EPD] Add notification mechanism to fix server hang and add timeout env var (#18229) 2026-02-10 11:52:54 +08:00
maocheng23
1d366f1206 Make bench_one_batch_server compatible for more backends (#18512) 2026-02-10 10:36:39 +08:00
Qiaolin Yu
4a1b50bb2d Fix idle batch predict dtype in spec v2 (#18379) 2026-02-10 10:29:13 +08:00
Kartik Ramesh
26a006e47f Add cache_config_info metric. (#17273) 2026-02-09 16:09:09 -08:00
Lianmin Zheng
b027c5aca6 [Auto Sync] Update cache_init_params.py (20260209) (#18502)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-09 14:49:41 -08:00
Lianmin Zheng
ce95f203b0 [Auto Sync] Update logits_processor.py (20260209) (#18503)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2026-02-09 14:33:02 -08:00