Lianmin Zheng
|
9815ee934c
|
[Auto Sync] Update weight_utils.py (20260212) (#18692)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dan Zheng <dzheng@x.ai>
|
2026-02-12 16:26:05 -08:00 |
|
Hao Jin
|
b48fe1d95e
|
[Diffusion] [BUG] Fix missing initialization of GLM-Image text encoder config (#18704)
Co-authored-by: Hao Jin <Hao Jin>
|
2026-02-12 16:19:22 -08:00 |
|
Shangming Cai
|
2a8a48c0ca
|
Reuse initialized transfer engine in mooncake store (#18460)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-13 01:21:35 +08:00 |
|
Yi Zhang
|
b168723424
|
[BUGFIX] fix bug in handle mamba radix cache in server_args (#18723)
|
2026-02-12 21:33:32 +08:00 |
|
Simo Lin
|
92c5749f41
|
refactor: replace local proto compilation with smg-grpc-proto package (#18682)
|
2026-02-12 05:29:24 -08:00 |
|
Scott Lee
|
c59b9223e6
|
Add spec_accept_histogram request statistic (#18332)
|
2026-02-12 21:09:21 +08:00 |
|
Hudson Xing
|
f3656432c7
|
add tool_choice=auto nightly test case (#18302)
|
2026-02-12 19:28:05 +08:00 |
|
Thomas Wang
|
e20e6c28b9
|
[AMD] Fix accuracy issue when running TP4 dsv3 model with mtp (#18607)
Co-authored-by: YC Tseng <yctseng@amd.com>
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
|
2026-02-12 01:13:16 -08:00 |
|
chenxu214
|
1edc69be08
|
[Ascend]Support qwen3.5 (#18544)
This PR affects only the NPU. If any issues arise, please contact iforgetmyname.
|
2026-02-12 15:22:47 +08:00 |
|
Vinh H. Pham
|
feaa9e7e00
|
[diffusion] fix: replace TextEncoderConfig with Qwen3TextConfig for Z-Image (#18560)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-02-12 14:31:23 +08:00 |
|
JooYeon
|
c9297661b9
|
fix: /metrics endpoint always reports engine_type="unified" in PD disaggregation mode (#18552)
Co-authored-by: joo_yeon.lee <joo_yeon.lee@samsung.com>
|
2026-02-12 14:20:43 +08:00 |
|
Li Jinliang
|
d91ce176bf
|
Update README commands to include model-path option (#18557)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-02-12 14:15:26 +08:00 |
|
Zheng Li
|
4ed2548427
|
[Qwen3_5] Refactor Qwen3_5ForCausalLMMTP class implementation (#18538)
|
2026-02-12 13:38:26 +08:00 |
|
YAMY
|
454676811e
|
[Flashinfer Autotune] Fix FlashInfer FP4 MoE autotuning crash by removing incorrect flatten on hidden_states_scale (#18500)
|
2026-02-12 13:31:27 +08:00 |
|
YC Tseng
|
20554a0a4f
|
[AMD] rocm 7.2 image release, PR test, Nightly Test (#17799)
Co-authored-by: Alan Kao <akao@amd.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: Michael <13900043+michaelzhang-ai@users.noreply.github.com>
|
2026-02-11 21:29:25 -08:00 |
|
danielafrimi
|
e422bcaed8
|
[Mamba] Add float16 support for SSM cache dtype (#18444)
|
2026-02-12 11:27:47 +08:00 |
|
Zhiyu
|
7e262b6496
|
Update modelopt quantization config parsing (#13919)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2026-02-12 11:08:29 +08:00 |
|
R0CKSTAR
|
41e1fd0be7
|
[diffusion] fix: webui cannot correctly display generated video using wan2.2 (#18473)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-02-12 10:35:39 +08:00 |
|
Yi Zhong
|
dc1309fc7e
|
Avoid kimi linear stream sync (#16186)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
|
2026-02-12 09:27:22 +08:00 |
|
Jiayi Yan
|
539bbf485c
|
[Bugfix] fix config bug caused by PR #18273 (#18535)
|
2026-02-12 09:26:46 +08:00 |
|
Yuwei An
|
2bd8363486
|
[PCG] GPT OSS Triton Kernel Support (#18405)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
|
2026-02-12 09:23:55 +08:00 |
|
qianyue76
|
f06ab17a73
|
[diffusion] docs: consolidate diffusion documentation into docs (#18095)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: JiaxinD <djx2048@gmail.com>
|
2026-02-11 16:55:07 -08:00 |
|
Lianmin Zheng
|
5875ef0a34
|
Clean up noisy startup log messages and refactor loader.py (#18531)
|
2026-02-11 16:12:57 -08:00 |
|
Piotr Mazurek
|
ded068a76e
|
Add LMF2 MoE model architecture (#17997)
|
2026-02-12 01:03:43 +08:00 |
|
Ke Bao
|
5d185efb78
|
Fix prefill stats for dllm (#18632)
|
2026-02-12 01:00:30 +08:00 |
|
Vedant V Jhaveri
|
98b5013d59
|
add support to enable lora with embedding models (#17780)
Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com>
|
2026-02-11 23:19:40 +08:00 |
|
Baizhou Zhang
|
947927bdb5
|
[V3.2] Change default CP token split method to --round-robin-split (#18613)
|
2026-02-11 20:14:35 +08:00 |
|
McZyWu
|
4f7422f7ba
|
[NPU] support model skywork-reward-gemma2-2-27B-v0.2 (#16947)
Co-authored-by: cy <chenyang08056032@163.com>
|
2026-02-11 15:34:53 +08:00 |
|
sky
|
72c1526657
|
Register cp-atten-allgather buffers with symm memory (#17756)
Signed-off-by: wangfakang <fakangwang@gmail.com>
|
2026-02-11 15:26:37 +08:00 |
|
Thomas Wang
|
a8eef53dc4
|
Fp8 prefill attn kernel integration (#18528)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
|
2026-02-10 23:23:48 -08:00 |
|
BourneSun0527
|
2cc235e795
|
Fix Bug on dsv3.2 (#18553)
This PR affects only the NPU. If any issues arise, please contact iforgetmyname.
|
2026-02-11 14:39:01 +08:00 |
|
Michael
|
d84d2063d3
|
[AMD] Fix Janus-Pro crash and add Kimi-K2.5 nightly test (#18269)
|
2026-02-10 22:33:13 -08:00 |
|
Liangsheng Yin
|
cd90346a2b
|
Add cache hit rate UT (#18566)
|
2026-02-10 21:27:41 -08:00 |
|
cutetocute
|
8d2892330c
|
chore: fix some typos (#18577)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2026-02-10 20:47:41 -08:00 |
|
Liangsheng Yin
|
93fca0bbc3
|
Fix wrong prefill log. (#18570)
|
2026-02-10 15:54:03 -08:00 |
|
Yi-Chia Chen
|
2bfab1bb67
|
Fix radix cache key to include generated tokens in multi-turn (regression) (#16521)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
|
2026-02-10 14:08:34 -08:00 |
|
Thomas Wang
|
4262f5259b
|
Tilelang sparse decode fwd for dsv32 mi355 (#18488)
Co-authored-by: kk <43161300+kkHuang-amd@users.noreply.github.com>
|
2026-02-10 09:53:26 -08:00 |
|
Zheng Li
|
44603764d6
|
fix(config): Support setting Mamba state dtype via config file (#18532)
Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>
|
2026-02-11 00:20:06 +08:00 |
|
Mick
|
efcdda0176
|
[diffusion] fix: fix fsdp (#18187)
|
2026-02-10 20:22:20 +08:00 |
|
wxy
|
47978ee858
|
[diffusion] feat: support parallel wan-vae decode (#18179)
|
2026-02-10 18:32:00 +08:00 |
|
Zehuan Li
|
26f2b3798d
|
[DLLM] Basic dLLM scheduling strategy and implementation (#17484)
Signed-off-by: Zehuan Li <lizehuan.lzh@antgroup.com>
|
2026-02-10 16:54:15 +08:00 |
|
shuwenn
|
8da14aea88
|
[HiCache] fix: StorageMetricsCollector was initialized twice (#18354)
|
2026-02-10 00:53:00 -08:00 |
|
Xinyuan Tong
|
398b81f78c
|
Support GlmMoeDsaForCausalLM (#18521)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: BBuf <1182563586@qq.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: BBuf <1182563586@qq.com>
|
2026-02-10 15:20:10 +08:00 |
|
Xinyuan Tong
|
e8a2c13380
|
Deepseekv32 compatibility with transformers v5 (#18297)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-02-10 14:50:40 +08:00 |
|
siyu
|
0b15f19927
|
[EPD] Add notification mechanism to fix server hang and add timeout env var (#18229)
|
2026-02-10 11:52:54 +08:00 |
|
maocheng23
|
1d366f1206
|
Make bench_one_batch_server compatible for more backends (#18512)
|
2026-02-10 10:36:39 +08:00 |
|
Qiaolin Yu
|
4a1b50bb2d
|
Fix idle batch predict dtype in spec v2 (#18379)
|
2026-02-10 10:29:13 +08:00 |
|
Kartik Ramesh
|
26a006e47f
|
Add cache_config_info metric. (#17273)
|
2026-02-09 16:09:09 -08:00 |
|
Lianmin Zheng
|
b027c5aca6
|
[Auto Sync] Update cache_init_params.py (20260209) (#18502)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-09 14:49:41 -08:00 |
|
Lianmin Zheng
|
ce95f203b0
|
[Auto Sync] Update logits_processor.py (20260209) (#18503)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
2026-02-09 14:33:02 -08:00 |
|