Commit Graph

6437 Commits

Author SHA1 Message Date
ishandhanani
01e3f4682e feat(kv-events): Add medium field to KV event types for storage tier tracking (#18205) 2026-02-09 12:39:15 -08:00
Zheng Li
27c447653d model: support Qwen3.5 (#18489)
Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>
2026-02-10 00:27:59 +08:00
Kurt Shuster
006da22268 Pass quantize_config to _initialize_model (#18273) 2026-02-09 23:34:42 +08:00
brimon
ddbcfbaaab feature: support bidirectional attention for Gemma-3 (#10707) 2026-02-09 23:17:45 +08:00
Mick
4f7da5ad0f [diffusion] chore: fix unclean shutdown and resource leaks (#18477) 2026-02-09 22:32:08 +08:00
yrk111222
76eb1c8406 [diffusion] feat: add ModelScope support (#17924) 2026-02-09 19:23:45 +08:00
Baizhou Zhang
615a02dcd4 Revert "optimize get_topk_ragged by fusing get k and k_scale triton kernel" (#18471) 2026-02-09 16:37:19 +08:00
Liangsheng Yin
875ad6cf35 Tiny rename for spec related fileds. (#18468) 2026-02-09 00:10:39 -08:00
LHXuuu
107958a489 Make compressed-tensors MoEs support ignored layers (#17828)
Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>
Co-authored-by: Peng Zhang <aniz1905@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-09 14:37:33 +08:00
Junlin Zhou
14652243bd [DLLM] Add JointThreshold algorithm for joint M2T and T2T decoding (#18171)
Signed-off-by: Junlin Zhou <zhoujunlin.zjl@antgroup.com>
Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>
2026-02-09 14:20:45 +08:00
Bingxu Chen
3f3c201243 [AMD] Update aiter to v0.1.10.post2 (#18423)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: YC Tseng <yctseng@amd.com>
2026-02-08 22:08:24 -08:00
Yingchun Lai
a1189068fa fix: fix the wrong return value type of draft model runner (#18105)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2026-02-08 20:51:35 -08:00
Zheng Wengang
68e31a3485 [BugFix][PD]Fix metadata_buffer_index leak when aborted in PD (#17483) 2026-02-09 11:34:29 +08:00
Shangming Cai
bffd765417 Refactoring Mooncake TE as a shared distributed component (#17810)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-02-09 10:53:11 +08:00
Yi Zhong
bf89cc3803 [ModelOPT] Support Qwen 3 Next Coder NVFP4 (#18224)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2026-02-08 22:29:07 +00:00
Mohammad Miadh Angkad
071bf2ce09 [Kimi-K2.5] Fix missing quant_config in KimiK25 (#18440) 2026-02-08 12:02:45 -08:00
Piotr Mazurek
656a3d742e Add tensor parallelism support to LFM2 ShortConv layers (#17777) 2026-02-09 00:52:47 +08:00
Mick
6601bc24da [diffusion] chore: revise process title (#18446) 2026-02-09 00:14:06 +08:00
debo3
031a652b93 Fix TRT-LLM MLA backend applying k_scale to BF16 KV cache in BMM1 (#18396) 2026-02-08 23:11:16 +08:00
Mick
a41aff1243 [diffusion] refactor: group component loaders under the component_loaders/ directory (#18438) 2026-02-08 23:02:27 +08:00
Yi Zhong
ca36d88fa6 [ModelOpt] Fix broken Qwen3-235B-A22B-Instruct-2507-NVFP4 launch (#18189)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2026-02-08 14:35:28 +00:00
wxy
43eecd8265 [diffusion] feat: support efficient sequence shard (#18161) 2026-02-08 21:09:39 +08:00
Zack Yu
d71ccd8860 fix: sync server_args.kv_cache_dtype when detecting FP8 KV cache (#18394) 2026-02-08 14:10:59 +08:00
DarkSharpness
8e2e835c2f [Fix] Fix backend selection after flashinfer version update (#18364) 2026-02-08 11:20:41 +08:00
Makcum888e
00248d85c7 [diffusion] platform: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend (#13662)
Co-authored-by: dhx98 <haox.dai@gmail.com>
Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: DHX98 <DHX98@noreply.gitcode.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
2026-02-08 10:45:30 +08:00
Mohammad Miadh Angkad
7b83659310 fix: fix NVFP4 Kimi-K2.5 weight mapping and exclude list (#18370) 2026-02-08 10:23:48 +08:00
wxy
64950d8f97 [diffusion] feat: support saving videos directly on the server to avoid the overhead of tensor transfer (#18253) 2026-02-07 22:08:42 +08:00
Mick
31d4cd2ffd [diffusion] fix: respect dist_timeout option (#18386) 2026-02-07 20:56:04 +08:00
Mohammad Miadh Angkad
fddef76619 [Doc] Fix outdated --fp4-gemm-backend documentation (#18350) 2026-02-07 20:42:47 +08:00
Hao Jin
d792aa7618 [diffusion] fix: remove unnecessary norm_type argument from GLM-Image dits (#18382)
Co-authored-by: Hao Jin <Hao Jin>
2026-02-07 20:35:12 +08:00
Baizhou Zhang
eb4cf1dfc4 [CI] Skip some flaky subtests for test_multi_lora_backend.py (#18408) 2026-02-07 19:06:53 +08:00
Xiaoyu Zhang
baec650462 [Diffusion] Apply fused_norm_scale_shift to LTX2/MOVA (#18257)
Co-authored-by: yihanc <yingluosanqian@gmail.com>
2026-02-07 17:28:42 +08:00
赵晨阳
1552aab741 Support execute_shell_command for env var support (#18390) 2026-02-07 12:33:29 +08:00
hlu1
4637970dfb [Qwen3Next] Optimize fused_sigmoid_gating_delta_rule_update_kernel (#18271) 2026-02-07 11:59:42 +08:00
Neal Vaidya
f1ff697494 add hybrid model PD to NIXL connector (#16229)
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
2026-02-06 15:05:36 -08:00
Prozac614
e13b727e92 [diffusion] CI: update perf baseline (#17512) 2026-02-07 00:28:44 +08:00
shaharmor98
c6aa1863be Add Nemotron 3 Nano tests (#18119)
Signed-off-by: Shahar Mor <smor@nvidia.com>
2026-02-06 23:55:42 +08:00
xiaoye
79d409f210 [diffusion] fix: offload text encoder model in image encoding stage (#18317) 2026-02-06 22:55:56 +08:00
Xuchun Shang
3d68bd9d9b add hicache jit test (#17847)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
2026-02-06 16:54:33 +08:00
陈一涵
f798ab9775 [diffusion] fix: fix torch.compile graph break caused by torch._dynamo.disable (#18336) 2026-02-06 14:48:09 +08:00
Alison Shao
d0c39bc219 Fix cross-container HF download race condition in CI (#18328) 2026-02-05 21:01:41 -08:00
Linyu Wu
aa390d2762 [Kernel] Migrate GPTQ-Marlin GEMM kernel to JIT (#18067) 2026-02-06 08:31:42 +08:00
aaaandychen
6a4b81e2d9 Refactor(qwen3-vl) optimize position encoding interpolation (#16781)
Signed-off-by: chenzhenyang <andy271828@163.com>
Signed-off-by: chenzhenyang <chenzhenyang@moonshot.cn>
Co-authored-by: chenzhenyang <chenzhenyang@moonshot.cn>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2026-02-05 10:26:35 -08:00
ovidiusm
498d8d0680 NixlKVManager optimizations (#17654)
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>
2026-02-06 00:25:23 +08:00
wxy
b639779dd8 [diffusion] feat: allow T5's TP Group to reuse the transformer's SP Group (#17818)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-02-06 00:12:19 +08:00
Glen Liu
3f32a5831d throw error if got adapter with added_tokens (#18046) 2026-02-05 23:55:43 +08:00
pansicheng
2eb4359ada [Kernel] Add JIT apply_rope_with_cos_sin_cache_inplace (#18155) 2026-02-05 21:49:37 +08:00
陈一涵
4aa03d91fd [diffusion] fix: fix accuracy bug caused by #14717 (#18296) 2026-02-05 20:36:18 +08:00
Shangming Cai
afae4c7178 [PD] Minor code cleanup for mooncake backend (#18279)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-02-05 17:38:09 +08:00
zhangheng
079fc8f3c5 [piecewise graph]: support MiniMax-M2 (#18217) 2026-02-04 23:24:38 -08:00