ishandhanani
|
01e3f4682e
|
feat(kv-events): Add medium field to KV event types for storage tier tracking (#18205)
|
2026-02-09 12:39:15 -08:00 |
|
Zheng Li
|
27c447653d
|
model: support Qwen3.5 (#18489)
Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>
|
2026-02-10 00:27:59 +08:00 |
|
Kurt Shuster
|
006da22268
|
Pass quantize_config to _initialize_model (#18273)
|
2026-02-09 23:34:42 +08:00 |
|
brimon
|
ddbcfbaaab
|
feature: support bidirectional attention for Gemma-3 (#10707)
|
2026-02-09 23:17:45 +08:00 |
|
Mick
|
4f7da5ad0f
|
[diffusion] chore: fix unclean shutdown and resource leaks (#18477)
|
2026-02-09 22:32:08 +08:00 |
|
yrk111222
|
76eb1c8406
|
[diffusion] feat: add ModelScope support (#17924)
|
2026-02-09 19:23:45 +08:00 |
|
Baizhou Zhang
|
615a02dcd4
|
Revert "optimize get_topk_ragged by fusing get k and k_scale triton kernel" (#18471)
|
2026-02-09 16:37:19 +08:00 |
|
Liangsheng Yin
|
875ad6cf35
|
Tiny rename for spec related fileds. (#18468)
|
2026-02-09 00:10:39 -08:00 |
|
LHXuuu
|
107958a489
|
Make compressed-tensors MoEs support ignored layers (#17828)
Signed-off-by: LHXuuu <xulianhao.xlh@antgroup.com>
Co-authored-by: Peng Zhang <aniz1905@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-09 14:37:33 +08:00 |
|
Junlin Zhou
|
14652243bd
|
[DLLM] Add JointThreshold algorithm for joint M2T and T2T decoding (#18171)
Signed-off-by: Junlin Zhou <zhoujunlin.zjl@antgroup.com>
Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>
|
2026-02-09 14:20:45 +08:00 |
|
Bingxu Chen
|
3f3c201243
|
[AMD] Update aiter to v0.1.10.post2 (#18423)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: YC Tseng <yctseng@amd.com>
|
2026-02-08 22:08:24 -08:00 |
|
Yingchun Lai
|
a1189068fa
|
fix: fix the wrong return value type of draft model runner (#18105)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2026-02-08 20:51:35 -08:00 |
|
Zheng Wengang
|
68e31a3485
|
[BugFix][PD]Fix metadata_buffer_index leak when aborted in PD (#17483)
|
2026-02-09 11:34:29 +08:00 |
|
Shangming Cai
|
bffd765417
|
Refactoring Mooncake TE as a shared distributed component (#17810)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-09 10:53:11 +08:00 |
|
Yi Zhong
|
bf89cc3803
|
[ModelOPT] Support Qwen 3 Next Coder NVFP4 (#18224)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
|
2026-02-08 22:29:07 +00:00 |
|
Mohammad Miadh Angkad
|
071bf2ce09
|
[Kimi-K2.5] Fix missing quant_config in KimiK25 (#18440)
|
2026-02-08 12:02:45 -08:00 |
|
Piotr Mazurek
|
656a3d742e
|
Add tensor parallelism support to LFM2 ShortConv layers (#17777)
|
2026-02-09 00:52:47 +08:00 |
|
Mick
|
6601bc24da
|
[diffusion] chore: revise process title (#18446)
|
2026-02-09 00:14:06 +08:00 |
|
debo3
|
031a652b93
|
Fix TRT-LLM MLA backend applying k_scale to BF16 KV cache in BMM1 (#18396)
|
2026-02-08 23:11:16 +08:00 |
|
Mick
|
a41aff1243
|
[diffusion] refactor: group component loaders under the component_loaders/ directory (#18438)
|
2026-02-08 23:02:27 +08:00 |
|
Yi Zhong
|
ca36d88fa6
|
[ModelOpt] Fix broken Qwen3-235B-A22B-Instruct-2507-NVFP4 launch (#18189)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
|
2026-02-08 14:35:28 +00:00 |
|
wxy
|
43eecd8265
|
[diffusion] feat: support efficient sequence shard (#18161)
|
2026-02-08 21:09:39 +08:00 |
|
Zack Yu
|
d71ccd8860
|
fix: sync server_args.kv_cache_dtype when detecting FP8 KV cache (#18394)
|
2026-02-08 14:10:59 +08:00 |
|
DarkSharpness
|
8e2e835c2f
|
[Fix] Fix backend selection after flashinfer version update (#18364)
|
2026-02-08 11:20:41 +08:00 |
|
Makcum888e
|
00248d85c7
|
[diffusion] platform: support WAN/FLUX/Qwen-Image/Qwen-Image-edit on Ascend (#13662)
Co-authored-by: dhx98 <haox.dai@gmail.com>
Co-authored-by: DHX98 <haoxiand@andrew.cmu.edu>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: DHX98 <DHX98@noreply.gitcode.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
|
2026-02-08 10:45:30 +08:00 |
|
Mohammad Miadh Angkad
|
7b83659310
|
fix: fix NVFP4 Kimi-K2.5 weight mapping and exclude list (#18370)
|
2026-02-08 10:23:48 +08:00 |
|
wxy
|
64950d8f97
|
[diffusion] feat: support saving videos directly on the server to avoid the overhead of tensor transfer (#18253)
|
2026-02-07 22:08:42 +08:00 |
|
Mick
|
31d4cd2ffd
|
[diffusion] fix: respect dist_timeout option (#18386)
|
2026-02-07 20:56:04 +08:00 |
|
Mohammad Miadh Angkad
|
fddef76619
|
[Doc] Fix outdated --fp4-gemm-backend documentation (#18350)
|
2026-02-07 20:42:47 +08:00 |
|
Hao Jin
|
d792aa7618
|
[diffusion] fix: remove unnecessary norm_type argument from GLM-Image dits (#18382)
Co-authored-by: Hao Jin <Hao Jin>
|
2026-02-07 20:35:12 +08:00 |
|
Baizhou Zhang
|
eb4cf1dfc4
|
[CI] Skip some flaky subtests for test_multi_lora_backend.py (#18408)
|
2026-02-07 19:06:53 +08:00 |
|
Xiaoyu Zhang
|
baec650462
|
[Diffusion] Apply fused_norm_scale_shift to LTX2/MOVA (#18257)
Co-authored-by: yihanc <yingluosanqian@gmail.com>
|
2026-02-07 17:28:42 +08:00 |
|
赵晨阳
|
1552aab741
|
Support execute_shell_command for env var support (#18390)
|
2026-02-07 12:33:29 +08:00 |
|
hlu1
|
4637970dfb
|
[Qwen3Next] Optimize fused_sigmoid_gating_delta_rule_update_kernel (#18271)
|
2026-02-07 11:59:42 +08:00 |
|
Neal Vaidya
|
f1ff697494
|
add hybrid model PD to NIXL connector (#16229)
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
|
2026-02-06 15:05:36 -08:00 |
|
Prozac614
|
e13b727e92
|
[diffusion] CI: update perf baseline (#17512)
|
2026-02-07 00:28:44 +08:00 |
|
shaharmor98
|
c6aa1863be
|
Add Nemotron 3 Nano tests (#18119)
Signed-off-by: Shahar Mor <smor@nvidia.com>
|
2026-02-06 23:55:42 +08:00 |
|
xiaoye
|
79d409f210
|
[diffusion] fix: offload text encoder model in image encoding stage (#18317)
|
2026-02-06 22:55:56 +08:00 |
|
Xuchun Shang
|
3d68bd9d9b
|
add hicache jit test (#17847)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
|
2026-02-06 16:54:33 +08:00 |
|
陈一涵
|
f798ab9775
|
[diffusion] fix: fix torch.compile graph break caused by torch._dynamo.disable (#18336)
|
2026-02-06 14:48:09 +08:00 |
|
Alison Shao
|
d0c39bc219
|
Fix cross-container HF download race condition in CI (#18328)
|
2026-02-05 21:01:41 -08:00 |
|
Linyu Wu
|
aa390d2762
|
[Kernel] Migrate GPTQ-Marlin GEMM kernel to JIT (#18067)
|
2026-02-06 08:31:42 +08:00 |
|
aaaandychen
|
6a4b81e2d9
|
Refactor(qwen3-vl) optimize position encoding interpolation (#16781)
Signed-off-by: chenzhenyang <andy271828@163.com>
Signed-off-by: chenzhenyang <chenzhenyang@moonshot.cn>
Co-authored-by: chenzhenyang <chenzhenyang@moonshot.cn>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2026-02-05 10:26:35 -08:00 |
|
ovidiusm
|
498d8d0680
|
NixlKVManager optimizations (#17654)
Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>
|
2026-02-06 00:25:23 +08:00 |
|
wxy
|
b639779dd8
|
[diffusion] feat: allow T5's TP Group to reuse the transformer's SP Group (#17818)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-02-06 00:12:19 +08:00 |
|
Glen Liu
|
3f32a5831d
|
throw error if got adapter with added_tokens (#18046)
|
2026-02-05 23:55:43 +08:00 |
|
pansicheng
|
2eb4359ada
|
[Kernel] Add JIT apply_rope_with_cos_sin_cache_inplace (#18155)
|
2026-02-05 21:49:37 +08:00 |
|
陈一涵
|
4aa03d91fd
|
[diffusion] fix: fix accuracy bug caused by #14717 (#18296)
|
2026-02-05 20:36:18 +08:00 |
|
Shangming Cai
|
afae4c7178
|
[PD] Minor code cleanup for mooncake backend (#18279)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-05 17:38:09 +08:00 |
|
zhangheng
|
079fc8f3c5
|
[piecewise graph]: support MiniMax-M2 (#18217)
|
2026-02-04 23:24:38 -08:00 |
|