Xinyuan Tong
|
3c34d2c3eb
|
[FIX] kimi_k2 reasoning parser (#17901)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-01-28 19:47:09 -08:00 |
|
Joe Redmond
|
0ff0d181ca
|
feat: add custom request header logging (#17786)
|
2026-01-28 19:33:08 -08:00 |
|
kk
|
f1384f5293
|
Integration mori backend for EP a2a data communication (#17012)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-01-28 19:07:34 -08:00 |
|
Jerry Ji
|
673dc09d9b
|
[Fix][trtllm-mha] Canonicalize the strides when num_head = 1 (#17732)
|
2026-01-29 10:11:18 +08:00 |
|
Qi Yuhang
|
0368ddf9ea
|
[JIT Kernel]Support fused_add_rmsnorm in JIT Kernel (#17677)
|
2026-01-29 09:29:59 +08:00 |
|
Zhang Yiyang (SII)
|
09a9147f59
|
[diffusion] model: support MOVA (#17704)
Co-authored-by: gaoyang07 <Gary1546308416AL@gmail.com>
Co-authored-by: cms42 <c@cms42.top>
Co-authored-by: cms42 <44895820+cms42@users.noreply.github.com>
Co-authored-by: Ruixiao Li <cgruixiao@outlook.com>
Co-authored-by: Li Ruixiao(SII) <80368770+Li-dongyang@users.noreply.github.com>
|
2026-01-29 09:12:08 +08:00 |
|
Prozac614
|
3fcda00e8c
|
[CI] Fix CI timeouts by upgrading runai_model_streamer (related to #16937) (#17636)
|
2026-01-28 17:09:45 -08:00 |
|
Lianmin Zheng
|
d4180815a4
|
Make the functions in logits_processor.py and sampler.py more modular (#17885)
|
2026-01-28 16:24:23 -08:00 |
|
jackey hua
|
0998de088b
|
[Perf] Tune Llama-4-Scout-17B-16E-Instruct fused moe kernel (#17891)
|
2026-01-28 14:06:46 -08:00 |
|
gingerXue
|
e9d727cb92
|
[MUSA][7/N] Enhance CUDA / PyNccl wrapper to support MTLink connectivity detection (#17499)
Signed-off-by: jingzhi.xue <jingzhi.xue@mthreads.com>
Co-authored-by: jingzhi.xue <jingzhi.xue@mthreads.com>
|
2026-01-28 11:36:30 -08:00 |
|
Артем Савкин
|
b77b0ffd60
|
[NPU] NZ for non-quantized MOE, Qwen3 MOE double memory consumption fix (#15904)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 00:55:08 +08:00 |
|
Jinn
|
1953efb60e
|
[AMD] ROCm: route W4A16 MoE to Triton and fix packed-weight loading (#17863)
|
2026-01-28 08:20:23 -08:00 |
|
triple-mu
|
1d1e72e516
|
[diffusion] fix: fix comfyui import typo (#17834)
|
2026-01-28 23:49:55 +08:00 |
|
Xiaoyu Zhang
|
c08b54a575
|
[JIT kernel] Update jit_kernel cache and develop doc (#17842)
|
2026-01-28 15:09:47 +08:00 |
|
Mick
|
2573a262af
|
[diffusion] doc: fix wrong docker run command (#17856)
|
2026-01-28 14:52:33 +08:00 |
|
Ziang Li
|
a8dda2aa57
|
[DSv32] Overlap indexer qk projection and activation quant (#17688)
|
2026-01-28 11:46:49 +08:00 |
|
Yisheng Gong
|
1c4616a034
|
fix: add bias when enable mm fallback variant (#17690)
|
2026-01-28 09:50:49 +08:00 |
|
陈一涵
|
647428d8d6
|
[diffusion] perf: apply mul add fusion for Qwen-Image (#16299)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-28 09:40:13 +08:00 |
|
Yashika Gandhi - Google
|
32ea7bcdd8
|
[diffusion] endpoint: fix vertex generate (#17611)
|
2026-01-28 09:38:56 +08:00 |
|
Mick
|
88fcd8535f
|
[diffusion] feat: add an arg for controlling the number of prefetched layers in layerwise-offload (#17693)
|
2026-01-28 09:34:27 +08:00 |
|
Mick
|
1507dc6cdf
|
[diffusion] fix: fix suppressing error log on non-main ranks (#17712)
|
2026-01-28 09:29:19 +08:00 |
|
Xiaoyu Zhang
|
331a22427c
|
[Diffusion] glm-image apply flashinfer rope (#17689)
|
2026-01-28 08:51:37 +08:00 |
|
siyu
|
4d00bd17a3
|
use shared memory for multimodal feature transport between Tokenizer and Scheduler (#16402)
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
|
2026-01-27 11:01:08 -08:00 |
|
Minglei Zhu
|
d90c0837e5
|
[hybrid-model] clean up and consolidate redundant fields in RadixLinearAttention (#17660)
|
2026-01-27 10:37:58 -08:00 |
|
fsygd
|
547e2d037e
|
[diffusion] refactor: add arg to control the precision of dit (#17751)
|
2026-01-27 23:01:23 +08:00 |
|
monkeyLoveding
|
d578b41bad
|
[NPU] Adapt cann 8.5: use sfa and lightning indexer op from cann and CI update (#17615)
Co-authored-by: Kelon <kelonlu@163.com>
|
2026-01-27 19:03:53 +08:00 |
|
MikkoParkkola
|
c56d19b977
|
fix(quantization): add sgl_kernel fallback for FP4 quantize on Blackwell GPUs (#17816)
|
2026-01-27 18:43:17 +08:00 |
|
Xuchun Shang
|
dba264ac73
|
[PP] fix wrong weight logic for tie_word_embeddings model (#15890)
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
|
2026-01-27 17:41:17 +08:00 |
|
Yuxuan Zhang
|
7106f6c8e1
|
[GLM-OCR] Support GLM-OCR Model (#17582)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-01-26 22:24:00 -08:00 |
|
Taemin Jung
|
81c0f5c5ad
|
[Model] Add support for EXAONE-4.0 Model (#8205)
Signed-off-by: BoxBy <lute7071@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-01-27 14:08:24 +08:00 |
|
laixin
|
6c9b054ab7
|
[Bug Fix] Fix reasoning parser when continue_final_message=true (#17065)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-01-27 14:04:44 +08:00 |
|
shuwenn
|
57e432d951
|
fix: preserve disconnect events in api key middleware (#17253)
|
2026-01-26 22:48:24 -05:00 |
|
shuwenn
|
fd3b179ffd
|
[HiCache][HA 1/N] Support HiCache storage runtime attach/detach (#15892)
|
2026-01-26 19:33:19 -08:00 |
|
Zhongdongming Dai
|
1b56a886bb
|
[chore]: improve time tracing of model loading process (#15426)
Co-authored-by: Michael Shin <mmshin@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
|
2026-01-26 19:04:25 -08:00 |
|
Yuhao Yang
|
479ab7a4e7
|
model: support Kimi-K2.5 (#17789)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-01-27 10:57:00 +08:00 |
|
WenhaoZhang
|
0519b0935f
|
[diffusion] comfyui: support Qwen-Image, Multi-GPU Z-Image, and Enhanced ComfyUI Integration (#17678)
Co-authored-by: niehen6174 <niehen.6174@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-27 10:06:42 +08:00 |
|
FlyPanda
|
2d8c22a15e
|
[bugfix] Internal processing of hf3fs crash # 16614 (#16938)
|
2026-01-26 18:01:50 -08:00 |
|
Mahdi-CV
|
539924037f
|
fix(processor): support InternS1 text_config in InternVL processor (#17040)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-26 13:02:54 -08:00 |
|
ybyang
|
5ab76ff220
|
Special logic for healthcheck (#17734)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2026-01-26 10:26:40 -08:00 |
|
Liangsheng Yin
|
85d077f44d
|
Introduce global alloc_len_per_decode & clean check decode memory (#15115)
|
2026-01-26 10:26:20 -08:00 |
|
Makcum888e
|
bba6e38ff8
|
[NPU] Split pyproject npu from pyproject other (#17641)
|
2026-01-26 09:45:44 -08:00 |
|
Yuan Luo
|
7bb41989fa
|
[1/N] Optimize All Reduce - Benchmark different AR operations (#13797)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-01-26 22:44:13 +08:00 |
|
lawtherWu
|
b56366f827
|
[NPU]DeepSeek-V3.2 support npu mlaprolog (#15381)
Co-authored-by: Zhengda Qin <zhengdqin@gmail.com>
Co-authored-by: richhuan <huan_rz@qq.com>
|
2026-01-26 20:42:37 +08:00 |
|
Yi Zhang
|
5844cb2fd8
|
refactor mamba radix cache logic in server_args (#17645)
|
2026-01-26 17:02:49 +08:00 |
|
shaharmor98
|
f6f1b6d000
|
Bump FI version (#17700)
Signed-off-by: Shahar Mor <smor@nvidia.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
|
2026-01-26 16:50:06 +08:00 |
|
McZyWu
|
2734b23481
|
accuracy enhancement for baichuan2-13B for npu (#16868)
Co-authored-by: cy <chenyang08056032@163.com>
|
2026-01-26 16:14:35 +08:00 |
|
Prozac614
|
12f794e516
|
[diffusion] fix: fix missing backend argument in pipelines_core initialization (#17343)
|
2026-01-26 15:47:10 +08:00 |
|
Kangyan-Zhou
|
48f4340b14
|
Exclude some diffusion package for ARM in docker release (#17745)
|
2026-01-25 23:32:39 -08:00 |
|
Alison Shao
|
30b3192039
|
Merge performance/accuracy test suites into regular stage-b suites (#17609)
|
2026-01-25 22:49:19 -08:00 |
|
CSWYF3634076
|
1a19b3987d
|
[Model] Add Ernie4.5 VL model support (#15679)
Signed-off-by: CSWYF3634076 <wangyafeng@baidu.com>
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2026-01-25 22:36:29 -08:00 |
|