Yuan Luo
|
26d95008b6
|
[apply][2/2] Fused qk_norm_rope for Qwen3-MoE (#13998)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-12-07 20:25:18 +08:00 |
|
Tiwei Bie
|
9abcab3ffa
|
[DLLM] feat: Add threshold based parallel decoding support (#14412)
Co-authored-by: Jinwei Yao <jinweiy@illinois.edu>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
|
2025-12-07 18:25:33 +08:00 |
|
Alison Shao
|
41d61faa99
|
[FLA] Add explicit kernel arguments to kda.py for Kimi Linear support (#14561)
|
2025-12-06 22:45:41 -08:00 |
|
Chen1022
|
3c7886ec4c
|
Fix attention backend logic for Qwen3-Next on SM100 (#14560)
|
2025-12-06 22:03:34 -08:00 |
|
b8zhong
|
6d5d76ad97
|
remove unecessary dual stream token threshold from the rest of models (qwen moe, kimi linear, etc.) (#14337)
|
2025-12-06 19:57:26 -08:00 |
|
Rain H
|
32a32cf7d2
|
Enhance prefill PP node robustness (#14494)
|
2025-12-06 18:00:54 -08:00 |
|
Minglei Zhu
|
be4a3ec376
|
support piecewise cuda graph for Olmo models (#14476)
|
2025-12-06 17:57:44 -08:00 |
|
almaslof
|
ff6e3ea934
|
[docs] Add missing word in argument description (#14205)
|
2025-12-06 17:56:54 -08:00 |
|
sglang-bot
|
d2b42477c7
|
chore: bump sgl-kernel version to 0.3.18.post3 (#14518)
|
2025-12-06 13:15:16 -08:00 |
|
Baizhou Zhang
|
9dfa01a435
|
[Misc]Register and refactor some environs for dpsk-fp4 and DeepEp (#14538)
|
2025-12-06 12:29:16 -08:00 |
|
Hanming Lu
|
e592ee6545
|
[Qwen3-next] remove heuristics and add radix cache kl test (#14520)
|
2025-12-06 12:11:40 -08:00 |
|
Baizhou Zhang
|
bc388471d2
|
[1/n] Fix hanging during DeepGemm Warmup (#14493)
|
2025-12-06 10:44:02 -08:00 |
|
gongwei-130
|
3e40c63674
|
fix "GrammarMatcher has terminated after accepting the stop token, but is trying to find the next token mask" when both reasoning and spec are enabled (#14464)
|
2025-12-06 06:15:22 -08:00 |
|
WenhaoZhang
|
80122e4f4c
|
[diffusion] lora: fix LoRA dtype handling and weight attribute access for z-image model (#14543)
Co-authored-by: niehen6174 <nihen6174@gmail.com>
|
2025-12-06 22:14:44 +08:00 |
|
Xiaoyu Zhang
|
6d41791823
|
[diffusion] perf: add QKV fusion optimization for Flux models (#14505)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-06 20:44:16 +08:00 |
|
Mick
|
35a9a07370
|
[diffusion] refactor: simplify sampling params' override logic (#14539)
|
2025-12-06 20:23:49 +08:00 |
|
Rain Jiang
|
ea177372bd
|
support mtp with deepseek r1 nvfp4 model (#13115)
Co-authored-by: Trevor Morris <tmorris@nvidia.com>
|
2025-12-06 00:45:54 -08:00 |
|
Baizhou Zhang
|
42fcf5438f
|
Revert "tiny remove deprecated endpoint call" (#14533)
|
2025-12-05 23:48:54 -08:00 |
|
Mick
|
d881f31488
|
[diffusion] chore: temporarily upgrade diffusers to make Z-image compatible with Cache-DiT (#14530)
|
2025-12-06 12:39:37 +08:00 |
|
Vincent Zhong
|
2ac5b98395
|
fix: fix rmsnorm -> layernorm in qwen3 omni (#11791)
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
|
2025-12-06 12:12:57 +08:00 |
|
Alison Shao
|
b988c18eae
|
Fix safetensors validation to catch corruption after download (#14465)
|
2025-12-05 16:04:00 -08:00 |
|
fzyzcjy
|
3d1b591aa1
|
Tiny use trtllm_mha as default when possible (#14291)
|
2025-12-05 14:26:03 -08:00 |
|
b8zhong
|
ec7b2c16d9
|
tiny remove deprecated endpoint call (#13607)
|
2025-12-05 09:54:49 -08:00 |
|
Hudson Xing
|
38daa29466
|
Add fused FP8 KV cache write kernel for TRTLLM MHA backend (#14093)
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
|
2025-12-06 00:53:55 +08:00 |
|
blahblah
|
66984a8b3d
|
[diffusion] feat: support cache-dit integration (#14234)
Co-authored-by: shuxiguo <shuxiguo@meituan.com>
Co-authored-by: DefTruth <qiustudent_r@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-06 00:52:22 +08:00 |
|
roikoren755
|
889b46ea50
|
[Spec] Mamba2 support in target models (#13434)
|
2025-12-06 00:50:46 +08:00 |
|
Mick
|
a89045603b
|
[diffusion] chore: set allowing overriding protected fields of sampling params as default behavior (#14471)
|
2025-12-06 00:22:42 +08:00 |
|
Alison Shao
|
662809874c
|
Add Mistral Large 3 to nightly CI tests (#14459)
|
2025-12-05 23:16:27 +08:00 |
|
elvischenv
|
205f041e96
|
Add Mistral Large 3 Eagle Support (#14466)
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-12-05 23:11:41 +08:00 |
|
Simo Lin
|
7235a7fbe9
|
[misc] add model arch and type to server info and use it for harmony (#14456)
|
2025-12-05 06:51:00 -08:00 |
|
Yuxuan Zhang
|
8fce9e7b2a
|
support GLM-V vision model dp (#14097)
|
2025-12-05 21:03:54 +08:00 |
|
Xiaoyu Zhang
|
5347732219
|
[diffusion] fix: Fix profiler trace missing Python stack in diffusion pipeline (#14499)
|
2025-12-05 12:12:35 +00:00 |
|
roikoren755
|
2ce121a1c3
|
Enable RadixCache for Mamba2 models (#13584)
|
2025-12-05 18:23:58 +08:00 |
|
WenhaoZhang
|
35ba6fe19e
|
[diffusion] fix: fix CLIP text encoder attention mask not used (#14364)
Co-authored-by: niehen6174 <niehen.6174@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-05 16:30:10 +08:00 |
|
GMI Xiao Jin
|
7c744d137d
|
[diffusion] cli: add argument --adjust-frames and --override-protected-fields (#13996)
Co-authored-by: dev <devnull@example.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-05 15:32:06 +08:00 |
|
zyksir
|
46b05ef58f
|
[diffusion] fix: fix bug about pin memory when offloading (#14472)
|
2025-12-05 15:26:30 +08:00 |
|
Mick
|
beec8eed6a
|
[diffusion] chore: further improve model searching logic (#14484)
|
2025-12-05 15:04:55 +08:00 |
|
Minglei Zhu
|
b76e303e6a
|
clean up gemlite usage (#14444)
|
2025-12-04 21:52:56 -08:00 |
|
Yinghai Lu
|
41429a8c10
|
[ez] Fix typing (#14473)
|
2025-12-05 12:23:13 +08:00 |
|
zyksir
|
fa0ca97694
|
[diffusion] improve: further optimize model load (#13836)
|
2025-12-05 10:45:20 +08:00 |
|
Junrong Lin
|
2ecee7571c
|
[Bug] fix not desired disable fused share experts caused by rocm logic (#14432)
|
2025-12-05 09:55:07 +08:00 |
|
Xinyuan Tong
|
6d37e70883
|
ministral3 (#14251)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Yueming Yuan <yy28@illinois.edu>
|
2025-12-04 14:31:26 -08:00 |
|
Sam
|
922756aaa1
|
[FIX] trtllm-moe-fp4-renorm for Qwen series models (#14350)
|
2025-12-04 12:52:21 -08:00 |
|
YAMY
|
7dfcc78155
|
[DeepseekV3.2][NSA][Indexer] Fix PAGED top-k transform for NSA indexer chunked execution on H200 (#14325)
|
2025-12-04 10:25:03 -08:00 |
|
Cherry_ming
|
1808df48fe
|
[NPU]add nightly-test-npu (#14143)
|
2025-12-05 00:43:35 +08:00 |
|
WenhaoZhang
|
788628b56f
|
[diffusion] feat: Add Configurable Generator Device and Seed Support via API (#14366)
Co-authored-by: niehen6174 <niehen.6174@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-05 00:25:09 +08:00 |
|
Raul Torres
|
29a2d4b59f
|
Add 'NPU' to the runtime exception message in get_device (#14225)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2025-12-04 17:34:31 +03:00 |
|
R0CKSTAR
|
079ac237da
|
[diffusion] fix: fix gen video doc (#14409)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-04 22:05:38 +08:00 |
|
Daniel Cámpora
|
8428078436
|
Add Mistral Large 3 support. (#14213)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-12-04 20:00:05 +08:00 |
|
Xuchun Shang
|
af35023e65
|
[bug fix] fix ima with get_mla_kv_buffer_kernel overflow (#14224)
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
|
2025-12-04 01:20:11 -08:00 |
|