PiteXChen
|
dc7bdc7329
|
bugfix[schedule]: Excessive preemption occurs when preempting running requests to schedule new prefill requests. (#12494)
Signed-off-by: CLFutureX <chenyongqyl@163.com>
|
2025-11-30 22:29:26 +08:00 |
|
Liangsheng Yin
|
0a9d64530d
|
Support grammar + spec + reasoning (#14163)
|
2025-11-30 21:19:57 +08:00 |
|
fzyzcjy
|
340c613ab5
|
Support numactl bind for CPU and memory before process starts (#14156)
|
2025-11-30 17:00:33 +08:00 |
|
fzyzcjy
|
36b729c2b8
|
Implement profiler v2 and fix stage mixture bug (#14148)
|
2025-11-30 16:59:52 +08:00 |
|
Tianhao Zhou
|
67e6ef4b2d
|
feat: longcat flash add aux layers capture for eagle3 (#14161)
|
2025-11-30 00:50:55 -08:00 |
|
strgrb
|
65ba5ab8b1
|
add cpp files for cpp_radix_tree to pyproject.toml. (#14052)
|
2025-11-30 13:05:04 +08:00 |
|
WenhaoZhang
|
990023e59b
|
[diffusion] lora: Fix LoRA weight merging for torch.nn.Linear layers from diffusers modules (#14150)
Co-authored-by: niehen6174 <niehen.6174@gmail.com>
|
2025-11-30 12:44:12 +08:00 |
|
fzyzcjy
|
0ae4b1ad81
|
Show errors when misusing env variables (#14154)
|
2025-11-30 10:57:35 +08:00 |
|
fzyzcjy
|
94cd64a7b0
|
Support checking fp8 params in weight_checker (#14147)
|
2025-11-30 09:08:59 +08:00 |
|
fzyzcjy
|
b870271a50
|
Fix spec v2 does not support RL update weights from tensor (#14146)
|
2025-11-30 09:08:05 +08:00 |
|
fzyzcjy
|
22ee9b0111
|
Super tiny add more info in dumper (#14145)
|
2025-11-30 09:07:39 +08:00 |
|
fzyzcjy
|
9d0e5f1f74
|
Tiny fix DeepGEMM precompile rank check (#14136)
|
2025-11-30 09:07:17 +08:00 |
|
Kangyan-Zhou
|
1d3d8b3418
|
Fix Minimax M2 loading issue (#13956)
|
2025-11-29 17:07:19 -05:00 |
|
Lianmin Zheng
|
155a9e7237
|
Fix condition for streaming output_ids in tokenizer manager (#13759)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Chang Su <chang.s.su\n@oracle.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-11-29 13:56:15 -08:00 |
|
gongwei-130
|
3339c81072
|
fix RuntimeError: RMSNorm failed with error code an illegal memory access was encountered (#14135)
|
2025-11-29 12:17:41 -08:00 |
|
Yuhao Yang
|
f03ea34a3d
|
add runtime check for PyTorch 2.9.1 + CuDNN < 9.15 to prevent Conv3d performance issues (#14119)
|
2025-11-29 10:05:54 -05:00 |
|
fzyzcjy
|
4cafc835d3
|
Super tiny fix typo (#14131)
|
2025-11-29 21:08:31 +08:00 |
|
Mick
|
c6a52f4411
|
[diffusion] chore: add resolution shortcuts for sampling params (#14129)
|
2025-11-29 18:00:21 +08:00 |
|
elvischenv
|
848ee57067
|
feat: support flashinfer kernel autotune (#12306)
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
|
2025-11-29 00:05:37 -08:00 |
|
Cheng Wan
|
0fe74af563
|
Remove incorrect deep_gemm assertions from server_args.py (#14113)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
|
2025-11-28 20:25:39 -08:00 |
|
Mick
|
0a362d653f
|
[diffusion] log: unify generation performance logging (#14117)
|
2025-11-29 12:21:59 +08:00 |
|
fjybiocs
|
143b57b805
|
enable piecewise cuda graph for prefill server (#13377)
Co-authored-by: serverance.fu <serverance.fu@temu.com>
|
2025-11-29 12:09:26 +08:00 |
|
Yan Ru Pei
|
f446b51c41
|
fix: malformed KV events for NVIDIA Dynamo (#13488)
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-11-28 14:55:20 -08:00 |
|
Tomer Shmilovich
|
11b6217aee
|
Fix NIXL OBJ desciptors (#10712)
Co-authored-by: Tomer Shmilovich <tshmilovich@login-eos01.eos.clusters.nvidia.com>
|
2025-11-28 11:32:07 -08:00 |
|
Yuhao Yang
|
841eb29d3d
|
[diffusion] model: support z-image (#14067)
|
2025-11-28 21:48:31 +08:00 |
|
fzyzcjy
|
45cf575852
|
Fix overlap scheduler not take effect when outputing logprobs (#14096)
|
2025-11-28 18:15:56 +08:00 |
|
Mick
|
0e8ce1e832
|
[diffusion] refactor: clean useless files (#14094)
|
2025-11-28 18:14:00 +08:00 |
|
Lzhang-hub
|
ea1e9f6b3c
|
feat: support qwen3_vl vision model dp (#13724)
|
2025-11-28 17:29:07 +08:00 |
|
Lzhang-hub
|
f6e37d3edb
|
[Bugfix] qwen2.5-vl spec decode accept_len low (#13904)
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>
|
2025-11-28 17:26:32 +08:00 |
|
vipwangerxiao
|
ab9a46d462
|
Support configuring the request limit per receiving poll (#14076)
Co-authored-by: Peng Wang <peng_wang@linux.alibaba.com>
Co-authored-by: Feng Su <225349073+sufeng-buaa@users.noreply.github.com>
|
2025-11-28 16:14:21 +08:00 |
|
shuwenn
|
621061f017
|
[Bugfix] input prompt was not logged (#13936)
|
2025-11-28 16:00:51 +08:00 |
|
Aleksandr Krotov
|
7daddcdb58
|
Fix structural_tag tool call with null schema (#14006)
|
2025-11-27 23:04:16 -08:00 |
|
Mick
|
951028968c
|
[diffusion] refactor: refactor ComponentLoader and support loading native models from diffusers and transformers (#13205)
|
2025-11-28 14:17:32 +08:00 |
|
Mick
|
3543a04a48
|
[diffusion] refactor: refactor condition image resize logic (#14079)
|
2025-11-28 14:06:34 +08:00 |
|
fzyzcjy
|
21af8e73ad
|
Super tiny add comments to SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK (#14048)
|
2025-11-27 22:16:43 +08:00 |
|
Baizhou Zhang
|
7ab548ef64
|
[2/2] Refactor DeepGeem requant for FP8 FusedMoE on Blackwell (#13960)
|
2025-11-27 09:00:26 -05:00 |
|
Yixin Dong
|
6350042696
|
feat: Naive support Spec V2 + Constrained Decoding (#13425)
Signed-off-by: Ubospica <ubospica@gmail.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-11-27 20:31:46 +08:00 |
|
fzyzcjy
|
25758647b1
|
Support sanity checking weight consistency especially for RL (#13854)
|
2025-11-27 20:25:12 +08:00 |
|
fzyzcjy
|
2bc8ee8b74
|
Tiny support 3D tensors in inverse_transform_scale_ue8m0 (#14002)
|
2025-11-27 20:20:45 +08:00 |
|
Jimmy
|
ab843ced31
|
[Feat]Add scheduler recv skipper weights to environment configuration (#13855)
|
2025-11-27 18:16:11 +08:00 |
|
Mick
|
6edffc6391
|
[diffusion] perf: improve black-forest-labs/FLUX.2-dev (#14040)
|
2025-11-27 14:49:52 +08:00 |
|
gaopengff
|
077ca70ee4
|
[Intel XPU]Add xpu support for get_device_memory_capacity (#13895)
|
2025-11-26 20:55:52 -08:00 |
|
Qiaolin Yu
|
7cb04dc0e5
|
Use trtllm mha decode kernel for target_verify in speculative decoding (#13976)
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
|
2025-11-26 20:40:34 -08:00 |
|
sunxxuns
|
5443db8759
|
fix: Fix AMD CI failures with HIP layernorm and PyPI connectivity (#13814)
Co-authored-by: root <root@mi300x8-005.atl1.do.cpe.ice.amd.com>
|
2025-11-27 11:30:37 +08:00 |
|
Stefan He
|
9f340ab1fb
|
[Piecewise] support disable decode cuda graph when enable piecewise cuda graph (#13965)
|
2025-11-26 18:35:59 -08:00 |
|
alisonshao
|
6330d6641b
|
Fix flashinfer cutlass MoE output shape for non-FP4-packed inputs (#14028)
|
2025-11-26 18:09:02 -07:00 |
|
Sam
|
91e8dc371a
|
[Feat][NVFP4] Enable NVFP4 MoE for Qwen series models (eg. Qwen3-Next) #13761 (#13761)
Co-authored-by: Kaixi Hou <kaixih@nvidia.com>
|
2025-11-26 17:53:45 -07:00 |
|
Lianmin Zheng
|
231df4b0d4
|
Cleanup server args (#14027)
|
2025-11-26 16:32:41 -08:00 |
|
ShawnY112358
|
5155016b56
|
[feat] update bucketed weights from distributed (#13824)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-11-26 15:30:45 -08:00 |
|
Netanel Haber
|
082b54c689
|
Support nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 (and nvidia/C-RADIOv2-H) (#12277)
|
2025-11-26 16:28:52 -07:00 |
|