YAMY
|
cfead25bbf
|
[Qwen3.5] mamba slice fix (Prefill TP != Decode TP & decode TP size>1) (#20655)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-17 19:30:58 +08:00 |
|
AMD-yanfeiwang
|
966ae87d02
|
[AMD] avoid correction_bias_dtype dtype convert (#20692)
|
2026-03-17 02:55:05 -07:00 |
|
Liangsheng Yin
|
5270a06488
|
[Disagg] Fix health check false-positive in disagg is_fully_idle (#20756)
|
2026-03-17 17:18:54 +08:00 |
|
Duyi-Wang
|
385a35bd11
|
[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars. (#20647)
|
2026-03-17 01:13:42 -07:00 |
|
Junhao Liu
|
ee106757df
|
[diffusion] fix: fix Diffusers backend ignores model-specific sampling parameter (#20080)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-17 16:10:46 +08:00 |
|
akhilg-nv
|
9a697ceabb
|
[Fix #20389] Illegal memory access in triton attention for large token counts (#20390)
|
2026-03-17 00:42:11 -07:00 |
|
Ratish P
|
e3277b3be2
|
[diffusion]: remove stale offload-manager in LTX2 AV denoising (#20624)
|
2026-03-17 15:14:00 +08:00 |
|
DefTruth
|
025691cd9e
|
[diffusion] chore: bump up cache-dit & support quant for diffusers backend (#20361)
|
2026-03-17 12:51:31 +08:00 |
|
Rocky Song
|
079a1fd35e
|
[Bugfix] Fix write-through events not processed when scheduler is idle (#20560)
|
2026-03-16 21:49:59 -07:00 |
|
Shangming Cai
|
5d5c31c6e4
|
[PP] Add CP pyobj broadcasting when enable dynamic CPP (#20738)
|
2026-03-17 12:20:11 +08:00 |
|
MMuzzammil1
|
855ec7017d
|
Add check to provide hicache-storage-backend when enabling kv caching on Decode Side in PD Disaggregation (#20732)
Signed-off-by: Mohd Muzzammil <me.muzzammil@samsung.com>
|
2026-03-17 11:25:14 +08:00 |
|
Hubert Lu
|
943f34f642
|
Add NCCL/RCCL pre-warming to reduce P99 TTFT cold-start latency (#20477)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-16 20:23:14 -07:00 |
|
Jay Shaik
|
e4d06b3db2
|
Fix /generate JSON serialization for non-finite top_logprobs (#20714)
|
2026-03-16 20:07:12 -07:00 |
|
shuwenn
|
515b3a323d
|
feat: support human-readable suffixes (25.6k, 1M, 1Mi) for token CLI (#20577)
|
2026-03-16 20:05:33 -07:00 |
|
psaab
|
9f56b471aa
|
[Network] Use NetworkAddress for dist_init_method and loopback fallbacks (#20657)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-03-16 19:59:49 -07:00 |
|
Jason Yao
|
4dbec2dd2b
|
[typo] Fix typos in comments and log messages in common.py (#20723)
|
2026-03-16 19:26:59 -07:00 |
|
Qiaolin Yu
|
7d87a6a071
|
Fix spec v1 token_ids_logprobs (#20718)
|
2026-03-16 19:23:28 -07:00 |
|
Mick
|
474a851ae3
|
[diffusion] fix: fix sampling params incorrectly override in cli (#20689)
|
2026-03-17 08:48:10 +08:00 |
|
Mick
|
1eea744855
|
[diffusion] CI: enable UT (#20690)
|
2026-03-17 07:44:04 +08:00 |
|
roikoren755
|
5ef5806160
|
[Nemotron] Small reasoning parser fix (#20284)
|
2026-03-16 13:29:40 -07:00 |
|
Bruce Wu
|
70a6fb53af
|
Enable embedding lookup/lora_a logic for chunked backend (#17692)
Co-authored-by: Bruce Wu <mogicianwu@fb.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>
|
2026-03-16 11:37:58 -07:00 |
|
Douglas Yang
|
061ec582bf
|
fix: adding teacache.params back to sampling params as intended (#20665)
|
2026-03-16 11:27:06 -07:00 |
|
ybyang
|
289cbcf482
|
fix: support PP2+CP8+TP8 (PP with context parallelism) (#19548)
|
2026-03-16 16:51:47 +00:00 |
|
Xiaoyu Zhang
|
6489f77733
|
[Diffusion] Fix compile graph broken by flashinfer rope (#20699)
|
2026-03-16 23:14:27 +08:00 |
|
Du Bin
|
d3c0f4376a
|
Fix AssertionError crash in disagg prefill inflight queue with PP (#20686)
|
2026-03-16 22:38:59 +08:00 |
|
Xiaoyu Zhang
|
15097c5c3b
|
Release sglang kernel 0.4.0 (#20440)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-16 20:34:58 +08:00 |
|
sky
|
3d58cd16d9
|
[DP Attention] Optimize dp_padding_mode selection for dp_size=1 in extend mode (#20406)
Signed-off-by: wangfakang <fakangwang@gmail.com>
|
2026-03-16 18:44:42 +08:00 |
|
Xun Sun
|
549fbcc864
|
[5/N] (Elastic EP) Use GPU P2P to exchange expert weights during EPLB as much as possible (#12068)
Co-authored-by: Hank Han <hanhan.hank@bytedance.com>
Co-authored-by: Hank Han <hanhan7630@outlook.com>
|
2026-03-16 18:40:58 +08:00 |
|
Xiaoyu Zhang
|
3055b6906d
|
[Diffusion] Document torch.compile graph-break checks in diffusion benchmark skills (#20681)
|
2026-03-16 17:41:40 +08:00 |
|
Mick
|
485597e651
|
[diffusion] fix: fix some sampling args passed via cli are omitted (#20630)
|
2026-03-16 16:55:30 +08:00 |
|
Sugar920
|
895e56097c
|
Add NPU basic function testcases (#19382)
Co-authored-by: cy <chenyang08056032@163.com>
Co-authored-by: Cherry_ming <136634645@qq.com>
|
2026-03-16 15:09:56 +08:00 |
|
shuwenn
|
42f18fe560
|
[HiCache] fix: release write-through lock_ref during decode (#20049)
|
2026-03-16 14:49:31 +08:00 |
|
Ke Bao
|
39336f5812
|
Precompute swa cache location (#20449)
|
2026-03-16 14:38:08 +08:00 |
|
Zheng Wengang
|
135af6dc92
|
[EPD][VLM] support video/audio input (#17824)
Co-authored-by: siyu <liusy58@linux.alibaba.com>
|
2026-03-16 14:18:21 +08:00 |
|
Shangming Cai
|
738cbde902
|
[PD] Make pending reqs resolving more robust (#20505)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-16 14:12:13 +08:00 |
|
pansicheng
|
97b2a89334
|
[RadixTree][8/N Refactor]: unify lock interface (#20330)
|
2026-03-16 11:49:51 +08:00 |
|
Liangsheng Yin
|
f0458e0b49
|
[Utils] Move network/socket utilities from common.py to network.py (#20646)
|
2026-03-15 20:35:24 -07:00 |
|
Javier Torres
|
afc71bae3a
|
feat: Add 'none' reasoning effort to ChatCompletionRequest (#20556)
|
2026-03-15 20:25:48 -07:00 |
|
gaopengff
|
f4393bf3f6
|
Fix correctness test issue for bench_one_batch (#20650)
|
2026-03-15 20:05:36 -07:00 |
|
Xiaoyu Zhang
|
e1eb25880f
|
[Diffusion] Add a benchmark for rmsnorm/fuse_add_rmsnorm (#20632)
|
2026-03-16 09:50:33 +08:00 |
|
Zhirui
|
35c249b4de
|
[OpenAI] Log raw request payload for --log-requests (#20605)
|
2026-03-15 17:45:00 -07:00 |
|
Liangsheng Yin
|
d852f26cb6
|
Fix dual-stack socket handling: IPV6_V6ONLY, IPv4-first, is_port_available all-family check (#20643)
|
2026-03-15 17:17:23 -07:00 |
|
jellysnack
|
53f831691a
|
fix: propagate grammar errors and improve llguidance backend (#20467)
|
2026-03-15 16:11:18 -07:00 |
|
psaab
|
1145805e7d
|
Fix socket utilities and reserve_port for IPv6 dual-stack support (#20491)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-03-15 14:29:10 -07:00 |
|
Ke Bao
|
e2be31824f
|
[CI] Add ut coverage tool (#20628)
|
2026-03-15 21:13:45 +08:00 |
|
Yuhao Yang
|
1c456a0af5
|
VLM: add Conv2dLayer/Conv3dLayer to fix PyTorch 2.9.1 CuDNN Conv3d (#20282)
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2026-03-15 19:17:44 +08:00 |
|
Kit Fraser-Taliente
|
7c773ddb0a
|
[Fix] Slice input_embeds to extend_input_len in prepare_for_extend (#20376)
|
2026-03-15 00:07:05 -07:00 |
|
Juan Muneton
|
7458407437
|
Fix InternVL and vision attention for non-CUDA backends (e.g. XPU) (#19997)
Co-authored-by: Yang Wang <mr.yang.wang@outlook.com>
|
2026-03-14 23:24:41 -07:00 |
|
shuwenn
|
1ac6a26464
|
fix: Nemotron chunk size alias (#20458)
|
2026-03-14 23:23:39 -07:00 |
|
Liangsheng Yin
|
fc7f9c1de7
|
Rename --stream-output to --incremental-streaming-output (#20614)
|
2026-03-14 23:22:33 -07:00 |
|