Jun Liu
|
65ce9965ce
|
[Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (#22715)
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
|
2026-05-06 19:20:04 -07:00 |
|
Baizhou Zhang
|
ecb786c8d7
|
[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (#24268)
|
2026-05-06 18:59:01 -07:00 |
|
Liangsheng Yin
|
eaf074d50e
|
propagate pytest exit code from test __main__ entries (#24487)
|
2026-05-06 18:46:52 -07:00 |
|
Yuzhen Zhou
|
4a279d9c36
|
[R3] Avoid implicit CUDA sync in routed experts DP slicing (#24550)
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2026-05-06 18:37:36 -07:00 |
|
huangtingwei
|
27445f9836
|
Add ChatCompletionRequest-style support to /v1/tokenize (#23981)
|
2026-05-06 18:35:20 -07:00 |
|
Brayden Zhong
|
3fe8bc987e
|
Support Triton MLA FP8 KV cache (#20479)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
|
2026-05-06 18:32:39 -07:00 |
|
Mick
|
2e642ea187
|
[diffusion] chore: align LTX-2 with official (#24313)
|
2026-05-07 08:46:28 +08:00 |
|
Xiaoyu Zhang
|
a9a8b20a90
|
[codex] Optimize Z-Image packed QKV (#24117)
|
2026-05-07 07:51:22 +08:00 |
|
gh1595
|
ece7e95b65
|
[LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (#24420)
Co-authored-by: Yanbin Jiang <jybsuper@gmail.com>
|
2026-05-06 14:51:30 -07:00 |
|
Lianmin Zheng
|
b859f7ffba
|
Improve metrics, observability, and PD deploy tooling (#24521)
|
2026-05-06 11:27:35 -07:00 |
|
Xiaoyu Zhang
|
d86f2916cc
|
Fix diffusion fallback guards and validation (#23335)
|
2026-05-07 00:05:43 +08:00 |
|
Shangming Cai
|
32d9998b9d
|
[PD] Prevent update_status to Failed from cleared entries (#24539)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-05-06 23:32:04 +08:00 |
|
sky
|
bfc1aeae13
|
[CP] Register KV cache allgather buffer with symmetric memory (#24040)
Signed-off-by: wangfakang <fakangwang@gmail.com>
|
2026-05-06 23:24:36 +08:00 |
|
fzyzcjy
|
c4c5541618
|
Support getting checksums in weight checker (#24537)
|
2026-05-06 22:59:28 +08:00 |
|
fzyzcjy
|
ae5ae840f6
|
Refactor buffer patterns in weight checker (#24538)
|
2026-05-06 22:52:07 +08:00 |
|
Ke Bao
|
eb5f0fbeef
|
Support swa HiCache for unified radix cache (#23391)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-05-06 22:19:25 +08:00 |
|
fzyzcjy
|
491051c622
|
Cherry pick weight_checker _weight_fp32 buffer skip from #22663 (#24534)
Co-authored-by: JD <jaedon.guo@gmail.com>
|
2026-05-06 21:21:12 +08:00 |
|
fzyzcjy
|
0d40931b08
|
Cherry pick weight_checker non-persistent buffer pattern list from #21278 (#24533)
Co-authored-by: JD <jaedon.guo@gmail.com>
|
2026-05-06 21:14:01 +08:00 |
|
fzyzcjy
|
864f9633f2
|
Cherry pick weight_checker fp8 dequant fix and non-persistent buffer skip from #21494 (#24532)
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
|
2026-05-06 21:13:17 +08:00 |
|
Lianmin Zheng
|
d4d4b04d66
|
[PD] Fix missing update_status call in abort() across all KV backends (#24522)
|
2026-05-06 05:30:11 -07:00 |
|
cctry
|
163bf1ba71
|
[PD] Fix KV transfer metrics (#24416)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-05-06 03:44:48 -07:00 |
|
Xiaoyu Zhang
|
b67df7cd1b
|
[Codex] Diffusion handle non-contiguous CFG communication (#24332)
Co-authored-by: BBuf Codex <bbuf-codex@users.noreply.github.com>
|
2026-05-06 17:27:14 +08:00 |
|
sky
|
c8bc23522f
|
Refactor: decouple segment tracking from comm registration (#21392)
Signed-off-by: wangfakang <fakangwang@gmail.com>
|
2026-05-06 17:07:58 +08:00 |
|
fzyzcjy
|
a858fda708
|
Add e2e test with log snapshot in dumper grafter (#24513)
|
2026-05-06 17:00:13 +08:00 |
|
fzyzcjy
|
8527db0a91
|
Enhance diff and tensor-info logging in dumper grafter (#24512)
|
2026-05-06 16:58:08 +08:00 |
|
fzyzcjy
|
75943cfbcf
|
Support per-call extras and dataclass transform input in dumper grafter (#24511)
|
2026-05-06 16:57:44 +08:00 |
|
fzyzcjy
|
833279eb2e
|
Support multi-rank exchange via all_gather_object in dumper grafter (#24510)
|
2026-05-06 16:57:20 +08:00 |
|
fzyzcjy
|
ebd64f5d40
|
Support user-supplied recv-side transform in dumper grafter (#24509)
|
2026-05-06 16:56:52 +08:00 |
|
fzyzcjy
|
9a65f0ac26
|
Support t2b direction and overlap protection in dumper grafter (#24508)
|
2026-05-06 16:56:24 +08:00 |
|
fzyzcjy
|
58487e68e5
|
Support cross-system tensor grafting in dumper (#24507)
|
2026-05-06 16:55:40 +08:00 |
|
fzyzcjy
|
61104d7d0a
|
Add prefixed _log helper in dumper (#24506)
|
2026-05-06 16:54:20 +08:00 |
|
Mick
|
fbebfdec9a
|
[diffusion] fix: fix diffusion FSDP sharding (#24431)
|
2026-05-06 14:55:51 +08:00 |
|
cctry
|
660a77f221
|
Silence noisy health-check race log in TokenizerManager (#24466)
|
2026-05-05 21:06:43 -07:00 |
|
ybyang
|
3da87902d7
|
[HiSparse] Support FP8 KV cache by routing to flashmla_kv backend (#23013)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
|
2026-05-06 03:18:30 +00:00 |
|
Night
|
b2420d72ff
|
[RL] DeepEP support for --enable-return-routed-experts (#16859)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com>
|
2026-05-05 20:01:07 -07:00 |
|
Xiaoyu Zhang
|
d7385b575f
|
[Diffusion] Optimize Hunyuan3D shape denoising (#24287)
|
2026-05-06 10:10:09 +08:00 |
|
Jianhong Zhang
|
c7019ff33d
|
[NIXL][XPU] Use np.uint64 for pointer/length arrays in disaggregation KV transfer (#24188)
|
2026-05-06 10:09:03 +08:00 |
|
Lianmin Zheng
|
b91b05ae27
|
Add --random-input-len to send_one.py (#24464)
|
2026-05-05 17:49:33 -07:00 |
|
cctry
|
22cf7d2b42
|
[Fix] Handle nixlRemoteDisconnectError in NixlKVSender (#24296)
|
2026-05-05 17:23:42 -07:00 |
|
Lianmin Zheng
|
64f80eabbe
|
Register aten::rms_norm and aten::mm.dtype in batch invariant mode (#24459)
|
2026-05-05 17:21:34 -07:00 |
|
Lianmin Zheng
|
46bde1f426
|
Add fwd_occupancy metric to SchedulerStats and Prometheus collector (#24458)
|
2026-05-05 17:04:34 -07:00 |
|
Xinyuan Tong
|
1e404afec2
|
fix(req_pool): bump pool.size to match actual tensor row count after #24243 (#24439)
|
2026-05-05 16:58:26 -07:00 |
|
Lianmin Zheng
|
710fed10fb
|
Revert "[fix] /pause_generation and /continue_generation wrong for --tokenizer-worker-num > 1" (#24461)
|
2026-05-05 16:44:34 -07:00 |
|
maocheng23
|
431ca54334
|
[fix] /pause_generation and /continue_generation wrong for --tokenizer-worker-num > 1 (#24445)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-05 16:27:58 -07:00 |
|
Liangsheng Yin
|
08d4c2072b
|
move topk capturers to srt/state_capturer/ (#24450)
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: Ziang Li <ziangli@umich.edu>
|
2026-05-05 15:54:01 -07:00 |
|
Liangsheng Yin
|
47a416fc62
|
add indexer-topk capture (V3.2 NSA + infra) (#24392)
|
2026-05-05 15:05:15 -07:00 |
|
Liangsheng Yin
|
c4c0376fcb
|
consolidate routed-experts capturer onto reusable base (#24403)
|
2026-05-05 12:41:49 -07:00 |
|
Mick
|
d23ef408f7
|
[diffusion] fix: fix RowParallel LoRA merged forwarding (#24410)
|
2026-05-06 00:30:16 +08:00 |
|
Mick
|
cc54d8e8d0
|
[diffusion] chore: clean CUDA cache only at explicit release points (#24397)
|
2026-05-05 22:30:43 +08:00 |
|
Polisetty V R K Jyothendra Varma
|
fdfc46f3a5
|
[Intel GPU] Enable DeepSeek V3.2 inference on XPU (#24356)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
|
2026-05-05 20:47:40 +08:00 |
|