Commit Graph

7834 Commits

Author SHA1 Message Date
Mick
9a0fd2ff0c [diffusion] optimize: default to in-memory loading for URL/base64 image inputs (#23118) 2026-04-20 23:29:02 +08:00
Mick
0be6ab04dd [diffusion] refactor: LTX2.3 code cleanup (#23207) 2026-04-20 19:02:05 +08:00
Vladislav Nosivskoy
4028a73c10 [KV-Events] Fix kv events events publishing for CP (#22983)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
2026-04-20 17:34:38 +08:00
Bingxu Chen
69eb95f20c [AMD] Pin peft<0.19 in pyproject_other.toml to fix ROCm CI ImportError (#23161)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-19 23:43:56 -07:00
Liangsheng Yin
a2d30d27fe wait for reap in kill_process_tree (#23213) 2026-04-19 23:36:33 -07:00
Bingxu Chen
ab936ce694 Revert "perf: optimize PCG inductor path for FP8 models (#21734)" (#23159)
Feel free to PR again.
2026-04-19 23:32:50 -07:00
Alex Nails
10e17cc55e [gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736) 2026-04-20 12:39:35 +08:00
Baizhou Zhang
c304d0d64d [Refactor] Deduplicate NSA utils.py into cp_utils.py for context parallel (#22914)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 21:35:35 -07:00
Liangsheng Yin
eb76aaba88 [core] Always-on StreamingSession in UnifiedRadixCache (#23202) 2026-04-19 21:19:43 -07:00
Shunkangz
e389a52cc8 Support allreduce fusion with cp (#21249)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-04-19 21:06:00 -07:00
Liangsheng Yin
a7276b623e integrate streaming session into UnifiedRadixCache (#23145) 2026-04-19 20:47:41 -07:00
Byron Hsu
1cff871c67 [Bugfix] Fix DeepEP timeout when compiling DeepGeMM in EP+DP+TP (#23185)
Co-authored-by: Byron Hsu <byronhsu@Byrons-MacBook-Pro.local>
Co-authored-by: Cheng Wan <ch-wan@users.noreply.github.com>
2026-04-19 17:36:11 -07:00
Liangsheng Yin
d3ce664612 move session to python/sglang/srt/session (#23144) 2026-04-19 17:34:19 -07:00
Baidu-AIAK
7ca3566130 Multi platform Plugin (#21388)
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com>
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
Co-authored-by: Alex Nails <alexj.nails@gmail.com>
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-19 17:23:51 -07:00
inkcherry
4fa3482180 [Bugfix] Add missing http_worker_ipc in session error path (#22766) 2026-04-19 12:47:09 -07:00
billishyahao
b74a9dd854 [AMD] fix tbo runtime error when initializing metadata for cuda graph (#22598) 2026-04-19 12:42:48 -07:00
Baizhou Zhang
6ecd6f84db [CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-19 05:32:36 -07:00
Thomas Wang
03828f4205 [AMD] Reduce NSA indexer kernels (weights_proj, k-cache store kernel fusion) (#22850) 2026-04-19 00:18:11 -07:00
Kehan Li
2a327f0877 Fix Qwen3.5 video processing when passing video_data in "processor_output" format (#22431) 2026-04-19 00:04:01 +08:00
Xiaoyu Zhang
cd6ad80c00 diffusion: add HunyuanVideo GroupNorm+SiLU fast path (#22814) 2026-04-18 23:38:49 +08:00
Xiaoyu Zhang
c6a45fab64 Qwen3next flashinfer allreduce auto enable (#22664) 2026-04-18 22:32:41 +08:00
Yisheng Gong
4839cecbb0 [main] chore: add bias for base layer with lora (#22169) 2026-04-18 02:07:02 -07:00
Mick
0d94c3366a [diffusion] feat: introduce ltx-2-two-stage device manager (#22869) 2026-04-18 11:04:33 +08:00
Xiaoyu Zhang
615d6c93b2 [codex] Add flashinfer TRTLLM backend for diffusion NVFP4 (#22717) 2026-04-18 09:06:28 +08:00
Lianmin Zheng
9c47bbad13 Clean up bench_one_batch warning and simplify norm dispatch (#23110) 2026-04-17 17:42:20 -07:00
Cheng Wan
5f7aee726a refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 17:05:13 -07:00
R0CKSTAR
26ae7b8bd7 [MLX] Support radix cache (#21509)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-04-18 07:00:50 +08:00
Liangsheng Yin
09b689b407 Apply HF transformers patches from sglang init (#23103) 2026-04-17 15:37:51 -07:00
Liangsheng Yin
573e12a7fc Merge /get_load into /v1/loads (#23010) 2026-04-17 13:36:51 -07:00
Lianmin Zheng
44e67c6835 Remove deprecated double sparsity feature (#23009) 2026-04-17 13:33:12 -07:00
andyluo7
9df6107dca [AMD] Enable DFLASH speculative decoding on ROCm (#22342)
Signed-off-by: Andy Luo <andyluo7@users.noreply.github.com>
Co-authored-by: Andy Luo <andyluo7@users.noreply.github.com>
2026-04-17 13:10:14 -07:00
shuwenn
90c76d665e [HiCache] fix: HiCacheFile component key suffixing (#22891)
Co-authored-by: Zhangheng <hzh0425@apache.org>
2026-04-17 13:06:28 -07:00
YC Yen-Ching Tseng
5d4e899477 [AMD] Fix AMD Multimodal Test - skip nvfp4 tests (#23045) 2026-04-17 09:02:39 -07:00
Jincong Chen
2bac219d0c [Perf] Precompute gemma_weight to avoid redundant add on every forward (#22673) 2026-04-17 23:37:41 +08:00
Xiaoyu Zhang
83c5119d01 [diffusion] CI: fix ModelOpt B200 CI artifact coverage (#22955) 2026-04-17 23:33:42 +08:00
Mick
5de89ea942 [diffusion] CI: fix auto-partition (#23076) 2026-04-17 22:37:24 +08:00
Opher Lieber
6e3bbef568 expose num_embeddings in VocabParallelEmbeddingWithLoRA (#22547) 2026-04-17 02:35:13 -07:00
Jonah Bernard
0d031335ed [Pipeline Parallelism][Bug] Fix scheduler hang in pipeline parallelism setup (#23006) 2026-04-17 14:50:47 +08:00
Duyi-Wang
8c190f6b91 [AMD] Add SGLANG_MORI_MOE_MAX_INPUT_TOKENS to truncate dispatch before MoE. (#22952) 2026-04-16 23:40:15 -07:00
RichardoMu
7390eddf28 feat(observability): add OpenTelemetry tracing for speculative decoding (#19545)
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
2026-04-17 14:01:58 +08:00
narutolhy
5fa0c6a52e Allow piecewise CUDA graph with speculative decoding (#22128)
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 13:39:30 +08:00
Xiaoyu Zhang
91679d935d [codex] Update diffusion skills (#23028) 2026-04-17 13:29:26 +08:00
blzheng
0dcfae5553 [CPU] Add gemma4_rmsnorm_cpu kernel (#22842)
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-04-17 13:03:16 +08:00
YC Yen-Ching Tseng
f0f0148167 Revert "feat: Support MXFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#19143)" (#23031) 2026-04-16 21:53:25 -07:00
Zhangheng
7d47f40a96 [UnifiedRadixTree]: Add HiCache hook interface for TreeComponent (#22924) 2026-04-17 12:09:41 +08:00
Byron Hsu
cf9845f8e3 [Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank (#22990)
Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
2026-04-17 11:10:31 +08:00
Jan Bernlöhr
04a53955b9 feat: add coordinated checkpoint prefetch for network filesystem loading (#20843) 2026-04-16 20:08:19 -07:00
Yuhao Yang
a77abbe005 [VLM] Reduce GPU memory footprint of CUDA IPC MM feature transport (#22662) 2026-04-17 10:38:36 +08:00
Yuxuan Zhang
16d11c2a10 Fix for the low-probability garbled output issue in the GLM-5 series models. (#22811) 2026-04-17 09:52:13 +08:00
Makcum888e
e353630b57 [Diffusion] [NPU] Fix multimodal gen CI (#22879) 2026-04-17 04:09:44 +03:00