Mick
|
9a0fd2ff0c
|
[diffusion] optimize: default to in-memory loading for URL/base64 image inputs (#23118)
|
2026-04-20 23:29:02 +08:00 |
|
Mick
|
0be6ab04dd
|
[diffusion] refactor: LTX2.3 code cleanup (#23207)
|
2026-04-20 19:02:05 +08:00 |
|
Vladislav Nosivskoy
|
4028a73c10
|
[KV-Events] Fix kv events events publishing for CP (#22983)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-04-20 17:34:38 +08:00 |
|
Bingxu Chen
|
69eb95f20c
|
[AMD] Pin peft<0.19 in pyproject_other.toml to fix ROCm CI ImportError (#23161)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-19 23:43:56 -07:00 |
|
Liangsheng Yin
|
a2d30d27fe
|
wait for reap in kill_process_tree (#23213)
|
2026-04-19 23:36:33 -07:00 |
|
Bingxu Chen
|
ab936ce694
|
Revert "perf: optimize PCG inductor path for FP8 models (#21734)" (#23159)
Feel free to PR again.
|
2026-04-19 23:32:50 -07:00 |
|
Alex Nails
|
10e17cc55e
|
[gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736)
|
2026-04-20 12:39:35 +08:00 |
|
Baizhou Zhang
|
c304d0d64d
|
[Refactor] Deduplicate NSA utils.py into cp_utils.py for context parallel (#22914)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-19 21:35:35 -07:00 |
|
Liangsheng Yin
|
eb76aaba88
|
[core] Always-on StreamingSession in UnifiedRadixCache (#23202)
|
2026-04-19 21:19:43 -07:00 |
|
Shunkangz
|
e389a52cc8
|
Support allreduce fusion with cp (#21249)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-04-19 21:06:00 -07:00 |
|
Liangsheng Yin
|
a7276b623e
|
integrate streaming session into UnifiedRadixCache (#23145)
|
2026-04-19 20:47:41 -07:00 |
|
Byron Hsu
|
1cff871c67
|
[Bugfix] Fix DeepEP timeout when compiling DeepGeMM in EP+DP+TP (#23185)
Co-authored-by: Byron Hsu <byronhsu@Byrons-MacBook-Pro.local>
Co-authored-by: Cheng Wan <ch-wan@users.noreply.github.com>
|
2026-04-19 17:36:11 -07:00 |
|
Liangsheng Yin
|
d3ce664612
|
move session to python/sglang/srt/session (#23144)
|
2026-04-19 17:34:19 -07:00 |
|
Baidu-AIAK
|
7ca3566130
|
Multi platform Plugin (#21388)
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com>
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
Co-authored-by: Alex Nails <alexj.nails@gmail.com>
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 17:23:51 -07:00 |
|
inkcherry
|
4fa3482180
|
[Bugfix] Add missing http_worker_ipc in session error path (#22766)
|
2026-04-19 12:47:09 -07:00 |
|
billishyahao
|
b74a9dd854
|
[AMD] fix tbo runtime error when initializing metadata for cuda graph (#22598)
|
2026-04-19 12:42:48 -07:00 |
|
Baizhou Zhang
|
6ecd6f84db
|
[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 05:32:36 -07:00 |
|
Thomas Wang
|
03828f4205
|
[AMD] Reduce NSA indexer kernels (weights_proj, k-cache store kernel fusion) (#22850)
|
2026-04-19 00:18:11 -07:00 |
|
Kehan Li
|
2a327f0877
|
Fix Qwen3.5 video processing when passing video_data in "processor_output" format (#22431)
|
2026-04-19 00:04:01 +08:00 |
|
Xiaoyu Zhang
|
cd6ad80c00
|
diffusion: add HunyuanVideo GroupNorm+SiLU fast path (#22814)
|
2026-04-18 23:38:49 +08:00 |
|
Xiaoyu Zhang
|
c6a45fab64
|
Qwen3next flashinfer allreduce auto enable (#22664)
|
2026-04-18 22:32:41 +08:00 |
|
Yisheng Gong
|
4839cecbb0
|
[main] chore: add bias for base layer with lora (#22169)
|
2026-04-18 02:07:02 -07:00 |
|
Mick
|
0d94c3366a
|
[diffusion] feat: introduce ltx-2-two-stage device manager (#22869)
|
2026-04-18 11:04:33 +08:00 |
|
Xiaoyu Zhang
|
615d6c93b2
|
[codex] Add flashinfer TRTLLM backend for diffusion NVFP4 (#22717)
|
2026-04-18 09:06:28 +08:00 |
|
Lianmin Zheng
|
9c47bbad13
|
Clean up bench_one_batch warning and simplify norm dispatch (#23110)
|
2026-04-17 17:42:20 -07:00 |
|
Cheng Wan
|
5f7aee726a
|
refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-17 17:05:13 -07:00 |
|
R0CKSTAR
|
26ae7b8bd7
|
[MLX] Support radix cache (#21509)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-04-18 07:00:50 +08:00 |
|
Liangsheng Yin
|
09b689b407
|
Apply HF transformers patches from sglang init (#23103)
|
2026-04-17 15:37:51 -07:00 |
|
Liangsheng Yin
|
573e12a7fc
|
Merge /get_load into /v1/loads (#23010)
|
2026-04-17 13:36:51 -07:00 |
|
Lianmin Zheng
|
44e67c6835
|
Remove deprecated double sparsity feature (#23009)
|
2026-04-17 13:33:12 -07:00 |
|
andyluo7
|
9df6107dca
|
[AMD] Enable DFLASH speculative decoding on ROCm (#22342)
Signed-off-by: Andy Luo <andyluo7@users.noreply.github.com>
Co-authored-by: Andy Luo <andyluo7@users.noreply.github.com>
|
2026-04-17 13:10:14 -07:00 |
|
shuwenn
|
90c76d665e
|
[HiCache] fix: HiCacheFile component key suffixing (#22891)
Co-authored-by: Zhangheng <hzh0425@apache.org>
|
2026-04-17 13:06:28 -07:00 |
|
YC Yen-Ching Tseng
|
5d4e899477
|
[AMD] Fix AMD Multimodal Test - skip nvfp4 tests (#23045)
|
2026-04-17 09:02:39 -07:00 |
|
Jincong Chen
|
2bac219d0c
|
[Perf] Precompute gemma_weight to avoid redundant add on every forward (#22673)
|
2026-04-17 23:37:41 +08:00 |
|
Xiaoyu Zhang
|
83c5119d01
|
[diffusion] CI: fix ModelOpt B200 CI artifact coverage (#22955)
|
2026-04-17 23:33:42 +08:00 |
|
Mick
|
5de89ea942
|
[diffusion] CI: fix auto-partition (#23076)
|
2026-04-17 22:37:24 +08:00 |
|
Opher Lieber
|
6e3bbef568
|
expose num_embeddings in VocabParallelEmbeddingWithLoRA (#22547)
|
2026-04-17 02:35:13 -07:00 |
|
Jonah Bernard
|
0d031335ed
|
[Pipeline Parallelism][Bug] Fix scheduler hang in pipeline parallelism setup (#23006)
|
2026-04-17 14:50:47 +08:00 |
|
Duyi-Wang
|
8c190f6b91
|
[AMD] Add SGLANG_MORI_MOE_MAX_INPUT_TOKENS to truncate dispatch before MoE. (#22952)
|
2026-04-16 23:40:15 -07:00 |
|
RichardoMu
|
7390eddf28
|
feat(observability): add OpenTelemetry tracing for speculative decoding (#19545)
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
|
2026-04-17 14:01:58 +08:00 |
|
narutolhy
|
5fa0c6a52e
|
Allow piecewise CUDA graph with speculative decoding (#22128)
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-17 13:39:30 +08:00 |
|
Xiaoyu Zhang
|
91679d935d
|
[codex] Update diffusion skills (#23028)
|
2026-04-17 13:29:26 +08:00 |
|
blzheng
|
0dcfae5553
|
[CPU] Add gemma4_rmsnorm_cpu kernel (#22842)
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-17 13:03:16 +08:00 |
|
YC Yen-Ching Tseng
|
f0f0148167
|
Revert "feat: Support MXFP4 quantized dense models on AMD CDNA2/CDNA3 GPUs (#19143)" (#23031)
|
2026-04-16 21:53:25 -07:00 |
|
Zhangheng
|
7d47f40a96
|
[UnifiedRadixTree]: Add HiCache hook interface for TreeComponent (#22924)
|
2026-04-17 12:09:41 +08:00 |
|
Byron Hsu
|
cf9845f8e3
|
[Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank (#22990)
Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
|
2026-04-17 11:10:31 +08:00 |
|
Jan Bernlöhr
|
04a53955b9
|
feat: add coordinated checkpoint prefetch for network filesystem loading (#20843)
|
2026-04-16 20:08:19 -07:00 |
|
Yuhao Yang
|
a77abbe005
|
[VLM] Reduce GPU memory footprint of CUDA IPC MM feature transport (#22662)
|
2026-04-17 10:38:36 +08:00 |
|
Yuxuan Zhang
|
16d11c2a10
|
Fix for the low-probability garbled output issue in the GLM-5 series models. (#22811)
|
2026-04-17 09:52:13 +08:00 |
|
Makcum888e
|
e353630b57
|
[Diffusion] [NPU] Fix multimodal gen CI (#22879)
|
2026-04-17 04:09:44 +03:00 |
|