Lianmin Zheng
|
8536d4b402
|
Clean up noisy startup warnings from third-party deps (#23669)
|
2026-04-27 03:10:46 -07:00 |
|
Shenxiu Liu
|
a3fc982ba7
|
[Whisper] Automatic language detection via structured generation (#22997)
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-04-27 15:54:41 +08:00 |
|
Xiaoyu Zhang
|
5f47cae1a0
|
add H100 configs for GLM-4.7-Flash (#23719)
|
2026-04-27 15:07:39 +08:00 |
|
Colin Z
|
d49561b8ae
|
[AMD] Fix Kimi-K2.6 Quark MXFP4 loading prefix and packed module mapping (#23408)
|
2026-04-26 23:56:15 -07:00 |
|
Praneth Paruchuri
|
b7113cadb1
|
[Bug Fix] Reject pp_max_micro_batch_size=0 to prevent silent deadlock on generate() (#23799)
|
2026-04-27 13:36:04 +08:00 |
|
Xinyuan Tong
|
e5198386bd
|
Upgrade transformers from 5.5.4 to 5.6.0 (#23525)
|
2026-04-26 22:33:54 -07:00 |
|
Zheng Wengang
|
91825b8808
|
[FEAT][EPD] support encoder real health (#23343)
|
2026-04-27 13:21:28 +08:00 |
|
AMD-yanfeiwang
|
5141d8ae21
|
[AMD]fix: use CUDA event for targeted draft-to-verify sync in EAGLE overlap (#21940)
|
2026-04-26 21:58:34 -07:00 |
|
Bingxu Chen
|
d84470079d
|
[AMD] Fix Grok-2 nightly: avoid multimodal misdetection from auto-populated vision_config (#23383)
|
2026-04-26 21:54:36 -07:00 |
|
Jia Guo
|
bead2e3470
|
perf: optimize PCG inductor path for FP8 models (redo of #21734) (#23227)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-26 20:34:27 -07:00 |
|
Byron Hsu
|
85376a6119
|
refactor(moe): centralize post-experts all-reduce skip predicate (#23748)
Co-authored-by: Byron Hsu <byron@periodiclabs.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-26 20:29:59 -07:00 |
|
iridiumine
|
32c3513816
|
[NPU] Support MTP for Qwen3.5 (#20918)
|
2026-04-27 10:44:17 +08:00 |
|
Kangyan-Zhou
|
35591c7d51
|
fix(lora): don't assert on non-LoRA lm_head adapter weights (#23433)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-26 12:10:07 -07:00 |
|
Mick
|
a392ae8879
|
[diffusion] feat: accelerate multiple-outputs generation (#23759)
|
2026-04-27 01:47:33 +08:00 |
|
jianan-gu
|
10fd0faccd
|
[CPU] Add Qwen3.5 model optimization for CPU (#19484)
Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-04-26 10:12:36 -07:00 |
|
Liwansi
|
7d49564431
|
[NPU]Fix support_triton bug (#23604)
|
2026-04-26 21:34:56 +08:00 |
|
Cheng Wan
|
c7878dbb6d
|
[MoE] Deprecate act_and_mul_triton; fold filter_expert into JIT silu/gelu_and_mul (#23707)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-26 01:41:35 -07:00 |
|
Mick
|
d49a0377de
|
[diffusion] refactor: make timestep scheduler request-local (#23716)
|
2026-04-26 15:59:53 +08:00 |
|
sglang-bot
|
9003f24e2b
|
chore: bump sglang-kernel version to 0.4.1.post1 (#23733)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-25 23:23:49 -07:00 |
|
Byron Hsu
|
ba4e9d2ac2
|
Apply should_use_dp_reduce_scatterv guard to remaining MoE models (follow-up to #23731) (#23732)
Co-authored-by: Byron Hsu <byronhsu@noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
|
2026-04-25 20:36:16 -07:00 |
|
Byron Hsu
|
71029abd64
|
Fix Qwen3 MoE: also guard EP all-reduce with not use_reduce_scatter (follow-up to #23731) (#23734)
Co-authored-by: Byron Hsu <byron@periodiclabs.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-25 20:35:52 -07:00 |
|
Byron Hsu
|
99b59b279c
|
Fix Qwen3 MoE double-reduce when DP attention + EP + reduce_scatterv (#23729) (#23731)
Co-authored-by: Byron Hsu <byronhsu@noreply.github.com>
|
2026-04-25 15:28:28 -07:00 |
|
AlbeeSo
|
e0a4522370
|
[typo] fix typo in parallel_state (#23710)
|
2026-04-25 09:33:33 -07:00 |
|
Mick
|
03849496ad
|
jit_kernel: tolerate FA3 kernels without out arg (#23717)
|
2026-04-25 23:42:33 +08:00 |
|
1874.
|
046c14a3ed
|
[NPU] Support GGUF quantization for Ascend NPU (dense + MoE) (#17883)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-04-25 17:16:47 +03:00 |
|
gjsheu
|
e708ea6d94
|
[diffusion] fix: restore cache-dit support for LTX2 (#23235)
Co-authored-by: gengjinsong <gengjinsong@huawei.com>
|
2026-04-25 18:10:43 +08:00 |
|
Aleksi Vesanto
|
50ce2708ca
|
[diffusion] fix: Fix FLUX.1/2 graph breaks (#23648)
|
2026-04-25 17:54:52 +08:00 |
|
kk
|
393252f514
|
[AMD] fused qk gemma norm kernels to reduce four kernels (#23575)
Co-authored-by: root <root@smci355-ccs-aus-g12-26.cs-aus.dcgpu>
|
2026-04-25 00:30:01 -07:00 |
|
Артем Савкин
|
bd523dd60d
|
[NPU] [Bugfix] [Diffusion] Fixed gray images at the generation output (#23266)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-04-25 10:20:38 +03:00 |
|
Yujing
|
6175946db7
|
[Feature]Add MSProbe dump support in SGLang (#18349)
|
2026-04-25 10:12:50 +03:00 |
|
Yujun Dong
|
21835fb0af
|
[HiCache] Prevent move_hybrid_indices from polluting radix-tree node host state (#23427)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-04-25 14:27:42 +08:00 |
|
DarkSharpness
|
82254bd9c5
|
[JIT Kernel] Reland JIT activation (#22094)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Cheng Wan <chwan@rice.edu>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-24 23:00:28 -07:00 |
|
YC Yen-Ching Tseng
|
adc59325bc
|
[AMD] Optimize MiniMax-M2.5 - enable fused Triton kernel for FP8 KV cache write in aiter decode path (#23620)
|
2026-04-24 22:23:49 -07:00 |
|
YC Yen-Ching Tseng
|
fb272d27db
|
[AMD] Optimize MiniMax-M2.5 - use aiter biased_grouped_topk for sigmoid scoring in MoE routing (#23611)
|
2026-04-24 22:18:08 -07:00 |
|
Shenxiu Liu
|
8471c9ebe6
|
Skip torch.cuda.empty_cache() in weight update flush path (#22998)
|
2026-04-25 12:41:39 +08:00 |
|
Yuhao Yang
|
4a3fe2a091
|
model: support parakeet nemotron encoder (#23568)
Co-authored-by: trangdough <trangtdo22@gmail.com>
|
2026-04-25 11:00:23 +08:00 |
|
Jackey Hua
|
465abadd3c
|
Add fused moe triton config for Qwen3.5-397B-A17B-FP8 (#23682)
|
2026-04-24 18:35:32 -07:00 |
|
Xinyi Song
|
76da28f6d6
|
[AMD][bugfix] add gate rocm >= 7.2 for bpreshuffle (#23671)
|
2026-04-24 13:26:16 -07:00 |
|
Jia Guo
|
587fd15bd2
|
perf: eliminate attention DtoD copy by passing pre-allocated output to FA (#21985)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-24 12:05:16 -07:00 |
|
Xinyuan Tong
|
6d03861476
|
support Hy3 preview (#23533)
Co-authored-by: pengmeng <pengmeng@tencent.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
Co-authored-by: chengvjiang <chengvjiang@tencent.com>
Co-authored-by: russellfeng <russellfeng@tencent.com>
|
2026-04-24 12:03:24 -07:00 |
|
Lianmin Zheng
|
6344b546c8
|
Deprecate --collect-tokens-histogram, auto-collect with --enable-metrics (#23595)
|
2026-04-24 12:00:16 -07:00 |
|
Mick
|
05696527ea
|
[diffusion] feat: support LoRA for LTX2.3 (#23649)
|
2026-04-25 01:52:41 +08:00 |
|
Kang Yifei
|
baa0aa670f
|
[HiCache & HybridModel] 3FS backend support DSA & mamba model (#23241)
Co-authored-by: 墨已 <kangyifei.kyf@alibaba-inc.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-04-25 00:48:01 +08:00 |
|
Kangrui Du
|
92d262f710
|
[diffusion] RL: add per-step rollout options for SDE and trajectory capture (#23151)
|
2026-04-24 23:26:16 +08:00 |
|
Siju Samuel
|
bca3dd958a
|
[Intel GPU] Enable pipeline parallelism on XPU (#23645)
|
2026-04-24 19:52:44 +08:00 |
|
Yuwei An
|
60bbb800db
|
[Experimental] Breakable Piecewise Cuda Graph (#22218)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-24 04:33:05 -07:00 |
|
Mick
|
b3b03369a5
|
[diffusion] fix: unify LTX-2.3 HQ codepath gates for all LTX-2.3 variants (#23624)
|
2026-04-24 17:44:08 +08:00 |
|
Shangming Cai
|
b8d883398d
|
Revert "[Intel GPU] Enable pipeline parallelism on XPU" (#23641)
|
2026-04-24 17:36:35 +08:00 |
|
Hubert Lu
|
4cb0c4e1f3
|
[AMD] Fix memory access fault when --page-size > 1 with speculative decoding on AMD GPUs (#23596)
|
2026-04-23 23:56:36 -07:00 |
|
Mick
|
cd1fa7506a
|
[diffusion] model: support LTX2.3 high quality pipeline (#23366)
|
2026-04-24 14:18:20 +08:00 |
|