Lianmin Zheng
|
f81b6e8f51
|
[Misc] Add @cache_once to is_arch_support_pdl in jit_kernel (#22724)
|
2026-04-13 14:42:49 -07:00 |
|
Baizhou Zhang
|
b441317aa4
|
Revert "Upgrade CI default CUDA version from 12.9 to 13.0" (#22727)
|
2026-04-13 14:39:24 -07:00 |
|
Lianmin Zheng
|
ba7bcca6b3
|
Use reshape instead of contiguous().view() in TRTLLMHAAttnBackend (#22517)
|
2026-04-13 14:29:12 -07:00 |
|
Kurt Shuster
|
ff13dfee45
|
[lora][moe] Virtual experts for LoRA MoE (#22122)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
|
2026-04-13 21:19:30 +00:00 |
|
ishandhanani
|
6b2bf66cd9
|
fix[glm4.7 flash]: properly detect gfx95_quant_format (#22720)
|
2026-04-13 13:10:07 -07:00 |
|
Asish Kumar
|
39810762d2
|
fix: use describe mode for SGLang version detection (#22600)
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
|
2026-04-13 09:45:45 -07:00 |
|
DarkSharpness
|
314d6ecf08
|
[Feature][JIT Kernel] Fused TP QK norm For Minimax (#20673)
Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2026-04-13 20:29:47 +08:00 |
|
Xiaole Guo
|
4df60434d7
|
[diffusion] model: support stable-diffusion-3-medium-diffusers (#19225)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: Kangrui Du <kangruidu@gmail.com>
Co-authored-by: Xiaole Guo <gxlvera@gmail.com>
|
2026-04-13 16:07:06 +08:00 |
|
Chandrakant Khandelwal
|
1e9eecfa36
|
[Intel GPU] Enable sgl-kernel-xpu fused_experts MoE kernel path for GPT-OSS bf16 models. (#22417)
|
2026-04-13 13:45:48 +08:00 |
|
Mick
|
d524f110ac
|
[diffusion] refactor: streamline denoising stages (#22633)
|
2026-04-13 13:34:37 +08:00 |
|
Polisetty V R K Jyothendra Varma
|
7d2c11970c
|
[Intel GPU] Upgrade pytorch xpu version to 2.11 (#21908)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-13 13:16:24 +08:00 |
|
Zhangheng
|
5549d910c6
|
[hisparse]: Adding ci for hisparse kvcache-swap-in jit-kernel (#22155)
|
2026-04-13 12:50:29 +08:00 |
|
Zhangheng
|
305b42935a
|
[HiSparse]: Add benchmark for hisparse kernel (#22187)
|
2026-04-13 12:49:18 +08:00 |
|
Alison Shao
|
3f4fbc165d
|
Upgrade CI default CUDA version from 12.9 to 13.0 (#21441)
|
2026-04-12 21:48:40 -07:00 |
|
Mohammad Miadh Angkad
|
4dbd59850b
|
Add bfloat16 KV cache validation for HiSparse (#22505)
|
2026-04-13 12:41:42 +08:00 |
|
Xiaoyu Zhang
|
fae0a2fc3c
|
[codex] Add LTX-2.3 benchmark skill recipes (#22631)
|
2026-04-13 12:23:32 +08:00 |
|
Mick
|
bf022e177c
|
Revert "[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support (#22574)" (#22649)
|
2026-04-13 11:17:32 +08:00 |
|
Zhangheng
|
bc59cc0f96
|
[RaidxTree Refactor]: Support Unified HybridRadixTree V2 (#21206)
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: xiezhq-hermann <xiezhq@stanford.edu>
|
2026-04-13 10:28:22 +08:00 |
|
Ziang Li
|
5593539942
|
[RL] Refactor NVFP4 shuffling/swizzling to in-place replacement (#22204)
|
2026-04-12 19:08:45 -07:00 |
|
blzheng
|
934e19a610
|
[CPU] Fix argument issues in qkv_proj_with_rope_fused_weight and bmm… (#21367)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-13 09:59:13 +08:00 |
|
Liangsheng Yin
|
da6b8e1448
|
Extract pause_resume_in_place kit; rename test_abort to test_scheduler_control (#22647)
|
2026-04-12 18:49:37 -07:00 |
|
Lawrence Wu
|
28e40d873c
|
fix(PD): respect pause_generation in disagg event loops (#20908)
|
2026-04-12 18:07:51 -07:00 |
|
ishandhanani
|
c1ab68b45e
|
fix: streaming session race condition + some metrics (#21875)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-04-12 18:05:23 -07:00 |
|
Xiaoyu Zhang
|
37fc47c645
|
diffusion: fix layerwise offload for ModelOpt quantized DiTs (#22594)
|
2026-04-13 08:01:54 +08:00 |
|
Xiaoyu Zhang
|
03a1a7b81c
|
[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support (#22574)
|
2026-04-13 07:57:41 +08:00 |
|
Kurt Shuster
|
f81b6df3a3
|
[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deepseek on-demand (#21864)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
|
2026-04-12 16:25:02 -07:00 |
|
Khoa Pham
|
1f8df97054
|
Fix broken streaming response with --incremental-streaming-output (#22549)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-12 15:05:58 -07:00 |
|
Zhiyu
|
d4ad30b94c
|
[diffusion] quant: enable modelopt quantized FLUX deployment (#20082)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-12 23:35:33 +08:00 |
|
Mick
|
495ef8ec64
|
[diffusion] model: support LTX2.3 two stage (#22182)
|
2026-04-12 22:15:57 +08:00 |
|
Ziang Li
|
31453bb76a
|
[RL] Fix weight update for mxfp8 flashinfer_cutlass gemm backend (#22484)
|
2026-04-12 13:02:17 +00:00 |
|
Mohammad Miadh Angkad
|
bcc0c65aa8
|
[DSA] Hopper FP8 FlashMLA KV padding (#22372)
|
2026-04-12 02:19:17 -07:00 |
|
Kurt Shuster
|
0e0091c6c8
|
[server] Add --quantization unquant to explicitly opt out of quantization (#21863)
|
2026-04-12 02:17:22 -07:00 |
|
Wenyao Gao
|
4dfc8e1c3f
|
VLM: support passing --mm-process-config for all models (#18467)
|
2026-04-12 17:08:05 +08:00 |
|
Liangsheng Yin
|
f1eb4ca90c
|
Fix streaming session busy check double-counting; add compat CI tests (#22213)
|
2026-04-12 01:48:16 -07:00 |
|
Ke Bao
|
bc1bfbf607
|
Fix swa input length limitation (#22597)
|
2026-04-12 16:03:35 +08:00 |
|
Liangsheng Yin
|
f2377a00cb
|
Add SWA support for runtime busy memory check (#21499)
|
2026-04-12 00:39:51 -07:00 |
|
wufann
|
19cb918653
|
[Not-Merge][AMD] GLM-5 performance optimization (#21166)
|
2026-04-11 23:58:11 -07:00 |
|
Hubert Lu
|
edaa5973d4
|
[AMD][No-Merge] Simplify fused allreduce + RMSNorm and remove hidden_dim allowlist (#21986)
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-04-11 23:47:08 -07:00 |
|
Xinyuan Tong
|
9a4e8089ff
|
[Whisper] Batch encoder forward for concurrent prefill requests (#22361)
|
2026-04-12 14:15:14 +08:00 |
|
Zhai Feiyue
|
52750129ef
|
fix prefill tps log accuracy (#22497)
|
2026-04-11 23:07:30 -07:00 |
|
Alex Nails
|
c6fd9a00c7
|
[tokenizer] eliminate O(n²) copy in non-incremental streaming (#22567)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-11 23:05:36 -07:00 |
|
Prozac614
|
45472d70cc
|
[diffusion] CI: dynamic load-balanced partitioning for diffusion CI (#15528)
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: SGLang CI <ci@sglang.ai>
|
2026-04-12 13:02:43 +08:00 |
|
Aurick Qiao
|
cd2b2364ff
|
[Bugfix] fix model_config deletion (#22281)
|
2026-04-12 11:24:26 +08:00 |
|
Kurt Shuster
|
8da1cfb30d
|
[lora][moe] Decoupled LoRA MoE backend with Marlin support (#21858)
|
2026-04-11 14:59:27 -07:00 |
|
Ziang Li
|
78043d4448
|
[Misc] [MXFP8] Drop sm100 mxfp8 warning (#21881)
|
2026-04-11 11:11:28 +00:00 |
|
Liangsheng Yin
|
61a62c6503
|
[mem] Flatten memory checkers into composable per-pool invariant checks (#22562)
|
2026-04-11 02:56:22 -07:00 |
|
dyhsup
|
8cca9747f5
|
[diffusion] model: support ERNIE-Image (#22439)
|
2026-04-11 17:18:11 +08:00 |
|
Baizhou Zhang
|
d14d368191
|
[Kernel] Set sgl_per_token_group_quant_8bit_v2 as default choice (#22467)
|
2026-04-11 01:59:57 -07:00 |
|
cctry
|
f855a0bde6
|
Introduce CUDA graph debug mode with breakable CUDA graph (#19102)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Cheng Wan <chwan@rice.edu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-11 00:36:56 -07:00 |
|
Liangsheng Yin
|
d11da2403c
|
Add hisparse staging + decode offload guards to is_fully_idle() (#22577)
|
2026-04-11 00:12:10 -07:00 |
|