Yuxuan Zhang
|
d49fc092cb
|
[Bug Fix] GLM-5.1: drop constexpr on page_indice_batch_offset, skip offloader post_init on draft worker, support N=32 in copy_to_gpu_no_ce (#23550)
|
2026-05-09 15:43:45 +08:00 |
|
Liangsheng Yin
|
78da0d3106
|
[Spec] Move accept_tokens off EagleDraftInput; pass via method arg (#24735)
|
2026-05-08 23:24:18 -07:00 |
|
Chi McIsaac
|
8e534e8f15
|
[diffusion] fix: fix diffusers executor crash when component residency manager is absent (#24573)
|
2026-05-09 11:45:06 +08:00 |
|
storyicon
|
590b13b513
|
[diffusion] fix: fix NCCL deadlock in ulysses sp when sequence length has remainder (#24694)
Signed-off-by: storyicon <storyicon@foxmail.com>
|
2026-05-09 11:05:37 +08:00 |
|
Polisetty V R K Jyothendra Varma
|
50ed01674e
|
fix is_arch_support_pdl function usage (#24600)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-09 09:39:34 +08:00 |
|
Liangsheng Yin
|
1613bae412
|
[Spec] Disambiguate verified_id into bonus_token(s) / accept_tokens (#24724)
|
2026-05-08 18:24:33 -07:00 |
|
Yuan Luo
|
a61a14f416
|
[KDA] Optimize prefill kernels with diagonal and recompute fuse (#24271)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-05-09 08:52:51 +08:00 |
|
Brayden Zhong
|
9ee830346f
|
Disable Custom AR V2 when in multi-node (#24729)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
|
2026-05-08 17:50:05 -07:00 |
|
Cheng Wan
|
d1c5937428
|
env: add SGLANG_RADIX_FORCE_MISS to force radix prefix-cache miss (#24726)
Co-authored-by: sihan-zzz <228612289+sihan-zzz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-08 17:46:38 -07:00 |
|
YAMY
|
560829a171
|
feat(scheduler): add adaptive queue-based prefill delayer trigger (#23189)
|
2026-05-08 16:54:30 -07:00 |
|
YAMY
|
6971a03fe6
|
fix(fa3): skip scheduler_metadata precompute under DP attention (#24632)
|
2026-05-08 16:19:20 -07:00 |
|
Niko Ma
|
62c2e091f6
|
[PD] MORI-IO: Add state transfer, inline transfer model, and high-concurrency fixes (#22665)
|
2026-05-08 16:07:22 -07:00 |
|
Jimmy Shong
|
fa8985486e
|
[test/fix]: isolate VLM MMMU eval output dirs to fix nightly-4-gpu cross-test pollution (#24623)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-08 15:01:53 -07:00 |
|
Jimmy Shong
|
096ad02b06
|
[Model] Laguna-XS.2 Model Support (#24204)
|
2026-05-09 05:43:13 +08:00 |
|
Cheng Wan
|
7b707c9222
|
disable the combination of --enable-two-batch-overlap and --enforce-s… (#24720)
|
2026-05-08 14:27:35 -07:00 |
|
Yuhao Yang
|
09912fd89d
|
Remove unnecessary bf16 assert in rotate_activation (#24686)
|
2026-05-09 05:00:52 +08:00 |
|
Yilong Zhao
|
f30d1d0b0a
|
logits: remove blocking H2D copy (#24627)
|
2026-05-08 13:22:13 -07:00 |
|
Ethan Feng
|
672f778512
|
[NemotronH] Fix expert scale weight loading (#24434)
|
2026-05-08 12:37:06 -07:00 |
|
zhongdaor-nv
|
2cf1a4ab38
|
feat: Add KV events for Mamba radix cache (#23678)
Signed-off-by: zhongdaor-nv <220807034+zhongdaor-nv@users.noreply.github.com>
Co-authored-by: zhongdaor-nv <220807034+zhongdaor-nv@users.noreply.github.com>
|
2026-05-08 11:53:36 -07:00 |
|
Lianmin Zheng
|
e40e339c72
|
Filter non-int token ids in benchmark and observe decode-side bootstrap/alloc metrics (#24684)
|
2026-05-08 11:45:37 -07:00 |
|
Mick
|
73b8eda103
|
[diffusion] fix: fix FA3 varlen out argument handling (#24688)
|
2026-05-08 19:01:49 +08:00 |
|
fanxingran
|
7f8e7a9130
|
fix(aiter): drop FP8 KV upcast; use native FP8 path in paged_attentio… (#24129)
Co-authored-by: fanxingran <fanxingran@amd.com>
|
2026-05-08 02:47:48 -07:00 |
|
jacky.cheng
|
f21d4868dc
|
[AMD] Replace naive triton RMSNorm with aiter RMSNorm for diffusion model (#24360)
|
2026-05-08 02:44:13 -07:00 |
|
YC Yen-Ching Tseng
|
e1150f66db
|
[AMD][diffusion] Temporal-unfolded batched Conv2D for ROCm VAE decode (#22971)
|
2026-05-08 02:32:14 -07:00 |
|
Brayden Zhong
|
80d0226b68
|
Turn on JIT custom AR implementation by default (#24363)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
|
2026-05-08 02:05:31 -07:00 |
|
HAI
|
73792629d4
|
[AMD] Intro SGLANG_DIFFUSION_AITER_FP8_ATTN (#24677)
|
2026-05-08 01:31:00 -07:00 |
|
jacky.cheng
|
b22d3cd606
|
[AMD] Support fp8 MLA for diffusion model (#20319)
|
2026-05-08 00:56:24 -07:00 |
|
Yibo Cai
|
55d8223c2b
|
[sgl-kernel/cpu] support w8a8 int8 model for arm cpu (#16045)
skip gpu test as this one is not related to gpu backend.
|
2026-05-08 14:47:06 +08:00 |
|
JoyFuture
|
e1bc001872
|
fix(mimo_v2): auto-disable multimodal when vision/audio configs are absent (#24652)
|
2026-05-08 13:40:08 +08:00 |
|
maocheng23
|
7deed98e1b
|
[fix] /pause_generation and /continue_generation wrong for --tokenizer-worker-num > 1 (#24462)
Co-authored-by: lawrence-harmonic <185285563+lawrence-harmonic@users.noreply.github.com>
|
2026-05-07 21:32:21 -07:00 |
|
Mick
|
2afb450501
|
[diffusion] optimize: optimize frame returns path (#24616)
|
2026-05-08 12:10:09 +08:00 |
|
johnnycxm
|
cdf5771f91
|
[MUSA][17/N] ci: Add MUSA diffusion, sgl-kernel tests, and CI workflow support (#20672)
Co-authored-by: ximin.chen <ximin.chen@mthreads.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
|
2026-05-07 20:45:21 -07:00 |
|
Brayden Zhong
|
5fa3bb2eaf
|
Enable flashinfer::trtllm_allreduce_fusion with PDL (#23765)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-08 10:41:10 +08:00 |
|
shuwenn
|
d9dddd4d7d
|
[SPEC V2][2/N] feat: adaptive spec support spec v2 (#23336)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2026-05-07 18:33:47 -07:00 |
|
Liangsheng Yin
|
35870d55ac
|
Deepseek V4 (#23882)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: yueming-yuan <yym022502@gmail.com>
Co-authored-by: DarkSharpness <2040703891@qq.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Qiaolin Yu <90088090+qiaolin-yu@users.noreply.github.com>
Co-authored-by: Ethan (Yusheng) Su <11704492+yushengsu-thu@users.noreply.github.com>
Co-authored-by: Mingyi <27337995+wisclmy0611@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Yihao Wang <42559837+againstentropy@users.noreply.github.com>
|
2026-05-07 18:32:21 -07:00 |
|
Mandepudi Rani Chowdary
|
55224fff08
|
Add Arm64 CPU Phase 1A CI bootstrap (#22123)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-08 09:28:23 +08:00 |
|
Lianmin Zheng
|
3c3f0bd55e
|
Cache empty MatchResult in RadixCache (#24470)
|
2026-05-07 17:13:20 -07:00 |
|
Baizhou Zhang
|
c4bb3ce273
|
Fix stuck when enabling MTP on DSA models (#24635)
|
2026-05-07 17:06:28 -07:00 |
|
Liangsheng Yin
|
95fb722dd2
|
Add registry for custom speculative algorithms (#23991)
|
2026-05-07 16:11:45 -07:00 |
|
Revanth Reddy Airre
|
c2c57068da
|
fix(http): apply SGLANG_TIMEOUT_KEEP_ALIVE in common.py (#24323)
Signed-off-by: Revanth Reddy Airre <revanthreddy@hippocraticai.com>
|
2026-05-07 16:01:41 -07:00 |
|
Xinyuan Tong
|
5b589ed2e7
|
feat(constrained): two-phase reasoning grammar + --enable-strict-thinking (#23953)
|
2026-05-07 14:21:51 -07:00 |
|
Xinyuan Tong
|
af2a2ac618
|
fix(function_call): handle Kimi-K2.5 bare numeric tool call IDs (#23950)
|
2026-05-07 14:20:02 -07:00 |
|
Xinyuan Tong
|
d8f9d32a05
|
feat(reasoning): auto-detect reasoning/tool-call parser from chat template (#23952)
|
2026-05-07 14:19:16 -07:00 |
|
Khoa Pham
|
d2c1034163
|
[Gemma 4] Adding MTP support (#24436)
Co-authored-by: Pengyu Chen <pychen96@gmail.com>
|
2026-05-07 14:08:41 -07:00 |
|
Xinyuan Tong
|
f1395af543
|
fix(openai): map reasoning.enabled to thinking AND enable_thinking (#23951)
|
2026-05-07 14:01:35 -07:00 |
|
R0CKSTAR
|
9cffa5ed6f
|
[MUSA] Bump torchada to 0.1.54 (#24592)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-05-07 11:45:49 -07:00 |
|
GXIN
|
90a618e37b
|
[NPU][diffusion] add selectable parallel VAE decode strategies (#23248)
Co-authored-by: 高鑫 <gaoxin@gaoxindeMacBook-Pro.local>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-07 21:37:03 +03:00 |
|
Junlin Wu
|
80a6014243
|
✨ [diffusion][npu][quant] Add MXFP8 quantization support for Wan2.2 Diffusion on Ascend NPU (#20922)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
2026-05-07 21:30:56 +03:00 |
|
McZyWu
|
7d397ad23d
|
[NPU]Support model Trinity-mini for Npu, accuracy 90% (#18172)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-05-07 20:58:18 +03:00 |
|
Mick
|
b0225a69dc
|
[diffusion] optimize: precompute LTX2 guidance perturbation states (#24494)
|
2026-05-08 01:18:42 +08:00 |
|