Teng Ma
|
00126648e2
|
[PD] Add EFA disaggregation transport
|
2026-05-10 23:51:29 +08:00 |
|
Teng Ma
|
c49ee54e73
|
Merge remote-tracking branch 'origin/main' into pr-21859-support-multi-protocol
# Conflicts:
# python/sglang/srt/server_args.py
# test/registered/unit/server_args/test_server_args.py
|
2026-05-10 22:35:49 +08:00 |
|
RunningLeon
|
335dbd60b4
|
Support Intern-S2-Preview (#24875)
|
2026-05-10 22:17:30 +08:00 |
|
Ke Bao
|
59faf986b2
|
[PD] Unify dsv4 dispatch with swa (#24888)
|
2026-05-10 22:01:13 +08:00 |
|
Yuhao Yang
|
2f06867128
|
Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (#24775)
Co-authored-by: Cheng Wan <chwan@rice.edu>
Co-authored-by: Chunan Zeng <zcnrex@gmail.com>
|
2026-05-10 19:03:37 +08:00 |
|
Yuhao Yang
|
bd0aa22309
|
Fix PD bootstrap failure handling (#24772)
Co-authored-by: Cheng Wan <chwan@rice.edu>
|
2026-05-10 19:02:47 +08:00 |
|
Liangsheng Yin
|
8cc16c9974
|
[Spec] Cleanup idle stub and shape-check patterns (#24881)
|
2026-05-10 02:39:53 -07:00 |
|
Cheng Wan
|
c7f674e427
|
[Bug] Add dsv4 state_type branch to mooncake disaggregation (#24878)
Co-authored-by: Cheng Wan <cheng.wan@radixark.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-10 01:13:46 -07:00 |
|
Liangsheng Yin
|
d08744238a
|
[Spec V1] Split draft-extend phase from EagleDraftInput into new EagleDraftExtendInput (#24859)
|
2026-05-10 01:07:45 -07:00 |
|
Yuan Luo
|
d3fd91ed97
|
[Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (#24696)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-05-10 00:24:12 -07:00 |
|
Qiaolin Yu
|
a87fb399de
|
[spec decoding] support kimi-k2.5-eagle3-mla (#24826)
|
2026-05-09 23:57:39 -07:00 |
|
shuwenn
|
b4d347e86e
|
[SPEC V2] fix: skip stale state updates in spec-v2 overlap (#23456)
|
2026-05-09 23:56:24 -07:00 |
|
Byron Hsu
|
cfd3fd00d0
|
[RL] Call torch.cuda.empty_cache() for in-place pause mode to avoid OOM (#24854)
Co-authored-by: Byron Hsu <byron@periodiclabs.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-09 23:36:52 -07:00 |
|
Chi McIsaac
|
44efc23a9a
|
[diffusion] CI: add cache-dit CI tests (#19213)
|
2026-05-10 13:38:41 +08:00 |
|
Byron Hsu
|
1e6c6d1f07
|
[Utils] Make request dump robust to unpicklable server_args and large meta_info (#24767)
Co-authored-by: Byron Hsu <byron@periodiclabs.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-09 21:41:41 -07:00 |
|
Stefan He
|
9578ba1b57
|
[Utils] Refactor device cache emptying (#24861)
Co-authored-by: Biao He <biao@Biaos-MacBook-Air.local>
|
2026-05-09 21:28:00 -07:00 |
|
Byron Hsu
|
47483001b6
|
[PrefillDelayer] support NCCL all-gather for cross-DP info sync (#24768)
Co-authored-by: Byron Hsu <byron@periodiclabs.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-09 21:20:03 -07:00 |
|
Byron Hsu
|
7edb4c3cea
|
[NUMA+Ray] Fix NUMA NVML handle resolution under shuffled CUDA_VISIBLE_DEVICES (#24766)
Co-authored-by: Byron Hsu <byron@periodiclabs.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-05-09 21:18:39 -07:00 |
|
Liangsheng Yin
|
c95454b341
|
speculative: drop dead params/returns/no-ops (#24865)
|
2026-05-09 15:53:31 -07:00 |
|
Charles Chen
|
12f42f2e7e
|
Support Gemma3/4 + Eagle3 (#23976)
|
2026-05-09 13:34:56 -07:00 |
|
luchangli
|
8087e07d52
|
[UnifiedRadixTree]: Align cache_empty_result with RadixTree (#24779)
Co-authored-by: Zhangheng <hzh0425@apache.org>
|
2026-05-09 23:52:22 +08:00 |
|
Baizhou Zhang
|
ef5e9f8aba
|
[DSV4] Cherry pick missing commits from deepseek_v4 branch and enhance tests (#24793)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: yueming-yuan <yym022502@gmail.com>
|
2026-05-09 04:15:37 -07:00 |
|
Brayden Zhong
|
4b23f6bdc5
|
Fix performance regression on Deepseek V3 on moe-runner-backend=triton on SM90 (#24562)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
|
2026-05-09 03:49:12 -07:00 |
|
Brayden Zhong
|
05d1ab51e8
|
Enable PDL for various kernels in DSV32/GLM5 (#23965)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
|
2026-05-09 03:42:56 -07:00 |
|
shuwenn
|
d5564c2a96
|
fix(fa3): translate page table to SWA loc in EAGLE3 topk>1 spec metadata (#24617)
|
2026-05-09 18:22:45 +08:00 |
|
JoyFuture
|
a309f1f8f4
|
fix(cuda_graph): zero out_cache_loc_swa on pad and use int32 (hybrid-SWA accuracy fix) (#24743)
|
2026-05-09 18:22:12 +08:00 |
|
Brayden Zhong
|
f4b7e73699
|
Enable trtllm-gen BF16 MoE for MTP (#24260)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
|
2026-05-09 03:14:17 -07:00 |
|
sglang-npu-bot
|
f1a9a455e0
|
Revert "[NPU] fix profiler on npu" (#24815)
|
2026-05-09 17:53:02 +08:00 |
|
zhaozx-cn
|
e2527df8b6
|
[NPU] fix profiler on npu (#24685)
Signed-off-by: zhaozx-cn <zhaozx2116@163.com>
|
2026-05-09 17:48:24 +08:00 |
|
Jia Guo
|
fd636410a2
|
Restrict fa_skip_kv_cache to non-MLA backends (#24097)
|
2026-05-09 09:25:02 +00:00 |
|
Brayden Zhong
|
8f33bee31b
|
Reland Cute-DSL FP4 dense GEMM (#23590)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
|
2026-05-09 02:20:58 -07:00 |
|
Yuxuan Zhang
|
d49fc092cb
|
[Bug Fix] GLM-5.1: drop constexpr on page_indice_batch_offset, skip offloader post_init on draft worker, support N=32 in copy_to_gpu_no_ce (#23550)
|
2026-05-09 15:43:45 +08:00 |
|
Liangsheng Yin
|
78da0d3106
|
[Spec] Move accept_tokens off EagleDraftInput; pass via method arg (#24735)
|
2026-05-08 23:24:18 -07:00 |
|
Chi McIsaac
|
8e534e8f15
|
[diffusion] fix: fix diffusers executor crash when component residency manager is absent (#24573)
|
2026-05-09 11:45:06 +08:00 |
|
storyicon
|
590b13b513
|
[diffusion] fix: fix NCCL deadlock in ulysses sp when sequence length has remainder (#24694)
Signed-off-by: storyicon <storyicon@foxmail.com>
|
2026-05-09 11:05:37 +08:00 |
|
Polisetty V R K Jyothendra Varma
|
50ed01674e
|
fix is_arch_support_pdl function usage (#24600)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-09 09:39:34 +08:00 |
|
Liangsheng Yin
|
1613bae412
|
[Spec] Disambiguate verified_id into bonus_token(s) / accept_tokens (#24724)
|
2026-05-08 18:24:33 -07:00 |
|
Yuan Luo
|
a61a14f416
|
[KDA] Optimize prefill kernels with diagonal and recompute fuse (#24271)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-05-09 08:52:51 +08:00 |
|
Brayden Zhong
|
9ee830346f
|
Disable Custom AR V2 when in multi-node (#24729)
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
|
2026-05-08 17:50:05 -07:00 |
|
Cheng Wan
|
d1c5937428
|
env: add SGLANG_RADIX_FORCE_MISS to force radix prefix-cache miss (#24726)
Co-authored-by: sihan-zzz <228612289+sihan-zzz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-08 17:46:38 -07:00 |
|
YAMY
|
560829a171
|
feat(scheduler): add adaptive queue-based prefill delayer trigger (#23189)
|
2026-05-08 16:54:30 -07:00 |
|
YAMY
|
6971a03fe6
|
fix(fa3): skip scheduler_metadata precompute under DP attention (#24632)
|
2026-05-08 16:19:20 -07:00 |
|
Niko Ma
|
62c2e091f6
|
[PD] MORI-IO: Add state transfer, inline transfer model, and high-concurrency fixes (#22665)
|
2026-05-08 16:07:22 -07:00 |
|
Jimmy Shong
|
fa8985486e
|
[test/fix]: isolate VLM MMMU eval output dirs to fix nightly-4-gpu cross-test pollution (#24623)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-05-08 15:01:53 -07:00 |
|
Jimmy Shong
|
096ad02b06
|
[Model] Laguna-XS.2 Model Support (#24204)
|
2026-05-09 05:43:13 +08:00 |
|
Cheng Wan
|
7b707c9222
|
disable the combination of --enable-two-batch-overlap and --enforce-s… (#24720)
|
2026-05-08 14:27:35 -07:00 |
|
Yuhao Yang
|
09912fd89d
|
Remove unnecessary bf16 assert in rotate_activation (#24686)
|
2026-05-09 05:00:52 +08:00 |
|
Yilong Zhao
|
f30d1d0b0a
|
logits: remove blocking H2D copy (#24627)
|
2026-05-08 13:22:13 -07:00 |
|
Ethan Feng
|
672f778512
|
[NemotronH] Fix expert scale weight loading (#24434)
|
2026-05-08 12:37:06 -07:00 |
|
zhongdaor-nv
|
2cf1a4ab38
|
feat: Add KV events for Mamba radix cache (#23678)
Signed-off-by: zhongdaor-nv <220807034+zhongdaor-nv@users.noreply.github.com>
Co-authored-by: zhongdaor-nv <220807034+zhongdaor-nv@users.noreply.github.com>
|
2026-05-08 11:53:36 -07:00 |
|