sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 11:48:01 +00:00

Author	SHA1	Message	Date
Teng Ma	00126648e2	[PD] Add EFA disaggregation transport	2026-05-10 23:51:29 +08:00
Teng Ma	c49ee54e73	Merge remote-tracking branch 'origin/main' into pr-21859-support-multi-protocol # Conflicts: # python/sglang/srt/server_args.py # test/registered/unit/server_args/test_server_args.py	2026-05-10 22:35:49 +08:00
RunningLeon	335dbd60b4	Support Intern-S2-Preview (#24875 )	2026-05-10 22:17:30 +08:00
Ke Bao	59faf986b2	[PD] Unify dsv4 dispatch with swa (#24888 )	2026-05-10 22:01:13 +08:00
Yuhao Yang	2f06867128	Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (#24775 ) Co-authored-by: Cheng Wan <chwan@rice.edu> Co-authored-by: Chunan Zeng <zcnrex@gmail.com>	2026-05-10 19:03:37 +08:00
Yuhao Yang	bd0aa22309	Fix PD bootstrap failure handling (#24772 ) Co-authored-by: Cheng Wan <chwan@rice.edu>	2026-05-10 19:02:47 +08:00
Liangsheng Yin	8cc16c9974	[Spec] Cleanup idle stub and shape-check patterns (#24881 )	2026-05-10 02:39:53 -07:00
Cheng Wan	c7f674e427	[Bug] Add dsv4 state_type branch to mooncake disaggregation (#24878 ) Co-authored-by: Cheng Wan <cheng.wan@radixark.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 01:13:46 -07:00
Liangsheng Yin	d08744238a	[Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (#24859 )	2026-05-10 01:07:45 -07:00
Yuan Luo	d3fd91ed97	[Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (#24696 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-05-10 00:24:12 -07:00
Qiaolin Yu	a87fb399de	[spec decoding] support kimi-k2.5-eagle3-mla (#24826 )	2026-05-09 23:57:39 -07:00
shuwenn	b4d347e86e	[SPEC V2] fix: skip stale state updates in spec-v2 overlap (#23456 )	2026-05-09 23:56:24 -07:00
Byron Hsu	cfd3fd00d0	[RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (#24854 ) Co-authored-by: Byron Hsu <byron@periodiclabs.ai> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-09 23:36:52 -07:00
Chi McIsaac	44efc23a9a	[diffusion] CI: add cache-dit CI tests (#19213 )	2026-05-10 13:38:41 +08:00
Byron Hsu	1e6c6d1f07	[Utils] Make request dump robust to unpicklable server_args and large meta_info (#24767 ) Co-authored-by: Byron Hsu <byron@periodiclabs.ai> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-09 21:41:41 -07:00
Stefan He	9578ba1b57	[Utils] Refactor device cache emptying (#24861 ) Co-authored-by: Biao He <biao@Biaos-MacBook-Air.local>	2026-05-09 21:28:00 -07:00
Byron Hsu	47483001b6	[PrefillDelayer] support NCCL all-gather for cross-DP info sync (#24768 ) Co-authored-by: Byron Hsu <byron@periodiclabs.ai> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-09 21:20:03 -07:00
Byron Hsu	7edb4c3cea	[NUMA+Ray] Fix NUMA NVML handle resolution under shuffled CUDA_VISIBLE_DEVICES (#24766 ) Co-authored-by: Byron Hsu <byron@periodiclabs.ai> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-09 21:18:39 -07:00
Liangsheng Yin	c95454b341	speculative: drop dead params/returns/no-ops (#24865 )	2026-05-09 15:53:31 -07:00
Charles Chen	12f42f2e7e	Support Gemma3/4 + Eagle3 (#23976 )	2026-05-09 13:34:56 -07:00
luchangli	8087e07d52	[UnifiedRadixTree]: Align cache_empty_result with RadixTree (#24779 ) Co-authored-by: Zhangheng <hzh0425@apache.org>	2026-05-09 23:52:22 +08:00
Baizhou Zhang	ef5e9f8aba	[DSV4] Cherry pick missing commits from deepseek_v4 branch and enhance tests (#24793 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: yueming-yuan <yym022502@gmail.com>	2026-05-09 04:15:37 -07:00
Brayden Zhong	4b23f6bdc5	Fix performance regression on Deepseek V3 on `moe-runner-backend=triton` on SM90 (#24562 ) Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>	2026-05-09 03:49:12 -07:00
Brayden Zhong	05d1ab51e8	Enable PDL for various kernels in DSV32/GLM5 (#23965 ) Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>	2026-05-09 03:42:56 -07:00
shuwenn	d5564c2a96	fix(fa3): translate page table to SWA loc in EAGLE3 topk>1 spec metadata (#24617 )	2026-05-09 18:22:45 +08:00
JoyFuture	a309f1f8f4	fix(cuda_graph): zero out_cache_loc_swa on pad and use int32 (hybrid-SWA accuracy fix) (#24743 )	2026-05-09 18:22:12 +08:00
Brayden Zhong	f4b7e73699	Enable trtllm-gen BF16 MoE for MTP (#24260 ) Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>	2026-05-09 03:14:17 -07:00
sglang-npu-bot	f1a9a455e0	Revert "[NPU] fix profiler on npu" (#24815 )	2026-05-09 17:53:02 +08:00
zhaozx-cn	e2527df8b6	[NPU] fix profiler on npu (#24685 ) Signed-off-by: zhaozx-cn <zhaozx2116@163.com>	2026-05-09 17:48:24 +08:00
Jia Guo	fd636410a2	Restrict fa_skip_kv_cache to non-MLA backends (#24097 )	2026-05-09 09:25:02 +00:00
Brayden Zhong	8f33bee31b	Reland Cute-DSL FP4 dense GEMM (#23590 ) Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>	2026-05-09 02:20:58 -07:00
Yuxuan Zhang	d49fc092cb	[Bug Fix] GLM-5.1: drop constexpr on page_indice_batch_offset, skip offloader post_init on draft worker, support N=32 in copy_to_gpu_no_ce (#23550 )	2026-05-09 15:43:45 +08:00
Liangsheng Yin	78da0d3106	[Spec] Move `accept_tokens` off `EagleDraftInput`; pass via method arg (#24735 )	2026-05-08 23:24:18 -07:00
Chi McIsaac	8e534e8f15	[diffusion] fix: fix diffusers executor crash when component residency manager is absent (#24573 )	2026-05-09 11:45:06 +08:00
storyicon	590b13b513	[diffusion] fix: fix NCCL deadlock in ulysses sp when sequence length has remainder (#24694 ) Signed-off-by: storyicon <storyicon@foxmail.com>	2026-05-09 11:05:37 +08:00
Polisetty V R K Jyothendra Varma	50ed01674e	fix is_arch_support_pdl function usage (#24600 ) Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-05-09 09:39:34 +08:00
Liangsheng Yin	1613bae412	[Spec] Disambiguate `verified_id` into `bonus_token(s)` / `accept_tokens` (#24724 )	2026-05-08 18:24:33 -07:00
Yuan Luo	a61a14f416	[KDA] Optimize prefill kernels with diagonal and recompute fuse (#24271 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-05-09 08:52:51 +08:00
Brayden Zhong	9ee830346f	Disable Custom AR V2 when in multi-node (#24729 ) Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>	2026-05-08 17:50:05 -07:00
Cheng Wan	d1c5937428	env: add SGLANG_RADIX_FORCE_MISS to force radix prefix-cache miss (#24726 ) Co-authored-by: sihan-zzz <228612289+sihan-zzz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 17:46:38 -07:00
YAMY	560829a171	feat(scheduler): add adaptive queue-based prefill delayer trigger (#23189 )	2026-05-08 16:54:30 -07:00
YAMY	6971a03fe6	fix(fa3): skip scheduler_metadata precompute under DP attention (#24632 )	2026-05-08 16:19:20 -07:00
Niko Ma	62c2e091f6	[PD] MORI-IO: Add state transfer, inline transfer model, and high-concurrency fixes (#22665 )	2026-05-08 16:07:22 -07:00
Jimmy Shong	fa8985486e	[test/fix]: isolate VLM MMMU eval output dirs to fix nightly-4-gpu cross-test pollution (#24623 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-05-08 15:01:53 -07:00
Jimmy Shong	096ad02b06	[Model] Laguna-XS.2 Model Support (#24204 )	2026-05-09 05:43:13 +08:00
Cheng Wan	7b707c9222	disable the combination of --enable-two-batch-overlap and --enforce-s… (#24720 )	2026-05-08 14:27:35 -07:00
Yuhao Yang	09912fd89d	Remove unnecessary bf16 assert in rotate_activation (#24686 )	2026-05-09 05:00:52 +08:00
Yilong Zhao	f30d1d0b0a	logits: remove blocking H2D copy (#24627 )	2026-05-08 13:22:13 -07:00
Ethan Feng	672f778512	[NemotronH] Fix expert scale weight loading (#24434 )	2026-05-08 12:37:06 -07:00
zhongdaor-nv	2cf1a4ab38	feat: Add KV events for Mamba radix cache (#23678 ) Signed-off-by: zhongdaor-nv <220807034+zhongdaor-nv@users.noreply.github.com> Co-authored-by: zhongdaor-nv <220807034+zhongdaor-nv@users.noreply.github.com>	2026-05-08 11:53:36 -07:00

1 2 3 4 5 ...

8284 Commits