sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 11:48:01 +00:00

Author	SHA1	Message	Date
jsheng_Linkedin	6d47dc8f6d	[CI][MLA] Enable deterministic inference for MGSM MLA FP8 test (#23303 )	2026-04-20 22:26:26 -07:00
Liangsheng Yin	6cc2eee50d	[misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305 )	2026-04-20 21:16:24 -07:00
Ke Bao	50fc2c9e23	Fix hybrid swa chunked prefill oom (#23174 )	2026-04-21 10:46:45 +08:00
Liangsheng Yin	c7a4ebf3c8	[Refactor] Replace `page_align_keys` helper with `RadixKey.page_aligned` method (#23107 )	2026-04-20 18:10:42 -07:00
shuwenn	b65799cf83	[SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599 ) Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2026-04-20 14:25:04 -07:00
Liangsheng Yin	8cb957ccff	[Perf] Make EAGLE bigram key an O(1) view on `RadixKey` (#23106 )	2026-04-20 12:01:11 -07:00
Shunkangz	3dc1491c95	Support moe_dp_size = 1 for various attention_cp_size (#22003 ) Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2026-04-20 11:58:19 -07:00
Lee Nau	b4bb036b73	fix legacy deepep path for flashinfer_cutedsl (#22925 )	2026-04-20 11:49:33 -07:00
Vladislav Nosivskoy	4028a73c10	[KV-Events] Fix kv events events publishing for CP (#22983 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>	2026-04-20 17:34:38 +08:00
Kangyan-Zhou	97baf17557	Fix test_modelopt_export using stale ModelConfig kwargs (#23214 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 23:18:09 -07:00
Liangsheng Yin	eb76aaba88	[core] Always-on `StreamingSession` in `UnifiedRadixCache` (#23202 )	2026-04-19 21:19:43 -07:00
Liangsheng Yin	d3ce664612	move session to python/sglang/srt/session (#23144 )	2026-04-19 17:34:19 -07:00
Baidu-AIAK	7ca3566130	Multi platform Plugin (#21388 ) Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com> Co-authored-by: Alex Nails <alex.nails@radixark.ai> Co-authored-by: Alex Nails <alexj.nails@gmail.com> Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-19 17:23:51 -07:00
billishyahao	b74a9dd854	[AMD] fix tbo runtime error when initializing metadata for cuda graph (#22598 )	2026-04-19 12:42:48 -07:00
Baizhou Zhang	6ecd6f84db	[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119 ) Co-authored-by: Kangyan Zhou <zky314343421@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alison Shao <a.shao@wustl.edu> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-04-19 05:32:36 -07:00
Cheng Wan	5f7aee726a	refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 17:05:13 -07:00
Liangsheng Yin	09b689b407	Apply HF transformers patches from sglang init (#23103 )	2026-04-17 15:37:51 -07:00
Liangsheng Yin	573e12a7fc	Merge /get_load into /v1/loads (#23010 )	2026-04-17 13:36:51 -07:00
Lianmin Zheng	44e67c6835	Remove deprecated double sparsity feature (#23009 )	2026-04-17 13:33:12 -07:00
Liangsheng Yin	3df35ecc80	Lower TestPiecewiseCudaGraphQwen25VL gsm8k threshold to 0.80 (#23099 )	2026-04-17 13:31:10 -07:00
CYYYC0310	a12ea979d4	[test] Add GSM8K accuracy test for PP with mixed chunk prefill (#23029 ) Co-authored-by: cyy <cy02433585@alibaba-inc.com>	2026-04-17 17:09:53 +08:00
narutolhy	5fa0c6a52e	Allow piecewise CUDA graph with speculative decoding (#22128 ) Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 13:39:30 +08:00
blzheng	0dcfae5553	[CPU] Add gemma4_rmsnorm_cpu kernel (#22842 ) Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-17 13:03:16 +08:00
Chunyuan WU	6c89214584	[CPU][sgl-kernel] `extend_attention_cpu` and `flash_attn_varlen_func`: fix `nan` for large seq (#22434 ) Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>	2026-04-17 13:01:01 +08:00
Byron Hsu	cf9845f8e3	[Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank (#22990 ) Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>	2026-04-17 11:10:31 +08:00
Jan Bernlöhr	04a53955b9	feat: add coordinated checkpoint prefetch for network filesystem loading (#20843 )	2026-04-16 20:08:19 -07:00
Khoa Pham	5a0eea8ac5	[CI] Adding Gemma 4 to Nightly CI (#22408 )	2026-04-16 19:30:16 -07:00
Alison Shao	0052093178	test(4-gpu-b200): split test_qwen35_models.py + bump partitions 5→6 (#22913 )	2026-04-16 18:51:59 -07:00
Liangsheng Yin	db7a751d48	refactor: extract FanOutCommunicator and use declarative spec table (#22967 )	2026-04-16 15:37:19 -07:00
Xinyuan Tong	082eaed0a4	test: fix flaky required function calling assertion (#22890 )	2026-04-16 09:44:26 -07:00
Zhangheng	14bcdfca21	[HiSparse]: Adding e2e ut for hisparse (#22979 )	2026-04-16 23:20:07 +08:00
Liangsheng Yin	bbd8f9ba09	migrate CPU-only unit tests from openai_server to unit/ (#22965 )	2026-04-16 03:53:33 -07:00
ybyang	fbd6dc3565	fix: normalize tool message content for GLM5.1 chat template (#22595 )	2026-04-16 16:48:38 +08:00
Xinyuan Tong	34fef07a15	Upgrade transformers to 5.5.3 and refactor hf_transformers_utils into subpackage (#21569 )	2026-04-15 20:03:44 -07:00
Liangsheng Yin	a4cf2ea128	streaming session: spec v2 bonus accounting + comprehensive test matrix (#22651 )	2026-04-15 17:12:41 -07:00
Liangsheng Yin	f9792166c3	trim_overshoot: cap swa_evicted_seqlen + unit test (#22900 )	2026-04-15 15:05:35 -07:00
Xinyu Zhang	13a2cd748d	[Ray] Add data parallel (DP) and DP attention support to RayEngine (#21887 ) Co-authored-by: xyuzh <xyuzh@users.noreply.github.com>	2026-04-15 15:00:48 -07:00
Sundara Raman Ramachandran	4927975427	[Score API] Add return_pooled_hidden_states to Scoring API for SequenceClassification / RewardModel (#22427 )	2026-04-15 14:58:56 -07:00
Kurt Shuster	32d9fe5a32	[lora] Speedup triton backend `sgemm` calls with better grid (#22386 )	2026-04-15 13:47:07 -07:00
Liangsheng Yin	aa78564e1a	Refactor streaming session abort handling (#22790 )	2026-04-15 00:13:05 -07:00
Michael	39c6bf730c	[AMD][CI] Add GLM-5-MXFP4 accuracy and perf nightly tests for MI35x (#21773 )	2026-04-14 18:55:36 -07:00
Ke Bao	3c0a6c6987	Add page_size and SWA coverage to unified radix cache bench test (#22815 )	2026-04-14 23:58:05 +08:00
Bi Xue	070c6a2489	[sgl] perf optimization for eplb (#21232 )	2026-04-14 22:52:17 +08:00
Ke Bao	9f9e0231bb	Refactor unified radix cache UT into parameterized test suite (#22812 )	2026-04-14 22:34:33 +08:00
Jincong Chen	6760c790bd	[bugfix] avoid attention padding tokens computation in pcg (#17706 )	2026-04-14 16:08:23 +08:00
Michael	eab045b2b7	[AMD] Add MiniMax-M2.7 accuracy and performance nightly tests (#22722 ) Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-04-14 00:30:11 -07:00
huangtingwei	e9d6b9eb2d	[HiCache & HybridModel] mooncake backend support DSA & mamba model (#21259 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: Vladislav Nosivskoy <vladnosiv@gmail.com>	2026-04-13 18:47:36 -07:00
Liangsheng Yin	33a3ba256f	Delete dead rematch path in SessionAwareCache.release_session (#22735 )	2026-04-13 17:02:40 -07:00
Kurt Shuster	ff13dfee45	[lora][moe] Virtual experts for LoRA MoE (#22122 ) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>	2026-04-13 21:19:30 +00:00
Asish Kumar	39810762d2	fix: use describe mode for SGLang version detection (#22600 ) Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>	2026-04-13 09:45:45 -07:00

1 2 3 4 5 ...

2351 Commits