Commit Graph

2351 Commits

Author SHA1 Message Date
jsheng_Linkedin
6d47dc8f6d [CI][MLA] Enable deterministic inference for MGSM MLA FP8 test (#23303) 2026-04-20 22:26:26 -07:00
Liangsheng Yin
6cc2eee50d [misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305) 2026-04-20 21:16:24 -07:00
Ke Bao
50fc2c9e23 Fix hybrid swa chunked prefill oom (#23174) 2026-04-21 10:46:45 +08:00
Liangsheng Yin
c7a4ebf3c8 [Refactor] Replace page_align_keys helper with RadixKey.page_aligned method (#23107) 2026-04-20 18:10:42 -07:00
shuwenn
b65799cf83 [SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2026-04-20 14:25:04 -07:00
Liangsheng Yin
8cb957ccff [Perf] Make EAGLE bigram key an O(1) view on RadixKey (#23106) 2026-04-20 12:01:11 -07:00
Shunkangz
3dc1491c95 Support moe_dp_size = 1 for various attention_cp_size (#22003)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-04-20 11:58:19 -07:00
Lee Nau
b4bb036b73 fix legacy deepep path for flashinfer_cutedsl (#22925) 2026-04-20 11:49:33 -07:00
Vladislav Nosivskoy
4028a73c10 [KV-Events] Fix kv events events publishing for CP (#22983)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
2026-04-20 17:34:38 +08:00
Kangyan-Zhou
97baf17557 Fix test_modelopt_export using stale ModelConfig kwargs (#23214)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 23:18:09 -07:00
Liangsheng Yin
eb76aaba88 [core] Always-on StreamingSession in UnifiedRadixCache (#23202) 2026-04-19 21:19:43 -07:00
Liangsheng Yin
d3ce664612 move session to python/sglang/srt/session (#23144) 2026-04-19 17:34:19 -07:00
Baidu-AIAK
7ca3566130 Multi platform Plugin (#21388)
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com>
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
Co-authored-by: Alex Nails <alexj.nails@gmail.com>
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-19 17:23:51 -07:00
billishyahao
b74a9dd854 [AMD] fix tbo runtime error when initializing metadata for cuda graph (#22598) 2026-04-19 12:42:48 -07:00
Baizhou Zhang
6ecd6f84db [CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-19 05:32:36 -07:00
Cheng Wan
5f7aee726a refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 17:05:13 -07:00
Liangsheng Yin
09b689b407 Apply HF transformers patches from sglang init (#23103) 2026-04-17 15:37:51 -07:00
Liangsheng Yin
573e12a7fc Merge /get_load into /v1/loads (#23010) 2026-04-17 13:36:51 -07:00
Lianmin Zheng
44e67c6835 Remove deprecated double sparsity feature (#23009) 2026-04-17 13:33:12 -07:00
Liangsheng Yin
3df35ecc80 Lower TestPiecewiseCudaGraphQwen25VL gsm8k threshold to 0.80 (#23099) 2026-04-17 13:31:10 -07:00
CYYYC0310
a12ea979d4 [test] Add GSM8K accuracy test for PP with mixed chunk prefill (#23029)
Co-authored-by: cyy <cy02433585@alibaba-inc.com>
2026-04-17 17:09:53 +08:00
narutolhy
5fa0c6a52e Allow piecewise CUDA graph with speculative decoding (#22128)
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 13:39:30 +08:00
blzheng
0dcfae5553 [CPU] Add gemma4_rmsnorm_cpu kernel (#22842)
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-04-17 13:03:16 +08:00
Chunyuan WU
6c89214584 [CPU][sgl-kernel] extend_attention_cpu and flash_attn_varlen_func: fix nan for large seq (#22434)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-04-17 13:01:01 +08:00
Byron Hsu
cf9845f8e3 [Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank (#22990)
Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
2026-04-17 11:10:31 +08:00
Jan Bernlöhr
04a53955b9 feat: add coordinated checkpoint prefetch for network filesystem loading (#20843) 2026-04-16 20:08:19 -07:00
Khoa Pham
5a0eea8ac5 [CI] Adding Gemma 4 to Nightly CI (#22408) 2026-04-16 19:30:16 -07:00
Alison Shao
0052093178 test(4-gpu-b200): split test_qwen35_models.py + bump partitions 5→6 (#22913) 2026-04-16 18:51:59 -07:00
Liangsheng Yin
db7a751d48 refactor: extract FanOutCommunicator and use declarative spec table (#22967) 2026-04-16 15:37:19 -07:00
Xinyuan Tong
082eaed0a4 test: fix flaky required function calling assertion (#22890) 2026-04-16 09:44:26 -07:00
Zhangheng
14bcdfca21 [HiSparse]: Adding e2e ut for hisparse (#22979) 2026-04-16 23:20:07 +08:00
Liangsheng Yin
bbd8f9ba09 migrate CPU-only unit tests from openai_server to unit/ (#22965) 2026-04-16 03:53:33 -07:00
ybyang
fbd6dc3565 fix: normalize tool message content for GLM5.1 chat template (#22595) 2026-04-16 16:48:38 +08:00
Xinyuan Tong
34fef07a15 Upgrade transformers to 5.5.3 and refactor hf_transformers_utils into subpackage (#21569) 2026-04-15 20:03:44 -07:00
Liangsheng Yin
a4cf2ea128 streaming session: spec v2 bonus accounting + comprehensive test matrix (#22651) 2026-04-15 17:12:41 -07:00
Liangsheng Yin
f9792166c3 trim_overshoot: cap swa_evicted_seqlen + unit test (#22900) 2026-04-15 15:05:35 -07:00
Xinyu Zhang
13a2cd748d [Ray] Add data parallel (DP) and DP attention support to RayEngine (#21887)
Co-authored-by: xyuzh <xyuzh@users.noreply.github.com>
2026-04-15 15:00:48 -07:00
Sundara Raman Ramachandran
4927975427 [Score API] Add return_pooled_hidden_states to Scoring API for SequenceClassification / RewardModel (#22427) 2026-04-15 14:58:56 -07:00
Kurt Shuster
32d9fe5a32 [lora] Speedup triton backend sgemm calls with better grid (#22386) 2026-04-15 13:47:07 -07:00
Liangsheng Yin
aa78564e1a Refactor streaming session abort handling (#22790) 2026-04-15 00:13:05 -07:00
Michael
39c6bf730c [AMD][CI] Add GLM-5-MXFP4 accuracy and perf nightly tests for MI35x (#21773) 2026-04-14 18:55:36 -07:00
Ke Bao
3c0a6c6987 Add page_size and SWA coverage to unified radix cache bench test (#22815) 2026-04-14 23:58:05 +08:00
Bi Xue
070c6a2489 [sgl] perf optimization for eplb (#21232) 2026-04-14 22:52:17 +08:00
Ke Bao
9f9e0231bb Refactor unified radix cache UT into parameterized test suite (#22812) 2026-04-14 22:34:33 +08:00
Jincong Chen
6760c790bd [bugfix] avoid attention padding tokens computation in pcg (#17706) 2026-04-14 16:08:23 +08:00
Michael
eab045b2b7 [AMD] Add MiniMax-M2.7 accuracy and performance nightly tests (#22722)
Co-authored-by: HaiShaw <hixiao@gmail.com>
2026-04-14 00:30:11 -07:00
huangtingwei
e9d6b9eb2d [HiCache & HybridModel] mooncake backend support DSA & mamba model (#21259)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
2026-04-13 18:47:36 -07:00
Liangsheng Yin
33a3ba256f Delete dead rematch path in SessionAwareCache.release_session (#22735) 2026-04-13 17:02:40 -07:00
Kurt Shuster
ff13dfee45 [lora][moe] Virtual experts for LoRA MoE (#22122)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
2026-04-13 21:19:30 +00:00
Asish Kumar
39810762d2 fix: use describe mode for SGLang version detection (#22600)
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
2026-04-13 09:45:45 -07:00