jsheng_Linkedin
|
6d47dc8f6d
|
[CI][MLA] Enable deterministic inference for MGSM MLA FP8 test (#23303)
|
2026-04-20 22:26:26 -07:00 |
|
Liangsheng Yin
|
6cc2eee50d
|
[misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305)
|
2026-04-20 21:16:24 -07:00 |
|
Ke Bao
|
50fc2c9e23
|
Fix hybrid swa chunked prefill oom (#23174)
|
2026-04-21 10:46:45 +08:00 |
|
Liangsheng Yin
|
c7a4ebf3c8
|
[Refactor] Replace page_align_keys helper with RadixKey.page_aligned method (#23107)
|
2026-04-20 18:10:42 -07:00 |
|
shuwenn
|
b65799cf83
|
[SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2026-04-20 14:25:04 -07:00 |
|
Liangsheng Yin
|
8cb957ccff
|
[Perf] Make EAGLE bigram key an O(1) view on RadixKey (#23106)
|
2026-04-20 12:01:11 -07:00 |
|
Shunkangz
|
3dc1491c95
|
Support moe_dp_size = 1 for various attention_cp_size (#22003)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-04-20 11:58:19 -07:00 |
|
Lee Nau
|
b4bb036b73
|
fix legacy deepep path for flashinfer_cutedsl (#22925)
|
2026-04-20 11:49:33 -07:00 |
|
Vladislav Nosivskoy
|
4028a73c10
|
[KV-Events] Fix kv events events publishing for CP (#22983)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-04-20 17:34:38 +08:00 |
|
Kangyan-Zhou
|
97baf17557
|
Fix test_modelopt_export using stale ModelConfig kwargs (#23214)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-19 23:18:09 -07:00 |
|
Liangsheng Yin
|
eb76aaba88
|
[core] Always-on StreamingSession in UnifiedRadixCache (#23202)
|
2026-04-19 21:19:43 -07:00 |
|
Liangsheng Yin
|
d3ce664612
|
move session to python/sglang/srt/session (#23144)
|
2026-04-19 17:34:19 -07:00 |
|
Baidu-AIAK
|
7ca3566130
|
Multi platform Plugin (#21388)
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0183.tjzj.baidu.com>
Co-authored-by: Alex Nails <alex.nails@radixark.ai>
Co-authored-by: Alex Nails <alexj.nails@gmail.com>
Co-authored-by: root <root@tjzj-inf-sci-k8s-bzz2-0000.tjzj.baidu.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 17:23:51 -07:00 |
|
billishyahao
|
b74a9dd854
|
[AMD] fix tbo runtime error when initializing metadata for cuda graph (#22598)
|
2026-04-19 12:42:48 -07:00 |
|
Baizhou Zhang
|
6ecd6f84db
|
[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 05:32:36 -07:00 |
|
Cheng Wan
|
5f7aee726a
|
refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-17 17:05:13 -07:00 |
|
Liangsheng Yin
|
09b689b407
|
Apply HF transformers patches from sglang init (#23103)
|
2026-04-17 15:37:51 -07:00 |
|
Liangsheng Yin
|
573e12a7fc
|
Merge /get_load into /v1/loads (#23010)
|
2026-04-17 13:36:51 -07:00 |
|
Lianmin Zheng
|
44e67c6835
|
Remove deprecated double sparsity feature (#23009)
|
2026-04-17 13:33:12 -07:00 |
|
Liangsheng Yin
|
3df35ecc80
|
Lower TestPiecewiseCudaGraphQwen25VL gsm8k threshold to 0.80 (#23099)
|
2026-04-17 13:31:10 -07:00 |
|
CYYYC0310
|
a12ea979d4
|
[test] Add GSM8K accuracy test for PP with mixed chunk prefill (#23029)
Co-authored-by: cyy <cy02433585@alibaba-inc.com>
|
2026-04-17 17:09:53 +08:00 |
|
narutolhy
|
5fa0c6a52e
|
Allow piecewise CUDA graph with speculative decoding (#22128)
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-17 13:39:30 +08:00 |
|
blzheng
|
0dcfae5553
|
[CPU] Add gemma4_rmsnorm_cpu kernel (#22842)
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-17 13:03:16 +08:00 |
|
Chunyuan WU
|
6c89214584
|
[CPU][sgl-kernel] extend_attention_cpu and flash_attn_varlen_func: fix nan for large seq (#22434)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-17 13:01:01 +08:00 |
|
Byron Hsu
|
cf9845f8e3
|
[Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank (#22990)
Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
|
2026-04-17 11:10:31 +08:00 |
|
Jan Bernlöhr
|
04a53955b9
|
feat: add coordinated checkpoint prefetch for network filesystem loading (#20843)
|
2026-04-16 20:08:19 -07:00 |
|
Khoa Pham
|
5a0eea8ac5
|
[CI] Adding Gemma 4 to Nightly CI (#22408)
|
2026-04-16 19:30:16 -07:00 |
|
Alison Shao
|
0052093178
|
test(4-gpu-b200): split test_qwen35_models.py + bump partitions 5→6 (#22913)
|
2026-04-16 18:51:59 -07:00 |
|
Liangsheng Yin
|
db7a751d48
|
refactor: extract FanOutCommunicator and use declarative spec table (#22967)
|
2026-04-16 15:37:19 -07:00 |
|
Xinyuan Tong
|
082eaed0a4
|
test: fix flaky required function calling assertion (#22890)
|
2026-04-16 09:44:26 -07:00 |
|
Zhangheng
|
14bcdfca21
|
[HiSparse]: Adding e2e ut for hisparse (#22979)
|
2026-04-16 23:20:07 +08:00 |
|
Liangsheng Yin
|
bbd8f9ba09
|
migrate CPU-only unit tests from openai_server to unit/ (#22965)
|
2026-04-16 03:53:33 -07:00 |
|
ybyang
|
fbd6dc3565
|
fix: normalize tool message content for GLM5.1 chat template (#22595)
|
2026-04-16 16:48:38 +08:00 |
|
Xinyuan Tong
|
34fef07a15
|
Upgrade transformers to 5.5.3 and refactor hf_transformers_utils into subpackage (#21569)
|
2026-04-15 20:03:44 -07:00 |
|
Liangsheng Yin
|
a4cf2ea128
|
streaming session: spec v2 bonus accounting + comprehensive test matrix (#22651)
|
2026-04-15 17:12:41 -07:00 |
|
Liangsheng Yin
|
f9792166c3
|
trim_overshoot: cap swa_evicted_seqlen + unit test (#22900)
|
2026-04-15 15:05:35 -07:00 |
|
Xinyu Zhang
|
13a2cd748d
|
[Ray] Add data parallel (DP) and DP attention support to RayEngine (#21887)
Co-authored-by: xyuzh <xyuzh@users.noreply.github.com>
|
2026-04-15 15:00:48 -07:00 |
|
Sundara Raman Ramachandran
|
4927975427
|
[Score API] Add return_pooled_hidden_states to Scoring API for SequenceClassification / RewardModel (#22427)
|
2026-04-15 14:58:56 -07:00 |
|
Kurt Shuster
|
32d9fe5a32
|
[lora] Speedup triton backend sgemm calls with better grid (#22386)
|
2026-04-15 13:47:07 -07:00 |
|
Liangsheng Yin
|
aa78564e1a
|
Refactor streaming session abort handling (#22790)
|
2026-04-15 00:13:05 -07:00 |
|
Michael
|
39c6bf730c
|
[AMD][CI] Add GLM-5-MXFP4 accuracy and perf nightly tests for MI35x (#21773)
|
2026-04-14 18:55:36 -07:00 |
|
Ke Bao
|
3c0a6c6987
|
Add page_size and SWA coverage to unified radix cache bench test (#22815)
|
2026-04-14 23:58:05 +08:00 |
|
Bi Xue
|
070c6a2489
|
[sgl] perf optimization for eplb (#21232)
|
2026-04-14 22:52:17 +08:00 |
|
Ke Bao
|
9f9e0231bb
|
Refactor unified radix cache UT into parameterized test suite (#22812)
|
2026-04-14 22:34:33 +08:00 |
|
Jincong Chen
|
6760c790bd
|
[bugfix] avoid attention padding tokens computation in pcg (#17706)
|
2026-04-14 16:08:23 +08:00 |
|
Michael
|
eab045b2b7
|
[AMD] Add MiniMax-M2.7 accuracy and performance nightly tests (#22722)
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-04-14 00:30:11 -07:00 |
|
huangtingwei
|
e9d6b9eb2d
|
[HiCache & HybridModel] mooncake backend support DSA & mamba model (#21259)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-04-13 18:47:36 -07:00 |
|
Liangsheng Yin
|
33a3ba256f
|
Delete dead rematch path in SessionAwareCache.release_session (#22735)
|
2026-04-13 17:02:40 -07:00 |
|
Kurt Shuster
|
ff13dfee45
|
[lora][moe] Virtual experts for LoRA MoE (#22122)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
|
2026-04-13 21:19:30 +00:00 |
|
Asish Kumar
|
39810762d2
|
fix: use describe mode for SGLang version detection (#22600)
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
|
2026-04-13 09:45:45 -07:00 |
|