Liangsheng Yin
|
0cb7295698
|
Fix streaming session busy-check double-counting via active_pool_idxs (#22753)
|
2026-04-14 13:11:06 -07:00 |
|
mingyue300
|
b4616dcbf5
|
[BugFix] Fix EAGLE speculative decoding missing grammar-based finish … (#21723)
|
2026-04-14 12:43:50 -07:00 |
|
Mick
|
d2f479e544
|
[diffusion] chore: auto-enable best parallel setting if unspecified (#22763)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-15 00:02:05 +08:00 |
|
Bi Xue
|
070c6a2489
|
[sgl] perf optimization for eplb (#21232)
|
2026-04-14 22:52:17 +08:00 |
|
Mick
|
c5e95080d2
|
[diffusion] model: support Ltx 2.3 two stage ti2v (#22667)
|
2026-04-14 22:10:08 +08:00 |
|
lawtherWu
|
454228e071
|
hicache storage backend mooncake support ascend hixl (#20016)
|
2026-04-14 20:51:06 +08:00 |
|
Jia Guo
|
6da3aba6a5
|
perf: optimize PCG inductor path for FP8 models (#21734)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-14 17:51:27 +08:00 |
|
xutizhou
|
3cb3f7c018
|
fix: EPLB dispatch OOB when shared experts fusion enabled under DeepEP (#22525)
|
2026-04-14 02:33:27 -07:00 |
|
Jincong Chen
|
6760c790bd
|
[bugfix] avoid attention padding tokens computation in pcg (#17706)
|
2026-04-14 16:08:23 +08:00 |
|
Michael
|
eab045b2b7
|
[AMD] Add MiniMax-M2.7 accuracy and performance nightly tests (#22722)
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-04-14 00:30:11 -07:00 |
|
xiaobochen-amd
|
d7ecab5113
|
[ROCm]fix(aiter): cast fp8 prefill output back to model dtype (#22626)
Co-authored-by: kk <43161300+kkHuang-amd@users.noreply.github.com>
|
2026-04-14 00:25:09 -07:00 |
|
Xiaoyu Zhang
|
f97c608caa
|
[diffusion] quant: add FLUX.1-dev modelopt nvfp4 support (#22672)
|
2026-04-14 15:00:59 +08:00 |
|
Colin Z
|
b10f852118
|
GLM-5/5.1 MXFP4 Checkpoint Inference Compatibility Fix (#22543)
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-04-13 23:56:48 -07:00 |
|
YAMY
|
657945c338
|
Replace all-reduce + dp_scatter with reduce_scatterv for DP attention (#22642)
|
2026-04-13 21:51:10 -07:00 |
|
ishandhanani
|
520ce526b9
|
Restore Qwen3 rope config fallback (#22739)
|
2026-04-13 21:47:37 -07:00 |
|
Xuwei
|
a9a2ae4a68
|
[Anthropic] Fix clock mismatch in received_time causing negative Prometheus metrics (#22247)
Signed-off-by: Xuwei Li <lixuwei.xy@gmail.com>
|
2026-04-13 21:22:00 -07:00 |
|
huangtingwei
|
e9d6b9eb2d
|
[HiCache & HybridModel] mooncake backend support DSA & mamba model (#21259)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-04-13 18:47:36 -07:00 |
|
ishandhanani
|
cc449ac4e5
|
feat(metrics): expose raw KV cache pool token counts as prometheus gauges (#22726)
|
2026-04-13 18:30:36 -07:00 |
|
huangtingwei
|
945d73824f
|
[HiSparse] Clarify decode token usage logs (#22331)
|
2026-04-13 18:03:25 -07:00 |
|
yuki-brook
|
1ec018f27a
|
[Feature] Add SiMM as sglang HiCache Storage backend (#18016)
|
2026-04-13 17:12:37 -07:00 |
|
Liangsheng Yin
|
33a3ba256f
|
Delete dead rematch path in SessionAwareCache.release_session (#22735)
|
2026-04-13 17:02:40 -07:00 |
|
Lianmin Zheng
|
9fb00ede15
|
Clean up TokenizerManager and req_time_stats: reduce overhead and simplify (#21646)
|
2026-04-13 16:47:32 -07:00 |
|
Jia Guo
|
a2b5111962
|
perf: skip KV cache in FA backend for embedding mode (#21971)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-13 16:27:52 -07:00 |
|
Lianmin Zheng
|
8f9553bccb
|
[Misc] Migrate SGLANG_SET_CPU_AFFINITY to envs and refactor model config building (#22730)
|
2026-04-13 16:10:31 -07:00 |
|
mqhc2020
|
f4f9e68189
|
[AMD] Add MoE weights and scales padding (#21097)
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-04-13 15:50:15 -07:00 |
|
Yilong Zhao
|
b1efce342c
|
env: add knob to control SWA eviction interval (#22645)
|
2026-04-13 15:37:59 -07:00 |
|
Lianmin Zheng
|
f81b6e8f51
|
[Misc] Add @cache_once to is_arch_support_pdl in jit_kernel (#22724)
|
2026-04-13 14:42:49 -07:00 |
|
Baizhou Zhang
|
b441317aa4
|
Revert "Upgrade CI default CUDA version from 12.9 to 13.0" (#22727)
|
2026-04-13 14:39:24 -07:00 |
|
Lianmin Zheng
|
ba7bcca6b3
|
Use reshape instead of contiguous().view() in TRTLLMHAAttnBackend (#22517)
|
2026-04-13 14:29:12 -07:00 |
|
Kurt Shuster
|
ff13dfee45
|
[lora][moe] Virtual experts for LoRA MoE (#22122)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
|
2026-04-13 21:19:30 +00:00 |
|
ishandhanani
|
6b2bf66cd9
|
fix[glm4.7 flash]: properly detect gfx95_quant_format (#22720)
|
2026-04-13 13:10:07 -07:00 |
|
Asish Kumar
|
39810762d2
|
fix: use describe mode for SGLang version detection (#22600)
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
|
2026-04-13 09:45:45 -07:00 |
|
DarkSharpness
|
314d6ecf08
|
[Feature][JIT Kernel] Fused TP QK norm For Minimax (#20673)
Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2026-04-13 20:29:47 +08:00 |
|
Xiaole Guo
|
4df60434d7
|
[diffusion] model: support stable-diffusion-3-medium-diffusers (#19225)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: Kangrui Du <kangruidu@gmail.com>
Co-authored-by: Xiaole Guo <gxlvera@gmail.com>
|
2026-04-13 16:07:06 +08:00 |
|
Chandrakant Khandelwal
|
1e9eecfa36
|
[Intel GPU] Enable sgl-kernel-xpu fused_experts MoE kernel path for GPT-OSS bf16 models. (#22417)
|
2026-04-13 13:45:48 +08:00 |
|
Mick
|
d524f110ac
|
[diffusion] refactor: streamline denoising stages (#22633)
|
2026-04-13 13:34:37 +08:00 |
|
Polisetty V R K Jyothendra Varma
|
7d2c11970c
|
[Intel GPU] Upgrade pytorch xpu version to 2.11 (#21908)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-13 13:16:24 +08:00 |
|
Zhangheng
|
5549d910c6
|
[hisparse]: Adding ci for hisparse kvcache-swap-in jit-kernel (#22155)
|
2026-04-13 12:50:29 +08:00 |
|
Zhangheng
|
305b42935a
|
[HiSparse]: Add benchmark for hisparse kernel (#22187)
|
2026-04-13 12:49:18 +08:00 |
|
Alison Shao
|
3f4fbc165d
|
Upgrade CI default CUDA version from 12.9 to 13.0 (#21441)
|
2026-04-12 21:48:40 -07:00 |
|
Mohammad Miadh Angkad
|
4dbd59850b
|
Add bfloat16 KV cache validation for HiSparse (#22505)
|
2026-04-13 12:41:42 +08:00 |
|
Xiaoyu Zhang
|
fae0a2fc3c
|
[codex] Add LTX-2.3 benchmark skill recipes (#22631)
|
2026-04-13 12:23:32 +08:00 |
|
Mick
|
bf022e177c
|
Revert "[Diffusion] Add FLUX.1-dev ModelOpt NVFP4 support (#22574)" (#22649)
|
2026-04-13 11:17:32 +08:00 |
|
Zhangheng
|
bc59cc0f96
|
[RaidxTree Refactor]: Support Unified HybridRadixTree V2 (#21206)
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: xiezhq-hermann <xiezhq@stanford.edu>
|
2026-04-13 10:28:22 +08:00 |
|
Ziang Li
|
5593539942
|
[RL] Refactor NVFP4 shuffling/swizzling to in-place replacement (#22204)
|
2026-04-12 19:08:45 -07:00 |
|
blzheng
|
934e19a610
|
[CPU] Fix argument issues in qkv_proj_with_rope_fused_weight and bmm… (#21367)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-13 09:59:13 +08:00 |
|
Liangsheng Yin
|
da6b8e1448
|
Extract pause_resume_in_place kit; rename test_abort to test_scheduler_control (#22647)
|
2026-04-12 18:49:37 -07:00 |
|
Lawrence Wu
|
28e40d873c
|
fix(PD): respect pause_generation in disagg event loops (#20908)
|
2026-04-12 18:07:51 -07:00 |
|
ishandhanani
|
c1ab68b45e
|
fix: streaming session race condition + some metrics (#21875)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-04-12 18:05:23 -07:00 |
|
Xiaoyu Zhang
|
37fc47c645
|
diffusion: fix layerwise offload for ModelOpt quantized DiTs (#22594)
|
2026-04-13 08:01:54 +08:00 |
|