R0CKSTAR
|
87e50f20f6
|
[Apple Silicon][MLX] Cache seq_lens-derived tensors in BatchedDecodeContext (#23470)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-04-23 18:12:26 -07:00 |
|
Mick
|
c0166355ae
|
[diffusion] CI: minor refactor CI (#23576)
|
2026-04-24 08:48:31 +08:00 |
|
Cheng Wan
|
d9c72bdd2b
|
Skip unselected experts in flashinfer_trtllm (#23493)
|
2026-04-23 17:30:19 -07:00 |
|
Cheng Wan
|
000a2525e1
|
Move expert_mask_gpu from FusedMoE layer to StandardDispatcher (#23585)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-23 17:17:27 -07:00 |
|
Lianmin Zheng
|
95d021b523
|
Pre-set SWA cache location in CudaGraphRunner (#23552)
|
2026-04-23 16:51:29 -07:00 |
|
Lianmin Zheng
|
bb962b0046
|
Fix MoE no_combine: skip router weight in down projection (#23545)
|
2026-04-23 16:47:58 -07:00 |
|
Sundara Raman Ramachandran
|
cf88fdcc9c
|
Expose child process PIDs from Engine for health check support (#23320)
|
2026-04-23 16:44:49 -07:00 |
|
sglang-bot
|
f3b88e080a
|
chore: bump flashinfer version to 0.6.8.post1 (#23281)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2026-04-23 15:23:03 -07:00 |
|
Byron Hsu
|
17210350fd
|
[PD+DP] Allow PrefillDelayer in disaggregated-prefill mode (#23588)
|
2026-04-23 14:51:16 -07:00 |
|
Alex Nails
|
579bd0b152
|
[bug fix] has_fp8_weights_in_checkpoint: handle HF repo IDs, not just local paths (#23542)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-23 12:56:49 -07:00 |
|
WangHao-hw
|
80125febb1
|
[BUGFIX]Fix Ascend backend pre-allocated range in NPU Graph Mode. (#22778)
|
2026-04-24 01:23:35 +08:00 |
|
Jinghong Li
|
c6872fc8fb
|
Fix: fallback to torch API when NVML memory query is not supported (#23426)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-04-23 19:26:04 +03:00 |
|
Jie Hao
|
86ed0680d7
|
feat: add OpenTelemetry tracing to DiffGenerator (#21254)
|
2026-04-23 09:25:23 -07:00 |
|
Arseniy Mironov
|
76e4c5a1f8
|
[Diffusion][NPU][Bugfix] Ascend_fa crashes when sequence parallelism is used. (#23572)
Co-authored-by: Napkin-AI <arseniy.mironov.dev@gmail.com>
|
2026-04-23 19:21:30 +03:00 |
|
Baichuan
|
54e21bb3a5
|
[fix] Fix dynamic chunking profiling crash on GLM-5 models (#23060)
Co-authored-by: liubaichuan <liubaichuan@infini-ai.com>
|
2026-04-23 19:30:57 +08:00 |
|
Xinyi Song
|
cd459af4e2
|
[AMD] Use bpreshuffle FP8 blockscale GEMM to replace ABScale GEMM (#23319)
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-04-23 01:51:30 -07:00 |
|
Ethan (Yusheng) Su
|
2ef1a21d5e
|
[bug fix] fix: detect FP8 weights from safetensors header instead of ass… (#23414)
|
2026-04-23 14:49:57 +08:00 |
|
Kangyan-Zhou
|
f1a70b4666
|
[Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC for gRPC mode (#22500)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-22 23:06:10 -07:00 |
|
mispa-ms
|
3c5b1f0810
|
[diffusion] fix: fix --warmup-resolutions hang with --enable-cfg-parallel (#23198)
|
2026-04-23 13:39:20 +08:00 |
|
Kangyan-Zhou
|
18359aadc8
|
[CI] Lower GSM8K baselines for B200 nightly after eval unification (#22136)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-22 22:30:54 -07:00 |
|
HuangJi
|
9716599383
|
[diffusion] fix: avoid illegal memory access in qwen image (#22953)
|
2026-04-23 12:41:26 +08:00 |
|
Mick
|
4d3c7e781a
|
[diffusion] CI: do not retry consistency failures (#23517)
|
2026-04-23 12:39:37 +08:00 |
|
maocheng23
|
d3aa9128be
|
Change SGLANG_SIMULATE_ACC_METHOD to 'match-expected' (#23527)
|
2026-04-22 21:26:08 -07:00 |
|
Jimmy Shong
|
68a8ed9b11
|
[Fix/Kernel] Add JIT rmsnorm_hf kernel to fix transformers backend MMLU accuracy regression (#22931)
Co-authored-by: SGLang CI <ci@sglang.ai>
|
2026-04-23 12:00:31 +08:00 |
|
Liangsheng Yin
|
0f21fe924a
|
fix ngram greedy verify kwarg (#23521)
|
2026-04-22 20:49:54 -07:00 |
|
ori
|
887d380ace
|
[MUSA] Resolve output garbage in Context Parallel on MusaFlashAttentionBackend (#23270)
Co-authored-by: zhiguo.qin <zhiguo.qin@mthreads.com>
|
2026-04-22 20:22:20 -07:00 |
|
Liangsheng Yin
|
f611dd24f1
|
fix retrive -> retrieve typo (#23503)
Co-authored-by: SoluMilken <19161836+solumilken@users.noreply.github.com>
|
2026-04-22 16:35:04 -07:00 |
|
Yanbin Jiang
|
917d2aa1dc
|
[LoRA] Fix EP + per-expert MoE LoRA illegal memory access (#23178)
|
2026-04-22 14:22:32 -07:00 |
|
Sam Shleifer
|
b9e33d6a5b
|
Dual MoE CUDA graph capture for lora/nolora batches (#22809)
|
2026-04-22 14:11:11 -07:00 |
|
jianan-gu
|
ad0fc88810
|
[CPU] [Quantization] Add GPTQ/AWQ 4bits quantization support for CPU (#22685)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-04-22 13:34:02 -07:00 |
|
Byron Hsu
|
0b77284587
|
[minor] Make DEFAULT_FORCE_STREAM_INTERVAL configurable via SGLANG_FORCE_STREAM_INTERVAL (#23215)
|
2026-04-22 13:05:40 -07:00 |
|
JasonHe-WQ
|
f85e3140bf
|
Fix:fix(timeout): fix timeout not propagated (#21944)
|
2026-04-22 12:48:48 -07:00 |
|
Yuxuan Zhang
|
28cfd3d272
|
Support defer_loading field at function level for Chat Completions API (#22702)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-04-22 10:09:54 -07:00 |
|
Todobe
|
92f28e9ba8
|
[NPU]Fix GLM-4.7-Flash failed on NPU (#22509)
|
2026-04-23 01:06:58 +08:00 |
|
cctry
|
0addd185af
|
Fix /generate endpoint crash when sampling params contain null values (#23401)
|
2026-04-22 09:56:10 -07:00 |
|
Aleksi Vesanto
|
ac351c1f04
|
[diffusion] [AMD] model: allow AITER backends in Flux 2 pipeline (#22802)
|
2026-04-22 08:15:44 -07:00 |
|
Shenxiu Liu
|
8b78e0888c
|
Skip mamba_pool_idx revert for session requests in _get_new_batch_prefill_raw (#23327)
|
2026-04-22 22:28:06 +08:00 |
|
Mick
|
4323fce82a
|
fix: dot-boundary match in is_layer_skipped for FP8 modules_to_not_convert (#23467)
|
2026-04-22 22:16:22 +08:00 |
|
Shangming Cai
|
1c06a3d072
|
[CI] Move disaggregation basic CI back to 2-gpu suite (#23447)
|
2026-04-22 17:50:33 +08:00 |
|
Ming Yang
|
7b10f01d1c
|
[model_runner] Label forward steps in profile traces with mode and token counts (#23419)
|
2026-04-22 02:31:18 -07:00 |
|
inkcherry
|
1e34cd0ba5
|
PD streaming: batch notify + SSE fast path (#22658)
|
2026-04-22 02:21:02 -07:00 |
|
Fengyuan Yu
|
5c245d978f
|
[Diffusion] Add mixed-resolution benchmark support (for #20762) (#20863)
Signed-off-by: Fengyuan Yu <15fengyuan@gmail.com>
Co-authored-by: Fengyuan Yu <15fengyuan@gmail.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-04-22 09:22:19 +03:00 |
|
cctry
|
e39f0f4ff3
|
Use libdevice tanh and support 2D-strided tensors in fused softcap kernel (#23157)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-21 22:54:37 -07:00 |
|
Wenxuan Tan
|
c3ea2d7b92
|
Rename mixed_with_decode_tokens in mixed chunk prefill adder (#6506)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-04-21 22:48:34 -07:00 |
|
Tarushii Goel
|
7607e4d180
|
py-spy without --native for ARM devices (#23410)
|
2026-04-21 20:45:52 -07:00 |
|
shuwenn
|
4befc31408
|
fix: pass v_head_dim to MHA KV pools and validate MiMo HiCache geometry (#23173)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-21 19:48:45 -07:00 |
|
MARATRIX
|
bf5e71dcec
|
[MUSA][19/N] Support HiCache with pin_memory allocator (#23361)
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
|
2026-04-21 19:45:53 -07:00 |
|
Piotr Mazurek
|
6cf0b004ca
|
[MoE] Add LFM2 MoE tuning support + tuned configs for H100/B200/MI325X (#22791)
Co-authored-by: Piotr Mazurek <piotr.mazurek@liquid.ai>
|
2026-04-21 18:32:05 -07:00 |
|
Byron Hsu
|
c090f71bf2
|
feat: enable SGLANG_PATCH_TOKENIZER by default (#23409)
|
2026-04-21 17:53:43 -07:00 |
|
hlu1
|
415f64e763
|
Add MambaPool kvcache offloading during retraction (#22493)
|
2026-04-22 08:51:03 +08:00 |
|