Liangsheng Yin
cd2d45e220
Isolate spec V1 path in decode post-processing ( #22146 )
2026-04-05 03:16:56 -07:00
R0CKSTAR
10b18b8b29
[diffusion] Add is_float64_supported to Platform ( #22112 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2026-04-05 18:12:28 +08:00
iLeGend
5a35316417
Enable IndexCache for DeepSeek V3.2 ( #21405 )
...
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com >
2026-04-05 02:45:58 -07:00
Baizhou Zhang
088203454b
[Fix] Fix nightly tests ( #22140 )
2026-04-05 02:26:42 -07:00
Liangsheng Yin
bd6a585605
Consolidate reasoning tests into test/registered/reasoning/ ( #22139 )
2026-04-05 01:09:11 -07:00
Yuhao Yang
2b119ba388
[diffusion] fix: fix accuracy for flux series ( #22059 )
...
Co-authored-by: Mick <mickjagger19@icloud.com >
2026-04-05 16:03:17 +08:00
Xiancheng Meng
71544f0341
[model] support voxtral (speech-to-text) ( #21635 )
...
Co-authored-by: mengxiancheng03 <mengxiancheng03@kuaishou.com >
2026-04-05 15:46:30 +08:00
Liangsheng Yin
904bb476d8
Migrate reasoning_tokens tests to existing server fixtures ( #22102 )
2026-04-05 00:30:57 -07:00
RoyWang
dd49127fe6
[AMD]: Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5… ( #21213 )
...
Co-authored-by: RoyWang <RoyWang@amd.com >
2026-04-04 22:13:29 -07:00
Ricardo-M-L
70658bfeb5
fix: add missing f-string prefixes in warning and assert messages ( #22067 )
...
Co-authored-by: yuj <yuj@ztjzsoft.com >
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-04 17:20:16 -07:00
Mick
efee62efa6
[diffusion] CI: improve diffusion comparison benchmark setting for realistic perf and auto-discover ut ( #22086 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-04 23:20:37 +08:00
Xiaoyu Zhang
0f0f004f1f
[Benchmark] Add auto benchmark tool with YAML-driven server flag search and canonical dataset format ( #21736 )
2026-04-04 21:46:58 +08:00
Xiaoyu Zhang
da25b471e3
Align diffusion nightly presets and broaden skill discovery ( #22099 )
2026-04-04 21:43:52 +08:00
Liangsheng Yin
abc297521f
Fix killall_sglang missing the main sglang serve process ( #22103 )
2026-04-04 03:43:08 -07:00
Muqi Li
1ad6839659
[Feature] Add Reasoning Tokens Usage ( #15562 )
...
Signed-off-by: Muqi Li <muqi1029@gmail.com >
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com >
Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com >
Co-authored-by: cklxx <1293822641@qq.com >
Co-authored-by: hnyls2002 <lsyincs@gmail.com >
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com >
2026-04-04 02:18:10 -07:00
Baizhou Zhang
bf984ae65d
Revert "[Bugfix] Temporarily skip TRTLLM attention on (G)B300 (SM103) to avoid high-concurrency hang" ( #22098 )
2026-04-04 02:17:19 -07:00
sglang-bot
46bf19cdab
chore: bump flashinfer version to 0.6.7.post2 ( #22097 )
...
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com >
2026-04-04 02:16:25 -07:00
narutolhy
24763256b9
[Speculative Decoding] Add FA4-based Spec Support ( #21080 )
...
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com >
2026-04-04 02:09:45 -07:00
Yuhao Yang
34d5765e2f
[VLM] Chunk-aware ViT encoding with per-image cache and lazy device transfer ( #22038 )
2026-04-04 16:55:17 +08:00
Piotr Mazurek
b5e8c4b9e3
model: support LFM2-VL (Liquid Foundation Model 2 Vision-Language) ( #21230 )
...
Co-authored-by: Piotr Mazurek <piotr.mazurek@liquid.ai >
2026-04-04 16:36:04 +08:00
R0CKSTAR
1fb4bf3558
[diffusion] fix: validate attention backend for Ring Attention in USPAttention ( #21828 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2026-04-04 16:24:02 +08:00
harrisonlimh
9fa12d605a
Add dsv3 router gemm benchmark on blackwell ( #17707 )
2026-04-04 01:18:01 -07:00
Xiaoyu Zhang
82ea4906cf
[diffusion] Default NVFP4 to CUTLASS and add all-model shape benchmarks ( #22091 )
2026-04-04 16:14:38 +08:00
Ethan (Yusheng) Su
ff8e47edf9
[5/n] Lora support cuda graph ( #21647 )
2026-04-04 00:31:46 -07:00
Douglas Yang
a94c3804c2
fix: mistral embedding regression fix ( #21913 )
2026-04-04 00:11:51 -07:00
Chi McIsaac
005e582d06
[diffusion] improve: norm fusion for z-image ( #18762 )
...
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com >
Co-authored-by: yihanc <yingluosanqian@gmail.com >
Co-authored-by: Mick <mickjagger19@icloud.com >
2026-04-04 14:01:01 +08:00
Qiaolin Yu
ef13031243
Tiny fix step3.5-flash launch crash ( #22076 )
2026-04-03 22:25:25 -07:00
Ziang Li
990c7590b8
[RL] Support mxfp8 DeepSeek V3 ( #21280 )
2026-04-03 21:57:45 -07:00
faceless void
de9859073f
Add --stream-response-default-include-usage server flag ( #16711 )
2026-04-03 21:36:00 -07:00
CHEN Xi
31c9d8e885
[Diffusion] Fix weight scale swizzle and add large-M kernel config for FLUX.2-dev-NVFP4 ( #22064 )
2026-04-04 11:50:30 +08:00
Yilong Zhao
fe92f3563c
dp: add profile req hook ( #22083 )
2026-04-03 20:47:09 -07:00
Yuxuan Zhang
b7ae3b5a9a
GLM-4.7 and GLM-4.7-Flash Loading and import format ( #21851 )
2026-04-03 20:44:08 -07:00
Prozac614
db3d4f4b76
[diffusion] model: support two stage pipeline of LTX-2 ( #20707 )
...
Co-authored-by: daiweitao <dwti614707404@163.com >
Co-authored-by: Mick <mickjagger19@icloud.com >
Co-authored-by: GMI Xiao Jin <xiao.j@gmicloud.ai >
2026-04-04 09:37:28 +08:00
Liangsheng Yin
95cdbce34f
[Test] Extract common PD server setup into base fixture ( #22080 )
2026-04-03 16:37:12 -07:00
Lawrence Wu
9593d434c4
fix: pause_generation should not populate running_batch on prefill nodes ( #20273 )
2026-04-03 16:16:06 -07:00
Sundara Raman Ramachandran
90e86800f4
[Score API] Implement EngineScoreMixin for scoring functionality and refactor Tok… ( #21342 )
2026-04-03 15:17:42 -07:00
Baizhou Zhang
ac1e437f6a
Revert "[Feature] JIT activation and update skills (by codex)" ( #22078 )
2026-04-03 15:04:15 -07:00
Mohammad Miadh Angkad
8cb337c8ea
[Bugfix] Temporarily skip TRTLLM attention on (G)B300 (SM103) to avoid high-concurrency hang ( #21906 )
...
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com >
2026-04-03 14:19:13 -07:00
Yz Xiao
1d7a53dd03
[Fix] XGrammarGrammarBackend reset to clear inherited cache ( #22054 )
2026-04-03 14:17:59 -07:00
sglang-bot
84118acf50
chore: bump sglang-kernel version to 0.4.1 ( #22009 )
...
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com >
2026-04-03 13:58:35 -07:00
Lianmin Zheng
eb407b80f3
[Kernel] Make FA3/FA4 imports lazy in FlashAttentionBackend ( #22028 )
2026-04-03 13:49:00 -07:00
Brayden Zhong
6aafe756b9
Revert "[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+… ( #22047 )
2026-04-03 13:12:30 -07:00
Shiyan Deng
0c9dc098e7
Fix DP attention worker port binding for IPv6 support ( #21917 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
2026-04-03 12:39:39 -07:00
Zhangheng
ed3435e37f
[HiSparse]: Optimize server args checking-HiSparse is temporarily only available for DSA models. ( #22065 )
2026-04-04 02:23:56 +08:00
Mick
151f727163
[diffusion] fix: fix gated repo failing the generate cmd ( #22040 )
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-04 00:43:11 +08:00
DarkSharpness
44e5d35703
[Feature][JIT Kernel] JIT activation and update skills (by codex) ( #21766 )
...
Co-authored-by: weiminc <tnwilly@gmail.com >
2026-04-03 23:28:54 +08:00
Mick
030fb1c4b1
refactor: replace mm_inputs dict with MultimodalProcessorOutput ( #21738 )
2026-04-03 23:26:37 +08:00
Ke Bao
9f409d0749
[CI] Adjust CI server launch timeout ( #22045 )
2026-04-03 22:38:07 +08:00
Xiaoyu Zhang
ee9d922f5a
Revert "[Kernel] Fuse temperature + softmax in sampling for decode speedup" ( #22046 )
2026-04-03 21:32:08 +08:00
Kangyan-Zhou
56ac9c9932
[Fix] Add _MOE_TP to graph_capture for MoE models with ep>1 ( #21907 )
...
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com >
2026-04-03 02:33:16 -07:00