Commit Graph

7528 Commits

Author SHA1 Message Date
Lianmin Zheng
a80961333b Clean up req_time_stats: reduce overhead and simplify (#22186) 2026-04-06 14:20:51 -07:00
Qiaolin Yu
93f38fe410 tiny fix chain-style multi layer eagle comments (#22206) 2026-04-06 13:49:03 -07:00
Tarushii Goel
8f337682bd [sgl] potential chained spec v2 fixes (#22041)
Co-authored-by: Mook <Godmook@users.noreply.github.com>
Co-authored-by: yudian0504 <yudian0504@users.noreply.github.com>
2026-04-06 13:38:04 -07:00
Ratish P
7f2fcc0b08 [VLM]: allow Qwen3.5 models for encoder disaggregation (#21849) 2026-04-07 02:07:24 +08:00
Aurick Qiao
3178f3959f Align incremental streaming logprobs with streamed output tokens (#21583) 2026-04-06 00:30:02 -07:00
Khoa Pham
12272b6791 [Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425) 2026-04-06 00:11:14 -07:00
Liangsheng Yin
6de2ff2a80 [Spec][Ngram] Followup fixes for MatchState incremental advance (#22180) 2026-04-05 23:04:28 -07:00
YAMY
dc125afffb Add staging buffer CI test and documentation for heterogeneous TP (#21921)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2026-04-06 14:00:20 +08:00
Khoa Pham
b2008bf9e0 [Spec][Ngram] 5/N: Store and advance anchor match state across decode steps (#21243) 2026-04-05 22:21:05 -07:00
Mick
82c41a2d9e [diffusion] model: support LTX2.3 (#22111) 2026-04-06 12:26:30 +08:00
Qiaolin Yu
f407461ec8 Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype (#22006) 2026-04-05 21:11:45 -07:00
Lianmin Zheng
e835601fb7 Cache gfx95 quant format detection in DeepseekV2DecoderLayer (#22143) 2026-04-05 20:20:54 -07:00
Bi Xue
52801ff20c [sgl] two potential spec_v2 bug fixes (#21589)
Co-authored-by: yilian49 <yilian49@users.noreply.github.com>
2026-04-05 19:41:43 -07:00
Prozac614
2f00e42555 [diffusion] CI: apply diffusers backend in lora case (#22157)
Co-authored-by: daiweitao <dwti614707404@163.com>
2026-04-06 10:14:35 +08:00
Zhiqiang Xie
41c7c97ff3 fix hisparse LRU policy (#22170)
Co-authored-by: huangtingwei9988 <huangtingwei9988@users.noreply.github.com>
Co-authored-by: hzh0425 <hzh0425@users.noreply.github.com>
2026-04-05 18:47:58 -07:00
Kangyan-Zhou
93109cc89b [Fix] Fix setuptools-scm version resolution for rc tags (#22165)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-04-05 16:55:32 -07:00
Zhiqiang Xie
30ba1f78b0 Hisparse Minor Fix (#22131)
Co-authored-by: huangtingwei9988 <141888744+huangtingwei9988@users.noreply.github.com>
Co-authored-by: hzh0425 <58988019+hzh0425@users.noreply.github.com>
2026-04-05 16:15:47 -07:00
Kangyan-Zhou
5dd2c243eb fix: TRT-LLM MHA CUDA illegal address with EAGLE v2 + DP attention (#21649)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-04-05 09:41:14 -07:00
Baizhou Zhang
c5fa364b80 [Hotfix] Fix router gemm on sm103 (#22134) 2026-04-05 09:33:14 -07:00
Zhangheng
51b276de74 [BugFix][RadixTree]: Fix backup invariant violation in Hi-MambaRadixTree (#22062)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: linjianyu77@foxmail.com
2026-04-05 23:19:50 +08:00
Shangming Cai
dccb11881f [PD] Fix staging warmup for GQA prefill decode different tp (#22153) 2026-04-05 23:13:06 +08:00
Liangsheng Yin
df9c831ab8 Unify think_end_id to model_config as single source of truth (#22148) 2026-04-05 03:35:38 -07:00
Liangsheng Yin
aeff9fb7c1 Add dump_metric to MMMU, lm-eval, and NeMo Skills eval paths (#22147) 2026-04-05 03:23:52 -07:00
Liangsheng Yin
cd2d45e220 Isolate spec V1 path in decode post-processing (#22146) 2026-04-05 03:16:56 -07:00
R0CKSTAR
10b18b8b29 [diffusion] Add is_float64_supported to Platform (#22112)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2026-04-05 18:12:28 +08:00
iLeGend
5a35316417 Enable IndexCache for DeepSeek V3.2 (#21405)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
2026-04-05 02:45:58 -07:00
Baizhou Zhang
088203454b [Fix] Fix nightly tests (#22140) 2026-04-05 02:26:42 -07:00
Liangsheng Yin
bd6a585605 Consolidate reasoning tests into test/registered/reasoning/ (#22139) 2026-04-05 01:09:11 -07:00
Yuhao Yang
2b119ba388 [diffusion] fix: fix accuracy for flux series (#22059)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-05 16:03:17 +08:00
Xiancheng Meng
71544f0341 [model] support voxtral (speech-to-text) (#21635)
Co-authored-by: mengxiancheng03 <mengxiancheng03@kuaishou.com>
2026-04-05 15:46:30 +08:00
Liangsheng Yin
904bb476d8 Migrate reasoning_tokens tests to existing server fixtures (#22102) 2026-04-05 00:30:57 -07:00
RoyWang
dd49127fe6 [AMD]: Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5… (#21213)
Co-authored-by: RoyWang <RoyWang@amd.com>
2026-04-04 22:13:29 -07:00
Ricardo-M-L
70658bfeb5 fix: add missing f-string prefixes in warning and assert messages (#22067)
Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 17:20:16 -07:00
Mick
efee62efa6 [diffusion] CI: improve diffusion comparison benchmark setting for realistic perf and auto-discover ut (#22086)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 23:20:37 +08:00
Xiaoyu Zhang
0f0f004f1f [Benchmark] Add auto benchmark tool with YAML-driven server flag search and canonical dataset format (#21736) 2026-04-04 21:46:58 +08:00
Xiaoyu Zhang
da25b471e3 Align diffusion nightly presets and broaden skill discovery (#22099) 2026-04-04 21:43:52 +08:00
Liangsheng Yin
abc297521f Fix killall_sglang missing the main sglang serve process (#22103) 2026-04-04 03:43:08 -07:00
Muqi Li
1ad6839659 [Feature] Add Reasoning Tokens Usage (#15562)
Signed-off-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>
Co-authored-by: cklxx <1293822641@qq.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2026-04-04 02:18:10 -07:00
Baizhou Zhang
bf984ae65d Revert "[Bugfix] Temporarily skip TRTLLM attention on (G)B300 (SM103) to avoid high-concurrency hang" (#22098) 2026-04-04 02:17:19 -07:00
sglang-bot
46bf19cdab chore: bump flashinfer version to 0.6.7.post2 (#22097)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2026-04-04 02:16:25 -07:00
narutolhy
24763256b9 [Speculative Decoding] Add FA4-based Spec Support (#21080)
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>
2026-04-04 02:09:45 -07:00
Yuhao Yang
34d5765e2f [VLM] Chunk-aware ViT encoding with per-image cache and lazy device transfer (#22038) 2026-04-04 16:55:17 +08:00
Piotr Mazurek
b5e8c4b9e3 model: support LFM2-VL (Liquid Foundation Model 2 Vision-Language) (#21230)
Co-authored-by: Piotr Mazurek <piotr.mazurek@liquid.ai>
2026-04-04 16:36:04 +08:00
R0CKSTAR
1fb4bf3558 [diffusion] fix: validate attention backend for Ring Attention in USPAttention (#21828)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2026-04-04 16:24:02 +08:00
harrisonlimh
9fa12d605a Add dsv3 router gemm benchmark on blackwell (#17707) 2026-04-04 01:18:01 -07:00
Xiaoyu Zhang
82ea4906cf [diffusion] Default NVFP4 to CUTLASS and add all-model shape benchmarks (#22091) 2026-04-04 16:14:38 +08:00
Ethan (Yusheng) Su
ff8e47edf9 [5/n] Lora support cuda graph (#21647) 2026-04-04 00:31:46 -07:00
Douglas Yang
a94c3804c2 fix: mistral embedding regression fix (#21913) 2026-04-04 00:11:51 -07:00
Chi McIsaac
005e582d06 [diffusion] improve: norm fusion for z-image (#18762)
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: yihanc <yingluosanqian@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-04 14:01:01 +08:00
Qiaolin Yu
ef13031243 Tiny fix step3.5-flash launch crash (#22076) 2026-04-03 22:25:25 -07:00