Lianmin Zheng
|
a80961333b
|
Clean up req_time_stats: reduce overhead and simplify (#22186)
|
2026-04-06 14:20:51 -07:00 |
|
Qiaolin Yu
|
93f38fe410
|
tiny fix chain-style multi layer eagle comments (#22206)
|
2026-04-06 13:49:03 -07:00 |
|
Tarushii Goel
|
8f337682bd
|
[sgl] potential chained spec v2 fixes (#22041)
Co-authored-by: Mook <Godmook@users.noreply.github.com>
Co-authored-by: yudian0504 <yudian0504@users.noreply.github.com>
|
2026-04-06 13:38:04 -07:00 |
|
Ratish P
|
7f2fcc0b08
|
[VLM]: allow Qwen3.5 models for encoder disaggregation (#21849)
|
2026-04-07 02:07:24 +08:00 |
|
Aurick Qiao
|
3178f3959f
|
Align incremental streaming logprobs with streamed output tokens (#21583)
|
2026-04-06 00:30:02 -07:00 |
|
Khoa Pham
|
12272b6791
|
[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425)
|
2026-04-06 00:11:14 -07:00 |
|
Liangsheng Yin
|
6de2ff2a80
|
[Spec][Ngram] Followup fixes for MatchState incremental advance (#22180)
|
2026-04-05 23:04:28 -07:00 |
|
YAMY
|
dc125afffb
|
Add staging buffer CI test and documentation for heterogeneous TP (#21921)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-04-06 14:00:20 +08:00 |
|
Khoa Pham
|
b2008bf9e0
|
[Spec][Ngram] 5/N: Store and advance anchor match state across decode steps (#21243)
|
2026-04-05 22:21:05 -07:00 |
|
Mick
|
82c41a2d9e
|
[diffusion] model: support LTX2.3 (#22111)
|
2026-04-06 12:26:30 +08:00 |
|
Qiaolin Yu
|
f407461ec8
|
Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype (#22006)
|
2026-04-05 21:11:45 -07:00 |
|
Lianmin Zheng
|
e835601fb7
|
Cache gfx95 quant format detection in DeepseekV2DecoderLayer (#22143)
|
2026-04-05 20:20:54 -07:00 |
|
Bi Xue
|
52801ff20c
|
[sgl] two potential spec_v2 bug fixes (#21589)
Co-authored-by: yilian49 <yilian49@users.noreply.github.com>
|
2026-04-05 19:41:43 -07:00 |
|
Prozac614
|
2f00e42555
|
[diffusion] CI: apply diffusers backend in lora case (#22157)
Co-authored-by: daiweitao <dwti614707404@163.com>
|
2026-04-06 10:14:35 +08:00 |
|
Zhiqiang Xie
|
41c7c97ff3
|
fix hisparse LRU policy (#22170)
Co-authored-by: huangtingwei9988 <huangtingwei9988@users.noreply.github.com>
Co-authored-by: hzh0425 <hzh0425@users.noreply.github.com>
|
2026-04-05 18:47:58 -07:00 |
|
Kangyan-Zhou
|
93109cc89b
|
[Fix] Fix setuptools-scm version resolution for rc tags (#22165)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-04-05 16:55:32 -07:00 |
|
Zhiqiang Xie
|
30ba1f78b0
|
Hisparse Minor Fix (#22131)
Co-authored-by: huangtingwei9988 <141888744+huangtingwei9988@users.noreply.github.com>
Co-authored-by: hzh0425 <58988019+hzh0425@users.noreply.github.com>
|
2026-04-05 16:15:47 -07:00 |
|
Kangyan-Zhou
|
5dd2c243eb
|
fix: TRT-LLM MHA CUDA illegal address with EAGLE v2 + DP attention (#21649)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-04-05 09:41:14 -07:00 |
|
Baizhou Zhang
|
c5fa364b80
|
[Hotfix] Fix router gemm on sm103 (#22134)
|
2026-04-05 09:33:14 -07:00 |
|
Zhangheng
|
51b276de74
|
[BugFix][RadixTree]: Fix backup invariant violation in Hi-MambaRadixTree (#22062)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: linjianyu77@foxmail.com
|
2026-04-05 23:19:50 +08:00 |
|
Shangming Cai
|
dccb11881f
|
[PD] Fix staging warmup for GQA prefill decode different tp (#22153)
|
2026-04-05 23:13:06 +08:00 |
|
Liangsheng Yin
|
df9c831ab8
|
Unify think_end_id to model_config as single source of truth (#22148)
|
2026-04-05 03:35:38 -07:00 |
|
Liangsheng Yin
|
aeff9fb7c1
|
Add dump_metric to MMMU, lm-eval, and NeMo Skills eval paths (#22147)
|
2026-04-05 03:23:52 -07:00 |
|
Liangsheng Yin
|
cd2d45e220
|
Isolate spec V1 path in decode post-processing (#22146)
|
2026-04-05 03:16:56 -07:00 |
|
R0CKSTAR
|
10b18b8b29
|
[diffusion] Add is_float64_supported to Platform (#22112)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-04-05 18:12:28 +08:00 |
|
iLeGend
|
5a35316417
|
Enable IndexCache for DeepSeek V3.2 (#21405)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-04-05 02:45:58 -07:00 |
|
Baizhou Zhang
|
088203454b
|
[Fix] Fix nightly tests (#22140)
|
2026-04-05 02:26:42 -07:00 |
|
Liangsheng Yin
|
bd6a585605
|
Consolidate reasoning tests into test/registered/reasoning/ (#22139)
|
2026-04-05 01:09:11 -07:00 |
|
Yuhao Yang
|
2b119ba388
|
[diffusion] fix: fix accuracy for flux series (#22059)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-05 16:03:17 +08:00 |
|
Xiancheng Meng
|
71544f0341
|
[model] support voxtral (speech-to-text) (#21635)
Co-authored-by: mengxiancheng03 <mengxiancheng03@kuaishou.com>
|
2026-04-05 15:46:30 +08:00 |
|
Liangsheng Yin
|
904bb476d8
|
Migrate reasoning_tokens tests to existing server fixtures (#22102)
|
2026-04-05 00:30:57 -07:00 |
|
RoyWang
|
dd49127fe6
|
[AMD]: Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5… (#21213)
Co-authored-by: RoyWang <RoyWang@amd.com>
|
2026-04-04 22:13:29 -07:00 |
|
Ricardo-M-L
|
70658bfeb5
|
fix: add missing f-string prefixes in warning and assert messages (#22067)
Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-04 17:20:16 -07:00 |
|
Mick
|
efee62efa6
|
[diffusion] CI: improve diffusion comparison benchmark setting for realistic perf and auto-discover ut (#22086)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-04 23:20:37 +08:00 |
|
Xiaoyu Zhang
|
0f0f004f1f
|
[Benchmark] Add auto benchmark tool with YAML-driven server flag search and canonical dataset format (#21736)
|
2026-04-04 21:46:58 +08:00 |
|
Xiaoyu Zhang
|
da25b471e3
|
Align diffusion nightly presets and broaden skill discovery (#22099)
|
2026-04-04 21:43:52 +08:00 |
|
Liangsheng Yin
|
abc297521f
|
Fix killall_sglang missing the main sglang serve process (#22103)
|
2026-04-04 03:43:08 -07:00 |
|
Muqi Li
|
1ad6839659
|
[Feature] Add Reasoning Tokens Usage (#15562)
Signed-off-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Mufeez Amjad <mufeez.amjad@outlook.com>
Co-authored-by: cklxx <1293822641@qq.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-04-04 02:18:10 -07:00 |
|
Baizhou Zhang
|
bf984ae65d
|
Revert "[Bugfix] Temporarily skip TRTLLM attention on (G)B300 (SM103) to avoid high-concurrency hang" (#22098)
|
2026-04-04 02:17:19 -07:00 |
|
sglang-bot
|
46bf19cdab
|
chore: bump flashinfer version to 0.6.7.post2 (#22097)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2026-04-04 02:16:25 -07:00 |
|
narutolhy
|
24763256b9
|
[Speculative Decoding] Add FA4-based Spec Support (#21080)
Co-authored-by: luhongyu.4869 <luhongyu.4869@bytedance.com>
|
2026-04-04 02:09:45 -07:00 |
|
Yuhao Yang
|
34d5765e2f
|
[VLM] Chunk-aware ViT encoding with per-image cache and lazy device transfer (#22038)
|
2026-04-04 16:55:17 +08:00 |
|
Piotr Mazurek
|
b5e8c4b9e3
|
model: support LFM2-VL (Liquid Foundation Model 2 Vision-Language) (#21230)
Co-authored-by: Piotr Mazurek <piotr.mazurek@liquid.ai>
|
2026-04-04 16:36:04 +08:00 |
|
R0CKSTAR
|
1fb4bf3558
|
[diffusion] fix: validate attention backend for Ring Attention in USPAttention (#21828)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-04-04 16:24:02 +08:00 |
|
harrisonlimh
|
9fa12d605a
|
Add dsv3 router gemm benchmark on blackwell (#17707)
|
2026-04-04 01:18:01 -07:00 |
|
Xiaoyu Zhang
|
82ea4906cf
|
[diffusion] Default NVFP4 to CUTLASS and add all-model shape benchmarks (#22091)
|
2026-04-04 16:14:38 +08:00 |
|
Ethan (Yusheng) Su
|
ff8e47edf9
|
[5/n] Lora support cuda graph (#21647)
|
2026-04-04 00:31:46 -07:00 |
|
Douglas Yang
|
a94c3804c2
|
fix: mistral embedding regression fix (#21913)
|
2026-04-04 00:11:51 -07:00 |
|
Chi McIsaac
|
005e582d06
|
[diffusion] improve: norm fusion for z-image (#18762)
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: yihanc <yingluosanqian@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-04 14:01:01 +08:00 |
|
Qiaolin Yu
|
ef13031243
|
Tiny fix step3.5-flash launch crash (#22076)
|
2026-04-03 22:25:25 -07:00 |
|