Thomas Wang
|
671fe73961
|
Reduce unnecessary kernels and copies in the NSA indexer (#22232)
|
2026-04-07 15:37:08 -07:00 |
|
David Wang
|
f08726fd56
|
[Feature] Add DFLASH speculative decoding support (#22077)
Co-authored-by: Jian Chen <141193260+jianc99@users.noreply.github.com>
Co-authored-by: Zhijian Liu <5782437+zhijian-liu@users.noreply.github.com>
Co-authored-by: Richard Gong <8001209+gongy@users.noreply.github.com>
Co-authored-by: David Wang <21328423+dcw02@users.noreply.github.com>
Co-authored-by: yilian49 <43861414+yilian49@users.noreply.github.com>
Co-authored-by: xm:D <38322020+xiaomin-d@users.noreply.github.com>
|
2026-04-07 14:48:51 -07:00 |
|
Liangsheng Yin
|
cc35714b03
|
[tiny] migrate /get_server_info; print accept length in accuracy tests (#22282)
|
2026-04-07 13:08:35 -07:00 |
|
Rain Jiang
|
1a8eb890f6
|
Kernels community fa3 (#20796)
|
2026-04-07 12:48:44 -07:00 |
|
huangtingwei
|
0c204fbd57
|
[HiSparse] Optimize the scheduling of decode backup. (#21932)
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2026-04-07 10:34:58 -07:00 |
|
khalilzhk
|
6131fb5882
|
[NPU] enable mla prepare fused kernel only when being mla attn (#22024)
|
2026-04-08 00:49:16 +08:00 |
|
Ke Bao
|
be42fbbbd7
|
Support HTTP2 server (#21700)
|
2026-04-08 00:42:52 +08:00 |
|
shuwenn
|
ec5742f4ab
|
fix: Auto-correct page_size for Mamba no_buffer radix cache mode (#20538)
|
2026-04-08 00:19:31 +08:00 |
|
Henson-Zh-Ali
|
727a182067
|
[Mamba] eliminate D2H if tracking mamba states (#20522)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-04-08 00:17:26 +08:00 |
|
YAMY
|
5ae00ecd48
|
[Disagg][NIXL] Support Mamba state slice transfer for heterogeneous TP (Step 2/2 for Qwen3.5) (#22240)
|
2026-04-07 23:47:31 +08:00 |
|
Mick
|
e7bc23cdab
|
[diffusion] CI: fix consistency check (#22251)
|
2026-04-07 23:43:18 +08:00 |
|
Yujun Dong
|
233f3e31bf
|
fix(pcg,mm): fix zeroing of input_embeds when replay PCG (#22229)
|
2026-04-07 20:33:17 +08:00 |
|
Xingyu Liu
|
98f38b14df
|
Add registration API for external linear attention backend (#21983)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
|
2026-04-07 02:47:40 -07:00 |
|
Nicolas Castet
|
490fa9fa44
|
[Perf] Restore torch.compile fusion for topk postprocessing (#21771)
|
2026-04-07 01:38:38 -07:00 |
|
YAMY
|
3148742ddb
|
[Disagg][NIXL] Fix heterogeneous TP KV transfer for non-MLA models (same logic with mooncake, Step 1/2 for Qwen3.5 support) (#22145)
|
2026-04-07 14:52:02 +08:00 |
|
amote-i
|
3f7dfba419
|
fix qwen2_5_math_rm_72b (#21295)
|
2026-04-07 14:36:57 +08:00 |
|
Aditya Sharma
|
f6e85676b5
|
model: support qwen3-asr (#22073)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-04-07 13:27:05 +08:00 |
|
Chang Min Bark
|
a757c1e3fb
|
[Apple Silicon] [MLX] Add mlx and mlx-lm dependencies (#22162)
Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>
|
2026-04-07 11:36:43 +08:00 |
|
Xinyuan Tong
|
2813cb6d9a
|
[New Model] Gemma 4 (#21952)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Pengyu Chen <pychen96@gmail.com>
Co-authored-by: kpham-sgl <khoa.pham@radixark.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Andy Luo <andy.luo@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
|
2026-04-06 20:24:44 -07:00 |
|
jianzhao-xu
|
73fc87a74f
|
fix(grok): adapt huihui-ai/grok-2 (#21522)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
|
2026-04-07 10:04:41 +08:00 |
|
Lianmin Zheng
|
494bb86169
|
Cache sub-objects in __getitem__ to ensure identity stability (#22184)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-06 18:53:38 -07:00 |
|
Prozac614
|
ef2d4013d7
|
[diffusion] CI: add consistency test (#15236)
Co-authored-by: daiweitao <dwti614707404@163.com>
|
2026-04-07 09:50:23 +08:00 |
|
Liangsheng Yin
|
e4b1366a46
|
[Spec][Ngram] Support multiple SAMs with dynamic HTTP API (#22203)
|
2026-04-06 18:49:22 -07:00 |
|
Liangsheng Yin
|
49cb7d546e
|
Move hash utils out of hicache_storage to break CUDA import chain (#22214)
|
2026-04-06 18:16:40 -07:00 |
|
AichenF
|
5e2b0f860c
|
[diffusion] perf: replace Conv3d with reshape + F.linear in PatchEmbed (#21014)
|
2026-04-07 09:12:59 +08:00 |
|
shadowxz109
|
ae38b24cc3
|
[NPU] Support dp-attention for MiniMax2.5 (#20919)
|
2026-04-07 08:55:37 +08:00 |
|
Trevor Morris
|
5cc246e095
|
Fix extra calls to get_numa_node_if_available to clean up logs (#21781)
|
2026-04-06 16:18:40 -07:00 |
|
Lianmin Zheng
|
a80961333b
|
Clean up req_time_stats: reduce overhead and simplify (#22186)
|
2026-04-06 14:20:51 -07:00 |
|
Qiaolin Yu
|
93f38fe410
|
tiny fix chain-style multi layer eagle comments (#22206)
|
2026-04-06 13:49:03 -07:00 |
|
Tarushii Goel
|
8f337682bd
|
[sgl] potential chained spec v2 fixes (#22041)
Co-authored-by: Mook <Godmook@users.noreply.github.com>
Co-authored-by: yudian0504 <yudian0504@users.noreply.github.com>
|
2026-04-06 13:38:04 -07:00 |
|
Ratish P
|
7f2fcc0b08
|
[VLM]: allow Qwen3.5 models for encoder disaggregation (#21849)
|
2026-04-07 02:07:24 +08:00 |
|
Aurick Qiao
|
3178f3959f
|
Align incremental streaming logprobs with streamed output tokens (#21583)
|
2026-04-06 00:30:02 -07:00 |
|
Khoa Pham
|
12272b6791
|
[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton (#21425)
|
2026-04-06 00:11:14 -07:00 |
|
Liangsheng Yin
|
6de2ff2a80
|
[Spec][Ngram] Followup fixes for MatchState incremental advance (#22180)
|
2026-04-05 23:04:28 -07:00 |
|
YAMY
|
dc125afffb
|
Add staging buffer CI test and documentation for heterogeneous TP (#21921)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-04-06 14:00:20 +08:00 |
|
Khoa Pham
|
b2008bf9e0
|
[Spec][Ngram] 5/N: Store and advance anchor match state across decode steps (#21243)
|
2026-04-05 22:21:05 -07:00 |
|
Mick
|
82c41a2d9e
|
[diffusion] model: support LTX2.3 (#22111)
|
2026-04-06 12:26:30 +08:00 |
|
Qiaolin Yu
|
f407461ec8
|
Tiny fix trtllm_fp8_per_tensor_scale_moe_wrapper router_logits dtype (#22006)
|
2026-04-05 21:11:45 -07:00 |
|
Lianmin Zheng
|
e835601fb7
|
Cache gfx95 quant format detection in DeepseekV2DecoderLayer (#22143)
|
2026-04-05 20:20:54 -07:00 |
|
Bi Xue
|
52801ff20c
|
[sgl] two potential spec_v2 bug fixes (#21589)
Co-authored-by: yilian49 <yilian49@users.noreply.github.com>
|
2026-04-05 19:41:43 -07:00 |
|
Prozac614
|
2f00e42555
|
[diffusion] CI: apply diffusers backend in lora case (#22157)
Co-authored-by: daiweitao <dwti614707404@163.com>
|
2026-04-06 10:14:35 +08:00 |
|
Zhiqiang Xie
|
41c7c97ff3
|
fix hisparse LRU policy (#22170)
Co-authored-by: huangtingwei9988 <huangtingwei9988@users.noreply.github.com>
Co-authored-by: hzh0425 <hzh0425@users.noreply.github.com>
|
2026-04-05 18:47:58 -07:00 |
|
Kangyan-Zhou
|
93109cc89b
|
[Fix] Fix setuptools-scm version resolution for rc tags (#22165)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-04-05 16:55:32 -07:00 |
|
Zhiqiang Xie
|
30ba1f78b0
|
Hisparse Minor Fix (#22131)
Co-authored-by: huangtingwei9988 <141888744+huangtingwei9988@users.noreply.github.com>
Co-authored-by: hzh0425 <58988019+hzh0425@users.noreply.github.com>
|
2026-04-05 16:15:47 -07:00 |
|
Kangyan-Zhou
|
5dd2c243eb
|
fix: TRT-LLM MHA CUDA illegal address with EAGLE v2 + DP attention (#21649)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-04-05 09:41:14 -07:00 |
|
Baizhou Zhang
|
c5fa364b80
|
[Hotfix] Fix router gemm on sm103 (#22134)
|
2026-04-05 09:33:14 -07:00 |
|
Zhangheng
|
51b276de74
|
[BugFix][RadixTree]: Fix backup invariant violation in Hi-MambaRadixTree (#22062)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: linjianyu77@foxmail.com
|
2026-04-05 23:19:50 +08:00 |
|
Shangming Cai
|
dccb11881f
|
[PD] Fix staging warmup for GQA prefill decode different tp (#22153)
|
2026-04-05 23:13:06 +08:00 |
|
Liangsheng Yin
|
df9c831ab8
|
Unify think_end_id to model_config as single source of truth (#22148)
|
2026-04-05 03:35:38 -07:00 |
|
Liangsheng Yin
|
aeff9fb7c1
|
Add dump_metric to MMMU, lm-eval, and NeMo Skills eval paths (#22147)
|
2026-04-05 03:23:52 -07:00 |
|