Baizhou Zhang
|
776709efe8
|
[3/n] deepseek_v2.py Refactor: Migrate MLA forward method in deepseek_v2.py (#19122)
|
2026-02-27 13:37:29 -08:00 |
|
wufann
|
7e46aafebb
|
[AMD] Enable cudagraph for aiter nsa backend and add aiter impl for nsa pr… (#18526)
|
2026-02-27 13:18:32 -08:00 |
|
Shu Wang
|
1b75d0d1a9
|
Fix BatchMLAPagedAttentionWrapper query/qo_inptr mismatch for EAGLE (#15601)
|
2026-02-27 11:35:45 -08:00 |
|
ishandhanani
|
6a1480ce45
|
Fix HiCacheNixl TypeError: mem_pool_host passed as file_path (#19517)
|
2026-02-27 10:59:32 -08:00 |
|
Mohammad Miadh Angkad
|
35ef38c61b
|
Remove gpt-oss hybrid swa gate for trtllm_mha (#19079)
|
2026-02-27 10:30:00 -08:00 |
|
Michael
|
1b79934d34
|
[AMD] Fix AMD CI test of TestToolChoiceLfm2Moe (#19113)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
|
2026-02-27 10:18:15 -08:00 |
|
R0CKSTAR
|
fe4bc8ebd5
|
[diffusion] fix: MulAdd 4D path (shift indexing) (#18673)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-02-28 01:52:57 +08:00 |
|
Makcum888e
|
b1249ac909
|
[Diffusion] [NPU] [CI] fix CI performance (#19486)
|
2026-02-27 18:23:02 +03:00 |
|
Yuan Luo
|
d2885a9094
|
[Qwen3-Next] Support gdn fused_rms_norm_gated (#19434)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-27 23:08:08 +08:00 |
|
joesun
|
ca5f2e2ed1
|
[diffusion] fix: Support default response_format=url in /v1/images/generations to avoid 400 errors when response_format is omitted (#19360)
Co-authored-by: Makcum888e <79456407+Makcum888e@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-27 19:47:38 +08:00 |
|
AMD-yanfeiwang
|
f69ca93d49
|
[AMD] remove redundancy H2D op in aiter attention backend (#19416)
Co-authored-by: root <root@mia1-p01-g28.mia.tensorwave.lan>
|
2026-02-27 02:15:55 -08:00 |
|
Yilong Zhao
|
07da4bed7b
|
[cache] add conservative estimation (#19482)
|
2026-02-27 18:14:46 +08:00 |
|
fxmarty-amd
|
9496bbd7b1
|
[AMD] Use tilelang as default NSA attention backend dispatch on AMD Instinct (#18319)
|
2026-02-27 01:43:34 -08:00 |
|
ympcMark
|
43fade5f69
|
[4/N] (Elastic EP) Back up Expert Weights in DRAM (#17374)
Co-authored-by: UNIDY2002 <unidy2002@outlook.com>
|
2026-02-27 15:59:13 +08:00 |
|
KnightLTC
|
eef44ec916
|
[NPU]kimi k2 thinking bugfix (#19387)
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-27 15:36:38 +08:00 |
|
Cao E
|
4f0f6cd9d0
|
Add torch.compile support for qwen3-next on CPU (#12444)
|
2026-02-26 23:28:03 -08:00 |
|
KnightLTC
|
bc9190435b
|
llama4 npu adapt (#17123)
Co-authored-by: cy <chenyang08056032@163.com>
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-27 14:52:37 +08:00 |
|
Ken J
|
f0c2089597
|
[vlm][internVL] Support processor and embedding inputs for InternVL (#19127)
|
2026-02-26 22:46:48 -08:00 |
|
triple-mu
|
8a56cc5836
|
[diffusion] model: Fix a performance bug in the Wan model about usp (#19340)
|
2026-02-27 14:36:23 +08:00 |
|
elvischenv
|
af3eccc9ab
|
[Perf] Eliminate the slice op for Flashinfer trtllm_fp4_block_scale_moe (#15731)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2026-02-26 22:23:40 -08:00 |
|
wingedge777
|
d566816d83
|
fix qwen3_vl visual module loading (#19333)
|
2026-02-26 19:44:05 -08:00 |
|
Alison Shao
|
52c8a3632a
|
Fix missing StandardCombineInput import in BF16 flashinfer_trtllm MoE (#19400)
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
|
2026-02-26 18:32:08 -08:00 |
|
Alison Shao
|
c2dce06d9f
|
Fix parallel tool call test for speculative decoding variants (#19370)
Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
|
2026-02-26 18:31:20 -08:00 |
|
littleyellowbicycle
|
5b5c509480
|
[NPU][feature adapt]remote load weight feature adp npu (#17968)
Co-authored-by: littleYellowBicycle <liguo29@huawei.com>
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-27 10:10:36 +08:00 |
|
Yizhong Cao
|
e567215e44
|
[bugfix]fix fa4 decoding (#19388)
Co-authored-by: caoyizhong.cyz <caoyizhong.cyz@alibaba-inc.com>
|
2026-02-27 09:48:21 +08:00 |
|
Sam (Kesen Li)
|
5194eef88b
|
[Fix] A followup fix for TRTLLM BF16 MoE (#15303)
|
2026-02-26 17:14:20 -08:00 |
|
billishyahao
|
f0d78f2e20
|
[AMD] Fix weight load shape mismatch for amd dsr1 0528 mxfp4 (#19425)
|
2026-02-26 17:11:59 -08:00 |
|
fzyzcjy
|
4add6ec0f6
|
Enhance displaying and debuggability in dump comparator (#19466)
|
2026-02-27 09:06:19 +08:00 |
|
fzyzcjy
|
23cbbd6d41
|
Integrate packed data context parallel in dump comparator (#19464)
|
2026-02-27 08:28:36 +08:00 |
|
Liangsheng Yin
|
1a32c0db4d
|
[Minor] Rename misleading chunked to reusing in ReqToTokenPool.alloc() (#19465)
|
2026-02-26 16:24:03 -08:00 |
|
fzyzcjy
|
8293a914a6
|
Support token align with packed CP data in dump comparator (#19463)
|
2026-02-27 08:12:54 +08:00 |
|
fzyzcjy
|
695e93b91f
|
Make reorderer support packed format with CP in dump comparator (#19462)
|
2026-02-27 08:12:18 +08:00 |
|
fzyzcjy
|
e3cdf6b1a3
|
Support CP packed format in unsharder in dump comparator (#19461)
|
2026-02-27 08:11:47 +08:00 |
|
fzyzcjy
|
f2d1b7c4ad
|
Support sequence parallel and trivial dims in dump comparator (#19460)
|
2026-02-27 08:11:15 +08:00 |
|
fzyzcjy
|
eb2ada3804
|
Support non-packed format when aligning tokens in dump comparator (#19459)
|
2026-02-27 08:10:29 +08:00 |
|
fzyzcjy
|
e1e0cfd856
|
Use named tensors in dump comparator (#19458)
|
2026-02-27 08:09:55 +08:00 |
|
fzyzcjy
|
eb0e905fc3
|
Update token layout and cleanup printer in dump comparator (#19457)
|
2026-02-27 08:09:15 +08:00 |
|
fzyzcjy
|
8ac64e1487
|
Support unifying axis ordering in dump comparator (#19456)
|
2026-02-27 08:08:32 +08:00 |
|
fzyzcjy
|
425d333ee3
|
Support token dim in arbitrary location in dump comparator (#19455)
|
2026-02-27 08:07:38 +08:00 |
|
Thomas Wang
|
5172c37845
|
[AMD] Use fused GEMM with FP8 cast for FP8 prefill (#19422)
|
2026-02-26 15:14:52 -08:00 |
|
RoyWang
|
a1ef8e2cc0
|
[AMD] optimize Kimi K2.5 fused_moe_triton performance by tuning (#19228)
|
2026-02-26 11:50:13 -08:00 |
|
Shangming Cai
|
288300aafd
|
[PD] Tiny code cleanup for prefill info registering (#19414)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-27 01:35:26 +08:00 |
|
zhangheng
|
e4b708d3e9
|
[Spec V2] Support specV2 for mamba hybrid attention (#18808)
Co-authored-by: Yi Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: Hanming Lu <hanming@x.ai>
|
2026-02-27 00:36:01 +08:00 |
|
DefTruth
|
78d6674c45
|
[diffusion] feat: support hybrid parallelism for diffusers backend (#19405)
|
2026-02-27 00:06:08 +08:00 |
|
Shangming Cai
|
e55e65535e
|
[Bugfix] Add rids to the batch filtering for two batch overlap (#19418)
|
2026-02-26 06:57:25 -08:00 |
|
Shangming Cai
|
97f1fa5e6b
|
[NPU] Fix disaggregation metadata buffer bootstrap_room_dtype for npu backend (#19423)
|
2026-02-26 21:10:50 +08:00 |
|
khalilzhk
|
86eb80007e
|
[NPU] support Kimi-K2.5 on NPU (#19331)
|
2026-02-26 20:41:44 +08:00 |
|
AlfredYong
|
bdc1e46e5a
|
[Qwen3.5] Qwen3.5-27B inference repeat bug fix (#19411)
|
2026-02-26 20:11:29 +08:00 |
|
Xiaoyu Zhang
|
74c8e7b215
|
refactor(jit_kernel): reduce duplication and separate test code (#19323)
|
2026-02-26 18:30:49 +08:00 |
|
Junhao Liu
|
a7152df2e3
|
[diffusion ] CLI: Fix typo in CLI usage doc string (#19316)
|
2026-02-26 13:24:14 +03:00 |
|