Commit Graph

6584 Commits

Author SHA1 Message Date
Alison Shao
52c8a3632a Fix missing StandardCombineInput import in BF16 flashinfer_trtllm MoE (#19400)
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
2026-02-26 18:32:08 -08:00
Alison Shao
c2dce06d9f Fix parallel tool call test for speculative decoding variants (#19370)
Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
2026-02-26 18:31:20 -08:00
littleyellowbicycle
5b5c509480 [NPU][feature adapt]remote load weight feature adp npu (#17968)
Co-authored-by: littleYellowBicycle <liguo29@huawei.com>
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
2026-02-27 10:10:36 +08:00
Yizhong Cao
e567215e44 [bugfix]fix fa4 decoding (#19388)
Co-authored-by: caoyizhong.cyz <caoyizhong.cyz@alibaba-inc.com>
2026-02-27 09:48:21 +08:00
Sam (Kesen Li)
5194eef88b [Fix] A followup fix for TRTLLM BF16 MoE (#15303) 2026-02-26 17:14:20 -08:00
billishyahao
f0d78f2e20 [AMD] Fix weight load shape mismatch for amd dsr1 0528 mxfp4 (#19425) 2026-02-26 17:11:59 -08:00
fzyzcjy
4add6ec0f6 Enhance displaying and debuggability in dump comparator (#19466) 2026-02-27 09:06:19 +08:00
fzyzcjy
23cbbd6d41 Integrate packed data context parallel in dump comparator (#19464) 2026-02-27 08:28:36 +08:00
Liangsheng Yin
1a32c0db4d [Minor] Rename misleading chunked to reusing in ReqToTokenPool.alloc() (#19465) 2026-02-26 16:24:03 -08:00
fzyzcjy
8293a914a6 Support token align with packed CP data in dump comparator (#19463) 2026-02-27 08:12:54 +08:00
fzyzcjy
695e93b91f Make reorderer support packed format with CP in dump comparator (#19462) 2026-02-27 08:12:18 +08:00
fzyzcjy
e3cdf6b1a3 Support CP packed format in unsharder in dump comparator (#19461) 2026-02-27 08:11:47 +08:00
fzyzcjy
f2d1b7c4ad Support sequence parallel and trivial dims in dump comparator (#19460) 2026-02-27 08:11:15 +08:00
fzyzcjy
eb2ada3804 Support non-packed format when aligning tokens in dump comparator (#19459) 2026-02-27 08:10:29 +08:00
fzyzcjy
e1e0cfd856 Use named tensors in dump comparator (#19458) 2026-02-27 08:09:55 +08:00
fzyzcjy
eb0e905fc3 Update token layout and cleanup printer in dump comparator (#19457) 2026-02-27 08:09:15 +08:00
fzyzcjy
8ac64e1487 Support unifying axis ordering in dump comparator (#19456) 2026-02-27 08:08:32 +08:00
fzyzcjy
425d333ee3 Support token dim in arbitrary location in dump comparator (#19455) 2026-02-27 08:07:38 +08:00
Thomas Wang
5172c37845 [AMD] Use fused GEMM with FP8 cast for FP8 prefill (#19422) 2026-02-26 15:14:52 -08:00
RoyWang
a1ef8e2cc0 [AMD] optimize Kimi K2.5 fused_moe_triton performance by tuning (#19228) 2026-02-26 11:50:13 -08:00
Shangming Cai
288300aafd [PD] Tiny code cleanup for prefill info registering (#19414)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-02-27 01:35:26 +08:00
zhangheng
e4b708d3e9 [Spec V2] Support specV2 for mamba hybrid attention (#18808)
Co-authored-by: Yi Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: Hanming Lu <hanming@x.ai>
2026-02-27 00:36:01 +08:00
DefTruth
78d6674c45 [diffusion] feat: support hybrid parallelism for diffusers backend (#19405) 2026-02-27 00:06:08 +08:00
Shangming Cai
e55e65535e [Bugfix] Add rids to the batch filtering for two batch overlap (#19418) 2026-02-26 06:57:25 -08:00
Shangming Cai
97f1fa5e6b [NPU] Fix disaggregation metadata buffer bootstrap_room_dtype for npu backend (#19423) 2026-02-26 21:10:50 +08:00
khalilzhk
86eb80007e [NPU] support Kimi-K2.5 on NPU (#19331) 2026-02-26 20:41:44 +08:00
AlfredYong
bdc1e46e5a [Qwen3.5] Qwen3.5-27B inference repeat bug fix (#19411) 2026-02-26 20:11:29 +08:00
Xiaoyu Zhang
74c8e7b215 refactor(jit_kernel): reduce duplication and separate test code (#19323) 2026-02-26 18:30:49 +08:00
Junhao Liu
a7152df2e3 [diffusion ] CLI: Fix typo in CLI usage doc string (#19316) 2026-02-26 13:24:14 +03:00
Shangming Cai
27fd014726 [PD] Add kv_cache_dtype consistency check for PD Disaggregation (#19407)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-02-26 17:15:58 +08:00
Yilong Zhao
de3d1e7669 [misc] use ORJSONResponse in http-server generate (#19191) 2026-02-25 21:26:25 -08:00
Alison Shao
0fd44ff342 Fix NSA CP positions mismatch in eagle NextN model (#19367) 2026-02-25 20:14:33 -08:00
Xinyu Zhang
119c91cb8b Skip signal handler registration when not on main thread (#18752) 2026-02-25 19:30:05 -08:00
Minglei Zhu
b3202fe6d0 [PCG] fix piecewise cuda graph for Qwen3.5 (#19220) 2026-02-26 11:16:52 +08:00
Alison Shao
a0a8f1473c [Benchmark] Fix generated_shared_prefix attribute naming and remove args dependency (#19363)
Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
2026-02-25 18:45:54 -08:00
sglang-bot
6e82183f5a [Disagg] Route disagg prefill results through process_batch_result (#19364) 2026-02-25 18:38:39 -08:00
fzyzcjy
265eb56d44 Support multi-step alignment and pipeline integration in dump comparator (#19378) 2026-02-26 10:23:22 +08:00
Yuan Luo
4e843f1216 [DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache (#19148)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: DarkSharpness <76582120+darksharpness@users.noreply.github.com>
2026-02-26 10:23:10 +08:00
fzyzcjy
f9a2f0398f Support token aligner planning and execution in dump comparator (#19377) 2026-02-26 10:04:33 +08:00
fzyzcjy
d34d5aca07 Support loading token aligner data in dump comparator (#19376) 2026-02-26 10:03:56 +08:00
fzyzcjy
e8dd14519d Add aligner entrypoint and bundle handler in dump comparator (#19375) 2026-02-26 10:03:22 +08:00
pansicheng
2ad475b4ed use flashinfer.sampling (#18696) 2026-02-26 10:02:38 +08:00
fzyzcjy
2739d7df62 Reorganize modules and pipeline in dump comparator (#19374) 2026-02-26 10:00:13 +08:00
fzyzcjy
508b8e3387 Handle warnings via sink for structured output and add pair in dump comparator (#19373) 2026-02-26 09:59:15 +08:00
fzyzcjy
46321ee70e Support dumping rid for correlation across passes in dump comparator (#19372) 2026-02-26 09:57:57 +08:00
Yuan Luo
7c9e8e2def [Re-land][jit kernel] Support per_token_group_quant_8bit jit kernel (#19140)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu>
2026-02-26 09:53:57 +08:00
Linyu Wu
beabaa8d37 [Kernel Slimming] Migrate marlin moe kernel to JIT (#19181)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-02-26 09:05:13 +08:00
Daniel Cámpora
350190487b Flashinfer MOE FP8 support for Mistral Large 3. (#15422)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2026-02-25 15:00:37 -08:00
Liangsheng Yin
c60dcc40bb [Logging] Guard log_prefill_stats against idle batches in disagg prefill (#19361) 2026-02-25 13:31:52 -08:00
YAMY
08957c88ea [Logging] Fix prefill side logging in pd disagg (#19350) 2026-02-25 12:42:18 -08:00