Alison Shao
|
52c8a3632a
|
Fix missing StandardCombineInput import in BF16 flashinfer_trtllm MoE (#19400)
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
|
2026-02-26 18:32:08 -08:00 |
|
Alison Shao
|
c2dce06d9f
|
Fix parallel tool call test for speculative decoding variants (#19370)
Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
|
2026-02-26 18:31:20 -08:00 |
|
littleyellowbicycle
|
5b5c509480
|
[NPU][feature adapt]remote load weight feature adp npu (#17968)
Co-authored-by: littleYellowBicycle <liguo29@huawei.com>
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-27 10:10:36 +08:00 |
|
Yizhong Cao
|
e567215e44
|
[bugfix]fix fa4 decoding (#19388)
Co-authored-by: caoyizhong.cyz <caoyizhong.cyz@alibaba-inc.com>
|
2026-02-27 09:48:21 +08:00 |
|
Sam (Kesen Li)
|
5194eef88b
|
[Fix] A followup fix for TRTLLM BF16 MoE (#15303)
|
2026-02-26 17:14:20 -08:00 |
|
billishyahao
|
f0d78f2e20
|
[AMD] Fix weight load shape mismatch for amd dsr1 0528 mxfp4 (#19425)
|
2026-02-26 17:11:59 -08:00 |
|
fzyzcjy
|
4add6ec0f6
|
Enhance displaying and debuggability in dump comparator (#19466)
|
2026-02-27 09:06:19 +08:00 |
|
fzyzcjy
|
23cbbd6d41
|
Integrate packed data context parallel in dump comparator (#19464)
|
2026-02-27 08:28:36 +08:00 |
|
Liangsheng Yin
|
1a32c0db4d
|
[Minor] Rename misleading chunked to reusing in ReqToTokenPool.alloc() (#19465)
|
2026-02-26 16:24:03 -08:00 |
|
fzyzcjy
|
8293a914a6
|
Support token align with packed CP data in dump comparator (#19463)
|
2026-02-27 08:12:54 +08:00 |
|
fzyzcjy
|
695e93b91f
|
Make reorderer support packed format with CP in dump comparator (#19462)
|
2026-02-27 08:12:18 +08:00 |
|
fzyzcjy
|
e3cdf6b1a3
|
Support CP packed format in unsharder in dump comparator (#19461)
|
2026-02-27 08:11:47 +08:00 |
|
fzyzcjy
|
f2d1b7c4ad
|
Support sequence parallel and trivial dims in dump comparator (#19460)
|
2026-02-27 08:11:15 +08:00 |
|
fzyzcjy
|
eb2ada3804
|
Support non-packed format when aligning tokens in dump comparator (#19459)
|
2026-02-27 08:10:29 +08:00 |
|
fzyzcjy
|
e1e0cfd856
|
Use named tensors in dump comparator (#19458)
|
2026-02-27 08:09:55 +08:00 |
|
fzyzcjy
|
eb0e905fc3
|
Update token layout and cleanup printer in dump comparator (#19457)
|
2026-02-27 08:09:15 +08:00 |
|
fzyzcjy
|
8ac64e1487
|
Support unifying axis ordering in dump comparator (#19456)
|
2026-02-27 08:08:32 +08:00 |
|
fzyzcjy
|
425d333ee3
|
Support token dim in arbitrary location in dump comparator (#19455)
|
2026-02-27 08:07:38 +08:00 |
|
Thomas Wang
|
5172c37845
|
[AMD] Use fused GEMM with FP8 cast for FP8 prefill (#19422)
|
2026-02-26 15:14:52 -08:00 |
|
RoyWang
|
a1ef8e2cc0
|
[AMD] optimize Kimi K2.5 fused_moe_triton performance by tuning (#19228)
|
2026-02-26 11:50:13 -08:00 |
|
Shangming Cai
|
288300aafd
|
[PD] Tiny code cleanup for prefill info registering (#19414)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-27 01:35:26 +08:00 |
|
zhangheng
|
e4b708d3e9
|
[Spec V2] Support specV2 for mamba hybrid attention (#18808)
Co-authored-by: Yi Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: Hanming Lu <hanming@x.ai>
|
2026-02-27 00:36:01 +08:00 |
|
DefTruth
|
78d6674c45
|
[diffusion] feat: support hybrid parallelism for diffusers backend (#19405)
|
2026-02-27 00:06:08 +08:00 |
|
Shangming Cai
|
e55e65535e
|
[Bugfix] Add rids to the batch filtering for two batch overlap (#19418)
|
2026-02-26 06:57:25 -08:00 |
|
Shangming Cai
|
97f1fa5e6b
|
[NPU] Fix disaggregation metadata buffer bootstrap_room_dtype for npu backend (#19423)
|
2026-02-26 21:10:50 +08:00 |
|
khalilzhk
|
86eb80007e
|
[NPU] support Kimi-K2.5 on NPU (#19331)
|
2026-02-26 20:41:44 +08:00 |
|
AlfredYong
|
bdc1e46e5a
|
[Qwen3.5] Qwen3.5-27B inference repeat bug fix (#19411)
|
2026-02-26 20:11:29 +08:00 |
|
Xiaoyu Zhang
|
74c8e7b215
|
refactor(jit_kernel): reduce duplication and separate test code (#19323)
|
2026-02-26 18:30:49 +08:00 |
|
Junhao Liu
|
a7152df2e3
|
[diffusion ] CLI: Fix typo in CLI usage doc string (#19316)
|
2026-02-26 13:24:14 +03:00 |
|
Shangming Cai
|
27fd014726
|
[PD] Add kv_cache_dtype consistency check for PD Disaggregation (#19407)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-26 17:15:58 +08:00 |
|
Yilong Zhao
|
de3d1e7669
|
[misc] use ORJSONResponse in http-server generate (#19191)
|
2026-02-25 21:26:25 -08:00 |
|
Alison Shao
|
0fd44ff342
|
Fix NSA CP positions mismatch in eagle NextN model (#19367)
|
2026-02-25 20:14:33 -08:00 |
|
Xinyu Zhang
|
119c91cb8b
|
Skip signal handler registration when not on main thread (#18752)
|
2026-02-25 19:30:05 -08:00 |
|
Minglei Zhu
|
b3202fe6d0
|
[PCG] fix piecewise cuda graph for Qwen3.5 (#19220)
|
2026-02-26 11:16:52 +08:00 |
|
Alison Shao
|
a0a8f1473c
|
[Benchmark] Fix generated_shared_prefix attribute naming and remove args dependency (#19363)
Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2026-02-25 18:45:54 -08:00 |
|
sglang-bot
|
6e82183f5a
|
[Disagg] Route disagg prefill results through process_batch_result (#19364)
|
2026-02-25 18:38:39 -08:00 |
|
fzyzcjy
|
265eb56d44
|
Support multi-step alignment and pipeline integration in dump comparator (#19378)
|
2026-02-26 10:23:22 +08:00 |
|
Yuan Luo
|
4e843f1216
|
[DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache (#19148)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: DarkSharpness <76582120+darksharpness@users.noreply.github.com>
|
2026-02-26 10:23:10 +08:00 |
|
fzyzcjy
|
f9a2f0398f
|
Support token aligner planning and execution in dump comparator (#19377)
|
2026-02-26 10:04:33 +08:00 |
|
fzyzcjy
|
d34d5aca07
|
Support loading token aligner data in dump comparator (#19376)
|
2026-02-26 10:03:56 +08:00 |
|
fzyzcjy
|
e8dd14519d
|
Add aligner entrypoint and bundle handler in dump comparator (#19375)
|
2026-02-26 10:03:22 +08:00 |
|
pansicheng
|
2ad475b4ed
|
use flashinfer.sampling (#18696)
|
2026-02-26 10:02:38 +08:00 |
|
fzyzcjy
|
2739d7df62
|
Reorganize modules and pipeline in dump comparator (#19374)
|
2026-02-26 10:00:13 +08:00 |
|
fzyzcjy
|
508b8e3387
|
Handle warnings via sink for structured output and add pair in dump comparator (#19373)
|
2026-02-26 09:59:15 +08:00 |
|
fzyzcjy
|
46321ee70e
|
Support dumping rid for correlation across passes in dump comparator (#19372)
|
2026-02-26 09:57:57 +08:00 |
|
Yuan Luo
|
7c9e8e2def
|
[Re-land][jit kernel] Support per_token_group_quant_8bit jit kernel (#19140)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu>
|
2026-02-26 09:53:57 +08:00 |
|
Linyu Wu
|
beabaa8d37
|
[Kernel Slimming] Migrate marlin moe kernel to JIT (#19181)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-02-26 09:05:13 +08:00 |
|
Daniel Cámpora
|
350190487b
|
Flashinfer MOE FP8 support for Mistral Large 3. (#15422)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-02-25 15:00:37 -08:00 |
|
Liangsheng Yin
|
c60dcc40bb
|
[Logging] Guard log_prefill_stats against idle batches in disagg prefill (#19361)
|
2026-02-25 13:31:52 -08:00 |
|
YAMY
|
08957c88ea
|
[Logging] Fix prefill side logging in pd disagg (#19350)
|
2026-02-25 12:42:18 -08:00 |
|