DefTruth
|
78d6674c45
|
[diffusion] feat: support hybrid parallelism for diffusers backend (#19405)
|
2026-02-27 00:06:08 +08:00 |
|
Shangming Cai
|
e55e65535e
|
[Bugfix] Add rids to the batch filtering for two batch overlap (#19418)
|
2026-02-26 06:57:25 -08:00 |
|
Shangming Cai
|
97f1fa5e6b
|
[NPU] Fix disaggregation metadata buffer bootstrap_room_dtype for npu backend (#19423)
|
2026-02-26 21:10:50 +08:00 |
|
khalilzhk
|
86eb80007e
|
[NPU] support Kimi-K2.5 on NPU (#19331)
|
2026-02-26 20:41:44 +08:00 |
|
AlfredYong
|
bdc1e46e5a
|
[Qwen3.5] Qwen3.5-27B inference repeat bug fix (#19411)
|
2026-02-26 20:11:29 +08:00 |
|
Xiaoyu Zhang
|
74c8e7b215
|
refactor(jit_kernel): reduce duplication and separate test code (#19323)
|
2026-02-26 18:30:49 +08:00 |
|
Junhao Liu
|
a7152df2e3
|
[diffusion ] CLI: Fix typo in CLI usage doc string (#19316)
|
2026-02-26 13:24:14 +03:00 |
|
Shangming Cai
|
27fd014726
|
[PD] Add kv_cache_dtype consistency check for PD Disaggregation (#19407)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-26 17:15:58 +08:00 |
|
Yilong Zhao
|
de3d1e7669
|
[misc] use ORJSONResponse in http-server generate (#19191)
|
2026-02-25 21:26:25 -08:00 |
|
Alison Shao
|
0fd44ff342
|
Fix NSA CP positions mismatch in eagle NextN model (#19367)
|
2026-02-25 20:14:33 -08:00 |
|
Xinyu Zhang
|
119c91cb8b
|
Skip signal handler registration when not on main thread (#18752)
|
2026-02-25 19:30:05 -08:00 |
|
Minglei Zhu
|
b3202fe6d0
|
[PCG] fix piecewise cuda graph for Qwen3.5 (#19220)
|
2026-02-26 11:16:52 +08:00 |
|
Alison Shao
|
a0a8f1473c
|
[Benchmark] Fix generated_shared_prefix attribute naming and remove args dependency (#19363)
Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2026-02-25 18:45:54 -08:00 |
|
sglang-bot
|
6e82183f5a
|
[Disagg] Route disagg prefill results through process_batch_result (#19364)
|
2026-02-25 18:38:39 -08:00 |
|
fzyzcjy
|
265eb56d44
|
Support multi-step alignment and pipeline integration in dump comparator (#19378)
|
2026-02-26 10:23:22 +08:00 |
|
Yuan Luo
|
4e843f1216
|
[DeepSeek-V3.2][JIT-kernel] Support nsa fuse store indexer k cache (#19148)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: DarkSharpness <76582120+darksharpness@users.noreply.github.com>
|
2026-02-26 10:23:10 +08:00 |
|
fzyzcjy
|
f9a2f0398f
|
Support token aligner planning and execution in dump comparator (#19377)
|
2026-02-26 10:04:33 +08:00 |
|
fzyzcjy
|
d34d5aca07
|
Support loading token aligner data in dump comparator (#19376)
|
2026-02-26 10:03:56 +08:00 |
|
fzyzcjy
|
e8dd14519d
|
Add aligner entrypoint and bundle handler in dump comparator (#19375)
|
2026-02-26 10:03:22 +08:00 |
|
pansicheng
|
2ad475b4ed
|
use flashinfer.sampling (#18696)
|
2026-02-26 10:02:38 +08:00 |
|
fzyzcjy
|
2739d7df62
|
Reorganize modules and pipeline in dump comparator (#19374)
|
2026-02-26 10:00:13 +08:00 |
|
fzyzcjy
|
508b8e3387
|
Handle warnings via sink for structured output and add pair in dump comparator (#19373)
|
2026-02-26 09:59:15 +08:00 |
|
fzyzcjy
|
46321ee70e
|
Support dumping rid for correlation across passes in dump comparator (#19372)
|
2026-02-26 09:57:57 +08:00 |
|
Yuan Luo
|
7c9e8e2def
|
[Re-land][jit kernel] Support per_token_group_quant_8bit jit kernel (#19140)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu>
|
2026-02-26 09:53:57 +08:00 |
|
Linyu Wu
|
beabaa8d37
|
[Kernel Slimming] Migrate marlin moe kernel to JIT (#19181)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-02-26 09:05:13 +08:00 |
|
Daniel Cámpora
|
350190487b
|
Flashinfer MOE FP8 support for Mistral Large 3. (#15422)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-02-25 15:00:37 -08:00 |
|
Liangsheng Yin
|
c60dcc40bb
|
[Logging] Guard log_prefill_stats against idle batches in disagg prefill (#19361)
|
2026-02-25 13:31:52 -08:00 |
|
YAMY
|
08957c88ea
|
[Logging] Fix prefill side logging in pd disagg (#19350)
|
2026-02-25 12:42:18 -08:00 |
|
Kangyan-Zhou
|
306c552639
|
Revert "Fix HybridAttnBackend forward for linear attention" (#19356)
|
2026-02-25 11:49:50 -08:00 |
|
jacky.cheng
|
b2c46fc60b
|
[AMD] Support Qwen3-Coder-Next on AMD platform (#18355)
Co-authored-by: yichiche@amd.com <jacky.cheng>
|
2026-02-25 11:06:22 -08:00 |
|
Makcum888e
|
0217e82a08
|
[diffusion] Clean code (#19325)
|
2026-02-25 21:16:03 +03:00 |
|
Even Zhou
|
2fb239450e
|
Revert "bugfix: prioritize init_npu_backend to fix various initialization bugs" (#19343)
|
2026-02-25 23:04:30 +08:00 |
|
Yuhao Yang
|
c7c4a1cbbd
|
refactor linear attention backend (#18622)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2026-02-25 23:02:44 +08:00 |
|
Mick
|
471acd98b9
|
[diffusion] logging: improve logging (#19312)
|
2026-02-25 23:00:35 +08:00 |
|
Qingfu Wen
|
59b9d1e86d
|
[diffusion] improve: improve fuse_scale_shift_kernel with non-blocking op (#18710)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-02-25 21:04:20 +08:00 |
|
akhilg-nv
|
c144e55462
|
Fix HybridAttnBackend forward for linear attention (#19006)
|
2026-02-25 21:02:37 +08:00 |
|
Zheng Li
|
d38c0e537d
|
fix(dense): fix Qwen3.5 dense model precision bug in TP_SIZE>1 (#19070)
|
2026-02-25 20:54:42 +08:00 |
|
Even Zhou
|
cdc411160b
|
[NPU] Fix a corner case where FusedMoE.top_k is not explicitly declared (#19287)
|
2026-02-25 20:49:59 +08:00 |
|
Mick
|
9840cd3f68
|
[diffusion] chore: enable sequence shard for wan by default (#19311)
|
2026-02-25 18:21:44 +08:00 |
|
billishyahao
|
60eeef7370
|
[AMD][with CI Fix] support two batch overlapping for mori ep (#19216)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-25 02:14:08 -08:00 |
|
GMI Xiao Jin
|
c4ef33862b
|
[diffusion] fix: fix bugs to let LTX-2 pipeline support latest Sglang Args pipelines (#19295)
|
2026-02-25 17:30:36 +08:00 |
|
Mohammad Miadh Angkad
|
671b595570
|
Fix trtllm_mha fp8 SWA KV index translation (#19107)
|
2026-02-25 17:02:17 +08:00 |
|
Julian Huang
|
a55f658835
|
[Misc] Normalize --host parameter to use plain hostname without scheme (#19309)
Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-02-25 00:37:24 -08:00 |
|
YAMY
|
f75abb4521
|
[Fix][Qwen3.5] Fix KV cache slice transfer for GQA models with replicated KV heads (#19086)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-25 16:26:44 +08:00 |
|
huangtingwei
|
d40cb2f725
|
[HiCache] Support heterogeneous tp for hicache storage (#18541)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-02-25 00:13:57 -08:00 |
|
Wang, Yi
|
3d879c69e9
|
refactor: extract device-to-backend mapping into get_default_distributed_backend (#19202)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-02-24 23:50:26 -08:00 |
|
Hexq0210
|
d0bb140034
|
[NPU] bugfix for model Qwen3-Coder-Next at weight shape transpose for npu. (#18700)
Co-authored-by: McZyWu <zhuoyun.wu.23@ucl.ac.uk>
|
2026-02-25 15:46:20 +08:00 |
|
xutizhou
|
a1b39c1c26
|
Perf/fuse mamba state scatter mtp verify (#18088)
|
2026-02-25 15:40:55 +08:00 |
|
lw9527
|
4a3a787f1e
|
[Fix] Kimi K2.5 support pp (#18434)
Co-authored-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>
Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-25 15:22:11 +08:00 |
|
Shangming Cai
|
8d9ee6669e
|
Fix comment for tp_rank calculation in dp_attention (#19306)
|
2026-02-25 15:19:10 +08:00 |
|