Michael
|
ca09d71cf0
|
Fix nightly grok failure on rotary embedding import (#19192)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
|
2026-02-25 13:25:16 +08:00 |
|
jacky.cheng
|
e138f7960a
|
[AMD] Fix accuracy while using --enable-dp-attention (#19247)
Co-authored-by: yichiche@amd.com <jacky.cheng>
|
2026-02-24 20:50:28 -08:00 |
|
Liangsheng Yin
|
ab0f608788
|
[PD-Disagg] Fix bootstrap server race condition when prefill workers not yet registered (#19288)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-02-24 20:22:16 -08:00 |
|
Liangsheng Yin
|
539f772f54
|
[PD-Disagg] Fully support external DP dispatch w/ PD-disaggregation mode. (#19268)
Co-authored-by: Ratish P <114130421+ratish1@users.noreply.github.com>
|
2026-02-24 19:58:01 -08:00 |
|
Mick
|
241ee90164
|
[diffusion] chore: tiny fix pyproject.toml (#19256)
|
2026-02-25 11:57:53 +08:00 |
|
Shangming Cai
|
0fac2796b6
|
[PD-Disagg] Improve KVManager init across all backends (#19240)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-25 10:37:09 +08:00 |
|
siyu
|
c0fdfd4b92
|
Delete mm.feature after decode phase (#17324)
|
2026-02-24 18:13:03 -08:00 |
|
Xiaoyu Zhang
|
9dff933164
|
[Kernel Slimming] Remove sgl-kernel AOT marlin kernels (#19241)
|
2026-02-25 10:08:22 +08:00 |
|
Feng Su
|
3b89302277
|
Refactor: observability code cleanup (#17862)
Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
|
2026-02-24 18:07:29 -08:00 |
|
siyu
|
245430eaac
|
Encoder Global Cache Manager (#16137)
Co-authored-by: Zheng Wengang <zwg0606@gmail.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
|
2026-02-24 18:05:43 -08:00 |
|
fzyzcjy
|
b7af58b9af
|
Support replication axis in dump comparator (#19282)
|
2026-02-25 09:48:43 +08:00 |
|
fzyzcjy
|
2e2b18e870
|
Support context parallel zigzag reordering in dump comparator (#19281)
|
2026-02-25 09:46:17 +08:00 |
|
fzyzcjy
|
0de1f4b07b
|
Support multi axis unsharding in dump comparator (#19280)
|
2026-02-25 09:44:07 +08:00 |
|
fzyzcjy
|
4bb678f28a
|
Full test coverage in dump comparator (#19279)
|
2026-02-25 09:43:29 +08:00 |
|
fzyzcjy
|
94ca2ac5d7
|
Support TP unification and enhance tests in dump comparator (#19278)
|
2026-02-25 09:42:56 +08:00 |
|
fzyzcjy
|
39ba9b5ab5
|
Support simple unsharding in dumper comparator (#19277)
|
2026-02-25 09:42:21 +08:00 |
|
fzyzcjy
|
02ca107b2c
|
Support dims annotation and enhance dump loader in dumper (#19276)
|
2026-02-25 09:41:48 +08:00 |
|
fzyzcjy
|
8b1ab4aaf9
|
Support agent-friendly output formats in dump comparator (#19275)
|
2026-02-25 09:40:33 +08:00 |
|
fzyzcjy
|
d7578ce279
|
Implement simplest dump comparator v2 (#19274)
|
2026-02-25 09:37:21 +08:00 |
|
wxy
|
9cec98b445
|
[diffusion] fix: shard timestep_proj in sequence-sharded ti2v (#19237)
|
2026-02-25 09:33:45 +08:00 |
|
Mick
|
0ede5c54a8
|
[diffusion] logging: improve request and component load logs (#19253)
|
2026-02-25 09:32:36 +08:00 |
|
silencejade
|
15f2e36fb9
|
[NPU] forward_npu uses native impl by default in MultiPlatformOp (#18545)
|
2026-02-25 09:16:09 +08:00 |
|
Chen, Zhentao
|
b9cf1563de
|
[ROCm] Optimize Deepseek R1 on MI300X (#18242)
Co-authored-by: Chen, Todd <zhenchen@amd.com>
|
2026-02-24 17:01:14 -08:00 |
|
Vladimir221
|
f1088beb6a
|
[NPU]Optimization of forward_npu for UnquantizedFusedMoEMethod (#13158)
|
2026-02-25 08:55:20 +08:00 |
|
Jonah Bernard
|
d7a03c7ebf
|
[MoE Refactor] Refactor FlashInferFusedMoE into FusedMoE and flashinfer_trtllm.py (#19266)
|
2026-02-24 16:47:53 -08:00 |
|
Chen, Zhentao
|
c193a52fa2
|
[AMD] DSR1/V3 use fp8 bmm in MLA for MI300X (#18624)
Co-authored-by: Chen, Todd <zhenchen@amd.com>
|
2026-02-24 15:33:33 -08:00 |
|
Ratish P
|
ae6f6e1495
|
[Refactor] Benchmark: Add typed DatasetArgs/Loader registry and CPU dataset unit tests (#19147)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2026-02-24 12:22:01 -08:00 |
|
Glen Liu
|
a06425184f
|
[Fix] remove redundant +1 when getting tail_str for Req (#18584)
|
2026-02-24 10:35:26 -08:00 |
|
Makcum888e
|
9bce3b040c
|
[diffusion] [NPU] Update perf baselines (#19227)
|
2026-02-24 21:15:16 +03:00 |
|
Zihao Wang
|
e6ad58e5da
|
[diffusion] fix: webui not return data (#19244)
|
2026-02-24 23:39:06 +08:00 |
|
Adarsh Shirawalmath
|
1efc33c640
|
[diffusion] refactor: remove enums and unify attention backends (#19149)
|
2026-02-24 21:21:23 +08:00 |
|
triple-mu
|
ea1bc1c578
|
[diffusion] feat: optimize all_to_all (#19157)
|
2026-02-24 21:20:10 +08:00 |
|
Yuan Luo
|
31c7dc9d99
|
[VLM] Introduce FlashInfer CUDNN Prefill as ViT Backend (#19003)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-24 19:49:22 +08:00 |
|
xdtbynd
|
1a83b2c15d
|
fix: fix the bug blocking completion template application (#17010)
Co-authored-by: xdtbynd <supercluster@vip.qq.com>
Co-authored-by: cy <chenyang08056032@163.com>
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-24 18:50:08 +08:00 |
|
Cheng Wan
|
6e54361608
|
Refactor CUDA graph input buffers with shared buffer pool (#19180)
|
2026-02-24 02:24:40 -08:00 |
|
jianzhao-xu
|
94946764a4
|
bugfix: The default value of the environment virable is None, so that… (#18309)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-24 18:10:21 +08:00 |
|
GMI Xiao Jin
|
fcfd964d7d
|
[diffusion] model: LTX-2 Support PR3 (#19151)
|
2026-02-24 16:55:28 +08:00 |
|
Simo Lin
|
88a7e48108
|
fix(grpc): handle embedding requests in _handle_batch_output (#19221)
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
|
2026-02-23 22:32:16 -08:00 |
|
Mick
|
2e053d6eb6
|
[diffusion] quant: support quant for all dits (#19156)
Co-authored-by: zyzshishui <zyzshishui@gmail.com>
|
2026-02-24 14:20:54 +08:00 |
|
fy
|
4f25a48d7a
|
support xverse_moe on npu
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
|
2026-02-24 14:11:20 +08:00 |
|
Liangsheng Yin
|
5e80027ac7
|
[PD-Disagg] Support FAKE transfer backend for more cases (#19215)
|
2026-02-23 20:50:43 -08:00 |
|
jianzhao-xu
|
d8ef33a551
|
bugfix: prioritize init_npu_backend to fix various initialization bugs (#17652)
|
2026-02-24 12:25:20 +08:00 |
|
Liangsheng Yin
|
feb041f4e5
|
[PD-Disagg] Improve type hints across all conn.py (#19208)
|
2026-02-23 19:44:23 -08:00 |
|
Liangsheng Yin
|
ea7ef63e6d
|
[PD-Disagg] Deduplicate common KVManager methods into CommonKVManager (#19205)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-23 18:34:21 -08:00 |
|
Kangyan-Zhou
|
8aeb16f3fc
|
fix: add missing blank line after docstring in serving_transcription.py (#19206)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-23 18:29:38 -08:00 |
|
Xinyuan Tong
|
581bf53e03
|
Whisper model support & /v1/audio/transcriptions endpoint & benchmark (#16983)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-23 17:28:37 -08:00 |
|
Liangsheng Yin
|
d160c5b9bb
|
[PD-Disagg] Unify prefill info data transition flow, all with PrefillServerInfo (#19195)
|
2026-02-23 16:15:20 -08:00 |
|
Liangsheng Yin
|
9fac90d85d
|
[CI] Tiny enhance the dp attention load blance benchmark (#19194)
|
2026-02-23 14:33:32 -08:00 |
|
Liangsheng Yin
|
2aa3fe394e
|
[CI] fix the teardown output of disaggregation test (#19193)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-23 12:41:03 -08:00 |
|
Liangsheng Yin
|
2274bfebb1
|
[PD-Disagg] Support query dp rank from bootstrap server. (#19168)
Signed-off-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
Co-authored-by: Chang Huaixin (OpenAnolis) <changhuaixin@linux.alibaba.com>
|
2026-02-23 10:59:30 -08:00 |
|