Cao E
|
274581fb77
|
Add support for more batch sizes in cpu_graph_runner (#13881)
|
2026-03-19 09:50:56 -07:00 |
|
kk
|
c8f0122acf
|
Fix gpu-fault issue when run deepseek-r1 and enable dp (#20841)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2026-03-19 02:36:12 -07:00 |
|
khalilzhk
|
574572b21b
|
[BugFix] bug fix for DeepSeek eagle3 in Attn-DP mode (#20492)
|
2026-03-19 14:48:46 +08:00 |
|
Shangming Cai
|
fd05532da1
|
Add logging for BootstrapServer for CI diagnosis (#20844)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-19 14:42:12 +08:00 |
|
blzheng
|
a98b456c70
|
[CPU] Add frontend support for Gemma (#12590)
|
2026-03-18 23:02:26 -07:00 |
|
jianan-gu
|
8d4fcf2f7b
|
[CPU] Fix MoE layer support for DeepSeek-OCR models (#12555)
|
2026-03-18 22:57:55 -07:00 |
|
Matti Varjokallio
|
85fe8c6793
|
[AMD] Use aiter_dsv3_router_gemm kernel if number of experts <= 256. (#18451)
|
2026-03-18 22:40:48 -07:00 |
|
kk
|
126cd5cfae
|
gpt-oss decode performance optimization (#20392)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2026-03-18 22:30:03 -07:00 |
|
blzheng
|
cd22aa27a9
|
[CPU] Add FP8 Bmm support (#9744)
Co-authored-by: Fan Yin <1106310035@qq.com>
|
2026-03-18 22:19:48 -07:00 |
|
Zaili Wang
|
2f4babe32b
|
[CPU] support LayerNorm with 3D shape (#15075)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-03-18 22:15:24 -07:00 |
|
blzheng
|
dc6aa26ce9
|
[CPU] Add mrope kernel for Qwen3-vl (#12531)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-03-18 22:12:48 -07:00 |
|
Juan Muneton
|
4052b53227
|
fix scheduler for non-cuda devices and disable piecewise cuda graph f… (#19992)
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
|
2026-03-18 21:54:19 -07:00 |
|
Ling Zhang
|
f85455ab24
|
[Bugfix] fix qwen3vl hang when --mm-enable-dp-encoder is enable (#20759)
|
2026-03-18 21:51:39 -07:00 |
|
Ethan (Yusheng) Su
|
7f6f1a3ab1
|
[LoRA][II] Add fused MOE LoRA Triton kernel and tests (#19711)
|
2026-03-18 19:58:14 -07:00 |
|
R0CKSTAR
|
7553b7dcb0
|
chore: extract diffusion_common in python/pyproject_other.toml (#20803)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-03-19 10:39:16 +08:00 |
|
Qiaolin Yu
|
eea9e19c13
|
fix lint introduced in #20708 (#20886)
|
2026-03-18 15:38:52 -07:00 |
|
Chang Su
|
0d23a461a0
|
feat(mm)(grpc): compute M-RoPE positions for preprocessed VL inputs (#19973)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-18 15:34:50 -07:00 |
|
Liangsheng Yin
|
8b9482e665
|
fix(dp-attn): consistent overlap disable decision across DP ranks (#20853)
|
2026-03-18 15:16:39 -07:00 |
|
maocheng23
|
4e8829e4cd
|
Replace topk_ids with curr_topk_ids in fused_moe.py (#20302)
|
2026-03-18 21:57:05 +00:00 |
|
Chad Voegele
|
a3196d08b8
|
[MiniMax M2] Fix KV cache scale loading (#20870)
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-18 14:54:43 -07:00 |
|
Xinyuan Tong
|
6b8a6545b2
|
Add Mistral Small 4 (Pixtral) support (#20708)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Alex Nails <alexnails@radixark.ai>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: dbari <dbari@users.noreply.github.com>
|
2026-03-18 14:15:32 -07:00 |
|
Trevor Morris
|
df1d046de2
|
Add packed_modules_mapping for MiniMax-M2 (#19995)
|
2026-03-18 14:10:01 -07:00 |
|
Xinyuan Tong
|
d1e95af282
|
Upgrade transformers==5.3.0 (#17784)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Alison Shao <alisonshao@mac.lan>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-18 13:50:43 -07:00 |
|
Bruce Wu
|
e5750a572c
|
Support TP for lora lm_head layer (#18511)
Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>
|
2026-03-18 13:48:03 -07:00 |
|
ishandhanani
|
8f0f36c64b
|
[1/2] Add ModelExpress coordination for remote instance weight loading - matching TP (#19920)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishan Dhanani <ishan@dhanani.dev>
|
2026-03-18 13:38:32 -07:00 |
|
Yaochen Han
|
c7a71740a5
|
[NPU][diffusion] npu support enable_torch_compile for torchair backend on diffusion models (#20687)
|
2026-03-18 22:40:35 +03:00 |
|
Vladislav Nosivskoy
|
b9dba851a0
|
Fix streaming token ids data loss under load (#19977)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
|
2026-03-18 12:23:45 -07:00 |
|
Gabriel Wu
|
70876ae93b
|
fix: guard configure_deep_gemm_num_sms when JIT disabled (#20868)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-18 11:15:20 -07:00 |
|
Jackie
|
a6c7bb54eb
|
[Perf]Optimize waiting queue update with set usage (#20503)
|
2026-03-18 09:56:24 -07:00 |
|
jianan-gu
|
21c4fc6334
|
[DP encoder] Fix pos_emb layer TP issue when DP encoder enabled for Qwen3 VL (#20788)
|
2026-03-18 17:14:47 +08:00 |
|
Thomas Wang
|
c0a4408f78
|
[AMD] Fix dpsk-v32 accuracy issue on mi355 (#20840)
|
2026-03-18 02:06:15 -07:00 |
|
billishyahao
|
f0d7a3f427
|
[AMD][TBO] Fix mori ep dual stream accuracy (#19888)
|
2026-03-18 02:00:55 -07:00 |
|
Shangming Cai
|
8b46f1f4ec
|
[PD] Add retry interval in ensure_prefill_info (#20832)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-18 16:02:20 +08:00 |
|
Chuan (Richard) Li
|
93422f27d6
|
[AMD][AITER] Guard _use_mla_ps_kernel with self.use_mla in draft_extend_v2 paths (#20409)
|
2026-03-18 00:45:22 -07:00 |
|
R0CKSTAR
|
ead9d7aa43
|
[diffusion] fix: fix vae model offload on mps(#20607)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-18 15:44:59 +08:00 |
|
chenxu214
|
532470bcca
|
[NPU] add new fusion operator DispatchFFNCombine (#20245)
|
2026-03-18 15:22:04 +08:00 |
|
jinke
|
ae15fca192
|
[Bugfix] fix hicache mooncake backend extra config loading (#16808)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: jinke15 <jinke15@jd.com>
|
2026-03-18 15:07:39 +08:00 |
|
xingsy97
|
d20e9a20fa
|
[JIT] Inject target architecture flag into JIT compilation (#20103)
|
2026-03-17 23:16:49 -07:00 |
|
xingsy97
|
f78d5c3b3c
|
[JIT Kernel] Add hadamard kernel test and benchmark (#20030)
|
2026-03-17 23:16:35 -07:00 |
|
Артем Савкин
|
c64681f162
|
[Bugfix] [diffusion] Fix cache-dit with sp-degree only (#19965)
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-03-18 14:05:12 +08:00 |
|
Kangyan-Zhou
|
b6055e59cd
|
[HiCache] Reduce per-request backup log noise (#20813)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-17 22:47:14 -07:00 |
|
Viacheslav
|
30a35ecd90
|
Add gigachat3.1 parser (#19886)
Signed-off-by: Viacheslav Barinov <vvadbarinov@sberbank.ru>
Signed-off-by: Viacheslav Bv <viacheslav.teh@gmail.com>
Co-authored-by: Viacheslav Barinov <vvadbarinov@sberbank.ru>
|
2026-03-17 22:45:01 -07:00 |
|
Evgueni Petrov
|
2e860233ca
|
rocm: fix oom when loading fp8 weights close to size of available vram (#19941)
|
2026-03-17 22:44:19 -07:00 |
|
shiyu7
|
0acc1d3c9a
|
fix: change qwen 3.5 linear attention a_log to fp32 (#19961)
Co-authored-by: sunqi.7 <sunqi.7@bytedance.com>
|
2026-03-17 22:42:06 -07:00 |
|
Brayden Zhong
|
88c40ec16d
|
Use Flashinfer for target_verify in GDN model for SM120 (#20604)
|
2026-03-17 22:40:56 -07:00 |
|
Brayden Zhong
|
97d5386a21
|
Use TRTLLM allreduce fusion for Qwen 3.5 (#19889)
|
2026-03-17 22:40:22 -07:00 |
|
Yuan Luo
|
9c87e137ee
|
[GDN] Support GDN packed decode (#20627)
|
2026-03-18 13:20:07 +08:00 |
|
Kaixi Hou
|
4cc19862ef
|
[NVIDIA] Integrate FlashInfer decode kernel (Blackwell) for Qwen3.5 (#19150)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-03-18 13:11:18 +08:00 |
|
hzh0425
|
c43d495dd5
|
[RadixTree][9/N Refactor]: Support unified init_load_back params (#20590)
|
2026-03-18 11:19:52 +08:00 |
|
Mick
|
f15b3338c9
|
Revert "[Bugfix] Fix GLM-4.6V vision regression in glm4v_moe and glm_ocr" (#20740)
|
2026-03-18 10:09:50 +08:00 |
|