Qiaolin Yu
|
d8db3077ca
|
Fix draft extend cuda graph when spec_step=1 (#21709)
|
2026-03-31 18:29:56 -07:00 |
|
Liangsheng Yin
|
e4c565f2f2
|
[Misc] Tiny: Add test network timeouts and dynamic max-parallel for 5090/2-gpu runners (#21800)
|
2026-03-31 18:27:39 -07:00 |
|
Chang Su
|
1389962f06
|
[gRPC] Preserve original ImportError in grpc_server.py (#21801)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-31 18:22:29 -07:00 |
|
Brayden Zhong
|
6a9b09847c
|
CUTLASS NVFP4 GEMM improvement of SM120 (#21314)
|
2026-04-01 09:04:34 +08:00 |
|
Johnsonms
|
5bbf347bb3
|
[jit_kernel] Optimize fused_qknorm_rope: deduplicate sincosf for interleave RoPE (#21654)
|
2026-04-01 09:04:13 +08:00 |
|
Xiaoyu Zhang
|
cdd7d6a227
|
Remove obsolete sgl-kernel legacy paths (#21528)
|
2026-04-01 09:00:20 +08:00 |
|
Liangsheng Yin
|
a8759dd9af
|
Fix killall.py crash when sglang is not yet installed (#21797)
|
2026-03-31 17:40:58 -07:00 |
|
Liangsheng Yin
|
7581d814ae
|
Add CompletionSampler for non-chat eval in run_eval (#21785)
|
2026-03-31 16:33:07 -07:00 |
|
Yilong Zhao
|
1f7cee81da
|
[moe] add customized option to moe-a2a-backend (#21786)
|
2026-03-31 16:32:47 -07:00 |
|
Baizhou Zhang
|
f60f2ccc10
|
[Fix] Fall back to triton MOE for GPT-OSS on Blackwell with driver >= 595 (#21780)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-31 15:52:10 -07:00 |
|
weireweire
|
9191b02eda
|
Fix cuda graph max bs capture upper bound (#21005)
|
2026-03-31 15:20:56 -07:00 |
|
Ethan (Yusheng) Su
|
3c91ebdf55
|
[2/n] lora - Shared outer experts and support qwen3_30b_a3b_instruct (#21466)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-31 14:06:23 -07:00 |
|
Liangsheng Yin
|
f4505e2ee3
|
Fix ineffective is_base_mistral CI patch for HF API rate limiting (#21729)
|
2026-03-31 12:54:34 -07:00 |
|
Trevor Morris
|
b91f78d255
|
[bugfix] Fix rope theta config for MiniMax after transformers v5 update (#21241)
|
2026-03-31 11:37:03 -07:00 |
|
Michael
|
8d919bbd44
|
[AMD] Fix Handle missing rope_theta in get_rope_config for Grok-1 (#21518)
|
2026-03-31 10:58:12 -07:00 |
|
Zhangheng
|
91048b2a8e
|
[HiMambaTree]: Optimize mamba host lock mechanism (#21750)
|
2026-03-31 21:52:24 +08:00 |
|
R0CKSTAR
|
e67dbf257a
|
[diffusion] fix: fix Wan2.2-I2V-A14B video max size issue(#21390)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-31 21:49:53 +08:00 |
|
Mick
|
7790645b82
|
[diffusion] UX: replace deprecated ORJSONResponse with orjson_response (#21755)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-31 21:41:33 +08:00 |
|
JD
|
20d07c4384
|
Fix remote weight info nnode>1 and dp>1 (#17389)
|
2026-03-31 21:17:18 +08:00 |
|
Shangming Cai
|
ca2b2130ba
|
[PD] Tiny cleanup after KVReceiver refactor (#21760)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-31 21:07:57 +08:00 |
|
Yuan Luo
|
c7adca9992
|
Fix kimi-linear launch server error (#21752)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-31 21:07:08 +08:00 |
|
Ke Bao
|
dbc97456ad
|
Enable evict swa with piecewise cuda graph (#21754)
|
2026-03-31 20:07:16 +08:00 |
|
weireweire
|
4455d17619
|
[PD] Refactor Disagg Conn and Fix Hang with total_request/total_tokens Balancing (#21299)
Co-authored-by: Weiliangl User <weiliangl@login-node.hosted.internal>
|
2026-03-31 18:01:50 +08:00 |
|
R0CKSTAR
|
6c03ae6fe2
|
[diffusion] fix: fix typo (#21746)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-03-31 17:51:46 +08:00 |
|
xiaoqi
|
a6a8b9b376
|
bugfix(model):fix deepstack index out of range error (#21727)
Co-authored-by: xiaoqi.31 <xiaoqi.31@jd.com>
|
2026-03-31 02:41:47 -07:00 |
|
Thomas Wang
|
5628e908ae
|
[AMD] Use tgemm.mm for MoEGate router gemm in deepseek_v2.py (#21657)
|
2026-03-31 00:55:40 -07:00 |
|
xiazhahe
|
b4cb31f698
|
[NPU] fix conflict between empty_cache and use_mem_pool (#21507)
|
2026-03-31 15:37:33 +08:00 |
|
Mohammad Miadh Angkad
|
dd9c9c1b8e
|
Add explicit disable flag for FlashInfer allreduce fusion (#21446)
|
2026-03-31 00:15:44 -07:00 |
|
Yuhao Yang
|
68a4573627
|
[diffusion] fix: fix Flux.2 with tp(#21664)
|
2026-03-31 14:14:59 +08:00 |
|
jacky.cheng
|
8ba992411d
|
[AMD] Fix CI multimodal-gen-test-1-gpu-amd for gen model (#21621)
|
2026-03-30 23:02:20 -07:00 |
|
Jincong Chen
|
03e4f2858d
|
[Perf]Remove H2D for Qwen3.5 SpecV2 (#20864)
|
2026-03-31 11:54:58 +08:00 |
|
Lewis
|
33e725b052
|
[Fix] Update supported custom_mem_pool types for mooncake (#21728)
Co-authored-by: 百麒 <yaozhong.lyz@alibaba-inc.com>
|
2026-03-31 11:18:30 +08:00 |
|
Xiaoyu Zhang
|
505eb312ec
|
Revert "DeepSeek-R1-0528-w4a8: DeepEP Low Latency Dispatch Adopts FP8 Communication" (#21719)
|
2026-03-31 10:22:01 +08:00 |
|
DarkSharpness
|
4e480982fa
|
[misc] multiprocess compilation to speed up test (#21483)
|
2026-03-31 08:56:37 +08:00 |
|
kk
|
67c295b5f5
|
[AMD] fix performance regression issue when run gpt-oss with "--context-length 13824" (#21691)
|
2026-03-30 16:30:16 -07:00 |
|
Zhai Feiyue
|
daf697afda
|
[AMD] Add SGLANG_DISAGGREGATION_NUM_PRE_ALLOCATE_REQS env var for configurable KV transfer overlap (#20410)
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-03-30 14:37:16 -07:00 |
|
Aditya Sharma
|
d6029de6ad
|
[Bugfix][NPU] Skip FRACTAL_NZ format for MoE weights with unaligned dimensions (#21209)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-03-30 23:22:17 +03:00 |
|
Vedant V Jhaveri
|
4a9ffc3ab6
|
fix nemotron capture for non attention layers (#21436)
|
2026-03-30 12:50:49 -07:00 |
|
Yuxuan Zhang
|
ad064c2f4e
|
[GLM-V and GLM-4.7] Cast to FP32 before gate projection for GLM model. (#21660)
|
2026-03-30 12:25:27 -07:00 |
|
Makcum888e
|
f4b0e9c64a
|
[diffusion] [NPU] support ring attention on NPU with FA (#21383)
|
2026-03-30 20:10:55 +03:00 |
|
GXIN
|
752d260c77
|
[NPU][diffusion]: support parallel decoding of qwen-image (#20757)
Co-authored-by: 高鑫 <gaoxin@gaoxindeMacBook-Pro.local>
|
2026-03-30 20:03:24 +03:00 |
|
cen121212
|
ba6d54d0f0
|
[NPU] GLM-5 optimize with fused kernels (#18617)
|
2026-03-30 22:48:15 +08:00 |
|
xieminghe1
|
7119d59747
|
DeepSeek-R1-0528-w4a8: DeepEP Low Latency Dispatch Adopts FP8 Communication (#14162)
Co-authored-by: undefined <zhouchen.arrebol@jd.com>
|
2026-03-30 22:27:28 +08:00 |
|
heziiop
|
673ffb3116
|
[NPU] fix eagle3 accept rate (#21255)
|
2026-03-30 21:58:25 +08:00 |
|
GXIN
|
c5c58c3349
|
[NPU][Diffusion] fix sp modulate for qwen-image-edit (#20974)
Co-authored-by: 高鑫 <gaoxin@gaoxindeMacBook-Pro.local>
|
2026-03-30 16:18:48 +03:00 |
|
Mick
|
0a1fb42869
|
[diffusion] CI: relax pr-test threshold (#21682)
|
2026-03-30 20:23:46 +08:00 |
|
Mick
|
b76730701b
|
[diffusion] feat: enhance overlay mechanism (#21648)
|
2026-03-30 19:45:34 +08:00 |
|
LiYomi
|
1d6424d5ad
|
fix: Mistral Small 4 fails to start due to config/weight format mismatch (#21620)
Co-authored-by: mengxiancheng03 <mengxiancheng03@kuaishou.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-30 01:57:35 -07:00 |
|
strgrb
|
b246269444
|
fix mamba cache leak when adder fails to add a matched req. (#21404)
|
2026-03-30 16:45:49 +08:00 |
|
Baizhou Zhang
|
62a63eeff7
|
[Fix] Fix weight_loader property assignment for qwen3-next FP8 models (#21662)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-30 01:35:59 -07:00 |
|