Baizhou Zhang
|
5e12c4e08e
|
[DSA] Support trtllm sparse mla kernel for prefill batches (#21783)
|
2026-04-01 13:55:05 -07:00 |
|
Trevor Morris
|
8950d129bd
|
[refactor] Clean up duplicate flashinfer trtllm moe code (#21233)
|
2026-04-01 13:52:22 -07:00 |
|
Liangsheng Yin
|
0138708576
|
[Misc] Add network timeout to eval dataset downloads (#21873)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-01 13:16:14 -07:00 |
|
Ziang Li
|
a19ef3a615
|
[FlashInver v0.6.7] Integrate flashinfer_trtllm mxfp8 gemm (#21576)
|
2026-04-01 15:55:06 -04:00 |
|
shuwenn
|
a1c725bdc5
|
fix: pre-init tokenizer_manager to avoid AttributeError in shutdown (#21824)
|
2026-04-01 10:54:53 -07:00 |
|
R0CKSTAR
|
ca3286d2d5
|
[diffusion] hardware: support FA3 attention backend on MUSA (attn backend, 14/N) (#18648)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-01 10:49:34 -07:00 |
|
shuwenn
|
6098c51bc2
|
fix(MiMo-V2-Flash): add mimo reasoning parser (#21414)
|
2026-04-02 00:47:27 +08:00 |
|
DarkSharpness
|
20f4193589
|
[Feature] JIT rmsnorm update (with claude) (#21834)
|
2026-04-01 23:40:00 +08:00 |
|
Ratish P
|
4f5b55e379
|
[diffusion][CI]: Add individual component accuracy CI for diffusion models (#18709)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-04-01 21:51:36 +08:00 |
|
Cherry_ming
|
e67b95d66b
|
[NPU]Add a full test pipeline on NPU, resolve issues in the NPU test architecture (#20751)
|
2026-04-01 19:56:31 +08:00 |
|
Yuhao Yang
|
1aabe44b64
|
[VLM] remove AsyncMMDataProcessor wrapper (#21651)
|
2026-04-01 17:39:50 +08:00 |
|
Mick
|
7bba319f1e
|
[diffusion] fix: respect --prompt-path (#21756)
|
2026-04-01 16:47:59 +08:00 |
|
wduan-hai
|
95b881452e
|
Fix in-place mode in pause generation (#21705)
|
2026-04-01 01:36:28 -07:00 |
|
yunkchen
|
eec70286ec
|
[Bugfix] Fix effective_mamba_size over-allocation (#20858)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-04-01 16:17:14 +08:00 |
|
yudian0504
|
7d2b856ce7
|
[Bug][VLM] Fix shared memory race condition in ShmPointerMMData broadcast for multi-GPU VLM serving (#21655)
|
2026-04-01 16:15:14 +08:00 |
|
Zhiqiang Xie
|
9eb75211b1
|
style refinement for hisparse (#21198)
|
2026-04-01 01:03:17 -07:00 |
|
Yuxuan Zhang
|
57341b128f
|
glm_interleave for GLM-V (#21671)
|
2026-04-01 00:21:10 -07:00 |
|
khalilzhk
|
835e19656f
|
Bug fix for llama eagle3 (#21397)
|
2026-04-01 15:01:53 +08:00 |
|
Alex Nails
|
912494f596
|
[CI] Fix lint that was not applied in #21458 (#21818)
|
2026-03-31 23:58:12 -07:00 |
|
Wenyao Gao
|
2861596fc6
|
[Bugfix] Fix PP tied embeddings weight loading for qwen3.5 4B dense model (#21347)
|
2026-04-01 14:51:03 +08:00 |
|
YC Yen-Ching Tseng
|
a188208e9a
|
[AMD] Optimize Qwen3-VL decode - fuse QK-norm + 3D mRoPE + KV cache write (#21458)
Co-authored-by: Bingxu Chen <bingxche@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-03-31 23:34:07 -07:00 |
|
sbeurnier
|
71baa025be
|
Fix added tokens config with sensible filter (#17905)
|
2026-03-31 23:32:21 -07:00 |
|
Xinyuan Tong
|
87a2768269
|
VLM: change default mm-attention backend from triton_attn to fa4 (on blackwell) (#21595)
|
2026-04-01 14:29:59 +08:00 |
|
Yuxuan Zhang
|
72d3d8f4cf
|
[Feature Restoration] repetition_penalty is essential for GLM-V models (#21258)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-03-31 23:29:49 -07:00 |
|
Ethan (Yusheng) Su
|
cffc95edf4
|
[3/n] lora moe - Support Qwen3-VL-30B-A3B-Instruct (#21469)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-31 23:15:16 -07:00 |
|
sglang-bot
|
ca3ba05a7a
|
chore: bump flashinfer version to 0.6.7 (#21422)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-31 21:18:16 -07:00 |
|
Yuan Luo
|
03a87068ea
|
[KDA] Fuse scaled_dot_kkt + solve_tril + recompute_w_u for KDA (#21604)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-31 20:57:27 -07:00 |
|
Karan Bansal
|
e9b6cce237
|
[MPS] Fix Triton stub sub-module imports on Python 3.12+ (#21551)
Co-authored-by: karanb192 <karan@example.com>
Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
|
2026-03-31 20:26:01 -07:00 |
|
KnightLTC
|
2488233ad5
|
[bugfix]GLM-4V model (#17122)
|
2026-04-01 10:37:40 +08:00 |
|
Mick
|
f9debd6514
|
[diffusion] CI: improve ci reliability (#21763)
|
2026-04-01 10:06:57 +08:00 |
|
Liangsheng Yin
|
09907795e1
|
Add latency and throughput metrics to run_eval (#21793)
|
2026-03-31 18:36:14 -07:00 |
|
shuwenn
|
8e84f846cc
|
[Diffusion] Add --uvicorn-access-log-exclude-prefixes to suppress noisy access logs (#20379)
|
2026-04-01 09:31:44 +08:00 |
|
Qiaolin Yu
|
d8db3077ca
|
Fix draft extend cuda graph when spec_step=1 (#21709)
|
2026-03-31 18:29:56 -07:00 |
|
Liangsheng Yin
|
e4c565f2f2
|
[Misc] Tiny: Add test network timeouts and dynamic max-parallel for 5090/2-gpu runners (#21800)
|
2026-03-31 18:27:39 -07:00 |
|
Chang Su
|
1389962f06
|
[gRPC] Preserve original ImportError in grpc_server.py (#21801)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-31 18:22:29 -07:00 |
|
Brayden Zhong
|
6a9b09847c
|
CUTLASS NVFP4 GEMM improvement of SM120 (#21314)
|
2026-04-01 09:04:34 +08:00 |
|
Johnsonms
|
5bbf347bb3
|
[jit_kernel] Optimize fused_qknorm_rope: deduplicate sincosf for interleave RoPE (#21654)
|
2026-04-01 09:04:13 +08:00 |
|
Xiaoyu Zhang
|
cdd7d6a227
|
Remove obsolete sgl-kernel legacy paths (#21528)
|
2026-04-01 09:00:20 +08:00 |
|
Liangsheng Yin
|
a8759dd9af
|
Fix killall.py crash when sglang is not yet installed (#21797)
|
2026-03-31 17:40:58 -07:00 |
|
Liangsheng Yin
|
7581d814ae
|
Add CompletionSampler for non-chat eval in run_eval (#21785)
|
2026-03-31 16:33:07 -07:00 |
|
Yilong Zhao
|
1f7cee81da
|
[moe] add customized option to moe-a2a-backend (#21786)
|
2026-03-31 16:32:47 -07:00 |
|
Baizhou Zhang
|
f60f2ccc10
|
[Fix] Fall back to triton MOE for GPT-OSS on Blackwell with driver >= 595 (#21780)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-31 15:52:10 -07:00 |
|
weireweire
|
9191b02eda
|
Fix cuda graph max bs capture upper bound (#21005)
|
2026-03-31 15:20:56 -07:00 |
|
Ethan (Yusheng) Su
|
3c91ebdf55
|
[2/n] lora - Shared outer experts and support qwen3_30b_a3b_instruct (#21466)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-31 14:06:23 -07:00 |
|
Liangsheng Yin
|
f4505e2ee3
|
Fix ineffective is_base_mistral CI patch for HF API rate limiting (#21729)
|
2026-03-31 12:54:34 -07:00 |
|
Trevor Morris
|
b91f78d255
|
[bugfix] Fix rope theta config for MiniMax after transformers v5 update (#21241)
|
2026-03-31 11:37:03 -07:00 |
|
Michael
|
8d919bbd44
|
[AMD] Fix Handle missing rope_theta in get_rope_config for Grok-1 (#21518)
|
2026-03-31 10:58:12 -07:00 |
|
Zhangheng
|
91048b2a8e
|
[HiMambaTree]: Optimize mamba host lock mechanism (#21750)
|
2026-03-31 21:52:24 +08:00 |
|
R0CKSTAR
|
e67dbf257a
|
[diffusion] fix: fix Wan2.2-I2V-A14B video max size issue(#21390)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-31 21:49:53 +08:00 |
|
Mick
|
7790645b82
|
[diffusion] UX: replace deprecated ORJSONResponse with orjson_response (#21755)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-31 21:41:33 +08:00 |
|