Ho-Ren (Jack) Chuang
|
4b5f63e1b8
|
FIX: (NSA) Compute topk_indices_offset when NSA prefill flashmla_sparse is used with FP8 KV cache (#20606)
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
|
2026-03-26 12:50:50 -07:00 |
|
jianzhao-xu
|
3867c6431a
|
Fix bug in dbrx model (#21445)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
|
2026-03-26 11:23:30 -07:00 |
|
shuwenn
|
646573e4e8
|
fix: use get_rope_config() to support models without rope_parameters (#21135)
|
2026-03-26 11:22:12 -07:00 |
|
McZyWu
|
0906e45cec
|
bugfix for weight loading for qwen3-next (#21313)
|
2026-03-26 21:21:00 +08:00 |
|
Mick
|
35720d9969
|
[diffusion] fix: fix qwen-image with nunchaku (#21415)
|
2026-03-26 16:31:44 +08:00 |
|
Anant Sharma
|
f289d173aa
|
[Deps] Bump xgrammar to 0.1.32 (#21032)
|
2026-03-26 01:22:37 -07:00 |
|
Chen, Zhentao
|
fd535942ac
|
[AMD]Integrate aiter's fused_topk for softmax scoring in topk function (#21421)
Co-authored-by: Chen, Todd <zhenchen@amd.com>
|
2026-03-26 00:57:56 -07:00 |
|
R0CKSTAR
|
a305964159
|
[MLX] Add native MLX execution backend for Apple Silicon Mac (#20342)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-03-26 00:09:17 -07:00 |
|
Xiaoyu Zhang
|
7ca015fe65
|
[Diffusion] Refactor diffusion JIT kernel test layout and narrow CI triggers (#21385)
|
2026-03-26 15:02:02 +08:00 |
|
Liangsheng Yin
|
79db3bec34
|
[CI] Add PID namespace and ps auxf diagnostics to killall.py (#21401)
|
2026-03-25 23:57:15 -07:00 |
|
MARATRIX
|
01ccdb91b1
|
[Fix] Add EPLB rebalance support for Kimi K2.5 (#21004)
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
|
2026-03-25 21:01:40 -07:00 |
|
ori
|
f142608408
|
[MUSA] apply_vocab_mask support musa device (#21296)
|
2026-03-25 21:00:58 -07:00 |
|
MARATRIX
|
f420b9b4a5
|
[MUSA][Feature] Enable Piecewise CUDA Graph support for MUSA platform (#20758)
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>
|
2026-03-25 21:00:28 -07:00 |
|
R0CKSTAR
|
abf4f1a47a
|
[MPS] Add StreamContext stub (#20782)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-03-25 20:59:51 -07:00 |
|
R0CKSTAR
|
02521420b3
|
[MPS] Support sglang.check_env (#20753)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-03-25 20:59:25 -07:00 |
|
gjsheu
|
d9e96153de
|
[NPU] Support Hybrid KV Cache for Ascend backend (#18032)
Co-authored-by: gengjinsong <gengjinsong@huawei.com>
|
2026-03-26 11:27:36 +08:00 |
|
Simo Lin
|
b835309f0c
|
Reland: compute M-RoPE positions for preprocessed VL inputs (#21244)
|
2026-03-25 20:12:43 -07:00 |
|
DarkSharpness
|
bb29893689
|
[Fix] Try to fix nvcc compilation error (#21246)
|
2026-03-26 10:59:36 +08:00 |
|
Aurick Qiao
|
a34e9ed64a
|
Add adjusted_filter_batch (#21260)
|
2026-03-26 10:59:05 +08:00 |
|
Aurick Qiao
|
53c1d8e963
|
Fix customized_info offset truncation (#21262)
|
2026-03-26 10:57:51 +08:00 |
|
Sam Shleifer
|
1100b9865c
|
Fix MxInt4 MoE returning wrong output variable (#21348)
|
2026-03-26 10:57:09 +08:00 |
|
Xiaoyu Zhang
|
6f2b51ade1
|
[Diffusion] Optimize diffusion Triton rotary embedding by processing multiple heads per token (#21387)
|
2026-03-26 08:59:25 +08:00 |
|
Hubert Lu
|
7c7b2a8c97
|
[Bugfix] Lazy-import CuteDSL KDA kernel to fix AMD/ROCm startup crash (#21428)
|
2026-03-25 16:37:26 -07:00 |
|
Liangsheng Yin
|
75682f1d2f
|
Remove noisy streaming backlog warning log (#21432)
|
2026-03-25 16:25:16 -07:00 |
|
Liangsheng Yin
|
4dd4e06f1d
|
[CI] Fix resource leak when setUpClass fails (#21338)
|
2026-03-25 16:22:44 -07:00 |
|
Xiaoyu Zhang
|
68f7f00174
|
[Diffusion] Speed up Qwen select01 Triton modulation kernels (#21318)
|
2026-03-25 20:48:39 +08:00 |
|
Mick
|
04eb72801f
|
[diffusion] CI: add performance tracking job to nightly (#21091)
|
2026-03-25 19:01:33 +08:00 |
|
Xiaoyu Zhang
|
689e9ef05c
|
[Diffusion] Add AKO4ALL kernel optimization skill (#21323)
|
2026-03-25 18:46:21 +08:00 |
|
Xiaoyu Zhang
|
e4ad10520b
|
[diffusion] Skip automatic Wan/MOVA DiT layerwise offload on high-end GPUs (#21248)
|
2026-03-25 18:45:30 +08:00 |
|
DarkSharpness
|
3d2a61cbf6
|
[Chore] Clean up JIT compilation flags (#21022)
|
2026-03-25 18:08:40 +08:00 |
|
Liangsheng Yin
|
4480e6c237
|
[CI] Add retry loop to killall_sglang GPU cleanup verification (#21393)
|
2026-03-25 02:16:20 -07:00 |
|
YC Yen-Ching Tseng
|
c494e47843
|
[AMD] Fix stage-b-test-small-1-gpu-amd (test_tool_choice.py) (#19868)
|
2026-03-25 01:10:21 -07:00 |
|
Alison Shao
|
5297a3cb46
|
[CI] Rewrite killall_sglang as Python with CI/local dual mode (#21331)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-03-24 23:54:01 -07:00 |
|
Mick
|
6cc5717e8a
|
[diffusion] doc: update quantization.md (#21356)
|
2026-03-25 14:48:38 +08:00 |
|
Alison Shao
|
17e41cfb21
|
Fix RDMA device mapping for non-zero GPU indices in disaggregation tests (#21303)
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-03-24 22:56:57 -07:00 |
|
Duyi-Wang
|
61a902ce88
|
[AMD][MoRI] Auto-select dispatch quantization type from MoE weight dtype. (#21040)
|
2026-03-24 22:53:57 -07:00 |
|
kk
|
86e2622097
|
[AMD] Add mha fp8-kv support (#21253)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2026-03-24 22:38:02 -07:00 |
|
Baizhou Zhang
|
2b75fed0dd
|
Workaround of DSA performance drop on B200 + DP (#21337)
|
2026-03-24 22:21:07 -07:00 |
|
Ke Bao
|
92492896a5
|
Fix disaggregation test bootstrap port conflict (#21271)
|
2026-03-24 21:14:41 -07:00 |
|
Ke Bao
|
c1d930c028
|
Increase flush cache timeout in hicache CI (#21305)
|
2026-03-24 19:00:59 -07:00 |
|
Yuan Luo
|
f273ba1ccc
|
[KDA] Support CuTeDSL KDA decode kernel (#21203)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-25 09:47:09 +08:00 |
|
DarkSharpness
|
dfc15b78b0
|
[misc] clean up kernel API (#21325)
|
2026-03-25 09:10:23 +08:00 |
|
ykcai-daniel
|
281fe10b5e
|
[diffusion] quant: support nvfp4 for Flux.2 (#20137)
Co-authored-by: zcnrex <zcnrex@gmail.com>
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Yikang Cai <dcai@catalyst-fleet1.cs.cmu.edu>
Co-authored-by: CHEN Xi <78632976+RubiaCx@users.noreply.github.com>
Co-authored-by: RubiaCx <1084281732@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-25 08:28:25 +08:00 |
|
Liangsheng Yin
|
37420dce0b
|
[CI] Enable failfast (-f) by default in run_suite.py (#21330)
|
2026-03-24 17:04:42 -07:00 |
|
Baizhou Zhang
|
1046dbe038
|
[Fix] Fix trtllm fp4 moe kernel not found error (#21343)
|
2026-03-24 16:38:05 -07:00 |
|
Mohammad Miadh Angkad
|
bbe25b2412
|
Use FlashInfer tinygemm for GPT-OSS MoE router on SM90+ (#20755)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-03-24 15:00:18 -07:00 |
|
Jiaxin(Jackson) Deng
|
c4db64c16b
|
Add Lychee Doc Links Check to Local and CI (#19742)
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
|
2026-03-24 13:48:26 -07:00 |
|
Jonah Bernard
|
a32e0d57e7
|
[LoRA][III] Add LoRA support for MoE layers and enable TP (#14105)
Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-24 13:14:14 -07:00 |
|
Zhang Yiyang (SII)
|
a3ed2e4d29
|
[diffusion][CI] Add CI for MOVA model inference (#20430)
Co-authored-by: Luo <139519292+0-693@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-24 21:28:16 +03:00 |
|
YC Yen-Ching Tseng
|
71f5ae3f9a
|
[AMD] Fix AMD Nightly Test - Transformers 5.3.0 incompatibility and gemma2-27b kv issue (#21193)
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
|
2026-03-24 10:41:44 -07:00 |
|