LiYomi
|
1d6424d5ad
|
fix: Mistral Small 4 fails to start due to config/weight format mismatch (#21620)
Co-authored-by: mengxiancheng03 <mengxiancheng03@kuaishou.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-30 01:57:35 -07:00 |
|
strgrb
|
b246269444
|
fix mamba cache leak when adder fails to add a matched req. (#21404)
|
2026-03-30 16:45:49 +08:00 |
|
Baizhou Zhang
|
62a63eeff7
|
[Fix] Fix weight_loader property assignment for qwen3-next FP8 models (#21662)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-30 01:35:59 -07:00 |
|
Hubert Lu
|
e6071e60c0
|
[AMD] Support AMD MXFP4 Qwen3.5-397B-A17B model (#21234)
|
2026-03-30 01:14:18 -07:00 |
|
kk
|
b9a68c304e
|
[AMD] Fused rope kv store (#21315)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2026-03-30 00:05:41 -07:00 |
|
blzheng
|
ed01e1d5d6
|
[CPU] add kernel apply_rotary_pos_emb_cpu for Qwen3-VL and Qwen3-Omni (#13121)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-03-29 23:43:46 -07:00 |
|
Aishwarya Ramasethu
|
c32ee48886
|
MFU metrics in Prometheus (#19395)
|
2026-03-29 23:40:06 -07:00 |
|
Polisetty V R K Jyothendra Varma
|
f0303fd07e
|
[Intel GPU] Enable DeepSeek R1 inference on XPU (#18461)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
|
2026-03-29 22:35:59 -07:00 |
|
Feng Su
|
9b4dd27478
|
[Fix] Fix Qwen3.5 MoE model loading and Mamba cache sharding in PP mode (#21448)
Co-authored-by: zhangxiaolei123456 <zhangxiaolei.666@bytedance.com>
|
2026-03-30 11:57:26 +08:00 |
|
Liangsheng Yin
|
c06ca1526c
|
Fix circular reference in CustomTestCase.__init_subclass__ (#21650)
Co-authored-by: wan4ch <wan4ch@gmail.com>
|
2026-03-29 20:38:12 -07:00 |
|
Lianmin Zheng
|
9f7792415a
|
Clean up TokenizerManager: remove dead code and improve rid validation (#21639)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-29 15:12:49 -07:00 |
|
Lianmin Zheng
|
f3970b17ef
|
[Cleanup] Remove unused BatchMultimodalOutput and BatchMultimodalDecodeReq (#21640)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-29 14:54:25 -07:00 |
|
Lianmin Zheng
|
1d9c8e8c9e
|
Simplify routed experts test and move base64 encoding to tokenizer manager (#21634)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-29 12:44:01 -07:00 |
|
Mohammad Miadh Angkad
|
2acdda1d85
|
[Fix] Remove redundant allreduce fusion block and skip TP=1 (#20621)
|
2026-03-29 12:30:40 -07:00 |
|
wili
|
bda94fc779
|
[Fix] SGLANG_USE_CUDA_IPC_TRANSPORT=1 and SGLANG_ENABLE_MM_SPLITTING=1 do not work at the same time. (#19915)
|
2026-03-30 01:15:26 +08:00 |
|
saatwiknagpal
|
d2440dcf58
|
[VLM] perf: optimize CUDA IPC for multimodal transfer by caching IPC pool handles (#21418)
|
2026-03-30 00:20:38 +08:00 |
|
wili
|
5bb9ca0e63
|
[Feature] Optimizations for JPEG input on NVIDIA GPU (#19749)
|
2026-03-30 00:06:14 +08:00 |
|
Bi Xue
|
42c46e6334
|
[sgl] disable piecewise cuda graph when a model doesn't have layers (#21565)
|
2026-03-29 23:04:20 +08:00 |
|
Hanlin Bi
|
aa9177152e
|
fix cuda graph capturing error in sm120 mxfp8 triton path (#19835)
|
2026-03-29 01:59:24 -07:00 |
|
Liangsheng Yin
|
fec9961a1f
|
Clean up _wait_for_scheduler_ready implementation (#21626)
|
2026-03-29 01:02:33 -07:00 |
|
psaab
|
d2fa8d67ba
|
Wrap IPv6 addresses in gRPC, bench_serving, and log messages (#21236)
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-03-29 00:36:31 -07:00 |
|
shuwenn
|
18074e25dc
|
fix: scheduler launch hang when non-current rank dies (#20287)
|
2026-03-29 00:28:45 -07:00 |
|
Simon (Jiyou) Li
|
22e4733ab9
|
Add subprocess liveness monitor to detect scheduler crashes (#18582)
Co-authored-by: 继优 <jiyou.ljy@alibaba-inc.com>
Co-authored-by: shuwenn <47200617+alphabetc1@users.noreply.github.com>
|
2026-03-29 00:09:13 -07:00 |
|
Kangyan-Zhou
|
9d64a82173
|
feat(ci): add GB300 nightly benchmark test suites (#21487)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-28 21:54:03 -07:00 |
|
Lianmin Zheng
|
ba6b501f3a
|
Clean up detokenizer and remove dead multimodal_gen code (#21588)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-28 21:44:40 -07:00 |
|
Xiaoyu Zhang
|
516cff97a3
|
[Diffusion] Align diffusion benchmark skill presets with nightly comparison cases (#21616)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-29 12:12:17 +08:00 |
|
Yuan Luo
|
343a7ac652
|
[GDN] Fuse GDN kkt + solve_tril into one kernel (#21411)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-29 12:02:07 +08:00 |
|
jacky.cheng
|
c86f6c2831
|
[AMD] Add peft>=0.18.0 to diffusion_hip deps for transformers 5.x compat for AMD diffusion model (#21442)
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-03-28 20:28:05 -07:00 |
|
Yuhao Yang
|
4e69f14b95
|
fix bench_serving sglang backend to support image dataset (#21294)
|
2026-03-29 10:02:11 +08:00 |
|
eigen
|
3ab9afd653
|
fix: piecewise_cuda_graph get correct qo_indptr (#21452)
Co-authored-by: Avery Huang <averyh@nvidia.com>
|
2026-03-28 15:57:29 -07:00 |
|
Shu Wang
|
efebcab43e
|
Support skip-softmax attention (#19089)
|
2026-03-28 15:55:48 -07:00 |
|
Xinyuan Tong
|
ced69c9f84
|
feat: enable CUDA graph and timestamp for the whisper model(#21190)
|
2026-03-29 01:46:03 +08:00 |
|
Yuhao Yang
|
57cf4790ca
|
[VLM] Optimize ShmPointerMMData for multi-pickle safety and deferred unwrap (#21465)
|
2026-03-28 23:11:12 +08:00 |
|
Mick
|
fc9de157f9
|
[diffusion] feat: support overlay model materialization (#21600)
|
2026-03-28 23:02:38 +08:00 |
|
Aditya Sharma
|
627e162335
|
[diffusion] fix: fix Flux2-Klein prompt tokenization length to 512 and add regression coverage (#21407)
|
2026-03-28 17:28:02 +08:00 |
|
Baizhou Zhang
|
edd4d54023
|
[Clean] Remove deprecated environs (#21536)
|
2026-03-28 00:35:44 -07:00 |
|
Liangsheng Yin
|
402628e560
|
Patch transformers is_base_mistral in CI to avoid HF 429 rate limiting (#21586)
|
2026-03-27 22:19:36 -07:00 |
|
Jianying
|
daf02bde33
|
Fix Piecewise CUDA Graph crash with -enable-mixed-chunk (#20441)
Co-authored-by: jianyingzhu <joeyzhu@nvidia.com>
|
2026-03-27 21:56:21 -07:00 |
|
Liangsheng Yin
|
19b1f75186
|
Fix HFRunner hang when subprocess dies during init (#21582)
|
2026-03-27 21:22:42 -07:00 |
|
Yuhao Yang
|
5ef56682b8
|
reduce CPU peak memory in multimodal tensor hashing (#21123)
|
2026-03-28 11:09:16 +08:00 |
|
Fengyuan Yu
|
9fa7b974fd
|
[diffusion] chore: remove redundant identity preprocess_text functions(#20633)
Co-authored-by: Fengyuan Yu <15fengyuan@gmail.com>
|
2026-03-28 10:07:30 +08:00 |
|
Eitan Turok
|
e570ca96f6
|
[diffusion] refactor: Unify TeaCacheParams and WanTeaCacheParams (#20706)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-28 09:51:44 +08:00 |
|
Mick
|
f0c68fbefd
|
[diffusion] UX: aggregate expected dtype-cast logs during weight loading (#21552)
|
2026-03-28 09:50:40 +08:00 |
|
Trevor Morris
|
7160b6cb76
|
[NVIDIA] Enable automatic NUMA configuration (#19452)
|
2026-03-27 18:44:13 -07:00 |
|
Vladislav Nosivskoy
|
c37200f5e4
|
Scope streaming backlog coalescing to incremental_streaming_output mode (#21037)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2026-03-27 17:29:54 -07:00 |
|
Qiaolin Yu
|
a27651d5e0
|
Remove sync when enabling return_logprob (#20972)
|
2026-03-27 16:36:28 -07:00 |
|
Ethan (Yusheng) Su
|
6d48719e31
|
[1/n] lora support - Auto detect lora target modules (#21439)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-27 16:08:36 -07:00 |
|
narutolhy
|
9b29131961
|
fix tp capture in vit cuda graph (#17255)
|
2026-03-27 22:38:18 +00:00 |
|
Muqi Li
|
38ad251738
|
feat: add gc_threshold arg (#21481)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-27 13:42:46 -07:00 |
|
huangtingwei
|
d864622a68
|
[Hicache & JIT_kernel] Support page first layout & mla jit kernel (#18311)
|
2026-03-27 08:54:36 -07:00 |
|