shuwenn
|
18074e25dc
|
fix: scheduler launch hang when non-current rank dies (#20287)
|
2026-03-29 00:28:45 -07:00 |
|
Simon (Jiyou) Li
|
22e4733ab9
|
Add subprocess liveness monitor to detect scheduler crashes (#18582)
Co-authored-by: 继优 <jiyou.ljy@alibaba-inc.com>
Co-authored-by: shuwenn <47200617+alphabetc1@users.noreply.github.com>
|
2026-03-29 00:09:13 -07:00 |
|
Kangyan-Zhou
|
9d64a82173
|
feat(ci): add GB300 nightly benchmark test suites (#21487)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-28 21:54:03 -07:00 |
|
Lianmin Zheng
|
ba6b501f3a
|
Clean up detokenizer and remove dead multimodal_gen code (#21588)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-28 21:44:40 -07:00 |
|
Xiaoyu Zhang
|
516cff97a3
|
[Diffusion] Align diffusion benchmark skill presets with nightly comparison cases (#21616)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-29 12:12:17 +08:00 |
|
Yuan Luo
|
343a7ac652
|
[GDN] Fuse GDN kkt + solve_tril into one kernel (#21411)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-29 12:02:07 +08:00 |
|
jacky.cheng
|
c86f6c2831
|
[AMD] Add peft>=0.18.0 to diffusion_hip deps for transformers 5.x compat for AMD diffusion model (#21442)
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-03-28 20:28:05 -07:00 |
|
Yuhao Yang
|
4e69f14b95
|
fix bench_serving sglang backend to support image dataset (#21294)
|
2026-03-29 10:02:11 +08:00 |
|
eigen
|
3ab9afd653
|
fix: piecewise_cuda_graph get correct qo_indptr (#21452)
Co-authored-by: Avery Huang <averyh@nvidia.com>
|
2026-03-28 15:57:29 -07:00 |
|
Shu Wang
|
efebcab43e
|
Support skip-softmax attention (#19089)
|
2026-03-28 15:55:48 -07:00 |
|
Xinyuan Tong
|
ced69c9f84
|
feat: enable CUDA graph and timestamp for the whisper model(#21190)
|
2026-03-29 01:46:03 +08:00 |
|
Yuhao Yang
|
57cf4790ca
|
[VLM] Optimize ShmPointerMMData for multi-pickle safety and deferred unwrap (#21465)
|
2026-03-28 23:11:12 +08:00 |
|
Mick
|
fc9de157f9
|
[diffusion] feat: support overlay model materialization (#21600)
|
2026-03-28 23:02:38 +08:00 |
|
Aditya Sharma
|
627e162335
|
[diffusion] fix: fix Flux2-Klein prompt tokenization length to 512 and add regression coverage (#21407)
|
2026-03-28 17:28:02 +08:00 |
|
Baizhou Zhang
|
edd4d54023
|
[Clean] Remove deprecated environs (#21536)
|
2026-03-28 00:35:44 -07:00 |
|
Liangsheng Yin
|
402628e560
|
Patch transformers is_base_mistral in CI to avoid HF 429 rate limiting (#21586)
|
2026-03-27 22:19:36 -07:00 |
|
Jianying
|
daf02bde33
|
Fix Piecewise CUDA Graph crash with -enable-mixed-chunk (#20441)
Co-authored-by: jianyingzhu <joeyzhu@nvidia.com>
|
2026-03-27 21:56:21 -07:00 |
|
Liangsheng Yin
|
19b1f75186
|
Fix HFRunner hang when subprocess dies during init (#21582)
|
2026-03-27 21:22:42 -07:00 |
|
Yuhao Yang
|
5ef56682b8
|
reduce CPU peak memory in multimodal tensor hashing (#21123)
|
2026-03-28 11:09:16 +08:00 |
|
Fengyuan Yu
|
9fa7b974fd
|
[diffusion] chore: remove redundant identity preprocess_text functions(#20633)
Co-authored-by: Fengyuan Yu <15fengyuan@gmail.com>
|
2026-03-28 10:07:30 +08:00 |
|
Eitan Turok
|
e570ca96f6
|
[diffusion] refactor: Unify TeaCacheParams and WanTeaCacheParams (#20706)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-28 09:51:44 +08:00 |
|
Mick
|
f0c68fbefd
|
[diffusion] UX: aggregate expected dtype-cast logs during weight loading (#21552)
|
2026-03-28 09:50:40 +08:00 |
|
Trevor Morris
|
7160b6cb76
|
[NVIDIA] Enable automatic NUMA configuration (#19452)
|
2026-03-27 18:44:13 -07:00 |
|
Vladislav Nosivskoy
|
c37200f5e4
|
Scope streaming backlog coalescing to incremental_streaming_output mode (#21037)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2026-03-27 17:29:54 -07:00 |
|
Qiaolin Yu
|
a27651d5e0
|
Remove sync when enabling return_logprob (#20972)
|
2026-03-27 16:36:28 -07:00 |
|
Ethan (Yusheng) Su
|
6d48719e31
|
[1/n] lora support - Auto detect lora target modules (#21439)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-27 16:08:36 -07:00 |
|
narutolhy
|
9b29131961
|
fix tp capture in vit cuda graph (#17255)
|
2026-03-27 22:38:18 +00:00 |
|
Muqi Li
|
38ad251738
|
feat: add gc_threshold arg (#21481)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-27 13:42:46 -07:00 |
|
huangtingwei
|
d864622a68
|
[Hicache & JIT_kernel] Support page first layout & mla jit kernel (#18311)
|
2026-03-27 08:54:36 -07:00 |
|
Bi Xue
|
30397e0a1e
|
[rl][sgl] fix tensor mismatch after pause (#21514)
|
2026-03-27 23:02:30 +08:00 |
|
yang1002378395-cmyk
|
279e7738c5
|
[diffusion] fix: return None instead of raising RuntimeError when no model info found (#21319)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-27 22:42:39 +08:00 |
|
Xiaoyu Zhang
|
9238bd08a2
|
[CI] Register missing jit_kernel test files (#21547)
|
2026-03-27 19:39:08 +08:00 |
|
yang1002378395-cmyk
|
f83b1b73a8
|
[diffusion] feat: add --strict-ports option for predictable port assignment (#21320)
Co-authored-by: 阳虎 <yanghu@yanghudeMacBook-Pro.local>
|
2026-03-27 16:40:50 +08:00 |
|
zwang86
|
5fc5c18bed
|
fix(security): replace unsafe pickle.loads with SafeUnpickler for CVE-2026-3989 (#20904)
|
2026-03-27 00:43:41 -07:00 |
|
Khoa Pham
|
8d4fca5908
|
[Security] 1/N: Bind ZMQ sockets to localhost to prevent unauthenticated remote access (#21435)
|
2026-03-26 23:33:49 -07:00 |
|
Xiaoyu Zhang
|
d633ab7349
|
[Diffusion] Add qknorm rope fuse kernel (#21440)
|
2026-03-27 14:27:08 +08:00 |
|
Xiaoyu Zhang
|
e8d46f145c
|
Opt jit qknorm_across_heads cuda kernel (#21503)
|
2026-03-27 13:30:46 +08:00 |
|
Johnsonms
|
8a56a7b04d
|
[jit_kernel] Migrate cast (downcast_fp8) from sgl-kernel AOT to JIT (#19103)
|
2026-03-27 13:21:44 +08:00 |
|
Johnsonms
|
c531be455e
|
[jit_kernel] Add fused_qknorm_rope JIT kernel (#19059)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-03-27 13:21:28 +08:00 |
|
Mick
|
d7c4c57ace
|
[diffusion] refactor: move format-specific weight loading hooks (quant-related) to a dedicated file (#21366)
|
2026-03-27 09:58:49 +08:00 |
|
Liangsheng Yin
|
e1ee68d0fc
|
Release mm features on session close and support multiple /rerun-ut specs (#21501)
|
2026-03-26 18:31:29 -07:00 |
|
Aurick Qiao
|
c2b3e42ad6
|
Fix sessions with mm inputs (#21269)
|
2026-03-26 17:38:23 -07:00 |
|
Liangsheng Yin
|
8a4cdcd538
|
Simplify flush_cache: reject concurrent requests, remove client-side retry (#21490)
|
2026-03-26 16:31:04 -07:00 |
|
Liangsheng Yin
|
c580ddd19d
|
Fix benchmark generating empty prompts when random_input_len is small (#21492)
|
2026-03-26 16:24:35 -07:00 |
|
Baizhou Zhang
|
a93065679b
|
Revert "bugfix for weight loading for qwen3-next" (#21496)
|
2026-03-26 16:17:18 -07:00 |
|
SevenJ
|
2e65c27b29
|
Api add flush cache timeout (#21413)
Signed-off-by: root <wenjun7j@gmail.com>
|
2026-03-26 14:44:37 -07:00 |
|
Qiaolin Yu
|
8c3ccef2d9
|
Fix Kimi K2.5 dp attention+ spec decoding launch crash (#21391)
|
2026-03-26 14:40:26 -07:00 |
|
satyamk7054
|
be0cca5596
|
Use torch.addmm instead of separate mm and add_ calls for LoRA torch.native (#20562)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-03-26 14:35:20 -07:00 |
|
satyamk7054
|
e59ea4f6e9
|
fix: torch-native LoRA for multi-adapter case (#20564)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-03-26 14:34:16 -07:00 |
|
Liangsheng Yin
|
fb90c9d298
|
[Test] Consolidate eval accuracy test mixins into eval_accuracy_kit (#21047)
|
2026-03-26 14:26:46 -07:00 |
|