Shu Wang
|
efebcab43e
|
Support skip-softmax attention (#19089)
|
2026-03-28 15:55:48 -07:00 |
|
Xinyuan Tong
|
ced69c9f84
|
feat: enable CUDA graph and timestamp for the whisper model(#21190)
|
2026-03-29 01:46:03 +08:00 |
|
Yuhao Yang
|
57cf4790ca
|
[VLM] Optimize ShmPointerMMData for multi-pickle safety and deferred unwrap (#21465)
|
2026-03-28 23:11:12 +08:00 |
|
Mick
|
fc9de157f9
|
[diffusion] feat: support overlay model materialization (#21600)
|
2026-03-28 23:02:38 +08:00 |
|
Aditya Sharma
|
627e162335
|
[diffusion] fix: fix Flux2-Klein prompt tokenization length to 512 and add regression coverage (#21407)
|
2026-03-28 17:28:02 +08:00 |
|
Baizhou Zhang
|
edd4d54023
|
[Clean] Remove deprecated environs (#21536)
|
2026-03-28 00:35:44 -07:00 |
|
Liangsheng Yin
|
402628e560
|
Patch transformers is_base_mistral in CI to avoid HF 429 rate limiting (#21586)
|
2026-03-27 22:19:36 -07:00 |
|
Jianying
|
daf02bde33
|
Fix Piecewise CUDA Graph crash with -enable-mixed-chunk (#20441)
Co-authored-by: jianyingzhu <joeyzhu@nvidia.com>
|
2026-03-27 21:56:21 -07:00 |
|
Liangsheng Yin
|
19b1f75186
|
Fix HFRunner hang when subprocess dies during init (#21582)
|
2026-03-27 21:22:42 -07:00 |
|
Yuhao Yang
|
5ef56682b8
|
reduce CPU peak memory in multimodal tensor hashing (#21123)
|
2026-03-28 11:09:16 +08:00 |
|
Fengyuan Yu
|
9fa7b974fd
|
[diffusion] chore: remove redundant identity preprocess_text functions(#20633)
Co-authored-by: Fengyuan Yu <15fengyuan@gmail.com>
|
2026-03-28 10:07:30 +08:00 |
|
Eitan Turok
|
e570ca96f6
|
[diffusion] refactor: Unify TeaCacheParams and WanTeaCacheParams (#20706)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-28 09:51:44 +08:00 |
|
Mick
|
f0c68fbefd
|
[diffusion] UX: aggregate expected dtype-cast logs during weight loading (#21552)
|
2026-03-28 09:50:40 +08:00 |
|
Trevor Morris
|
7160b6cb76
|
[NVIDIA] Enable automatic NUMA configuration (#19452)
|
2026-03-27 18:44:13 -07:00 |
|
Vladislav Nosivskoy
|
c37200f5e4
|
Scope streaming backlog coalescing to incremental_streaming_output mode (#21037)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2026-03-27 17:29:54 -07:00 |
|
Qiaolin Yu
|
a27651d5e0
|
Remove sync when enabling return_logprob (#20972)
|
2026-03-27 16:36:28 -07:00 |
|
Ethan (Yusheng) Su
|
6d48719e31
|
[1/n] lora support - Auto detect lora target modules (#21439)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-27 16:08:36 -07:00 |
|
narutolhy
|
9b29131961
|
fix tp capture in vit cuda graph (#17255)
|
2026-03-27 22:38:18 +00:00 |
|
Muqi Li
|
38ad251738
|
feat: add gc_threshold arg (#21481)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-27 13:42:46 -07:00 |
|
huangtingwei
|
d864622a68
|
[Hicache & JIT_kernel] Support page first layout & mla jit kernel (#18311)
|
2026-03-27 08:54:36 -07:00 |
|
Bi Xue
|
30397e0a1e
|
[rl][sgl] fix tensor mismatch after pause (#21514)
|
2026-03-27 23:02:30 +08:00 |
|
yang1002378395-cmyk
|
279e7738c5
|
[diffusion] fix: return None instead of raising RuntimeError when no model info found (#21319)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-27 22:42:39 +08:00 |
|
Xiaoyu Zhang
|
9238bd08a2
|
[CI] Register missing jit_kernel test files (#21547)
|
2026-03-27 19:39:08 +08:00 |
|
yang1002378395-cmyk
|
f83b1b73a8
|
[diffusion] feat: add --strict-ports option for predictable port assignment (#21320)
Co-authored-by: 阳虎 <yanghu@yanghudeMacBook-Pro.local>
|
2026-03-27 16:40:50 +08:00 |
|
zwang86
|
5fc5c18bed
|
fix(security): replace unsafe pickle.loads with SafeUnpickler for CVE-2026-3989 (#20904)
|
2026-03-27 00:43:41 -07:00 |
|
Khoa Pham
|
8d4fca5908
|
[Security] 1/N: Bind ZMQ sockets to localhost to prevent unauthenticated remote access (#21435)
|
2026-03-26 23:33:49 -07:00 |
|
Xiaoyu Zhang
|
d633ab7349
|
[Diffusion] Add qknorm rope fuse kernel (#21440)
|
2026-03-27 14:27:08 +08:00 |
|
Xiaoyu Zhang
|
e8d46f145c
|
Opt jit qknorm_across_heads cuda kernel (#21503)
|
2026-03-27 13:30:46 +08:00 |
|
Johnsonms
|
8a56a7b04d
|
[jit_kernel] Migrate cast (downcast_fp8) from sgl-kernel AOT to JIT (#19103)
|
2026-03-27 13:21:44 +08:00 |
|
Johnsonms
|
c531be455e
|
[jit_kernel] Add fused_qknorm_rope JIT kernel (#19059)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-03-27 13:21:28 +08:00 |
|
Mick
|
d7c4c57ace
|
[diffusion] refactor: move format-specific weight loading hooks (quant-related) to a dedicated file (#21366)
|
2026-03-27 09:58:49 +08:00 |
|
Liangsheng Yin
|
e1ee68d0fc
|
Release mm features on session close and support multiple /rerun-ut specs (#21501)
|
2026-03-26 18:31:29 -07:00 |
|
Aurick Qiao
|
c2b3e42ad6
|
Fix sessions with mm inputs (#21269)
|
2026-03-26 17:38:23 -07:00 |
|
Liangsheng Yin
|
8a4cdcd538
|
Simplify flush_cache: reject concurrent requests, remove client-side retry (#21490)
|
2026-03-26 16:31:04 -07:00 |
|
Liangsheng Yin
|
c580ddd19d
|
Fix benchmark generating empty prompts when random_input_len is small (#21492)
|
2026-03-26 16:24:35 -07:00 |
|
Baizhou Zhang
|
a93065679b
|
Revert "bugfix for weight loading for qwen3-next" (#21496)
|
2026-03-26 16:17:18 -07:00 |
|
SevenJ
|
2e65c27b29
|
Api add flush cache timeout (#21413)
Signed-off-by: root <wenjun7j@gmail.com>
|
2026-03-26 14:44:37 -07:00 |
|
Qiaolin Yu
|
8c3ccef2d9
|
Fix Kimi K2.5 dp attention+ spec decoding launch crash (#21391)
|
2026-03-26 14:40:26 -07:00 |
|
satyamk7054
|
be0cca5596
|
Use torch.addmm instead of separate mm and add_ calls for LoRA torch.native (#20562)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-03-26 14:35:20 -07:00 |
|
satyamk7054
|
e59ea4f6e9
|
fix: torch-native LoRA for multi-adapter case (#20564)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-03-26 14:34:16 -07:00 |
|
Liangsheng Yin
|
fb90c9d298
|
[Test] Consolidate eval accuracy test mixins into eval_accuracy_kit (#21047)
|
2026-03-26 14:26:46 -07:00 |
|
Liangsheng Yin
|
e5b7650353
|
Fix UnboundLocalError when DetokenizerManager constructor fails (#21471)
|
2026-03-26 13:00:16 -07:00 |
|
Ho-Ren (Jack) Chuang
|
4b5f63e1b8
|
FIX: (NSA) Compute topk_indices_offset when NSA prefill flashmla_sparse is used with FP8 KV cache (#20606)
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
|
2026-03-26 12:50:50 -07:00 |
|
jianzhao-xu
|
3867c6431a
|
Fix bug in dbrx model (#21445)
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
|
2026-03-26 11:23:30 -07:00 |
|
shuwenn
|
646573e4e8
|
fix: use get_rope_config() to support models without rope_parameters (#21135)
|
2026-03-26 11:22:12 -07:00 |
|
McZyWu
|
0906e45cec
|
bugfix for weight loading for qwen3-next (#21313)
|
2026-03-26 21:21:00 +08:00 |
|
Mick
|
35720d9969
|
[diffusion] fix: fix qwen-image with nunchaku (#21415)
|
2026-03-26 16:31:44 +08:00 |
|
Anant Sharma
|
f289d173aa
|
[Deps] Bump xgrammar to 0.1.32 (#21032)
|
2026-03-26 01:22:37 -07:00 |
|
Chen, Zhentao
|
fd535942ac
|
[AMD]Integrate aiter's fused_topk for softmax scoring in topk function (#21421)
Co-authored-by: Chen, Todd <zhenchen@amd.com>
|
2026-03-26 00:57:56 -07:00 |
|
R0CKSTAR
|
a305964159
|
[MLX] Add native MLX execution backend for Apple Silicon Mac (#20342)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-03-26 00:09:17 -07:00 |
|