Junhao Liu
|
051427c0a3
|
[diffusion] benchmark: add SLO metric forinbench_serving (#18907)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-03-08 22:35:57 +08:00 |
|
liubiyongge
|
cc73355a1f
|
[Feature] Add SLRU eviction policy & fix RadixCache hit_count bug (#18843)
Co-authored-by: zhangheng <hzh0425@apache.org>
|
2026-03-08 21:30:55 +08:00 |
|
Mick
|
2c183350be
|
[diffusion] fix: fix wrong dit config for qwen-image-edit-plus-2511 (#20123)
|
2026-03-08 20:08:36 +08:00 |
|
Ratish P
|
ab9de886c5
|
[diffusion] reduce LayerwiseOffloadManager reserved GPU memory (#20042)
|
2026-03-08 19:26:17 +08:00 |
|
Liangsheng Yin
|
29f3a5396e
|
[Minor] Add SessionSlot.is_holding_kv property for readability (#20120)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-08 03:25:13 -07:00 |
|
Liangsheng Yin
|
36b557d2c9
|
Fix streaming session with paged KV cache (SWA/MLA) (#20070)
Co-authored-by: Yilong Zhao <74357408+happierpig@users.noreply.github.com>
Co-authored-by: Aurick Qiao <6137920+aurickq@users.noreply.github.com>
|
2026-03-08 03:00:32 -07:00 |
|
yuyu5333
|
230fb55899
|
[Performance] Decode Offload improves the long texts performance 100% through dynamic block offload. (#17216)
Co-authored-by: zhangheng <hzh0425@apache.org>
|
2026-03-08 17:16:53 +08:00 |
|
Yuan Luo
|
97a2a9be0f
|
[VLM] Replace conv3d proj with linear for GLM4V (#20033)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-07 22:50:47 -08:00 |
|
Fan Lin
|
7fb282a96f
|
[diffusion] fix: fix bug of copy_if (#20094)
Co-authored-by: Yihan Chen <yingluosanqian@gmail.com>
|
2026-03-08 14:27:58 +08:00 |
|
xingsy97
|
7f9f85d4c8
|
[diffusion] feat: make QwenImageLayered resolution configurable (#20044)
|
2026-03-08 14:26:05 +08:00 |
|
Lancer
|
a73369c39f
|
[diffusion] chore: ensure CFG Zero Star numerical stability for Helios model (#20091)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
|
2026-03-08 14:25:14 +08:00 |
|
shuwenn
|
72f6dfcc31
|
fix: add ModelScope cache lookup and speculative path support (#20098)
|
2026-03-07 22:23:16 -08:00 |
|
Liangsheng Yin
|
d02c515ee8
|
Decouple scheduler log printing from metrics collection (#20107)
|
2026-03-07 22:09:10 -08:00 |
|
Baizhou Zhang
|
d28f35240a
|
[V32/GLM5] Change default setting of V32 nvfp4 on TP4 (#20086)
|
2026-03-07 15:13:25 -08:00 |
|
Alison Shao
|
0f62da6953
|
[CI] Show test partition assignments after checkout (#20085)
Co-authored-by: Alison Shao <alisonshao@mac.lan>
|
2026-03-07 13:50:49 -08:00 |
|
VDV1985
|
45bd30e29d
|
[NPU] make torch_native lora backend a little bit faster (#17228)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Egor Filimonov <44640852+ssshinigami@users.noreply.github.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-03-07 20:14:46 +03:00 |
|
Ke Bao
|
5867c3fa80
|
Support HiCache for MambaRadixCache (#19663)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-03-08 00:36:25 +08:00 |
|
Bingxu Chen
|
17721b00fd
|
[AMD] Fix Tensor Memory Aliasing (#19928)
|
2026-03-07 08:06:10 -08:00 |
|
Yuan Luo
|
7da590d4d0
|
[Qwen3.5] Support Qwen3.5 Pipeline Parallelism (#19670)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-07 23:34:08 +08:00 |
|
YeChang Guo
|
13bdc7bf4a
|
[Feature][NPU]: add runtime support for AutoRound quantized models (#16699)
Co-authored-by: root <root@localhost.localdomain>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-03-07 18:03:55 +03:00 |
|
Артем Савкин
|
5297b02c88
|
[Diffusion] [NPU] Wan2.2-T2V-A14B-Diffusers modelslim quantization support (#17996)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-03-07 17:26:44 +03:00 |
|
xingsy97
|
f8d4eb7022
|
[Docs] Add docstrings to JIT kernel include headers (#19770)
|
2026-03-07 20:48:00 +08:00 |
|
Ratish P
|
ef6540b439
|
[diffusion]: add width/height passthrough for OpenAI image API (#19970)
|
2026-03-07 20:43:46 +08:00 |
|
David Wang
|
19c51fe2fa
|
fix(rope): restore K writeback in fused rope + kv store kernel (#19636)
|
2026-03-07 20:41:35 +08:00 |
|
Fan Yin
|
43d6a32045
|
[sgl-kernel] rebase FlashMLA 0217 (#18902)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-07 00:30:52 -08:00 |
|
danielafrimi
|
f8bbf56de7
|
Refactor NemotronHConfig to canonical layers_block_type and add MTP block-type support (#19950)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
|
2026-03-06 23:22:03 -08:00 |
|
Lancer
|
b91fb8393e
|
[diffusion] fix: fix multi-prompt generation and support multiple prompts in cli (#19960)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
|
2026-03-07 13:01:59 +08:00 |
|
Eitan Turok
|
31e93e4486
|
[diffusion] fix: fix TeaCache silently fails with --enable-teacache (#19964)
|
2026-03-07 13:00:11 +08:00 |
|
Qiaolin Yu
|
925185f9ec
|
Fix flashinfer backend with pcg (#20061)
|
2026-03-06 20:01:43 -08:00 |
|
Feng Su
|
8a411a9a2a
|
[Tracing] Remove the deprecated tracing code from mini_lb (#19409)
|
2026-03-07 11:19:23 +08:00 |
|
Mohammad Miadh Angkad
|
f88acf8780
|
[JIT Kernel] Reland NVFP4 kernels to JIT (#20012)
|
2026-03-07 10:31:08 +08:00 |
|
Yilong Zhao
|
6ffc74efd7
|
[Metrics] Add overlap bubble timing, full KV usage gauge, and prefill cuda graph tracking (#19982)
|
2026-03-06 17:41:27 -08:00 |
|
shubham singhal
|
a0d085c16d
|
Adding correct path for module not found error while collecting test (#19778)
Co-authored-by: sys-lpot-val <sys_lpot_val@intel.com>
|
2026-03-06 16:26:16 -08:00 |
|
R0CKSTAR
|
e818f8219a
|
Fix none-comparison (E711) warnings (#19745)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
|
2026-03-06 16:15:21 -08:00 |
|
R0CKSTAR
|
0c4f98ed4e
|
[diffusion] hardware: add set_musa_arch on MUSA (misc, 15/N) (#19381)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-03-06 16:14:41 -08:00 |
|
MARATRIX
|
069d4c577b
|
Fix Kimi K2.5 PP layer range exposure for PD disaggregation (#19959)
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
|
2026-03-06 16:14:02 -08:00 |
|
Liangsheng Yin
|
ddcecdea49
|
[Core] Unify max_num_reqs dp_size division for pool sizing (#20063)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-06 16:12:59 -08:00 |
|
Kangyan-Zhou
|
7a12255b6e
|
fix: set first_token_time before computing decode_throughput for single-batch completions (#19984)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-06 16:11:41 -08:00 |
|
Aurick Qiao
|
5c8e28698c
|
Add cleanup for _ATTN_TP in parallel_state.py (#19978)
|
2026-03-06 15:43:31 -08:00 |
|
Shu Wang
|
61de303f0a
|
Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe (#19189)
|
2026-03-06 15:15:04 -08:00 |
|
Kangyan-Zhou
|
e89069ee64
|
Fallback to torch.cuda.mem_get_info() when nvidia-smi is unavailable (#18957)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-06 15:00:08 -08:00 |
|
Liangsheng Yin
|
604db4471d
|
[Core] Clarify memory variable naming in model runner (#20060)
|
2026-03-06 14:00:46 -08:00 |
|
Liangsheng Yin
|
7a6cf0e9ba
|
[Core] Extract _calculate_mamba_ratio and _init_pools from init_memory_pool (#20058)
|
2026-03-06 13:37:22 -08:00 |
|
Mohammad Miadh Angkad
|
759700c808
|
Fix SM120 triton_kernels MXFP4 block_k for GPT-OSS (#20040)
|
2026-03-06 10:53:08 -08:00 |
|
R0CKSTAR
|
de1a0afcbc
|
[MUSA][10/N] Add GGUF support (#18357)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-03-06 10:50:35 -08:00 |
|
JohnHerry
|
e8f2b80340
|
[diffusion] improve: improve code readability of DenoisingStage (#20003)
|
2026-03-06 23:23:44 +08:00 |
|
xingsy97
|
54634b9a40
|
[Kernel] Dispatch exp/sin/cos through dtype_trait (#19798)
|
2026-03-06 22:57:52 +08:00 |
|
Johnsonms
|
2d266c73ea
|
Migrate renorm kernels from sgl-kernel to FlashInfer JIT (#18854)
|
2026-03-06 22:53:28 +08:00 |
|
Xiaoyu Zhang
|
6d22c9f369
|
[Diffusion] Move hf kernels diffusion cuda kernels skills to SGLD (#20001)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-06 22:16:06 +08:00 |
|
Yuan Luo
|
f7de9375ac
|
[GDN][Qwen3-Next][Qwen3.5] Fuse fused_gdn_gating and fused_recurrent_gated_delta_rule_update in verify_target (#19775)
|
2026-03-06 21:42:44 +08:00 |
|