Makcum888e
|
5f81ec1ad5
|
[Diffusion] Fix get model name when model local path end with "/" (#18918)
|
2026-02-17 13:19:54 +03:00 |
|
Ratish P
|
f6cc02489f
|
[diffusion]: fix sparse video gen 2 backend being applied to cross-attention (#18900)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-02-17 13:17:46 +03:00 |
|
HAI
|
b158f5d4a2
|
Revert "[AMD] Fix RotaryEmbedding crash on AMD/ROCm (regression from #17934)" (#18922)
|
2026-02-17 01:07:50 -08:00 |
|
billishyahao
|
899e2be7d0
|
[TBO] fix cuda graph intermittently becomes disabled bug (#18320)
|
2026-02-16 22:18:57 -08:00 |
|
Michael
|
5e3103a787
|
[AMD] Fix RotaryEmbedding crash on AMD/ROCm (regression from #17934) (#18903)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
|
2026-02-17 12:59:40 +08:00 |
|
Mohammad Miadh Angkad
|
90a0d66e1e
|
[Tiny] Fix assert syntax warning in compressed_tensors_w4a4_mxint4_moe.py (#18899)
|
2026-02-17 12:54:30 +08:00 |
|
Yilong Zhao
|
d5307ce022
|
[misc] adding metadata field in UpdateWeightFromDiskReqInput (#18821)
|
2026-02-17 12:14:15 +08:00 |
|
triple-mu
|
26b2c63d03
|
[diffusion] operator: unify rotary embedding impl (#18164)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-17 12:02:48 +08:00 |
|
pansicheng
|
b21390f8f3
|
Adapt the Qwen2Model._update_causal_mask for transformers==4.57.1 (#18774)
|
2026-02-17 10:20:41 +08:00 |
|
Ratish P
|
50ca24aebb
|
[diffusion]: fix scheduler crash on ZMQ messages with unexpected frame counts (#17890)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-02-17 09:45:05 +08:00 |
|
Frank Minors
|
1b659bcb08
|
Fix GLM-5 fused shared expert (#18804)
Co-authored-by: FrankMinions <liuchen@shinemo.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
|
2026-02-16 19:50:39 +00:00 |
|
danielafrimi
|
0ff24159a5
|
Fix modelopt FP8 create weights (#18447)
Signed-off-by: root <dafrimi@nvidia.com>
|
2026-02-17 00:59:50 +08:00 |
|
Tamir Baydasov
|
eba6af385d
|
[2/N] Quantization Refactor: Compressed tensors MoE schemes (#17503)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
Co-authored-by: Peng Zhang <aniz1905@gmail.com>
|
2026-02-16 18:03:51 +03:00 |
|
Estrella-xx
|
1b3513a7e4
|
refactor FAKE transfer backend and remove --disaggregation-decode-enable-fake-auto parameter (#18345)
|
2026-02-16 17:27:02 +03:00 |
|
Ratish P
|
c1d1337afc
|
[diffusion][Wan]: fix sparse attention backends being applied to cross-attention (#17596)
|
2026-02-16 21:57:58 +08:00 |
|
Mohammad Miadh Angkad
|
b86c6491fa
|
[Perf] ~9.5x faster Blackwell MXFP4 MoE weight loading (#18858)
|
2026-02-16 19:47:09 +08:00 |
|
Shivam jindal
|
4f0409f8aa
|
[Model] Add Qwen3ForRewardModel and fix Qwen3ForSequenceClassification (#17992)
Co-authored-by: yes-its-shivam <yes-its-shivam@users.noreply.github.com>
|
2026-02-16 19:44:41 +08:00 |
|
Mick
|
de833f9e8e
|
Revert "[diffusion]: Improve layerwise offload buffer reuse and shared-storage handling" (#18866)
|
2026-02-16 18:00:58 +08:00 |
|
Mick
|
d0c94e136a
|
[diffusion] logging: improve peak vram logging (#18865)
|
2026-02-16 16:44:37 +08:00 |
|
Yi Zhong
|
ed22720c07
|
[JIT kernel] hd=512,1024 in JIT QK norm (cta based) (#17515)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
|
2026-02-16 16:07:24 +08:00 |
|
Alison Shao
|
206accd15d
|
Fix GLM-4V processor registration when glm_ocr is unavailable (#18885)
|
2026-02-16 16:02:31 +08:00 |
|
Changyi Yang
|
61da34ad0b
|
[diffusion] fix: fix LoRA weight snapshot aliasing in unmerge logic (#18883)
|
2026-02-16 15:39:45 +08:00 |
|
Alison Shao
|
86c181e335
|
Fix test_lora_qwen3 nightly failure: replace adapter with added_tokens (#18884)
|
2026-02-16 14:35:06 +08:00 |
|
Douglas Yang
|
f1efb46bdd
|
fix: adding performance logging for nightly diffusion (#18023)
|
2026-02-16 14:09:00 +08:00 |
|
fzyzcjy
|
f554b3c27b
|
Support dumping gradients, parameters, lazy values (#18881)
Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>
|
2026-02-16 13:34:06 +08:00 |
|
fzyzcjy
|
9a7d8d5eb0
|
Collect upper level metadata to dump output (#18880)
|
2026-02-16 13:31:19 +08:00 |
|
fzyzcjy
|
949792d0c6
|
Change dump output format to dict with value and metadata (#18879)
|
2026-02-16 13:30:47 +08:00 |
|
fzyzcjy
|
02816abc0d
|
Flip dumper to disable by default and refactor environment handling (#18878)
|
2026-02-16 13:29:32 +08:00 |
|
Duyi-Wang
|
5ddc84e33e
|
[AMD] MORI-EP inter kernel type switch (#18437)
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-15 20:59:39 -08:00 |
|
Johnsonms
|
bc79a64d3a
|
[Diff]: support SGLANG_TORCH_PROFILER_DIR environment variable for profiler log directory (#18454)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-16 12:47:29 +08:00 |
|
Mick
|
0af9dcc407
|
[diffusion] refactor: refactor server_args adjust and validate logics (#18863)
|
2026-02-16 11:49:06 +08:00 |
|
Mick
|
78b4c9e248
|
[diffusion] fix: avoid saving output for warmup requests (#18867)
|
2026-02-16 11:48:28 +08:00 |
|
Yuan Luo
|
8a82c70297
|
[VLM] Optimize Ernie4.5-VL rotary embedding with fused triton kernel (#18856)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-16 11:19:44 +08:00 |
|
Rain Jiang
|
0ffd0a3995
|
Nsa trtllm mla sparse fp8 support with Deepseek v3.2 NVFP4 (#18389)
|
2026-02-16 09:29:54 +08:00 |
|
Mike Qiu
|
b79808bee2
|
Fix libnuma.so does not exsit (#15355)
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2026-02-16 00:37:50 +08:00 |
|
akhilg-nv
|
48eac1b62d
|
Improve profiler options for bench_serving (#16991)
|
2026-02-16 00:36:01 +08:00 |
|
Chanh Nguyen
|
597d17dd18
|
Use ephemeral nccl port via get_free_port() (#18009)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
|
2026-02-16 00:32:47 +08:00 |
|
tjp_zju
|
7a607c4900
|
fix_get_quant_method_in_fused_moe_condition (#18459)
Signed-off-by: tom-zju <tanjianpingzju1990@gmail.com>
Co-authored-by: Peng Zhang <aniz1905@gmail.com>
|
2026-02-16 00:31:42 +08:00 |
|
WiwilZ
|
b2f74d660a
|
fix: add SM110 (Jetson AGX Thor) to Blackwell capability check (#18787)
|
2026-02-16 00:26:58 +08:00 |
|
blake-snc
|
57f7e06cb9
|
fix: update Blackwell log/error messages to include SM12x (#18751)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-16 00:23:51 +08:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
Ratish P
|
ddfe147377
|
[diffusion]: Improve layerwise offload buffer reuse and shared-storage handling (#18611)
|
2026-02-15 22:17:51 +08:00 |
|
Mick
|
3feb48139e
|
[diffusion] quant: add support for svdquant and nunchaku (#18549)
Co-authored-by: AichenF <aichenf@nvidia.com>
Co-authored-by: jianyingzhu <53300651@qq.com>
|
2026-02-15 20:43:00 +08:00 |
|
Michael
|
88010e9601
|
[AMD] Fix nightly 1-GPU test failures and bench_serving regression (#18761)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
|
2026-02-15 20:36:47 +08:00 |
|
fzyzcjy
|
4c7f986c6b
|
Extract dumper and prefill delayer tests common utils (#18857)
|
2026-02-15 18:33:23 +08:00 |
|
haowen-han
|
b992828ad2
|
fix: fix bug on kimi2.5 with dp2 and tp4 (#18604)
Co-authored-by: hanhaowen <hanhaowen@baidu.com>
|
2026-02-15 16:32:13 +08:00 |
|
Ratish P
|
274bf6607a
|
[diffusion] fix: enable torch.compile for UlyssesAttention (#18840)
|
2026-02-15 15:54:27 +08:00 |
|
zhangxiaolei123456
|
ad1bdb93df
|
perf: add minimax-2.5 fused_moe tuning config for h20 (#18833)
|
2026-02-15 15:46:56 +08:00 |
|
jackey hua
|
922fbc21e2
|
[Perf] Tune MiniMax M2 fused moe kernel on H100 GPU (#18851)
|
2026-02-15 15:30:52 +08:00 |
|
andyluo7
|
944a9f6fcf
|
Fix/qwen3 5 amd rope cutedsl fallback (#18753)
Co-authored-by: seungrokj <seungrok.jung@amd.com>
|
2026-02-14 22:09:44 -08:00 |
|