fzyzcjy
|
cc63c99f11
|
Enhance hook mechanism in dumper (#19073)
|
2026-02-22 16:13:38 +08:00 |
|
fzyzcjy
|
fdf80b5031
|
Extract framework plugins in dumper (#19072)
|
2026-02-22 16:10:43 +08:00 |
|
fzyzcjy
|
e32b5364a2
|
Auto annotate context in dumper (#19071)
|
2026-02-22 16:08:48 +08:00 |
|
fzyzcjy
|
8bc0751376
|
Support enabling partial non intrusive dump in dumper (#19069)
|
2026-02-22 16:07:45 +08:00 |
|
fzyzcjy
|
0384c459a7
|
Support non-intrusive dumping in dumper (#19068)
|
2026-02-22 16:04:02 +08:00 |
|
fzyzcjy
|
5eccc3cff9
|
Refactor dumper and change on_forward_pass_start API (#19065)
|
2026-02-22 16:03:27 +08:00 |
|
Shangming Cai
|
eccee4c48e
|
[PD] Change bootstrap_room metadata dtype from int64 to uint64 (#19141)
|
2026-02-22 14:20:16 +08:00 |
|
YAMY
|
5995bfec63
|
[Qwen3-Next] Enable fused_qkvzba_split_reshape_cat also for prefill (#18917)
|
2026-02-22 13:57:17 +08:00 |
|
Qiaolin Yu
|
8cf003c44b
|
Fix spec v2+dp attention in nsa backend (#19134)
|
2026-02-22 13:46:15 +08:00 |
|
Bhavneek Singh
|
8612358ee9
|
[BUG] [DLLM] Missing max_running_requests value (#18740)
|
2026-02-22 13:38:56 +08:00 |
|
fy1214
|
ae62898ffb
|
[diffusion] quant: adapt FP8 linear to sgld and support quant in flux (#17023)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-02-22 12:55:28 +08:00 |
|
YAMY
|
cef353f338
|
[Fix] Quick fix for int32 overflow in Mooncakes' send_kvcache_slice (#19076)
|
2026-02-22 12:00:33 +08:00 |
|
Liangsheng Yin
|
4653939cda
|
Revert "[jit kernel] Support per_token_group_quant_8bit jit kernel" (#19131)
|
2026-02-22 07:54:24 +08:00 |
|
Liangsheng Yin
|
1f2da824dd
|
[Benchmark] Remove re-exports from bench_serving.py (#19130)
|
2026-02-21 14:30:30 -08:00 |
|
Ratish P
|
f158869c2c
|
[Refactor] Benchmark Phase 1: extract utils and datasets from bench_serving (#19077)
Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>
|
2026-02-21 13:50:11 -08:00 |
|
Mohammad Miadh Angkad
|
7b0fb43c7a
|
[FlashInfer] Switch FlashInfer allreduce fusion to unified API (#18341)
|
2026-02-22 00:07:16 +08:00 |
|
Bi Xue
|
bf36aa4c31
|
[sgl] view could hold the memory too long and introduced large memory (#19109)
|
2026-02-21 23:40:56 +08:00 |
|
Xinyuan Tong
|
677b66af80
|
fix KimiK2Detector regex patterns with re.DOTALL (#19120)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-02-21 22:13:08 +08:00 |
|
Xinyuan Tong
|
4a362a0e04
|
fix tool handling in OpenAIServingChat (#18996)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-02-21 22:07:09 +08:00 |
|
Xiaoyu Zhang
|
66497ab0aa
|
[Diffusion] Restruct and clean Diffusion rotary embedding (#19064)
|
2026-02-21 21:41:47 +08:00 |
|
DarkSharpness
|
d8d0208c63
|
[Feature] rewrite rope kernel; remove flashinfer dependencies (#18844)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-21 21:32:40 +08:00 |
|
Mick
|
6503f94211
|
[diffusion] feat: support passing component path via server args (#19108)
|
2026-02-21 21:22:47 +08:00 |
|
Xinyuan Tong
|
cc451671b5
|
[FEAT] Add Anthropic compatible API endpoint (#18630)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-02-21 19:37:38 +08:00 |
|
Mick
|
b89ca65789
|
[diffusion] refactor: reduce redundancy and improve stage api (#19060)
|
2026-02-21 16:35:47 +08:00 |
|
danielafrimi
|
33c33a7de9
|
[Quantization] Support config.json quantization_config format, fix exclude_modules matching, and fix KV cache scale loading for Nemotron (#18546)
Signed-off-by: root <dafrimi@nvidia.com>
|
2026-02-21 16:14:29 +08:00 |
|
Nicolas Castet
|
51b3ed02ca
|
Fix bug in symm mem pre-allocation default (#19082)
|
2026-02-21 11:51:28 +08:00 |
|
Vladislav Nosivskoy
|
afd91e8782
|
[DSv32] Fix MTP and CP compatability (#19062)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-02-21 11:10:34 +08:00 |
|
Lianmin Zheng
|
463baafe10
|
[Auto Sync] Update batch_invariant_ops.py (20260221) (#19098)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>
|
2026-02-20 18:27:31 -08:00 |
|
Lianmin Zheng
|
2928dfb8fa
|
[Auto Sync] Update bench_one_batch_server_internal.py (20260221) (#19097)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2026-02-20 18:19:14 -08:00 |
|
Cheng Wan
|
84c67c8be0
|
Refactor graph input buffers (#18991)
|
2026-02-20 18:09:31 -08:00 |
|
Minglei Zhu
|
4bffd3a232
|
[GPT-OSS] support fp8 online quantization for gpt-oss bf16 (#18988)
merge it as all required CI passed
|
2026-02-20 14:16:57 -08:00 |
|
Qiaolin Yu
|
96bae2355e
|
Add generated-shared-prefix dataset in bench_one_batch (#18986)
|
2026-02-20 13:33:10 -08:00 |
|
0xNullPath
|
ab18734375
|
[feat] feat: support swa in trtllm_mha (#18970)
|
2026-02-21 01:39:29 +08:00 |
|
billishyahao
|
fbb6098487
|
[AMD] support two batch overlapping for mori ep (#17953)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com>
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-20 08:45:55 -08:00 |
|
Cheng Wan
|
38ee749dd9
|
Fix adjust_num_token_non_padded_for_attn_tp returning CPU tensor (#19051)
|
2026-02-20 23:23:38 +08:00 |
|
Nicolas Castet
|
3358ba8945
|
[Fix] Run FlashInfer autotune on non-default stream for NCCL 2.29+ compatibility (#18987)
|
2026-02-20 23:21:38 +08:00 |
|
DarkSharpness
|
52852404c8
|
[Fix] DO NOT skip save_kv_cache for dllm (#19020)
|
2026-02-20 23:20:29 +08:00 |
|
Mohammad Miadh Angkad
|
f23a23cc05
|
Fix NSA FP8 KV cache path for both-trtllm MHA one-shot (#18931)
Co-authored-by: rainj-me <96632942+rainj-me@users.noreply.github.com>
|
2026-02-20 22:00:09 +08:00 |
|
Mick
|
8d789b5c3d
|
[diffusion] feat: support nunchaku for Z-Image-Turbo and flux.1 (int4) (#18959)
|
2026-02-20 21:16:08 +08:00 |
|
Yuan Luo
|
7d953440ec
|
[jit kernel] Support per_token_group_quant_8bit jit kernel (#18905)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-20 21:01:05 +08:00 |
|
Mick
|
38a69652e6
|
[diffusion] logging: log available mem when each stage starts in debug level (#18998)
|
2026-02-20 19:57:06 +08:00 |
|
fzyzcjy
|
0d20cf5a66
|
Fix lint on main (#19054)
|
2026-02-20 15:45:24 +08:00 |
|
Cheng Wan
|
b59a22f781
|
fix lint on main (#19052)
|
2026-02-20 15:30:57 +08:00 |
|
Duyi-Wang
|
b0786cdf94
|
[AMD] Replace msgpack with msgspec in MORI-IO (#19007)
|
2026-02-19 23:04:15 -08:00 |
|
YAMY
|
8541b1118d
|
[Fix][Qwen3.5] Pass max_mamba_cache_size to mamba pool in disaggregation decode path (#19002)
|
2026-02-20 14:31:26 +08:00 |
|
chengshuang18
|
295bc17576
|
Feature/sdar support (#19044)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
|
2026-02-19 21:58:15 -08:00 |
|
fzyzcjy
|
046ef0aa35
|
Support using SGLang port in dumper (#19038)
|
2026-02-20 12:30:24 +08:00 |
|
fzyzcjy
|
2fecc2c075
|
Support resetting and enhance HTTP endpoints for dumper (#19046)
Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>
|
2026-02-20 12:29:09 +08:00 |
|
fzyzcjy
|
503bf3047a
|
Enhance configure and env parsing in dumper (#19034)
|
2026-02-20 12:28:10 +08:00 |
|
fzyzcjy
|
df995aab56
|
Support filtering labels in dumper (#19018)
|
2026-02-20 12:27:12 +08:00 |
|