Nicolas Castet
|
51b3ed02ca
|
Fix bug in symm mem pre-allocation default (#19082)
|
2026-02-21 11:51:28 +08:00 |
|
Vladislav Nosivskoy
|
afd91e8782
|
[DSv32] Fix MTP and CP compatability (#19062)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-02-21 11:10:34 +08:00 |
|
Lianmin Zheng
|
463baafe10
|
[Auto Sync] Update batch_invariant_ops.py (20260221) (#19098)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>
|
2026-02-20 18:27:31 -08:00 |
|
Lianmin Zheng
|
2928dfb8fa
|
[Auto Sync] Update bench_one_batch_server_internal.py (20260221) (#19097)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2026-02-20 18:19:14 -08:00 |
|
Cheng Wan
|
84c67c8be0
|
Refactor graph input buffers (#18991)
|
2026-02-20 18:09:31 -08:00 |
|
Minglei Zhu
|
4bffd3a232
|
[GPT-OSS] support fp8 online quantization for gpt-oss bf16 (#18988)
merge it as all required CI passed
|
2026-02-20 14:16:57 -08:00 |
|
Qiaolin Yu
|
96bae2355e
|
Add generated-shared-prefix dataset in bench_one_batch (#18986)
|
2026-02-20 13:33:10 -08:00 |
|
0xNullPath
|
ab18734375
|
[feat] feat: support swa in trtllm_mha (#18970)
|
2026-02-21 01:39:29 +08:00 |
|
billishyahao
|
fbb6098487
|
[AMD] support two batch overlapping for mori ep (#17953)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com>
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-02-20 08:45:55 -08:00 |
|
Cheng Wan
|
38ee749dd9
|
Fix adjust_num_token_non_padded_for_attn_tp returning CPU tensor (#19051)
|
2026-02-20 23:23:38 +08:00 |
|
Nicolas Castet
|
3358ba8945
|
[Fix] Run FlashInfer autotune on non-default stream for NCCL 2.29+ compatibility (#18987)
|
2026-02-20 23:21:38 +08:00 |
|
DarkSharpness
|
52852404c8
|
[Fix] DO NOT skip save_kv_cache for dllm (#19020)
|
2026-02-20 23:20:29 +08:00 |
|
Mohammad Miadh Angkad
|
f23a23cc05
|
Fix NSA FP8 KV cache path for both-trtllm MHA one-shot (#18931)
Co-authored-by: rainj-me <96632942+rainj-me@users.noreply.github.com>
|
2026-02-20 22:00:09 +08:00 |
|
Mick
|
8d789b5c3d
|
[diffusion] feat: support nunchaku for Z-Image-Turbo and flux.1 (int4) (#18959)
|
2026-02-20 21:16:08 +08:00 |
|
Yuan Luo
|
7d953440ec
|
[jit kernel] Support per_token_group_quant_8bit jit kernel (#18905)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-20 21:01:05 +08:00 |
|
Mick
|
38a69652e6
|
[diffusion] logging: log available mem when each stage starts in debug level (#18998)
|
2026-02-20 19:57:06 +08:00 |
|
fzyzcjy
|
0d20cf5a66
|
Fix lint on main (#19054)
|
2026-02-20 15:45:24 +08:00 |
|
Cheng Wan
|
b59a22f781
|
fix lint on main (#19052)
|
2026-02-20 15:30:57 +08:00 |
|
Duyi-Wang
|
b0786cdf94
|
[AMD] Replace msgpack with msgspec in MORI-IO (#19007)
|
2026-02-19 23:04:15 -08:00 |
|
YAMY
|
8541b1118d
|
[Fix][Qwen3.5] Pass max_mamba_cache_size to mamba pool in disaggregation decode path (#19002)
|
2026-02-20 14:31:26 +08:00 |
|
chengshuang18
|
295bc17576
|
Feature/sdar support (#19044)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
|
2026-02-19 21:58:15 -08:00 |
|
fzyzcjy
|
046ef0aa35
|
Support using SGLang port in dumper (#19038)
|
2026-02-20 12:30:24 +08:00 |
|
fzyzcjy
|
2fecc2c075
|
Support resetting and enhance HTTP endpoints for dumper (#19046)
Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>
|
2026-02-20 12:29:09 +08:00 |
|
fzyzcjy
|
503bf3047a
|
Enhance configure and env parsing in dumper (#19034)
|
2026-02-20 12:28:10 +08:00 |
|
fzyzcjy
|
df995aab56
|
Support filtering labels in dumper (#19018)
|
2026-02-20 12:27:12 +08:00 |
|
fzyzcjy
|
261bca3c58
|
Support captured dump output and console output control in dumper (#19017)
|
2026-02-20 12:26:24 +08:00 |
|
fzyzcjy
|
fc1500adc6
|
Hint users when wrongly execute it with partial ranks in dumper (#19014)
|
2026-02-20 12:25:54 +08:00 |
|
fzyzcjy
|
b41d412c3d
|
Support cleanup previous dumps in dumper (#19013)
|
2026-02-20 12:25:21 +08:00 |
|
Cheng Wan
|
13a4a0406e
|
Fix flashinfer autotune to only wrap run_once() (#19004)
|
2026-02-19 20:02:21 -08:00 |
|
Cheng Wan
|
64bca5315f
|
Fix long prompt KV allocation by falling back to torch native APIs when exceeding Triton tensor limit (#18250)
|
2026-02-19 19:15:05 -08:00 |
|
Nicolas Castet
|
99df920cdb
|
Register tensors with symmetric memory for qwen (#18643)
|
2026-02-20 09:32:32 +08:00 |
|
Cheng Wan
|
73a7f0d049
|
Revert "Add SDAR model support" (#19032)
|
2026-02-19 16:03:56 -08:00 |
|
Liangsheng Yin
|
db34c1cbfb
|
Tiny remove duplicate coredump env injection (#19023)
|
2026-02-19 13:26:30 -08:00 |
|
Liangsheng Yin
|
5ff5aa6923
|
[spec v2]Fix torch gc of future indices (#18958)
|
2026-02-19 11:38:25 -08:00 |
|
chengshuang18
|
44ab752b7a
|
Add SDAR model support (#18318)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
|
2026-02-19 11:20:32 -08:00 |
|
Mick
|
3207427d6d
|
[diffusion] CI: enable warmup as default (#19010)
|
2026-02-19 23:27:23 +08:00 |
|
Mick
|
d73f06f091
|
[diffusion] chore: improve memory usage on consumer-level GPU (#18997)
|
2026-02-19 21:59:49 +08:00 |
|
satyamk7054
|
963def7f26
|
Move lora request validation to tokenizer_manager from server (#18962)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-02-19 21:03:19 +08:00 |
|
Makcum888e
|
d07e8aa4a3
|
[Diffusion] [NPU] Enable profiler on NPU (#17807)
|
2026-02-19 15:33:51 +03:00 |
|
Prozac614
|
e21fc78dbd
|
[diffusion] fix: fix rank used in parallel executor when enable_cfg_parallel is false (#18975)
Co-authored-by: daiweitao <dwti614707404@163.com>
|
2026-02-19 20:12:24 +08:00 |
|
Xiaoyu Zhang
|
19aa19b111
|
[diffusion] refactor: refactor diffusion triton kernels (#18966)
|
2026-02-19 17:03:44 +08:00 |
|
pansicheng
|
48642d5384
|
[RadixTree][4/N Refactor]: Move available_and_evictable_str to individual radix cache classes (#17852)
|
2026-02-19 17:03:15 +08:00 |
|
shaharmor98
|
82a0bafc1c
|
Feat/add fi selective state update kernel call (#18070)
Signed-off-by: Shahar Mor <smor@nvidia.com>
|
2026-02-19 16:56:06 +08:00 |
|
Yuwei An
|
0be30d4b0d
|
Fix PCG MoE Error (#17739)
|
2026-02-19 16:48:06 +08:00 |
|
hlu1
|
bba2fc49a1
|
[Qwen3.5] Enable nvfp4 checkpoint (#18937)
|
2026-02-19 12:24:05 +08:00 |
|
hxie
|
443b1a88d1
|
Add batched zero copy to NIXL backend (#18850)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2026-02-18 16:31:02 -08:00 |
|
Bingxu Chen
|
462267982b
|
[AMD] Fix mi35x dsv32 mtp nightly (#18978)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
|
2026-02-18 16:23:17 -08:00 |
|
Alison Shao
|
e2fccb2ee0
|
Fix flaky Qwen3-Next KL divergence tests by reverting mamba slot release (#18910)
|
2026-02-19 07:55:16 +08:00 |
|
Mengyang Liu
|
4f980f6f23
|
[Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2026-02-18 11:24:07 -08:00 |
|
Tamir Baydasov
|
150ed881be
|
[4/N] Quantization Refactor: Quark MoE schemes (#18252)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Peng Zhang <aniz1905@gmail.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-02-18 19:44:30 +03:00 |
|