Commit Graph

6403 Commits

Author SHA1 Message Date
Cheng Wan
38ee749dd9 Fix adjust_num_token_non_padded_for_attn_tp returning CPU tensor (#19051) 2026-02-20 23:23:38 +08:00
Nicolas Castet
3358ba8945 [Fix] Run FlashInfer autotune on non-default stream for NCCL 2.29+ compatibility (#18987) 2026-02-20 23:21:38 +08:00
DarkSharpness
52852404c8 [Fix] DO NOT skip save_kv_cache for dllm (#19020) 2026-02-20 23:20:29 +08:00
Mohammad Miadh Angkad
f23a23cc05 Fix NSA FP8 KV cache path for both-trtllm MHA one-shot (#18931)
Co-authored-by: rainj-me <96632942+rainj-me@users.noreply.github.com>
2026-02-20 22:00:09 +08:00
Mick
8d789b5c3d [diffusion] feat: support nunchaku for Z-Image-Turbo and flux.1 (int4) (#18959) 2026-02-20 21:16:08 +08:00
Yuan Luo
7d953440ec [jit kernel] Support per_token_group_quant_8bit jit kernel (#18905)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-02-20 21:01:05 +08:00
Mick
38a69652e6 [diffusion] logging: log available mem when each stage starts in debug level (#18998) 2026-02-20 19:57:06 +08:00
fzyzcjy
0d20cf5a66 Fix lint on main (#19054) 2026-02-20 15:45:24 +08:00
Cheng Wan
b59a22f781 fix lint on main (#19052) 2026-02-20 15:30:57 +08:00
Duyi-Wang
b0786cdf94 [AMD] Replace msgpack with msgspec in MORI-IO (#19007) 2026-02-19 23:04:15 -08:00
YAMY
8541b1118d [Fix][Qwen3.5] Pass max_mamba_cache_size to mamba pool in disaggregation decode path (#19002) 2026-02-20 14:31:26 +08:00
chengshuang18
295bc17576 Feature/sdar support (#19044)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
2026-02-19 21:58:15 -08:00
fzyzcjy
046ef0aa35 Support using SGLang port in dumper (#19038) 2026-02-20 12:30:24 +08:00
fzyzcjy
2fecc2c075 Support resetting and enhance HTTP endpoints for dumper (#19046)
Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>
2026-02-20 12:29:09 +08:00
fzyzcjy
503bf3047a Enhance configure and env parsing in dumper (#19034) 2026-02-20 12:28:10 +08:00
fzyzcjy
df995aab56 Support filtering labels in dumper (#19018) 2026-02-20 12:27:12 +08:00
fzyzcjy
261bca3c58 Support captured dump output and console output control in dumper (#19017) 2026-02-20 12:26:24 +08:00
fzyzcjy
fc1500adc6 Hint users when wrongly execute it with partial ranks in dumper (#19014) 2026-02-20 12:25:54 +08:00
fzyzcjy
b41d412c3d Support cleanup previous dumps in dumper (#19013) 2026-02-20 12:25:21 +08:00
Cheng Wan
13a4a0406e Fix flashinfer autotune to only wrap run_once() (#19004) 2026-02-19 20:02:21 -08:00
Cheng Wan
64bca5315f Fix long prompt KV allocation by falling back to torch native APIs when exceeding Triton tensor limit (#18250) 2026-02-19 19:15:05 -08:00
Nicolas Castet
99df920cdb Register tensors with symmetric memory for qwen (#18643) 2026-02-20 09:32:32 +08:00
Cheng Wan
73a7f0d049 Revert "Add SDAR model support" (#19032) 2026-02-19 16:03:56 -08:00
Liangsheng Yin
db34c1cbfb Tiny remove duplicate coredump env injection (#19023) 2026-02-19 13:26:30 -08:00
Liangsheng Yin
5ff5aa6923 [spec v2]Fix torch gc of future indices (#18958) 2026-02-19 11:38:25 -08:00
chengshuang18
44ab752b7a Add SDAR model support (#18318)
Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn>
Co-authored-by: chengshuang <chengshuang@pjlab.org.cn>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
2026-02-19 11:20:32 -08:00
Mick
3207427d6d [diffusion] CI: enable warmup as default (#19010) 2026-02-19 23:27:23 +08:00
Mick
d73f06f091 [diffusion] chore: improve memory usage on consumer-level GPU (#18997) 2026-02-19 21:59:49 +08:00
satyamk7054
963def7f26 Move lora request validation to tokenizer_manager from server (#18962)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
2026-02-19 21:03:19 +08:00
Makcum888e
d07e8aa4a3 [Diffusion] [NPU] Enable profiler on NPU (#17807) 2026-02-19 15:33:51 +03:00
Prozac614
e21fc78dbd [diffusion] fix: fix rank used in parallel executor when enable_cfg_parallel is false (#18975)
Co-authored-by: daiweitao <dwti614707404@163.com>
2026-02-19 20:12:24 +08:00
Xiaoyu Zhang
19aa19b111 [diffusion] refactor: refactor diffusion triton kernels (#18966) 2026-02-19 17:03:44 +08:00
pansicheng
48642d5384 [RadixTree][4/N Refactor]: Move available_and_evictable_str to individual radix cache classes (#17852) 2026-02-19 17:03:15 +08:00
shaharmor98
82a0bafc1c Feat/add fi selective state update kernel call (#18070)
Signed-off-by: Shahar Mor <smor@nvidia.com>
2026-02-19 16:56:06 +08:00
Yuwei An
0be30d4b0d Fix PCG MoE Error (#17739) 2026-02-19 16:48:06 +08:00
hlu1
bba2fc49a1 [Qwen3.5] Enable nvfp4 checkpoint (#18937) 2026-02-19 12:24:05 +08:00
hxie
443b1a88d1 Add batched zero copy to NIXL backend (#18850)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2026-02-18 16:31:02 -08:00
Bingxu Chen
462267982b [AMD] Fix mi35x dsv32 mtp nightly (#18978)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
2026-02-18 16:23:17 -08:00
Alison Shao
e2fccb2ee0 Fix flaky Qwen3-Next KL divergence tests by reverting mamba slot release (#18910) 2026-02-19 07:55:16 +08:00
Mengyang Liu
4f980f6f23 [Feature] Implement update_weights_from_disk for SGLang-D (Diffusion … (#18306)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2026-02-18 11:24:07 -08:00
Tamir Baydasov
150ed881be [4/N] Quantization Refactor: Quark MoE schemes (#18252)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Peng Zhang <aniz1905@gmail.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2026-02-18 19:44:30 +03:00
Ethan (Yusheng) Su
9c5aae4df5 [Fix] Add lora tied lm head support (for Qwen2.5, Gemma, etc model need) (#18634) 2026-02-19 00:34:51 +08:00
Yuhao Yang
5a7ae059e3 Add DP ViT support for Kimi K2.5 (#18689) 2026-02-18 23:03:07 +08:00
zijiexia
eb6ff4a940 [diffusion] fix: refactor task resolution logic in benchmark function for multimodal generation (#18948) 2026-02-18 20:57:42 +08:00
Xiaoyu Zhang
390c154306 [Tiny fix] Super tiny fix mul_add naive forward bug (#18964) 2026-02-18 16:18:43 +08:00
Xiaoyu Zhang
513c12d23f Remove unused fast-hadamard-transform PyTorch extension sources (#18927) 2026-02-18 15:51:07 +08:00
Mick
420a611275 [diffusion] refactor: unify SamplingParams construction and improve DiffGenerator return types (#18928) 2026-02-18 14:56:58 +08:00
DarkSharpness
9d138685c1 [Refactor] Fix test and clean up hicache code (#18555) 2026-02-18 14:37:46 +08:00
William Arnold
95c44cea29 [feat] Add return_routed_experts param to async_generate for parity with generate (#18508) 2026-02-17 22:11:19 -08:00
Neal Vaidya
ac0e493329 feat: add nsa and swa disagg support with nixl (#18939)
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
2026-02-18 13:13:26 +08:00