Commit Graph

6437 Commits

Author SHA1 Message Date
zackyoray
d275d47973 [NIXL] Add custom NIXL backend selection for KVManager (#17146)
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
2026-01-26 14:35:38 +08:00
Yuan Luo
1e8db18290 [Kimi-Linear] Remove duplicated code in kimi-linear (#17731) 2026-01-26 14:20:24 +08:00
chenxu214
444b9521e4 [Bugfix]Repeated add modelslim quant_config and bugfix with "enable-piecewise-cuda-graph" on NPU (#17511) 2026-01-26 09:51:07 +08:00
Kangyan-Zhou
592603d77b Fix flaky streaming logprobs test by handling detokenizer text buffering (#17687)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2026-01-25 15:09:06 -08:00
Kangyan-Zhou
344eeaee90 Upload nightly test metrics to GH artifacts (#17696) 2026-01-25 14:35:14 -08:00
Kangyan-Zhou
8d3e1ac0c8 Add an all type in pyproject.tml to include diffusion support (#17697) 2026-01-25 12:52:13 -08:00
Kangyan-Zhou
9123491430 A few updates to the night tests (#17694) 2026-01-25 11:20:17 -08:00
HandH1998
a883906a24 Support mxint4 flashinfer_trtllm moe gemm (#16892) 2026-01-26 00:15:53 +08:00
Mick
b105dad5da [diffusion] refactor: remove useless lazy-import cache-dit codes (#17659) 2026-01-25 22:43:22 +08:00
Zhengbo Wang
fb61164f27 [Refactor] Use is_in_ci() utility in JIT kernel benchmarks (#17118) 2026-01-25 20:40:47 +08:00
xjx471258437
9bd92ba0f6 Support PD disaggregation with different TP/DP size for Qwen3-Next (#16056)
Co-authored-by: xjx392321 <xjx392321@alibaba-inc.com>
2026-01-25 15:34:02 +08:00
Ke Bao
30ece5e1d6 Fix swa memory pool size with spec (#17630) 2026-01-25 14:10:43 +08:00
Mohammad Miadh Angkad
1674b9ef44 [DeepSeek-V3.2] Fix TRT-LLM NSA in target_verify/draft_extend (#17662) 2026-01-25 13:10:14 +08:00
Alison Shao
9121f22656 Add PyTorch .bin file validation to CI weight validation (#17533) 2026-01-24 19:18:15 -08:00
Chen Shen
59f027a8c8 [diffusion]: Fix ZImage SP sharding for caption and latent (#17301)
Co-authored-by: rhyshen <rhyshen@tencent.com>
Co-authored-by: florianzhao <florianzhao@tencent.com>
2026-01-25 10:10:48 +08:00
Xinyuan Tong
37c04c2245 fix: Refactor register_image_processor to use kwarg instead of positional arg (#17685)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2026-01-24 15:31:01 -08:00
Trevor Morris
2c2c4e446b [NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668) 2026-01-24 22:59:55 +08:00
TMC
458a43d4ac [NPU] torch_npu profiler tensorboard path type fix (#17545) 2026-01-24 22:55:49 +08:00
Yuan Luo
0c8165ffbd [Kimi-Linear] Refactor Kimi-Linear to support RadixLinearAttention (#17506)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-01-24 21:27:13 +08:00
yunkchen
bf19d20d89 [Bugfix] fix TypeError when log-requests-level >=2 in prefill node warmup (#17129) 2026-01-24 19:16:22 +08:00
Lianmin Zheng
0834f9afeb [Auto Sync] Update test_deterministic.py (20260124) (#17665)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>
2026-01-24 02:52:47 -08:00
Glen Liu
a6280b2a23 add documentation example for LoRA overlap loading and cleanup unused function (#17464) 2026-01-24 15:33:16 +08:00
Xiaoyu Zhang
3992a023e6 Move fa4 from sgl-kernel to jit kernel (#17353) 2026-01-24 15:25:03 +08:00
Xiaoyu Zhang
7a4bb0d516 [Diffusion] Add diffusion time embedding to jit kernel (#17658) 2026-01-24 14:27:08 +08:00
Ke Bao
fb683be6eb Use attn tp group in embedding for more models (#17570) 2026-01-24 13:37:44 +08:00
strgrb
176da1bbdd Fix: mistake sigmoid in kda (#17508) 2026-01-24 13:35:14 +08:00
Qi Yuhang
4c512a7d1d [JIT Kernel]Add Some CUDA Runtime API Wrapper for JIT Kernel Header (#17588) 2026-01-24 12:57:58 +08:00
GMI Xiao Jin
d0919be733 [diffusion] model: LTX-2 Support (2/2) (#17496)
Co-authored-by: Fan Yin <1106310035@qq.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
2026-01-24 12:51:37 +08:00
GMI Xiao Jin
797a9811a2 [diffusion] model: LTX-2 (1/2) (#17495)
Co-authored-by: FlamingoPg <1106310035@qq.com>
2026-01-24 11:59:48 +08:00
Ananya
894928a951 Refactor: Extract DeepSeek common utilities into shared module (#16969) 2026-01-24 11:29:52 +08:00
Lianmin Zheng
bc6f0b5ce7 [Auto Sync] Update logits_processor.py, test_logprobs.py (20260124) (#17664)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: yehu-ux <yehu@x.ai>
2026-01-23 17:57:41 -08:00
McZyWu
b4a611fb33 [NPU] solve accuracy problem for stablelm-2-1-6b for npu (#17470) 2026-01-24 08:27:38 +08:00
McZyWu
8a5ed2434f [NPU]support model MiniCPM3-4B for npu (#16866) 2026-01-24 08:25:12 +08:00
Douglas Yang
4c7136bb36 feature: adding openai compatible API request to bench_serving (#17219) 2026-01-23 16:04:28 -08:00
Nan Jiang
ad05782160 fix post_residual_addition more generally (#17286) 2026-01-23 15:43:37 -08:00
R0CKSTAR
a77729a276 [MUSA][1/N] sglang.check_env (#16959)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2026-01-23 14:41:17 -08:00
Mansoor
bdaa3de075 Add return routed experts to the completions and chat/completions endpoints (#17434) 2026-01-23 12:12:36 -08:00
Tiwei Bie
5438cd20ce [DLLM] Remove cuda graph batch size limitation (#17458) 2026-01-23 09:52:39 -08:00
Jerry Ji
010c17a133 [Refactor] Algebraic data type for nextn config + some basic refactors (#17347) 2026-01-24 01:16:55 +08:00
Yi Zhong
08fcda2f63 add the fa4 mm backend and varlen func (#13539)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2026-01-23 23:12:06 +08:00
akhilg-nv
2fb328109f [DeepSeek V3.2] Enable trtllm NSA with bf16 kvcache (#16758)
Co-authored-by: DarkSharpness <76582120+DarkSharpness@users.noreply.github.com>
2026-01-23 20:26:21 +08:00
Nicolas Castet
48e9daadff Support symmetric memory pre-allocation to avoid fragmentation (#17089) 2026-01-23 17:57:04 +08:00
Yuzhen Zhou
2169025b77 turn off dit_layerwise_offload for wan on rocm (#17569) 2026-01-23 15:22:42 +08:00
Lianmin Zheng
56e6652d1d Lazy import torchao (#17626) 2026-01-22 22:04:51 -08:00
JiaruiChang5268
c0b5a180fe [NPU]bugfix: fix for dsv3.2 and dsvl2 (#17007)
Co-authored-by: Hexq0210 <893781835@qq.com>
Co-authored-by: liupeng374 <782420244@qq.com>
Co-authored-by: cy <chenyang08056032@163.com>
2026-01-23 11:15:15 +08:00
Ke Bao
7ace64d1d8 Update mamba env setting (#17566) 2026-01-23 11:02:32 +08:00
siyu
62e6a749b0 Skip mm feature pool init to avoid EPD OOM (#16388) 2026-01-23 10:53:45 +08:00
MMuzzammil1
2399af5557 Bugfix: Writing to storage when write-back method is chosen (#14718) 2026-01-22 15:08:25 -08:00
hxie
13f88045b3 configuration file support and nixl integration augmentation for hicache-storage-backend-extra-config (#16602) 2026-01-22 14:31:48 -08:00
wufann
a921029b97 [AMD] Support ds3.2 on gfx942 platform (#17504)
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
2026-01-22 13:57:08 -08:00