zackyoray
|
d275d47973
|
[NIXL] Add custom NIXL backend selection for KVManager (#17146)
Signed-off-by: Yoray Zack <yorayz@nvidia.com>
|
2026-01-26 14:35:38 +08:00 |
|
Yuan Luo
|
1e8db18290
|
[Kimi-Linear] Remove duplicated code in kimi-linear (#17731)
|
2026-01-26 14:20:24 +08:00 |
|
chenxu214
|
444b9521e4
|
[Bugfix]Repeated add modelslim quant_config and bugfix with "enable-piecewise-cuda-graph" on NPU (#17511)
|
2026-01-26 09:51:07 +08:00 |
|
Kangyan-Zhou
|
592603d77b
|
Fix flaky streaming logprobs test by handling detokenizer text buffering (#17687)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-01-25 15:09:06 -08:00 |
|
Kangyan-Zhou
|
344eeaee90
|
Upload nightly test metrics to GH artifacts (#17696)
|
2026-01-25 14:35:14 -08:00 |
|
Kangyan-Zhou
|
8d3e1ac0c8
|
Add an all type in pyproject.tml to include diffusion support (#17697)
|
2026-01-25 12:52:13 -08:00 |
|
Kangyan-Zhou
|
9123491430
|
A few updates to the night tests (#17694)
|
2026-01-25 11:20:17 -08:00 |
|
HandH1998
|
a883906a24
|
Support mxint4 flashinfer_trtllm moe gemm (#16892)
|
2026-01-26 00:15:53 +08:00 |
|
Mick
|
b105dad5da
|
[diffusion] refactor: remove useless lazy-import cache-dit codes (#17659)
|
2026-01-25 22:43:22 +08:00 |
|
Zhengbo Wang
|
fb61164f27
|
[Refactor] Use is_in_ci() utility in JIT kernel benchmarks (#17118)
|
2026-01-25 20:40:47 +08:00 |
|
xjx471258437
|
9bd92ba0f6
|
Support PD disaggregation with different TP/DP size for Qwen3-Next (#16056)
Co-authored-by: xjx392321 <xjx392321@alibaba-inc.com>
|
2026-01-25 15:34:02 +08:00 |
|
Ke Bao
|
30ece5e1d6
|
Fix swa memory pool size with spec (#17630)
|
2026-01-25 14:10:43 +08:00 |
|
Mohammad Miadh Angkad
|
1674b9ef44
|
[DeepSeek-V3.2] Fix TRT-LLM NSA in target_verify/draft_extend (#17662)
|
2026-01-25 13:10:14 +08:00 |
|
Alison Shao
|
9121f22656
|
Add PyTorch .bin file validation to CI weight validation (#17533)
|
2026-01-24 19:18:15 -08:00 |
|
Chen Shen
|
59f027a8c8
|
[diffusion]: Fix ZImage SP sharding for caption and latent (#17301)
Co-authored-by: rhyshen <rhyshen@tencent.com>
Co-authored-by: florianzhao <florianzhao@tencent.com>
|
2026-01-25 10:10:48 +08:00 |
|
Xinyuan Tong
|
37c04c2245
|
fix: Refactor register_image_processor to use kwarg instead of positional arg (#17685)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2026-01-24 15:31:01 -08:00 |
|
Trevor Morris
|
2c2c4e446b
|
[NVIDIA] Add flashinfer all-to-all MOE dispatcher (#14668)
|
2026-01-24 22:59:55 +08:00 |
|
TMC
|
458a43d4ac
|
[NPU] torch_npu profiler tensorboard path type fix (#17545)
|
2026-01-24 22:55:49 +08:00 |
|
Yuan Luo
|
0c8165ffbd
|
[Kimi-Linear] Refactor Kimi-Linear to support RadixLinearAttention (#17506)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-01-24 21:27:13 +08:00 |
|
yunkchen
|
bf19d20d89
|
[Bugfix] fix TypeError when log-requests-level >=2 in prefill node warmup (#17129)
|
2026-01-24 19:16:22 +08:00 |
|
Lianmin Zheng
|
0834f9afeb
|
[Auto Sync] Update test_deterministic.py (20260124) (#17665)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>
|
2026-01-24 02:52:47 -08:00 |
|
Glen Liu
|
a6280b2a23
|
add documentation example for LoRA overlap loading and cleanup unused function (#17464)
|
2026-01-24 15:33:16 +08:00 |
|
Xiaoyu Zhang
|
3992a023e6
|
Move fa4 from sgl-kernel to jit kernel (#17353)
|
2026-01-24 15:25:03 +08:00 |
|
Xiaoyu Zhang
|
7a4bb0d516
|
[Diffusion] Add diffusion time embedding to jit kernel (#17658)
|
2026-01-24 14:27:08 +08:00 |
|
Ke Bao
|
fb683be6eb
|
Use attn tp group in embedding for more models (#17570)
|
2026-01-24 13:37:44 +08:00 |
|
strgrb
|
176da1bbdd
|
Fix: mistake sigmoid in kda (#17508)
|
2026-01-24 13:35:14 +08:00 |
|
Qi Yuhang
|
4c512a7d1d
|
[JIT Kernel]Add Some CUDA Runtime API Wrapper for JIT Kernel Header (#17588)
|
2026-01-24 12:57:58 +08:00 |
|
GMI Xiao Jin
|
d0919be733
|
[diffusion] model: LTX-2 Support (2/2) (#17496)
Co-authored-by: Fan Yin <1106310035@qq.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
|
2026-01-24 12:51:37 +08:00 |
|
GMI Xiao Jin
|
797a9811a2
|
[diffusion] model: LTX-2 (1/2) (#17495)
Co-authored-by: FlamingoPg <1106310035@qq.com>
|
2026-01-24 11:59:48 +08:00 |
|
Ananya
|
894928a951
|
Refactor: Extract DeepSeek common utilities into shared module (#16969)
|
2026-01-24 11:29:52 +08:00 |
|
Lianmin Zheng
|
bc6f0b5ce7
|
[Auto Sync] Update logits_processor.py, test_logprobs.py (20260124) (#17664)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: yehu-ux <yehu@x.ai>
|
2026-01-23 17:57:41 -08:00 |
|
McZyWu
|
b4a611fb33
|
[NPU] solve accuracy problem for stablelm-2-1-6b for npu (#17470)
|
2026-01-24 08:27:38 +08:00 |
|
McZyWu
|
8a5ed2434f
|
[NPU]support model MiniCPM3-4B for npu (#16866)
|
2026-01-24 08:25:12 +08:00 |
|
Douglas Yang
|
4c7136bb36
|
feature: adding openai compatible API request to bench_serving (#17219)
|
2026-01-23 16:04:28 -08:00 |
|
Nan Jiang
|
ad05782160
|
fix post_residual_addition more generally (#17286)
|
2026-01-23 15:43:37 -08:00 |
|
R0CKSTAR
|
a77729a276
|
[MUSA][1/N] sglang.check_env (#16959)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2026-01-23 14:41:17 -08:00 |
|
Mansoor
|
bdaa3de075
|
Add return routed experts to the completions and chat/completions endpoints (#17434)
|
2026-01-23 12:12:36 -08:00 |
|
Tiwei Bie
|
5438cd20ce
|
[DLLM] Remove cuda graph batch size limitation (#17458)
|
2026-01-23 09:52:39 -08:00 |
|
Jerry Ji
|
010c17a133
|
[Refactor] Algebraic data type for nextn config + some basic refactors (#17347)
|
2026-01-24 01:16:55 +08:00 |
|
Yi Zhong
|
08fcda2f63
|
add the fa4 mm backend and varlen func (#13539)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2026-01-23 23:12:06 +08:00 |
|
akhilg-nv
|
2fb328109f
|
[DeepSeek V3.2] Enable trtllm NSA with bf16 kvcache (#16758)
Co-authored-by: DarkSharpness <76582120+DarkSharpness@users.noreply.github.com>
|
2026-01-23 20:26:21 +08:00 |
|
Nicolas Castet
|
48e9daadff
|
Support symmetric memory pre-allocation to avoid fragmentation (#17089)
|
2026-01-23 17:57:04 +08:00 |
|
Yuzhen Zhou
|
2169025b77
|
turn off dit_layerwise_offload for wan on rocm (#17569)
|
2026-01-23 15:22:42 +08:00 |
|
Lianmin Zheng
|
56e6652d1d
|
Lazy import torchao (#17626)
|
2026-01-22 22:04:51 -08:00 |
|
JiaruiChang5268
|
c0b5a180fe
|
[NPU]bugfix: fix for dsv3.2 and dsvl2 (#17007)
Co-authored-by: Hexq0210 <893781835@qq.com>
Co-authored-by: liupeng374 <782420244@qq.com>
Co-authored-by: cy <chenyang08056032@163.com>
|
2026-01-23 11:15:15 +08:00 |
|
Ke Bao
|
7ace64d1d8
|
Update mamba env setting (#17566)
|
2026-01-23 11:02:32 +08:00 |
|
siyu
|
62e6a749b0
|
Skip mm feature pool init to avoid EPD OOM (#16388)
|
2026-01-23 10:53:45 +08:00 |
|
MMuzzammil1
|
2399af5557
|
Bugfix: Writing to storage when write-back method is chosen (#14718)
|
2026-01-22 15:08:25 -08:00 |
|
hxie
|
13f88045b3
|
configuration file support and nixl integration augmentation for hicache-storage-backend-extra-config (#16602)
|
2026-01-22 14:31:48 -08:00 |
|
wufann
|
a921029b97
|
[AMD] Support ds3.2 on gfx942 platform (#17504)
Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com>
|
2026-01-22 13:57:08 -08:00 |
|