Cherry_ming
|
1808df48fe
|
[NPU]add nightly-test-npu (#14143)
|
2025-12-05 00:43:35 +08:00 |
|
WenhaoZhang
|
788628b56f
|
[diffusion] feat: Add Configurable Generator Device and Seed Support via API (#14366)
Co-authored-by: niehen6174 <niehen.6174@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-05 00:25:09 +08:00 |
|
Raul Torres
|
29a2d4b59f
|
Add 'NPU' to the runtime exception message in get_device (#14225)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2025-12-04 17:34:31 +03:00 |
|
R0CKSTAR
|
079ac237da
|
[diffusion] fix: fix gen video doc (#14409)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-04 22:05:38 +08:00 |
|
Daniel Cámpora
|
8428078436
|
Add Mistral Large 3 support. (#14213)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-12-04 20:00:05 +08:00 |
|
Xuchun Shang
|
af35023e65
|
[bug fix] fix ima with get_mla_kv_buffer_kernel overflow (#14224)
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
|
2025-12-04 01:20:11 -08:00 |
|
jianan-gu
|
70d2587324
|
[CPU] Optimize small oc GEMM for Qwen3-next on CPU (#12446)
Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com>
|
2025-12-04 00:38:47 -08:00 |
|
Even Zhou
|
894c0dc57c
|
[NPU][1/N] NPU basic functions refactor and new modelslim quant type (#13359)
|
2025-12-04 16:15:31 +08:00 |
|
yctseng0211
|
d6c490192d
|
[AMD] fix the regression issue for DeepseekV3 on MI300 (#14383)
|
2025-12-03 23:30:11 -08:00 |
|
Yuan Luo
|
b2b09f5f24
|
[VLM] Introduce Cache for positional embedding ids for Qwen-VL family (#14292)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-12-04 12:32:00 +08:00 |
|
Kevin Li
|
04df80a9a1
|
Support PP x PD decode with nixl backend (#14392)
|
2025-12-04 12:22:17 +08:00 |
|
b8zhong
|
9d82340298
|
Revert "Revert "enable csgmv automatically on cuda"" (#14277)
|
2025-12-03 13:12:30 -08:00 |
|
alisonshao
|
80518bea65
|
Fix validation to detect missing model files before loading (#14253)
|
2025-12-03 11:36:07 -08:00 |
|
Lianmin Zheng
|
46d7b35ec7
|
Move custom_ops under layers; move _custom_ops.py → custom_all_reduce_ops.py (#14326)
|
2025-12-03 10:33:37 -08:00 |
|
Sulfur6-L8972
|
20aad5b5ab
|
Single Batch Overlap for MoE Models (#9660)
Co-authored-by: Cheng Wan <wan4ch@gmail.com>
Co-authored-by: Zqy11 <841971412@qq.com>
Co-authored-by: AniZpZ <aniz1905@gmail.com>
Co-authored-by: TianyuZhang1214 <tianyuzhang1214@gmail.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-12-03 10:07:42 -08:00 |
|
Dongjie Zou
|
aca0d01d3f
|
[diffusion] doc: add vae path to cli doc#14004 (#14355)
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-03 15:13:25 +00:00 |
|
Michelle Wu
|
443d7bcd83
|
[Ascend] fix AscendAttnMaskBuilder bug to support float16 models (#14271)
|
2025-12-03 19:37:16 +08:00 |
|
chenxu140
|
16d8de2284
|
[bugfix] NpuFuseEPMoE miss initialization parameters (#14295)
|
2025-12-03 19:36:41 +08:00 |
|
ZhengdQin
|
d122e32467
|
[NPU] bug fix: w_vc need contiguous for NPU batch_matmul_transpose ops (#13980)
|
2025-12-03 19:35:18 +08:00 |
|
Yuhao Yao
|
77512ae0d7
|
[bugfix] Fix prefill tbo disabled when --deepep-mode=auto (#14333)
Co-authored-by: Cheng Wan <wan4ch@gmail.com>
|
2025-12-03 01:20:33 -08:00 |
|
Shangming Cai
|
93452a8252
|
[PD] Support decode pp for PD disaggregation (#14265)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-12-03 14:35:29 +08:00 |
|
Vikram
|
42271376d1
|
[bug fix] use npu phy id in container env (#14266)
Co-authored-by: jinke15 <jinke15@jd.com>
|
2025-12-03 11:33:43 +08:00 |
|
Johnsonms
|
043f13171f
|
[Performance] Optimize NSA Indexer K/S Buffer Access with Fused Triton Kernels (#13812)
Co-authored-by: Johnsonms <johnson@together.ai>
|
2025-12-02 18:53:06 -08:00 |
|
Dongjie Zou
|
f764c6910d
|
[diffusion] feat: support distilled vae generic (#14195)
Co-authored-by: BBuf <1182563586@qq.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-03 10:27:31 +08:00 |
|
Even Zhou
|
7d1a130cde
|
Refactor custom allreduce logics (#13710)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-12-02 17:20:05 -08:00 |
|
sglang-bot
|
7ae368efde
|
chore: bump SGLang version to 0.5.6 (#14316)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-12-02 17:17:13 -08:00 |
|
Lianmin Zheng
|
ca52ed425f
|
Clean up imports and move files (#14317)
|
2025-12-02 16:31:54 -08:00 |
|
Eva20150932-atlascloud
|
7c38eca1e4
|
feat: DeepSeek new v3.2 encoding (#14249)
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
|
2025-12-02 11:41:05 -08:00 |
|
Quanfeng Li
|
427b08e24d
|
Init TBO with dp_padded batch (#11423)
Co-authored-by: Cheng Wan <wan4ch@gmail.com>
Co-authored-by: Yuhao Yao <37280700+yuhyao@users.noreply.github.com>
|
2025-12-02 10:34:26 -08:00 |
|
alisonshao
|
0141ca370f
|
Revert PR #14044: Restore separate memory pool for piecewise CUDA graph (#14278)
|
2025-12-02 09:53:16 -08:00 |
|
alisonshao
|
25a6be4930
|
Fix duplicate download log messages in multi-process environment (#14299)
|
2025-12-02 09:33:18 -08:00 |
|
Mick
|
9530b76630
|
[diffusion] refactor: simplify DmdDenoisingStage (#14269)
|
2025-12-02 18:59:40 +08:00 |
|
Jinyan Chen
|
3067b3f050
|
[diffusion] chore: improve model info registration and searching strategy (#14281)
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-12-02 18:28:59 +08:00 |
|
Lianmin Zheng
|
64092c8b55
|
[Auto Sync] Rename is_hybrid to is_hybrid_swa (#14252)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: Hanming Lu <hanming@x.ai>
|
2025-12-01 23:24:24 -08:00 |
|
sglang-bot
|
63b9300f00
|
chore: bump sgl-kernel version to 0.3.18.post2 (#14244)
|
2025-12-01 23:14:12 -08:00 |
|
b8zhong
|
236a7c2370
|
fix trtllm mla spec (#13738)
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
|
2025-12-01 22:16:25 -08:00 |
|
Roger Young
|
3dabd609fb
|
Optimize topk sigmoid in minimax_m2 (#14047)
Co-authored-by: xuebi <xuebi@minimaxi.com>
|
2025-12-02 14:07:12 +08:00 |
|
Kevin Li
|
c9e2090101
|
fix: Support PP for Mistral Small 3.1 (#14254)
|
2025-12-02 13:04:14 +08:00 |
|
kun-llfl
|
106df4eac5
|
Fix mrope_positions size when req is retracted (#13700)
Signed-off-by: Kun(llfl) <i@imux.top>
Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
|
2025-12-02 11:38:20 +08:00 |
|
Mick
|
1f930cd23d
|
[diffusion] CI: add testcase-wise retry mechanism (#14261)
|
2025-12-02 11:06:12 +08:00 |
|
Kartik Ramesh
|
11ce05163d
|
Fix NIXL exception message (#14172)
|
2025-12-02 10:39:45 +08:00 |
|
Stefan He
|
8fe8b63576
|
Revert "Try to remove wrong logic about max total token in spec decoding" (#14259)
|
2025-12-01 18:18:03 -08:00 |
|
Yuan Luo
|
26aebf83d3
|
[VLM] Support Piecewise CUDA Graph for Qwen3-Omni-MOE (#14222)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-12-02 10:12:10 +08:00 |
|
Mick
|
3ab8ae6847
|
[diffusion] fix: fix Flux.2 condition image resize (#14232)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-02 10:05:44 +08:00 |
|
Lianmin Zheng
|
796d82b107
|
[Auto Sync] Add max_total_num_tokens metric: Update scheduler_metrics_mixin.py, collector.py (20251202) (#14256)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dan Zheng <dzheng@x.ai>
|
2025-12-01 16:34:34 -08:00 |
|
Lianmin Zheng
|
1da59e8304
|
[Auto Sync] optionally disable fake register in Update fp8_kernel.py (20251202) (#14255)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: gauravjain14 <41287729+gauravjain14@users.noreply.github.com>
|
2025-12-01 16:11:12 -08:00 |
|
TomerBN-Nvidia
|
02af51e4fc
|
Support fp4 fp8 non gated moe (#13794)
Co-authored-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
|
2025-12-01 15:26:28 -08:00 |
|
Zhiyu
|
079b173853
|
Fix a distributed initialization error (#13843)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-12-01 15:10:05 -08:00 |
|
YAMY
|
1f2b84d28d
|
Fix NSA Bug in Centralize NSA Dispatch Logic (#14245)
|
2025-12-01 13:18:18 -08:00 |
|
ishandhanani
|
07821352fb
|
Revert "Skip weight loading in deepgemm compilation" (#14241)
|
2025-12-01 12:59:09 -08:00 |
|