Khoa Pham
|
cd75d54fc5
|
[Bugfix] Fix CUDA graph replay issues in trtllm_mla draft_extend (#21987)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-03 01:45:13 -07:00 |
|
shuwenn
|
4f84ce5807
|
[CI] ci: add test_http_server_auth.py to CI (#21866)
|
2026-04-03 16:32:18 +08:00 |
|
Thomas Wang
|
7431db7392
|
[AMD] Enable FP8 KV cache and FP8 attention kernel for NSA on MI300/MI355 with TileLang backend (#21511)
|
2026-04-03 00:58:23 -07:00 |
|
Kelon
|
ad0516d9c1
|
[NPU] optimize glm4.7 (#19246)
|
2026-04-03 15:44:07 +08:00 |
|
Shangming Cai
|
d82097a0df
|
[PD] Tiny register info field cleanup for mooncake backend (#22016)
|
2026-04-03 15:13:44 +08:00 |
|
Ricardo-M-L
|
24f52e66d3
|
fix: remove duplicate words in comments (#22007)
|
2026-04-03 00:05:39 -07:00 |
|
Yuzhen Zhou
|
6b876a7710
|
[ROCM][RL] Shuffle Weight In-Place to Preserve Parameter Attributes (#21825)
|
2026-04-02 23:43:55 -07:00 |
|
Zhangheng
|
4d097047f2
|
[PD]: Add support for HiSparse to directly transfer the cache from Prefill to Decode DRAM. (#21591)
Co-authored-by: Tingwei Huang <huangtingwei9988@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2026-04-02 23:06:12 -07:00 |
|
kk
|
5bcbc9757c
|
[AMD] Resolve the performance degression when launch server with "--enable-aiter-allreduce-fusion" (#21947)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2026-04-02 22:10:24 -07:00 |
|
DarkSharpness
|
d1b7c3907d
|
[Parallel State Refactor 2/n] Unify code path of AMD deterministic all reduce (#20871)
|
2026-04-03 12:33:17 +08:00 |
|
Baizhou Zhang
|
efa7b2d5d3
|
Revert "[MUSA][9/N] Add FA3 attention backend support through MATE (MUSA AI Tensor Engine)" (#22002)
|
2026-04-02 20:42:13 -07:00 |
|
lviy
|
5f0df1e2ad
|
[Bugfix] Fix incorrect dp-attention parallel info in bench_one_batch (#21519)
|
2026-04-02 20:13:53 -07:00 |
|
Yuhao Yang
|
69e89a1fcc
|
[VLM] Enable per-image MM splitting by default and remove MULTI_IMAGES modality (#21899)
|
2026-04-03 11:04:41 +08:00 |
|
narutolhy
|
8897ac58f0
|
[PP] qwen3 vl skip layer id for pp (#19135)
|
2026-04-03 10:51:53 +08:00 |
|
Mook
|
991f3aa5b3
|
[Feature] NVFP4 Marlin fallback for non-Blackwell GPUs (SM75+) (#19652)
|
2026-04-03 10:48:15 +08:00 |
|
Khoa Pham
|
2b5aed94f5
|
Remove maxItems=1 restriction when tool_choice is specified (#20208)
|
2026-04-03 02:35:24 +00:00 |
|
Thomas
|
0539c62bc1
|
[Diffusion][NPU] Add support for MOVA (#21633)
Co-authored-by: zhangshuai (S) <z00836796@china.huawei.com>
|
2026-04-03 05:33:14 +03:00 |
|
Kangyan-Zhou
|
1f97714f9b
|
[CI] Add timeouts to Slack upload urlopen and WebClient (#21903)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-02 19:30:55 -07:00 |
|
Xiaoyu Zhang
|
89affff290
|
Skip broken AutoModel mapping entries when resolving Llava submodules (#21892)
|
2026-04-03 09:04:26 +08:00 |
|
Adarsh Shirawalmath
|
34ddf135fd
|
[Feature] Stronger transformers modeling backend with TP, PP, MoE, VLMs, and torch compile (#19163)
|
2026-04-02 16:02:33 -07:00 |
|
ori
|
939cf398a9
|
[MUSA][9/N] Add FA3 attention backend support through MATE (MUSA AI Tensor Engine) (#17985)
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
|
2026-04-02 15:04:31 -07:00 |
|
Ethan (Yusheng) Su
|
566b4a4f1c
|
[4/n] Support gpt oss 20b lora (#21570)
|
2026-04-02 12:57:38 -07:00 |
|
Lianmin Zheng
|
fe38410c3e
|
Remove logging for subprocess watchdog start (#21968)
|
2026-04-02 11:30:33 -07:00 |
|
Feng Su
|
8732b2e9c6
|
[CI] [Tracing] Add ci for tracing and fix bugs (#21740)
|
2026-04-02 10:50:50 -07:00 |
|
Mick
|
2278a321ca
|
[diffusion] chore: fix stage profiler for multi-stage denoising (#21955)
|
2026-04-03 01:16:38 +08:00 |
|
DarkSharpness
|
df94cdcebb
|
[Parallel State Refactor 1/n] Remove stream of PyNCCL (#20866)
|
2026-04-03 00:47:50 +08:00 |
|
Ke Bao
|
b21db86e2f
|
[CI] Fix gpu deps import in cpu test (#21950)
|
2026-04-03 00:06:31 +08:00 |
|
Todobe
|
083304ca44
|
[NPU] Support GLM-4.7-Flash on NPU (#21408)
|
2026-04-02 17:44:50 +08:00 |
|
Liangsheng Yin
|
9d9537fbd3
|
Migrate ngram corpus from torch cpp_extension to TVM FFI jit_kernel (#21920)
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2026-04-02 02:18:11 -07:00 |
|
Qiaolin Yu
|
b684b0b72f
|
Fix spec v2 + logprob when max_num_token is set (#20799)
|
2026-04-02 01:55:16 -07:00 |
|
Baizhou Zhang
|
fbc1f92453
|
[DSA] Set trtllm kernels as nsa default for Blackwell (#21914)
|
2026-04-02 00:22:27 -07:00 |
|
Yilong Zhao
|
f30df723bf
|
scheduler: add prefill-only update in merge batch (#21840)
|
2026-04-01 23:33:06 -07:00 |
|
Trevor Morris
|
d24ea24e18
|
[NVIDIA] Enable fp8 flashinfer_trtllm_routed MoE for MiniMax-M2.5 (#20394)
|
2026-04-01 23:02:06 -07:00 |
|
Khoa Pham
|
f836658077
|
[Spec][Ngram] 4/N: Remove max_match_window_size and min_match_window_size, matching all suffixes of the Trie (#21225)
|
2026-04-01 22:09:46 -07:00 |
|
Liangsheng Yin
|
269589ad71
|
Return HTTP 400 for streaming validation errors (#21900)
|
2026-04-01 21:58:12 -07:00 |
|
Khoa Pham
|
153359b4dd
|
Multi tool streaming fix (#20004)
|
2026-04-01 21:53:05 -07:00 |
|
Mook
|
7a59e05dd1
|
[Kernel] Fuse temperature + softmax in sampling for decode speedup (#20501)
|
2026-04-02 12:46:36 +08:00 |
|
Brayden Zhong
|
cb0c2cbfdb
|
Enable multi-thread weight loading by default (#20289)
|
2026-04-01 21:27:20 -07:00 |
|
Zhangheng
|
fae66b4050
|
Support PP key for file backend (#21901)
|
2026-04-02 12:23:58 +08:00 |
|
David Cheung
|
ed427e1299
|
Migrate all callers from /get_server_info to /server_info (#21463)
|
2026-04-01 21:17:50 -07:00 |
|
Prozac614
|
24997fe42c
|
[diffusion] CI: add initial nvfp4 ci test for b200 (#21767)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-02 11:31:08 +08:00 |
|
Yuhao Yang
|
2ef12073f4
|
[VLM] Add VLM TP=4 per-commit CI test and improve MMMU eval prompt/parser (#21841)
|
2026-04-01 20:09:47 -07:00 |
|
Hanlin Bi
|
0f6bedf6ed
|
fix pcg torch dynamo recompile in mxfp8 Triton path (#21888)
Co-authored-by: Hanlin Bi <hanlinbi@umich.edu>
|
2026-04-02 01:57:49 +00:00 |
|
Noa Neria
|
8d9145d97e
|
Direct model loading from object storage with Runai Model Streamer (#17948)
Signed-off-by: Noa Neria <noa@run.ai>
|
2026-04-01 18:41:22 -07:00 |
|
huangtingwei
|
6dd2f774de
|
[HiCache & PD]Fixed detailed cache hit breakdown in PD scenarios. (#21764)
|
2026-04-01 17:44:55 -07:00 |
|
shuwenn
|
9cb362f70e
|
[HiCache] fix: Clone host indices to avoid memory leak (#21624)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2026-04-01 17:42:07 -07:00 |
|
Liangsheng Yin
|
d7256eb69a
|
Unify GSM8K eval path to Chat API for regression CI readiness (#21667)
|
2026-04-01 17:12:19 -07:00 |
|
ishandhanani
|
1081a25983
|
revert: remove TTL-based hard pin from HiRadixCache (#21884)
|
2026-04-01 16:51:15 -07:00 |
|
Alison Shao
|
1ac74e652e
|
[Misc] Fix comparator e2e tests: add polars dep + fix dp-attention test (#21804)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
|
2026-04-01 15:44:35 -07:00 |
|
YAMY
|
821a8a99fb
|
[Disagg] GPU staging buffer with dynamic ring allocator for heterogeneous TP KV transfer (#19890)
|
2026-04-01 14:09:18 -07:00 |
|