Commit Graph

7437 Commits

Author SHA1 Message Date
Xiaoyu Zhang
89affff290 Skip broken AutoModel mapping entries when resolving Llava submodules (#21892) 2026-04-03 09:04:26 +08:00
Adarsh Shirawalmath
34ddf135fd [Feature] Stronger transformers modeling backend with TP, PP, MoE, VLMs, and torch compile (#19163) 2026-04-02 16:02:33 -07:00
ori
939cf398a9 [MUSA][9/N] Add FA3 attention backend support through MATE (MUSA AI Tensor Engine) (#17985)
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2026-04-02 15:04:31 -07:00
Ethan (Yusheng) Su
566b4a4f1c [4/n] Support gpt oss 20b lora (#21570) 2026-04-02 12:57:38 -07:00
Lianmin Zheng
fe38410c3e Remove logging for subprocess watchdog start (#21968) 2026-04-02 11:30:33 -07:00
Feng Su
8732b2e9c6 [CI] [Tracing] Add ci for tracing and fix bugs (#21740) 2026-04-02 10:50:50 -07:00
Mick
2278a321ca [diffusion] chore: fix stage profiler for multi-stage denoising (#21955) 2026-04-03 01:16:38 +08:00
DarkSharpness
df94cdcebb [Parallel State Refactor 1/n] Remove stream of PyNCCL (#20866) 2026-04-03 00:47:50 +08:00
Ke Bao
b21db86e2f [CI] Fix gpu deps import in cpu test (#21950) 2026-04-03 00:06:31 +08:00
Todobe
083304ca44 [NPU] Support GLM-4.7-Flash on NPU (#21408) 2026-04-02 17:44:50 +08:00
Liangsheng Yin
9d9537fbd3 Migrate ngram corpus from torch cpp_extension to TVM FFI jit_kernel (#21920)
Co-authored-by: DarkSharpness <2040703891@qq.com>
2026-04-02 02:18:11 -07:00
Qiaolin Yu
b684b0b72f Fix spec v2 + logprob when max_num_token is set (#20799) 2026-04-02 01:55:16 -07:00
Baizhou Zhang
fbc1f92453 [DSA] Set trtllm kernels as nsa default for Blackwell (#21914) 2026-04-02 00:22:27 -07:00
Yilong Zhao
f30df723bf scheduler: add prefill-only update in merge batch (#21840) 2026-04-01 23:33:06 -07:00
Trevor Morris
d24ea24e18 [NVIDIA] Enable fp8 flashinfer_trtllm_routed MoE for MiniMax-M2.5 (#20394) 2026-04-01 23:02:06 -07:00
Khoa Pham
f836658077 [Spec][Ngram] 4/N: Remove max_match_window_size and min_match_window_size, matching all suffixes of the Trie (#21225) 2026-04-01 22:09:46 -07:00
Liangsheng Yin
269589ad71 Return HTTP 400 for streaming validation errors (#21900) 2026-04-01 21:58:12 -07:00
Khoa Pham
153359b4dd Multi tool streaming fix (#20004) 2026-04-01 21:53:05 -07:00
Mook
7a59e05dd1 [Kernel] Fuse temperature + softmax in sampling for decode speedup (#20501) 2026-04-02 12:46:36 +08:00
Brayden Zhong
cb0c2cbfdb Enable multi-thread weight loading by default (#20289) 2026-04-01 21:27:20 -07:00
Zhangheng
fae66b4050 Support PP key for file backend (#21901) 2026-04-02 12:23:58 +08:00
David Cheung
ed427e1299 Migrate all callers from /get_server_info to /server_info (#21463) 2026-04-01 21:17:50 -07:00
Prozac614
24997fe42c [diffusion] CI: add initial nvfp4 ci test for b200 (#21767)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-02 11:31:08 +08:00
Yuhao Yang
2ef12073f4 [VLM] Add VLM TP=4 per-commit CI test and improve MMMU eval prompt/parser (#21841) 2026-04-01 20:09:47 -07:00
Hanlin Bi
0f6bedf6ed fix pcg torch dynamo recompile in mxfp8 Triton path (#21888)
Co-authored-by: Hanlin Bi <hanlinbi@umich.edu>
2026-04-02 01:57:49 +00:00
Noa Neria
8d9145d97e Direct model loading from object storage with Runai Model Streamer (#17948)
Signed-off-by: Noa Neria <noa@run.ai>
2026-04-01 18:41:22 -07:00
huangtingwei
6dd2f774de [HiCache & PD]Fixed detailed cache hit breakdown in PD scenarios. (#21764) 2026-04-01 17:44:55 -07:00
shuwenn
9cb362f70e [HiCache] fix: Clone host indices to avoid memory leak (#21624)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2026-04-01 17:42:07 -07:00
Liangsheng Yin
d7256eb69a Unify GSM8K eval path to Chat API for regression CI readiness (#21667) 2026-04-01 17:12:19 -07:00
ishandhanani
1081a25983 revert: remove TTL-based hard pin from HiRadixCache (#21884) 2026-04-01 16:51:15 -07:00
Alison Shao
1ac74e652e [Misc] Fix comparator e2e tests: add polars dep + fix dp-attention test (#21804)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
2026-04-01 15:44:35 -07:00
YAMY
821a8a99fb [Disagg] GPU staging buffer with dynamic ring allocator for heterogeneous TP KV transfer (#19890) 2026-04-01 14:09:18 -07:00
Baizhou Zhang
5e12c4e08e [DSA] Support trtllm sparse mla kernel for prefill batches (#21783) 2026-04-01 13:55:05 -07:00
Trevor Morris
8950d129bd [refactor] Clean up duplicate flashinfer trtllm moe code (#21233) 2026-04-01 13:52:22 -07:00
Liangsheng Yin
0138708576 [Misc] Add network timeout to eval dataset downloads (#21873)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 13:16:14 -07:00
Ziang Li
a19ef3a615 [FlashInver v0.6.7] Integrate flashinfer_trtllm mxfp8 gemm (#21576) 2026-04-01 15:55:06 -04:00
shuwenn
a1c725bdc5 fix: pre-init tokenizer_manager to avoid AttributeError in shutdown (#21824) 2026-04-01 10:54:53 -07:00
R0CKSTAR
ca3286d2d5 [diffusion] hardware: support FA3 attention backend on MUSA (attn backend, 14/N) (#18648)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-01 10:49:34 -07:00
shuwenn
6098c51bc2 fix(MiMo-V2-Flash): add mimo reasoning parser (#21414) 2026-04-02 00:47:27 +08:00
DarkSharpness
20f4193589 [Feature] JIT rmsnorm update (with claude) (#21834) 2026-04-01 23:40:00 +08:00
Ratish P
4f5b55e379 [diffusion][CI]: Add individual component accuracy CI for diffusion models (#18709)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-04-01 21:51:36 +08:00
Cherry_ming
e67b95d66b [NPU]Add a full test pipeline on NPU, resolve issues in the NPU test architecture (#20751) 2026-04-01 19:56:31 +08:00
Yuhao Yang
1aabe44b64 [VLM] remove AsyncMMDataProcessor wrapper (#21651) 2026-04-01 17:39:50 +08:00
Mick
7bba319f1e [diffusion] fix: respect --prompt-path (#21756) 2026-04-01 16:47:59 +08:00
wduan-hai
95b881452e Fix in-place mode in pause generation (#21705) 2026-04-01 01:36:28 -07:00
yunkchen
eec70286ec [Bugfix] Fix effective_mamba_size over-allocation (#20858)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2026-04-01 16:17:14 +08:00
yudian0504
7d2b856ce7 [Bug][VLM] Fix shared memory race condition in ShmPointerMMData broadcast for multi-GPU VLM serving (#21655) 2026-04-01 16:15:14 +08:00
Zhiqiang Xie
9eb75211b1 style refinement for hisparse (#21198) 2026-04-01 01:03:17 -07:00
Yuxuan Zhang
57341b128f glm_interleave for GLM-V (#21671) 2026-04-01 00:21:10 -07:00
khalilzhk
835e19656f Bug fix for llama eagle3 (#21397) 2026-04-01 15:01:53 +08:00