DarkSharpness
|
df94cdcebb
|
[Parallel State Refactor 1/n] Remove stream of PyNCCL (#20866)
|
2026-04-03 00:47:50 +08:00 |
|
Ke Bao
|
b21db86e2f
|
[CI] Fix gpu deps import in cpu test (#21950)
|
2026-04-03 00:06:31 +08:00 |
|
Todobe
|
083304ca44
|
[NPU] Support GLM-4.7-Flash on NPU (#21408)
|
2026-04-02 17:44:50 +08:00 |
|
Liangsheng Yin
|
9d9537fbd3
|
Migrate ngram corpus from torch cpp_extension to TVM FFI jit_kernel (#21920)
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2026-04-02 02:18:11 -07:00 |
|
Qiaolin Yu
|
b684b0b72f
|
Fix spec v2 + logprob when max_num_token is set (#20799)
|
2026-04-02 01:55:16 -07:00 |
|
Baizhou Zhang
|
fbc1f92453
|
[DSA] Set trtllm kernels as nsa default for Blackwell (#21914)
|
2026-04-02 00:22:27 -07:00 |
|
Yilong Zhao
|
f30df723bf
|
scheduler: add prefill-only update in merge batch (#21840)
|
2026-04-01 23:33:06 -07:00 |
|
Trevor Morris
|
d24ea24e18
|
[NVIDIA] Enable fp8 flashinfer_trtllm_routed MoE for MiniMax-M2.5 (#20394)
|
2026-04-01 23:02:06 -07:00 |
|
Khoa Pham
|
f836658077
|
[Spec][Ngram] 4/N: Remove max_match_window_size and min_match_window_size, matching all suffixes of the Trie (#21225)
|
2026-04-01 22:09:46 -07:00 |
|
Liangsheng Yin
|
269589ad71
|
Return HTTP 400 for streaming validation errors (#21900)
|
2026-04-01 21:58:12 -07:00 |
|
Khoa Pham
|
153359b4dd
|
Multi tool streaming fix (#20004)
|
2026-04-01 21:53:05 -07:00 |
|
Mook
|
7a59e05dd1
|
[Kernel] Fuse temperature + softmax in sampling for decode speedup (#20501)
|
2026-04-02 12:46:36 +08:00 |
|
Brayden Zhong
|
cb0c2cbfdb
|
Enable multi-thread weight loading by default (#20289)
|
2026-04-01 21:27:20 -07:00 |
|
Zhangheng
|
fae66b4050
|
Support PP key for file backend (#21901)
|
2026-04-02 12:23:58 +08:00 |
|
David Cheung
|
ed427e1299
|
Migrate all callers from /get_server_info to /server_info (#21463)
|
2026-04-01 21:17:50 -07:00 |
|
Prozac614
|
24997fe42c
|
[diffusion] CI: add initial nvfp4 ci test for b200 (#21767)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-02 11:31:08 +08:00 |
|
Yuhao Yang
|
2ef12073f4
|
[VLM] Add VLM TP=4 per-commit CI test and improve MMMU eval prompt/parser (#21841)
|
2026-04-01 20:09:47 -07:00 |
|
Hanlin Bi
|
0f6bedf6ed
|
fix pcg torch dynamo recompile in mxfp8 Triton path (#21888)
Co-authored-by: Hanlin Bi <hanlinbi@umich.edu>
|
2026-04-02 01:57:49 +00:00 |
|
Noa Neria
|
8d9145d97e
|
Direct model loading from object storage with Runai Model Streamer (#17948)
Signed-off-by: Noa Neria <noa@run.ai>
|
2026-04-01 18:41:22 -07:00 |
|
huangtingwei
|
6dd2f774de
|
[HiCache & PD]Fixed detailed cache hit breakdown in PD scenarios. (#21764)
|
2026-04-01 17:44:55 -07:00 |
|
shuwenn
|
9cb362f70e
|
[HiCache] fix: Clone host indices to avoid memory leak (#21624)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2026-04-01 17:42:07 -07:00 |
|
Liangsheng Yin
|
d7256eb69a
|
Unify GSM8K eval path to Chat API for regression CI readiness (#21667)
|
2026-04-01 17:12:19 -07:00 |
|
ishandhanani
|
1081a25983
|
revert: remove TTL-based hard pin from HiRadixCache (#21884)
|
2026-04-01 16:51:15 -07:00 |
|
Alison Shao
|
1ac74e652e
|
[Misc] Fix comparator e2e tests: add polars dep + fix dp-attention test (#21804)
Co-authored-by: Alison Shao <alison.shao@mac.lan>
|
2026-04-01 15:44:35 -07:00 |
|
YAMY
|
821a8a99fb
|
[Disagg] GPU staging buffer with dynamic ring allocator for heterogeneous TP KV transfer (#19890)
|
2026-04-01 14:09:18 -07:00 |
|
Baizhou Zhang
|
5e12c4e08e
|
[DSA] Support trtllm sparse mla kernel for prefill batches (#21783)
|
2026-04-01 13:55:05 -07:00 |
|
Trevor Morris
|
8950d129bd
|
[refactor] Clean up duplicate flashinfer trtllm moe code (#21233)
|
2026-04-01 13:52:22 -07:00 |
|
Liangsheng Yin
|
0138708576
|
[Misc] Add network timeout to eval dataset downloads (#21873)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-01 13:16:14 -07:00 |
|
Ziang Li
|
a19ef3a615
|
[FlashInver v0.6.7] Integrate flashinfer_trtllm mxfp8 gemm (#21576)
|
2026-04-01 15:55:06 -04:00 |
|
shuwenn
|
a1c725bdc5
|
fix: pre-init tokenizer_manager to avoid AttributeError in shutdown (#21824)
|
2026-04-01 10:54:53 -07:00 |
|
R0CKSTAR
|
ca3286d2d5
|
[diffusion] hardware: support FA3 attention backend on MUSA (attn backend, 14/N) (#18648)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-01 10:49:34 -07:00 |
|
shuwenn
|
6098c51bc2
|
fix(MiMo-V2-Flash): add mimo reasoning parser (#21414)
|
2026-04-02 00:47:27 +08:00 |
|
DarkSharpness
|
20f4193589
|
[Feature] JIT rmsnorm update (with claude) (#21834)
|
2026-04-01 23:40:00 +08:00 |
|
Ratish P
|
4f5b55e379
|
[diffusion][CI]: Add individual component accuracy CI for diffusion models (#18709)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-04-01 21:51:36 +08:00 |
|
Cherry_ming
|
e67b95d66b
|
[NPU]Add a full test pipeline on NPU, resolve issues in the NPU test architecture (#20751)
|
2026-04-01 19:56:31 +08:00 |
|
Yuhao Yang
|
1aabe44b64
|
[VLM] remove AsyncMMDataProcessor wrapper (#21651)
|
2026-04-01 17:39:50 +08:00 |
|
Mick
|
7bba319f1e
|
[diffusion] fix: respect --prompt-path (#21756)
|
2026-04-01 16:47:59 +08:00 |
|
wduan-hai
|
95b881452e
|
Fix in-place mode in pause generation (#21705)
|
2026-04-01 01:36:28 -07:00 |
|
yunkchen
|
eec70286ec
|
[Bugfix] Fix effective_mamba_size over-allocation (#20858)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-04-01 16:17:14 +08:00 |
|
yudian0504
|
7d2b856ce7
|
[Bug][VLM] Fix shared memory race condition in ShmPointerMMData broadcast for multi-GPU VLM serving (#21655)
|
2026-04-01 16:15:14 +08:00 |
|
Zhiqiang Xie
|
9eb75211b1
|
style refinement for hisparse (#21198)
|
2026-04-01 01:03:17 -07:00 |
|
Yuxuan Zhang
|
57341b128f
|
glm_interleave for GLM-V (#21671)
|
2026-04-01 00:21:10 -07:00 |
|
khalilzhk
|
835e19656f
|
Bug fix for llama eagle3 (#21397)
|
2026-04-01 15:01:53 +08:00 |
|
Alex Nails
|
912494f596
|
[CI] Fix lint that was not applied in #21458 (#21818)
|
2026-03-31 23:58:12 -07:00 |
|
Wenyao Gao
|
2861596fc6
|
[Bugfix] Fix PP tied embeddings weight loading for qwen3.5 4B dense model (#21347)
|
2026-04-01 14:51:03 +08:00 |
|
YC Yen-Ching Tseng
|
a188208e9a
|
[AMD] Optimize Qwen3-VL decode - fuse QK-norm + 3D mRoPE + KV cache write (#21458)
Co-authored-by: Bingxu Chen <bingxche@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2026-03-31 23:34:07 -07:00 |
|
sbeurnier
|
71baa025be
|
Fix added tokens config with sensible filter (#17905)
|
2026-03-31 23:32:21 -07:00 |
|
Xinyuan Tong
|
87a2768269
|
VLM: change default mm-attention backend from triton_attn to fa4 (on blackwell) (#21595)
|
2026-04-01 14:29:59 +08:00 |
|
Yuxuan Zhang
|
72d3d8f4cf
|
[Feature Restoration] repetition_penalty is essential for GLM-V models (#21258)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-03-31 23:29:49 -07:00 |
|
Ethan (Yusheng) Su
|
cffc95edf4
|
[3/n] lora moe - Support Qwen3-VL-30B-A3B-Instruct (#21469)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
|
2026-03-31 23:15:16 -07:00 |
|