Commit Graph

11824 Commits

Author SHA1 Message Date
Divyam Agrawal
cfd49e233c Fix formatting for ACM-VIT in README acknowledgements section (#23325) 2026-04-20 22:28:45 -07:00
jsheng_Linkedin
6d47dc8f6d [CI][MLA] Enable deterministic inference for MGSM MLA FP8 test (#23303) 2026-04-20 22:26:26 -07:00
Shangming Cai
a58c7f381e [PD] Fix clip logic when state indices lens are mismatch (#23323) 2026-04-21 13:22:20 +08:00
Yuhao Yang
5595f6e988 Fix trtllm mla chunked-prefill zero-length bug (#22291) (#22688) 2026-04-20 22:10:13 -07:00
Mingyi
82beaf1748 Docs/url redirect (#23312) 2026-04-20 21:26:18 -07:00
Liangsheng Yin
6cc2eee50d [misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305) 2026-04-20 21:16:24 -07:00
Alison Shao
6b19e8a452 ci: reduce scheduled PR test from 4x to 3x daily (#23313) 2026-04-20 20:53:13 -07:00
amote-i
301604f953 [NPU] [DOC] Quick start doc for Ascend NPU (#23238) 2026-04-21 11:19:09 +08:00
Lewis
0d0405273b [Fix] Solve the error lead by _commit_transfer_to_req() when using IntraNode NVLink in PD disaggregation (#23252)
Co-authored-by: 百麒 <yaozhong.lyz@alibaba-inc.com>
2026-04-21 11:02:18 +08:00
Ke Bao
50fc2c9e23 Fix hybrid swa chunked prefill oom (#23174) 2026-04-21 10:46:45 +08:00
Zhangheng
ab3ce02de9 [Hybrid-Cache]: Refactor hybrid_pool_assembler.py (#23243) 2026-04-21 10:45:23 +08:00
ishandhanani
3c007ee5d4 fix(hicache): emit KV events for L2 host cache insertions (#22894)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: Ishan Dhanani <ishandhanani@gmail.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
2026-04-20 19:07:03 -07:00
ChangLiu0709
ac08ebed65 [AMD] Resolve Qwen3.5 MTP (speculative decoding) radix cache conflict. (#22908) 2026-04-20 18:17:11 -07:00
Liangsheng Yin
c7a4ebf3c8 [Refactor] Replace page_align_keys helper with RadixKey.page_aligned method (#23107) 2026-04-20 18:10:42 -07:00
Mingyi
712b01d875 Update CODEOWNERS to include new documentation paths for docs and doc… (#23293) 2026-04-20 16:48:41 -07:00
Tarushii Goel
3e367f9bcd [sgl] fix incorrect behavior in cuda graph draft extend (#22832) 2026-04-20 16:29:16 -07:00
Tarushii Goel
100b0f86dd [sgl] add support for weight update function in spedec (#22088) 2026-04-20 16:26:20 -07:00
Tarushii Goel
28f3a2d8ed [sgl] multilayereagleworkerv2 fix (#22954) 2026-04-20 16:22:16 -07:00
Thomas Wang
57ecce9807 [AMD] Enable MTP for GLM-5-mxfp4 model (#23219) 2026-04-20 16:09:07 -07:00
Mingyi
a3291b5654 Add new Mintlify documentation site (docs_new/) (#23001)
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com>
Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com>
Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com>
Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: Maitri Shah <shah29maitri@gmail.com>
Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com>
Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com>
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: IshhanKheria <ishhankheria06@gmail.com>
Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com>
Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com>
Co-authored-by: longGGGGGG <553746008@qq.com>
Co-authored-by: Richard <richardchen@radixark.ai>
Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com>
Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com>
Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu>
Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com>
Co-authored-by: nimeshas <nimesha.s106@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
2026-04-20 15:10:22 -07:00
jsheng_Linkedin
575fdc2c4c [CI][LoRA] Drop flaky all-None batch from multi-LoRA parity test (#23287) 2026-04-20 14:43:25 -07:00
shuwenn
b65799cf83 [SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2026-04-20 14:25:04 -07:00
shuwenn
dbcf7459b5 fix: reset empty prefill batch fullness (#23138) 2026-04-20 14:14:00 -07:00
mispa-ms
d8d9d32b29 [docker] Fix stray backslash dropping sgl-model-gateway COPY (#23097)
Signed-off-by: misunp <misunp@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:44:05 -07:00
ishandhanani
6f6843c582 [Docker] Move Rust toolchain install to torch_deps stage (#23278) 2026-04-20 13:13:10 -07:00
Yuhao Yang
fe9b9b254b Fix segfault in cudaMemcpyBatchAsync on CUDA 13.0 (#23136)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
2026-04-20 12:20:22 -07:00
Liangsheng Yin
8cb957ccff [Perf] Make EAGLE bigram key an O(1) view on RadixKey (#23106) 2026-04-20 12:01:11 -07:00
Shunkangz
3dc1491c95 Support moe_dp_size = 1 for various attention_cp_size (#22003)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-04-20 11:58:19 -07:00
ishandhanani
90d527195b [CI] Fix nightly docker builds failing on root-owned workspace leftovers (#23279)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:56:33 -07:00
Lee Nau
b4bb036b73 fix legacy deepep path for flashinfer_cutedsl (#22925) 2026-04-20 11:49:33 -07:00
Kangyan-Zhou
4698f4cd10 [CI] Fix wait-for-jobs hanging when matrix job skipped at job level (#23277)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:16:10 -07:00
ishandhanani
b5d9a86e4c fix: add back priorty as radix cache policy (#23275) 2026-04-20 10:04:35 -07:00
Alex Nails
332ec5e5ee [release] install rust toolchain in main dockerfile (#23014) 2026-04-20 09:50:08 -07:00
Makcum888e
39c720d1b9 [Diffusion][NPU][CI] update perf numbers (#23056)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2026-04-20 19:34:11 +03:00
Mick
9a0fd2ff0c [diffusion] optimize: default to in-memory loading for URL/base64 image inputs (#23118) 2026-04-20 23:29:02 +08:00
Mick
0be6ab04dd [diffusion] refactor: LTX2.3 code cleanup (#23207) 2026-04-20 19:02:05 +08:00
YC Yen-Ching Tseng
da62e90904 [AMD] Fix multimodal timeout issue : rocm7.2 PR Test (#23247) 2026-04-20 18:36:08 +08:00
YC Yen-Ching Tseng
cf4b84f839 [AMD] Update AMD workflow name (#23245)
Co-authored-by: bingxche <bingxche@amd.com>
2026-04-20 18:18:24 +08:00
Vladislav Nosivskoy
4028a73c10 [KV-Events] Fix kv events events publishing for CP (#22983)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
2026-04-20 17:34:38 +08:00
Alex Nails
bea4c895c1 [gRPC] Pass --experimental_allow_proto3_optional to protoc in build.rs (#23226)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 17:20:14 +08:00
Bingxu Chen
69eb95f20c [AMD] Pin peft<0.19 in pyproject_other.toml to fix ROCm CI ImportError (#23161)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-19 23:43:56 -07:00
Liangsheng Yin
a2d30d27fe wait for reap in kill_process_tree (#23213) 2026-04-19 23:36:33 -07:00
Bingxu Chen
ab936ce694 Revert "perf: optimize PCG inductor path for FP8 models (#21734)" (#23159)
Feel free to PR again.
2026-04-19 23:32:50 -07:00
Kangyan-Zhou
97baf17557 Fix test_modelopt_export using stale ModelConfig kwargs (#23214)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 23:18:09 -07:00
Kangyan-Zhou
1ebe1c57ed [CI] Partition stage-a-test-cpu into 4 matrix shards (#23208)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 22:07:37 -07:00
Alex Nails
10e17cc55e [gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736) 2026-04-20 12:39:35 +08:00
Baizhou Zhang
c304d0d64d [Refactor] Deduplicate NSA utils.py into cp_utils.py for context parallel (#22914)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 21:35:35 -07:00
Liangsheng Yin
eb76aaba88 [core] Always-on StreamingSession in UnifiedRadixCache (#23202) 2026-04-19 21:19:43 -07:00
Shunkangz
e389a52cc8 Support allreduce fusion with cp (#21249)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2026-04-19 21:06:00 -07:00
Liangsheng Yin
a7276b623e integrate streaming session into UnifiedRadixCache (#23145) 2026-04-19 20:47:41 -07:00