Divyam Agrawal
|
cfd49e233c
|
Fix formatting for ACM-VIT in README acknowledgements section (#23325)
|
2026-04-20 22:28:45 -07:00 |
|
jsheng_Linkedin
|
6d47dc8f6d
|
[CI][MLA] Enable deterministic inference for MGSM MLA FP8 test (#23303)
|
2026-04-20 22:26:26 -07:00 |
|
Shangming Cai
|
a58c7f381e
|
[PD] Fix clip logic when state indices lens are mismatch (#23323)
|
2026-04-21 13:22:20 +08:00 |
|
Yuhao Yang
|
5595f6e988
|
Fix trtllm mla chunked-prefill zero-length bug (#22291) (#22688)
|
2026-04-20 22:10:13 -07:00 |
|
Mingyi
|
82beaf1748
|
Docs/url redirect (#23312)
|
2026-04-20 21:26:18 -07:00 |
|
Liangsheng Yin
|
6cc2eee50d
|
[misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305)
|
2026-04-20 21:16:24 -07:00 |
|
Alison Shao
|
6b19e8a452
|
ci: reduce scheduled PR test from 4x to 3x daily (#23313)
|
2026-04-20 20:53:13 -07:00 |
|
amote-i
|
301604f953
|
[NPU] [DOC] Quick start doc for Ascend NPU (#23238)
|
2026-04-21 11:19:09 +08:00 |
|
Lewis
|
0d0405273b
|
[Fix] Solve the error lead by _commit_transfer_to_req() when using IntraNode NVLink in PD disaggregation (#23252)
Co-authored-by: 百麒 <yaozhong.lyz@alibaba-inc.com>
|
2026-04-21 11:02:18 +08:00 |
|
Ke Bao
|
50fc2c9e23
|
Fix hybrid swa chunked prefill oom (#23174)
|
2026-04-21 10:46:45 +08:00 |
|
Zhangheng
|
ab3ce02de9
|
[Hybrid-Cache]: Refactor hybrid_pool_assembler.py (#23243)
|
2026-04-21 10:45:23 +08:00 |
|
ishandhanani
|
3c007ee5d4
|
fix(hicache): emit KV events for L2 host cache insertions (#22894)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: Ishan Dhanani <ishandhanani@gmail.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
|
2026-04-20 19:07:03 -07:00 |
|
ChangLiu0709
|
ac08ebed65
|
[AMD] Resolve Qwen3.5 MTP (speculative decoding) radix cache conflict. (#22908)
|
2026-04-20 18:17:11 -07:00 |
|
Liangsheng Yin
|
c7a4ebf3c8
|
[Refactor] Replace page_align_keys helper with RadixKey.page_aligned method (#23107)
|
2026-04-20 18:10:42 -07:00 |
|
Mingyi
|
712b01d875
|
Update CODEOWNERS to include new documentation paths for docs and doc… (#23293)
|
2026-04-20 16:48:41 -07:00 |
|
Tarushii Goel
|
3e367f9bcd
|
[sgl] fix incorrect behavior in cuda graph draft extend (#22832)
|
2026-04-20 16:29:16 -07:00 |
|
Tarushii Goel
|
100b0f86dd
|
[sgl] add support for weight update function in spedec (#22088)
|
2026-04-20 16:26:20 -07:00 |
|
Tarushii Goel
|
28f3a2d8ed
|
[sgl] multilayereagleworkerv2 fix (#22954)
|
2026-04-20 16:22:16 -07:00 |
|
Thomas Wang
|
57ecce9807
|
[AMD] Enable MTP for GLM-5-mxfp4 model (#23219)
|
2026-04-20 16:09:07 -07:00 |
|
Mingyi
|
a3291b5654
|
Add new Mintlify documentation site (docs_new/) (#23001)
Co-authored-by: AdityaVKochar <adityavardhankochar@gmail.com>
Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: adhyan-jain <adhyanjain2006@gmail.com>
Co-authored-by: Adhyan Jain <71976554+adhyan-jain@users.noreply.github.com>
Co-authored-by: Maitri-shah29 <maitrirajivshah@gmail.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: Maitri Shah <shah29maitri@gmail.com>
Co-authored-by: Aditya Vardhan Kochar <80113212+AdityaVKochar@users.noreply.github.com>
Co-authored-by: Rishit Shivam <164783543+pokymono@users.noreply.github.com>
Co-authored-by: Rishitshivam <164783543+Rishitshivam@users.noreply.github.com>
Co-authored-by: IshhanKheria <ishhankheria06@gmail.com>
Co-authored-by: Ishita Joshi <ishitata.joshi@gmail.com>
Co-authored-by: Richard Chen <104477092+Richardczl98@users.noreply.github.com>
Co-authored-by: longGGGGGG <553746008@qq.com>
Co-authored-by: Richard <richardchen@radixark.ai>
Co-authored-by: Nakul Sinha <nakul.new4socials@gmail.com>
Co-authored-by: Divyam Agrawal <ludicrouslytrue@gmail.com>
Co-authored-by: Richardczl98 <Zhenlinc@stanford.edu>
Co-authored-by: Krishang Zinzuwadia <krishangzinzuwadia@gmail.com>
Co-authored-by: nimeshas <nimesha.s106@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jignas Paturu <86356085+JignasP@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
|
2026-04-20 15:10:22 -07:00 |
|
jsheng_Linkedin
|
575fdc2c4c
|
[CI][LoRA] Drop flaky all-None batch from multi-LoRA parity test (#23287)
|
2026-04-20 14:43:25 -07:00 |
|
shuwenn
|
b65799cf83
|
[SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2026-04-20 14:25:04 -07:00 |
|
shuwenn
|
dbcf7459b5
|
fix: reset empty prefill batch fullness (#23138)
|
2026-04-20 14:14:00 -07:00 |
|
mispa-ms
|
d8d9d32b29
|
[docker] Fix stray backslash dropping sgl-model-gateway COPY (#23097)
Signed-off-by: misunp <misunp@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-20 13:44:05 -07:00 |
|
ishandhanani
|
6f6843c582
|
[Docker] Move Rust toolchain install to torch_deps stage (#23278)
|
2026-04-20 13:13:10 -07:00 |
|
Yuhao Yang
|
fe9b9b254b
|
Fix segfault in cudaMemcpyBatchAsync on CUDA 13.0 (#23136)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
|
2026-04-20 12:20:22 -07:00 |
|
Liangsheng Yin
|
8cb957ccff
|
[Perf] Make EAGLE bigram key an O(1) view on RadixKey (#23106)
|
2026-04-20 12:01:11 -07:00 |
|
Shunkangz
|
3dc1491c95
|
Support moe_dp_size = 1 for various attention_cp_size (#22003)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-04-20 11:58:19 -07:00 |
|
ishandhanani
|
90d527195b
|
[CI] Fix nightly docker builds failing on root-owned workspace leftovers (#23279)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-20 11:56:33 -07:00 |
|
Lee Nau
|
b4bb036b73
|
fix legacy deepep path for flashinfer_cutedsl (#22925)
|
2026-04-20 11:49:33 -07:00 |
|
Kangyan-Zhou
|
4698f4cd10
|
[CI] Fix wait-for-jobs hanging when matrix job skipped at job level (#23277)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-20 11:16:10 -07:00 |
|
ishandhanani
|
b5d9a86e4c
|
fix: add back priorty as radix cache policy (#23275)
|
2026-04-20 10:04:35 -07:00 |
|
Alex Nails
|
332ec5e5ee
|
[release] install rust toolchain in main dockerfile (#23014)
|
2026-04-20 09:50:08 -07:00 |
|
Makcum888e
|
39c720d1b9
|
[Diffusion][NPU][CI] update perf numbers (#23056)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-04-20 19:34:11 +03:00 |
|
Mick
|
9a0fd2ff0c
|
[diffusion] optimize: default to in-memory loading for URL/base64 image inputs (#23118)
|
2026-04-20 23:29:02 +08:00 |
|
Mick
|
0be6ab04dd
|
[diffusion] refactor: LTX2.3 code cleanup (#23207)
|
2026-04-20 19:02:05 +08:00 |
|
YC Yen-Ching Tseng
|
da62e90904
|
[AMD] Fix multimodal timeout issue : rocm7.2 PR Test (#23247)
|
2026-04-20 18:36:08 +08:00 |
|
YC Yen-Ching Tseng
|
cf4b84f839
|
[AMD] Update AMD workflow name (#23245)
Co-authored-by: bingxche <bingxche@amd.com>
|
2026-04-20 18:18:24 +08:00 |
|
Vladislav Nosivskoy
|
4028a73c10
|
[KV-Events] Fix kv events events publishing for CP (#22983)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2026-04-20 17:34:38 +08:00 |
|
Alex Nails
|
bea4c895c1
|
[gRPC] Pass --experimental_allow_proto3_optional to protoc in build.rs (#23226)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-20 17:20:14 +08:00 |
|
Bingxu Chen
|
69eb95f20c
|
[AMD] Pin peft<0.19 in pyproject_other.toml to fix ROCm CI ImportError (#23161)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-19 23:43:56 -07:00 |
|
Liangsheng Yin
|
a2d30d27fe
|
wait for reap in kill_process_tree (#23213)
|
2026-04-19 23:36:33 -07:00 |
|
Bingxu Chen
|
ab936ce694
|
Revert "perf: optimize PCG inductor path for FP8 models (#21734)" (#23159)
Feel free to PR again.
|
2026-04-19 23:32:50 -07:00 |
|
Kangyan-Zhou
|
97baf17557
|
Fix test_modelopt_export using stale ModelConfig kwargs (#23214)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-19 23:18:09 -07:00 |
|
Kangyan-Zhou
|
1ebe1c57ed
|
[CI] Partition stage-a-test-cpu into 4 matrix shards (#23208)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-19 22:07:37 -07:00 |
|
Alex Nails
|
10e17cc55e
|
[gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736)
|
2026-04-20 12:39:35 +08:00 |
|
Baizhou Zhang
|
c304d0d64d
|
[Refactor] Deduplicate NSA utils.py into cp_utils.py for context parallel (#22914)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-19 21:35:35 -07:00 |
|
Liangsheng Yin
|
eb76aaba88
|
[core] Always-on StreamingSession in UnifiedRadixCache (#23202)
|
2026-04-19 21:19:43 -07:00 |
|
Shunkangz
|
e389a52cc8
|
Support allreduce fusion with cp (#21249)
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2026-04-19 21:06:00 -07:00 |
|
Liangsheng Yin
|
a7276b623e
|
integrate streaming session into UnifiedRadixCache (#23145)
|
2026-04-19 20:47:41 -07:00 |
|