Commit Graph

6677 Commits

Author SHA1 Message Date
fzyzcjy
5bf3deb4bc Trace execution information in dump comparator (#19682) 2026-03-02 18:46:27 +08:00
fzyzcjy
abdc0ee71f Support directory detection in dump comparator (#19680) 2026-03-02 18:45:35 +08:00
fzyzcjy
6980416149 Support non orthogonal parallel axes and explicit replication annotation in dump comparator (#19679) 2026-03-02 18:44:33 +08:00
fzyzcjy
a70dd11011 Support flattened dims in dump comparator (#19678) 2026-03-02 18:43:01 +08:00
fzyzcjy
15e83eea61 Enhance replication check, matching pattern, logging in dump comparator (#19677) 2026-03-02 18:42:27 +08:00
fzyzcjy
ec44bc82ab Support presets and arbitrary skipping keys in dump comparator (#19676) 2026-03-02 18:41:49 +08:00
Mick
2e15c015c0 [diffusion] feat: Add --model-id for config resolution; deprecate model_detectors (#19607) 2026-03-02 16:39:53 +08:00
kk
15af26d1e8 Add aiter attention support in prefill-attention-backend of gpt-oss (#18282)
Co-authored-by: wunhuang <wunhuang@amd.com>
2026-03-01 23:39:24 -08:00
ishandhanani
f7da379b61 feat: TTL-based prefix pinning with refresh-on-hit for HiRadixCache (#18941)
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-01 23:27:22 -08:00
Leon Gao
07ef5f7be1 Remove sync points in mamba cache + prefill cudagraph plumbing for DP (#19639) 2026-03-02 15:03:42 +08:00
Baidu-AIAK
922aad2faa Cleanup disagg decode prebuilt flow and add cross-stream sync in merge_batch (#19568)
Co-authored-by: vincent <vincent@vincentdeMacBook-Pro.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
2026-03-01 21:52:27 -08:00
Prozac614
57c5c343d7 [diffusion] model: support Hunyuan3D-2 (#18170)
Co-authored-by: yingluosanqian <yingluosanqian@gmail.com>
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-02 12:28:05 +08:00
Yuan Luo
f6ee6dc8c3 [JIT-kernel] Add unit test for nsa indexer fused_store_k_cache (#19389)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-03-02 12:18:11 +08:00
Shangming Cai
0a6678bf3a [PD] Remove unused server args for disaggregation (#19618)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2026-03-02 11:38:50 +08:00
Henry
e5edf222cd [WIP]enable mxfp8 on nvidia sm120 (#19112)
Co-authored-by: Your Name <you@example.com>
2026-03-01 19:06:43 -08:00
SoluMilken
20282f5664 [fix typo] expert_indicies -> expert_indices (#19627)
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
2026-03-01 17:37:34 -08:00
zwang86
f51ddba131 feat: add FA4 SM90 paged KV decode support & update attention docs (#18442)
Co-authored-by: Zeyu Wang <zeyu.wang@yahooinc.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2026-03-02 09:12:19 +08:00
Kangyan-Zhou
98224de29b [Bugfix] Add missing auto_create_handle_loop to communicator methods (#19610)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 15:00:05 -08:00
SoluMilken
0b3ddbcf10 [fix typo] seperated_timestep -> separated_timestep (#19622) 2026-03-01 14:09:51 -08:00
Kangyan-Zhou
dc02e5bea7 [HiCache] Re-land spec v2 + decode KV cache offloading compatibility (#19615)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 13:58:31 -08:00
Ziang Li
0e86977811 [RL] Support per-layer mixed FP8/BF16 serving for FP8 checkpoints (#18742) 2026-03-01 21:59:22 +08:00
Mick
a75840b373 [diffusion] CI: create and refactor UT (#19619) 2026-03-01 19:38:20 +08:00
Brayden Zhong
80a6b32703 [Perf] Optimize NSA backend metadata under MTP (#19536)
Co-authored-by: Baidu-AIAK <Baidu_AIAK@163.com>
Co-authored-by: zengpai <zengpai@baidu.com>
2026-03-01 01:59:26 -08:00
Mick
d098c8dab0 [diffusion] add .claude and update contributing with attitude towards vibe-pr (#19511) 2026-03-01 14:41:55 +08:00
Bingxu Chen
5fa6633485 [AMD] Fix MoRI EP warmup hang by restoring deepep_mode=normal default (#19498) 2026-02-28 22:05:22 -08:00
Kangyan-Zhou
dcf462cfba Revert "[HiCache] Enable spec v2 + decode KV cache offloading compatibility" (#19613) 2026-02-28 21:54:32 -08:00
Kangyan-Zhou
8167346609 [HiCache] Enable spec v2 + decode KV cache offloading compatibility (#19518)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 21:53:52 -08:00
Yi Zhong
894e887e4a [Blackwell] Make mxint4 flashinfer_trtllm moe gemm set by default on blackwell (#18136)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
2026-03-01 05:21:01 +00:00
Jun Liu
38dc372dae [Bugfix] Fix KeyError: 'prompt_tokens' when streaming requests are aborted (#19514) 2026-02-28 20:21:35 -08:00
Zhang Yiyang (SII)
4ec450e97b [diffusion][MOVA] fix: fix task type in MOVA pipeline and shared model placement (#19489) 2026-03-01 12:13:15 +08:00
Liangsheng Yin
5acb45cf32 [Session] Extract SessionController and clean up session logic in Scheduler (#19547) 2026-02-28 19:47:44 -08:00
Alison Shao
a45613f2a6 Revert "[SGL] sync patch: Remove sync points, prefill cudagraph for DP, disable cache reset in mem check (#19190)" (#19581)
Co-authored-by: Alison Shao <alisonshao@mac.lan>
2026-02-28 19:46:47 -08:00
fzyzcjy
e64095c3c7 Support data parallel attention in dump comparator (#19602) 2026-03-01 10:51:21 +08:00
fzyzcjy
ea6ff7b01f Support multi sharding group on the same dimension in dump comparator (#19601) 2026-03-01 10:36:48 +08:00
fzyzcjy
46960e65cf Add skip patterns, tee to file, tensor load warning in dump comparator (#19600) 2026-03-01 10:36:22 +08:00
fzyzcjy
b0b26a7ef1 Support concat mode in token aligner in dump comparator (#19599) 2026-03-01 10:35:50 +08:00
fzyzcjy
e78f1283f7 Support overriding and post-hoc providing metadata in dump comparator (#19598) 2026-03-01 10:35:06 +08:00
fzyzcjy
e41164af1c Enhance replicated tensor checker in dump comparator (#19597) 2026-03-01 10:34:34 +08:00
fzyzcjy
ec08240a6a Support data parallel in dump comparator (#19596) 2026-03-01 10:34:03 +08:00
fzyzcjy
003ad6daaa Support partial tensors waiting for reduction and pipeline parallel in dump comparator (#19595) 2026-03-01 10:33:39 +08:00
fzyzcjy
67810828cf Visualize per-token information in dump comparator (#19594) 2026-03-01 10:32:59 +08:00
fzyzcjy
f5a10e04cd Support arbitrary filtering in dumper (#19593) 2026-03-01 10:31:21 +08:00
Duyi-Wang
8240a87306 [AMD] MORI-EP support for EP4. (#19578) 2026-02-28 13:13:46 -08:00
Haodi Lei
f451664504 [Fix] Add --disable-draft-model-update to control draft model updates(especially in RL) (#15726)
Co-authored-by: leihaodi <haodilei@gmail.com>
2026-02-28 12:09:55 -08:00
Mohammad Miadh Angkad
9c81ce4707 [Anthropic API] Preserve image content in tool_result conversion (#19233)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2026-02-28 12:07:22 -08:00
zhangheng
a0d8a7ae6d [RadixTree][6/N Refactor]: Refactor SWARadixTree to simplify the computation and alignment of bigram keys. (#19427) 2026-02-28 20:01:39 +08:00
fzyzcjy
5705e02d28 Support singleton dimension squeezing in dump comparator (#19566) 2026-02-28 18:11:46 +08:00
fzyzcjy
80bbd30909 Visualize comparison detailed results in dump comparator (#19565) 2026-02-28 18:08:16 +08:00
fzyzcjy
40facdb28c Handle recompute and verify closeness in dumper (#19564) 2026-02-28 18:07:44 +08:00
fzyzcjy
63a4778542 Support non-intrusive arbitrary dumping in dumper and add e2e tests (#19563) 2026-02-28 18:06:55 +08:00