fzyzcjy
|
5bf3deb4bc
|
Trace execution information in dump comparator (#19682)
|
2026-03-02 18:46:27 +08:00 |
|
fzyzcjy
|
abdc0ee71f
|
Support directory detection in dump comparator (#19680)
|
2026-03-02 18:45:35 +08:00 |
|
fzyzcjy
|
6980416149
|
Support non orthogonal parallel axes and explicit replication annotation in dump comparator (#19679)
|
2026-03-02 18:44:33 +08:00 |
|
fzyzcjy
|
a70dd11011
|
Support flattened dims in dump comparator (#19678)
|
2026-03-02 18:43:01 +08:00 |
|
fzyzcjy
|
15e83eea61
|
Enhance replication check, matching pattern, logging in dump comparator (#19677)
|
2026-03-02 18:42:27 +08:00 |
|
fzyzcjy
|
ec44bc82ab
|
Support presets and arbitrary skipping keys in dump comparator (#19676)
|
2026-03-02 18:41:49 +08:00 |
|
Mick
|
2e15c015c0
|
[diffusion] feat: Add --model-id for config resolution; deprecate model_detectors (#19607)
|
2026-03-02 16:39:53 +08:00 |
|
kk
|
15af26d1e8
|
Add aiter attention support in prefill-attention-backend of gpt-oss (#18282)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2026-03-01 23:39:24 -08:00 |
|
ishandhanani
|
f7da379b61
|
feat: TTL-based prefix pinning with refresh-on-hit for HiRadixCache (#18941)
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-03-01 23:27:22 -08:00 |
|
Leon Gao
|
07ef5f7be1
|
Remove sync points in mamba cache + prefill cudagraph plumbing for DP (#19639)
|
2026-03-02 15:03:42 +08:00 |
|
Baidu-AIAK
|
922aad2faa
|
Cleanup disagg decode prebuilt flow and add cross-stream sync in merge_batch (#19568)
Co-authored-by: vincent <vincent@vincentdeMacBook-Pro.local>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-03-01 21:52:27 -08:00 |
|
Prozac614
|
57c5c343d7
|
[diffusion] model: support Hunyuan3D-2 (#18170)
Co-authored-by: yingluosanqian <yingluosanqian@gmail.com>
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-02 12:28:05 +08:00 |
|
Yuan Luo
|
f6ee6dc8c3
|
[JIT-kernel] Add unit test for nsa indexer fused_store_k_cache (#19389)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-02 12:18:11 +08:00 |
|
Shangming Cai
|
0a6678bf3a
|
[PD] Remove unused server args for disaggregation (#19618)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-03-02 11:38:50 +08:00 |
|
Henry
|
e5edf222cd
|
[WIP]enable mxfp8 on nvidia sm120 (#19112)
Co-authored-by: Your Name <you@example.com>
|
2026-03-01 19:06:43 -08:00 |
|
SoluMilken
|
20282f5664
|
[fix typo] expert_indicies -> expert_indices (#19627)
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
|
2026-03-01 17:37:34 -08:00 |
|
zwang86
|
f51ddba131
|
feat: add FA4 SM90 paged KV decode support & update attention docs (#18442)
Co-authored-by: Zeyu Wang <zeyu.wang@yahooinc.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2026-03-02 09:12:19 +08:00 |
|
Kangyan-Zhou
|
98224de29b
|
[Bugfix] Add missing auto_create_handle_loop to communicator methods (#19610)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-01 15:00:05 -08:00 |
|
SoluMilken
|
0b3ddbcf10
|
[fix typo] seperated_timestep -> separated_timestep (#19622)
|
2026-03-01 14:09:51 -08:00 |
|
Kangyan-Zhou
|
dc02e5bea7
|
[HiCache] Re-land spec v2 + decode KV cache offloading compatibility (#19615)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-01 13:58:31 -08:00 |
|
Ziang Li
|
0e86977811
|
[RL] Support per-layer mixed FP8/BF16 serving for FP8 checkpoints (#18742)
|
2026-03-01 21:59:22 +08:00 |
|
Mick
|
a75840b373
|
[diffusion] CI: create and refactor UT (#19619)
|
2026-03-01 19:38:20 +08:00 |
|
Brayden Zhong
|
80a6b32703
|
[Perf] Optimize NSA backend metadata under MTP (#19536)
Co-authored-by: Baidu-AIAK <Baidu_AIAK@163.com>
Co-authored-by: zengpai <zengpai@baidu.com>
|
2026-03-01 01:59:26 -08:00 |
|
Mick
|
d098c8dab0
|
[diffusion] add .claude and update contributing with attitude towards vibe-pr (#19511)
|
2026-03-01 14:41:55 +08:00 |
|
Bingxu Chen
|
5fa6633485
|
[AMD] Fix MoRI EP warmup hang by restoring deepep_mode=normal default (#19498)
|
2026-02-28 22:05:22 -08:00 |
|
Kangyan-Zhou
|
dcf462cfba
|
Revert "[HiCache] Enable spec v2 + decode KV cache offloading compatibility" (#19613)
|
2026-02-28 21:54:32 -08:00 |
|
Kangyan-Zhou
|
8167346609
|
[HiCache] Enable spec v2 + decode KV cache offloading compatibility (#19518)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-28 21:53:52 -08:00 |
|
Yi Zhong
|
894e887e4a
|
[Blackwell] Make mxint4 flashinfer_trtllm moe gemm set by default on blackwell (#18136)
Signed-off-by: vincentzed <207368749+vincentzed@users.noreply.github.com>
|
2026-03-01 05:21:01 +00:00 |
|
Jun Liu
|
38dc372dae
|
[Bugfix] Fix KeyError: 'prompt_tokens' when streaming requests are aborted (#19514)
|
2026-02-28 20:21:35 -08:00 |
|
Zhang Yiyang (SII)
|
4ec450e97b
|
[diffusion][MOVA] fix: fix task type in MOVA pipeline and shared model placement (#19489)
|
2026-03-01 12:13:15 +08:00 |
|
Liangsheng Yin
|
5acb45cf32
|
[Session] Extract SessionController and clean up session logic in Scheduler (#19547)
|
2026-02-28 19:47:44 -08:00 |
|
Alison Shao
|
a45613f2a6
|
Revert "[SGL] sync patch: Remove sync points, prefill cudagraph for DP, disable cache reset in mem check (#19190)" (#19581)
Co-authored-by: Alison Shao <alisonshao@mac.lan>
|
2026-02-28 19:46:47 -08:00 |
|
fzyzcjy
|
e64095c3c7
|
Support data parallel attention in dump comparator (#19602)
|
2026-03-01 10:51:21 +08:00 |
|
fzyzcjy
|
ea6ff7b01f
|
Support multi sharding group on the same dimension in dump comparator (#19601)
|
2026-03-01 10:36:48 +08:00 |
|
fzyzcjy
|
46960e65cf
|
Add skip patterns, tee to file, tensor load warning in dump comparator (#19600)
|
2026-03-01 10:36:22 +08:00 |
|
fzyzcjy
|
b0b26a7ef1
|
Support concat mode in token aligner in dump comparator (#19599)
|
2026-03-01 10:35:50 +08:00 |
|
fzyzcjy
|
e78f1283f7
|
Support overriding and post-hoc providing metadata in dump comparator (#19598)
|
2026-03-01 10:35:06 +08:00 |
|
fzyzcjy
|
e41164af1c
|
Enhance replicated tensor checker in dump comparator (#19597)
|
2026-03-01 10:34:34 +08:00 |
|
fzyzcjy
|
ec08240a6a
|
Support data parallel in dump comparator (#19596)
|
2026-03-01 10:34:03 +08:00 |
|
fzyzcjy
|
003ad6daaa
|
Support partial tensors waiting for reduction and pipeline parallel in dump comparator (#19595)
|
2026-03-01 10:33:39 +08:00 |
|
fzyzcjy
|
67810828cf
|
Visualize per-token information in dump comparator (#19594)
|
2026-03-01 10:32:59 +08:00 |
|
fzyzcjy
|
f5a10e04cd
|
Support arbitrary filtering in dumper (#19593)
|
2026-03-01 10:31:21 +08:00 |
|
Duyi-Wang
|
8240a87306
|
[AMD] MORI-EP support for EP4. (#19578)
|
2026-02-28 13:13:46 -08:00 |
|
Haodi Lei
|
f451664504
|
[Fix] Add --disable-draft-model-update to control draft model updates(especially in RL) (#15726)
Co-authored-by: leihaodi <haodilei@gmail.com>
|
2026-02-28 12:09:55 -08:00 |
|
Mohammad Miadh Angkad
|
9c81ce4707
|
[Anthropic API] Preserve image content in tool_result conversion (#19233)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-02-28 12:07:22 -08:00 |
|
zhangheng
|
a0d8a7ae6d
|
[RadixTree][6/N Refactor]: Refactor SWARadixTree to simplify the computation and alignment of bigram keys. (#19427)
|
2026-02-28 20:01:39 +08:00 |
|
fzyzcjy
|
5705e02d28
|
Support singleton dimension squeezing in dump comparator (#19566)
|
2026-02-28 18:11:46 +08:00 |
|
fzyzcjy
|
80bbd30909
|
Visualize comparison detailed results in dump comparator (#19565)
|
2026-02-28 18:08:16 +08:00 |
|
fzyzcjy
|
40facdb28c
|
Handle recompute and verify closeness in dumper (#19564)
|
2026-02-28 18:07:44 +08:00 |
|
fzyzcjy
|
63a4778542
|
Support non-intrusive arbitrary dumping in dumper and add e2e tests (#19563)
|
2026-02-28 18:06:55 +08:00 |
|