Michael
|
dc4380e33a
|
[AMD] [DeepSeek-OCR-2 Day 0] Enable DeepSeek-OCR-2 on AMD GPUs and add nightly test (#19732)
|
2026-03-10 17:04:35 -07:00 |
|
Qiaolin Yu
|
09a118fafe
|
Support return_logprob for spec v2 (overlap safe) (#19801)
Co-authored-by: Ratish1 <ratish1501@gmail.com>
Co-authored-by: Ratish1 <formula733@gmail.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
|
2026-03-10 15:38:27 -07:00 |
|
Ziang Li
|
76ee4bb98c
|
[FlashInfer v0.6.4] [RL] Integrate FlashInfer mxfp8 gemm, MoE, and routed MoE (#19537)
|
2026-03-10 15:37:57 -07:00 |
|
Qiaolin Yu
|
bd460e9565
|
add logprob related params in bench_serving (#20218)
|
2026-03-10 15:04:57 -07:00 |
|
R0CKSTAR
|
db97f193b7
|
[diffusion][llm] macOS support (#19549)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-10 13:11:07 -07:00 |
|
Qiaolin Yu
|
a3d88a247b
|
Enable piecewise-cuda-graph when logprob_start_len = -1 (#19453)
|
2026-03-10 12:50:57 -07:00 |
|
fxmarty-amd
|
031d0a2aad
|
[Qwen-MOE] Fix memory duplication issues in case layers weights are re-assigned during weight loading (#18255)
|
2026-03-10 17:34:56 +00:00 |
|
Xinyuan Tong
|
11d9c36c2f
|
Replace soundfile+torchaudio with torchcodec AudioDecoder in load_audio (#20190)
|
2026-03-10 17:26:29 +00:00 |
|
Mick
|
e1f0b3181a
|
[diffusion] fix: adjust convert_hf_to_fp8 to be compatible with more dits (#20281)
|
2026-03-11 01:21:54 +08:00 |
|
Xiaoyu Zhang
|
60cc06297e
|
[4/n jit_kernel restruct] speed up CI tests and add benchmark workflow (#20268)
|
2026-03-10 21:37:41 +08:00 |
|
JiaruiChang5268
|
5a7c1b8ec6
|
[NPU] replace swiglu with custom kernel
|
2026-03-10 21:08:37 +08:00 |
|
Hexq0210
|
9884957c07
|
[NPU] Bugfix for qwen35 on NPU (#19756)
|
2026-03-10 20:03:26 +08:00 |
|
heziiop
|
6ed996bf65
|
[bugfix] disable share input buffer feature on npu due to accuracy issue (#19507)
|
2026-03-10 19:26:46 +08:00 |
|
Xiaoyu Zhang
|
51d9d34977
|
[2/n jit_kernel restruct] unify rotary embedding entrypoints under rope.py (#20247)
|
2026-03-10 17:49:57 +08:00 |
|
Thomas Wang
|
6407891b4f
|
[AMD] Fp8 prefill integration with radix cache path for dpsk models (#20187)
|
2026-03-10 02:49:47 -07:00 |
|
Liangsheng Yin
|
ac07a6d439
|
Revert "[Scheduler] Decouple maybe_send_health_check_signal from process_batch_result" (#20259)
|
2026-03-10 01:58:48 -07:00 |
|
Lancer
|
8cd1de3354
|
[diffusion] fix: map each prompt to corresponding image in multi-prompt scenario (#20081)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-10 16:58:21 +08:00 |
|
Lancer
|
2c2003158f
|
[diffusion] fix: fix flux2 lora (#20200)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
|
2026-03-10 16:57:01 +08:00 |
|
Xiaoyu Zhang
|
8517da5d08
|
[3/n jit_kernel restruct] Clean up benchmark naming and benchmarking helpers (#20250)
|
2026-03-10 16:39:03 +08:00 |
|
Xiaoyu Zhang
|
c812504b92
|
[1/n jit_kernel restruct] unify cache usage and clean up naming in ngram_embedding (#20244)
|
2026-03-10 15:53:43 +08:00 |
|
Johnsonms
|
7cf0551014
|
Migrate norm kernels to FlashInfer JIT implementation (#18871)
|
2026-03-10 14:56:07 +08:00 |
|
Junrong Lin
|
69158e9d9f
|
[Bugfix] Skip _mamba_verify_update for idle batch (#20167)
|
2026-03-10 14:53:01 +08:00 |
|
liupeng374
|
9b2e5526fb
|
[NPU][Bug fix] context parallel bug fix (#19820)
|
2026-03-10 14:43:49 +08:00 |
|
khalilzhk
|
5f717913a0
|
support Kimi-K2.5-w4a8 on ascend
|
2026-03-10 14:43:27 +08:00 |
|
Jacob0226
|
dadd4dde83
|
[AMD] Skip the flaky test for lora ci test. (#20175)
Co-authored-by: YC Tseng <yctseng@amd.com>
|
2026-03-09 23:15:30 -07:00 |
|
shuwenn
|
5a11ae19c1
|
[CI] fix: notebook ci often OOM (#20199)
|
2026-03-09 22:32:41 -07:00 |
|
Liangsheng Yin
|
208f1428e9
|
[Scheduler] Decouple maybe_send_health_check_signal from process_batch_result (#20227)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-09 21:18:56 -07:00 |
|
sfiisf
|
08d37f6955
|
[diffusion] fix: add VAE tiling/slicing argument handling for diffusers backend (#17825)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-10 11:38:20 +08:00 |
|
huangtingwei
|
0a43229be1
|
Fix mooncake store write/read bandwidth logs (#18294)
|
2026-03-09 19:54:37 -07:00 |
|
Mook
|
9610944ae6
|
[Feature] Add SANA diffusion model (#19234)
|
2026-03-10 10:09:21 +08:00 |
|
huangtingwei
|
254d3cee0b
|
[HiCache] Supports Indexer layout for NSATokenToKVPoolHost (#19912)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-03-09 18:24:06 -07:00 |
|
Mohammad Miadh Angkad
|
ca997b7ba9
|
Add min_p and chat-template kwargs support to run_eval (#19571)
|
2026-03-09 14:53:09 -07:00 |
|
Baizhou Zhang
|
be63f982b7
|
[V32/GLM5] Control the threshold of applying dense attention with an environ (#20062)
|
2026-03-09 14:36:10 -07:00 |
|
Martin Vit
|
d39ed074cf
|
fix: default FP4 GEMM backend to flashinfer_cudnn on SM120 (Blackwell) (#20047)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2026-03-09 14:13:08 -07:00 |
|
Baizhou Zhang
|
61d530e8ac
|
[CI] Fix lint (#20209)
|
2026-03-09 14:09:59 -07:00 |
|
ybyang
|
3e8abc71ca
|
[Disagg] Skip health check enqueue when PD disagg queues have backlog (#20191)
|
2026-03-09 12:58:10 -07:00 |
|
AMD-yanfeiwang
|
f0153ad225
|
[AMD][Feature] support fp4 dispatch and fp8 combine in moriep (#19757)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
|
2026-03-09 12:52:05 -07:00 |
|
Liangsheng Yin
|
ffb4b6f4c1
|
[Core] Replace server_args mutation hack with explicit MemoryPoolConfig for draft worker init (#20183)
|
2026-03-09 11:45:54 -07:00 |
|
Yuhao Yang
|
ecca8c553d
|
[diffusion] fix: fix diffusers backend issues in diffusion ci gt workflow (#20173)
|
2026-03-10 00:51:48 +08:00 |
|
Ke Bao
|
2e444bdced
|
Move stop words to args in send one (#20193)
|
2026-03-09 23:05:32 +08:00 |
|
sjqgogogogo
|
eb4ba1bde2
|
Feature/support longcat flash lite (#17838)
Co-authored-by: sunjiaqi11 <sunjiaqi11@meituan.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2026-03-09 23:00:11 +08:00 |
|
wenxuewuhd
|
11b76d24dc
|
[NPU] [DLLM]DLLM LLaDA2.x graph mode support with NPU speedup modifications (#18485)
Co-authored-by: Zhang-Xiaoxue <xiaoxuezhang17@outlook.com>
Co-authored-by: dawncc <dawn.cc022@gmail.com>
Co-authored-by: lixinqi7 <li_xinqi7@163.com>
Co-authored-by: rangejay <rangejay1st@163.com>
|
2026-03-09 22:41:05 +08:00 |
|
Xinyuan Tong
|
d116a8cd94
|
[Bugfix] Fix load_audio: mono before resample + use torchaudio (#20054)
|
2026-03-09 19:24:20 +08:00 |
|
Xinyuan Tong
|
4a757990a1
|
[VLM] Replace decord with torchcodec for video decoding (#20055)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
|
2026-03-09 19:23:49 +08:00 |
|
Yuzhen Zhou
|
b719219de9
|
[ROCm] Use unreg path for aiter custom all-reduce during CUDA graph capture (#20155)
|
2026-03-09 01:09:04 -07:00 |
|
luoyuyan
|
cabe171b6c
|
Fix qwen3.5 mtp eplb related issues (#19767)
|
2026-03-09 16:05:32 +08:00 |
|
roikoren755
|
c76251f70c
|
Return intermediate Mamba states (#19716)
|
2026-03-09 16:04:36 +08:00 |
|
Mike Qiu
|
96724f490c
|
Add auto bind numa node (#15678)
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
|
2026-03-08 23:46:09 -07:00 |
|
Liangsheng Yin
|
2ef00383ab
|
[Core] Refactor init_memory_pool into composable resolution helpers (#20142)
|
2026-03-08 21:46:27 -07:00 |
|
siyu
|
c6184b7dc0
|
Fix EPD OOM by offloading precomputed_embeddings during chunked prefill (#16503)
|
2026-03-08 20:10:40 -07:00 |
|