Commit Graph

6915 Commits

Author SHA1 Message Date
Michael
dc4380e33a [AMD] [DeepSeek-OCR-2 Day 0] Enable DeepSeek-OCR-2 on AMD GPUs and add nightly test (#19732) 2026-03-10 17:04:35 -07:00
Qiaolin Yu
09a118fafe Support return_logprob for spec v2 (overlap safe) (#19801)
Co-authored-by: Ratish1 <ratish1501@gmail.com>
Co-authored-by: Ratish1 <formula733@gmail.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
2026-03-10 15:38:27 -07:00
Ziang Li
76ee4bb98c [FlashInfer v0.6.4] [RL] Integrate FlashInfer mxfp8 gemm, MoE, and routed MoE (#19537) 2026-03-10 15:37:57 -07:00
Qiaolin Yu
bd460e9565 add logprob related params in bench_serving (#20218) 2026-03-10 15:04:57 -07:00
R0CKSTAR
db97f193b7 [diffusion][llm] macOS support (#19549)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-10 13:11:07 -07:00
Qiaolin Yu
a3d88a247b Enable piecewise-cuda-graph when logprob_start_len = -1 (#19453) 2026-03-10 12:50:57 -07:00
fxmarty-amd
031d0a2aad [Qwen-MOE] Fix memory duplication issues in case layers weights are re-assigned during weight loading (#18255) 2026-03-10 17:34:56 +00:00
Xinyuan Tong
11d9c36c2f Replace soundfile+torchaudio with torchcodec AudioDecoder in load_audio (#20190) 2026-03-10 17:26:29 +00:00
Mick
e1f0b3181a [diffusion] fix: adjust convert_hf_to_fp8 to be compatible with more dits (#20281) 2026-03-11 01:21:54 +08:00
Xiaoyu Zhang
60cc06297e [4/n jit_kernel restruct] speed up CI tests and add benchmark workflow (#20268) 2026-03-10 21:37:41 +08:00
JiaruiChang5268
5a7c1b8ec6 [NPU] replace swiglu with custom kernel 2026-03-10 21:08:37 +08:00
Hexq0210
9884957c07 [NPU] Bugfix for qwen35 on NPU (#19756) 2026-03-10 20:03:26 +08:00
heziiop
6ed996bf65 [bugfix] disable share input buffer feature on npu due to accuracy issue (#19507) 2026-03-10 19:26:46 +08:00
Xiaoyu Zhang
51d9d34977 [2/n jit_kernel restruct] unify rotary embedding entrypoints under rope.py (#20247) 2026-03-10 17:49:57 +08:00
Thomas Wang
6407891b4f [AMD] Fp8 prefill integration with radix cache path for dpsk models (#20187) 2026-03-10 02:49:47 -07:00
Liangsheng Yin
ac07a6d439 Revert "[Scheduler] Decouple maybe_send_health_check_signal from process_batch_result" (#20259) 2026-03-10 01:58:48 -07:00
Lancer
8cd1de3354 [diffusion] fix: map each prompt to corresponding image in multi-prompt scenario (#20081)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-10 16:58:21 +08:00
Lancer
2c2003158f [diffusion] fix: fix flux2 lora (#20200)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
2026-03-10 16:57:01 +08:00
Xiaoyu Zhang
8517da5d08 [3/n jit_kernel restruct] Clean up benchmark naming and benchmarking helpers (#20250) 2026-03-10 16:39:03 +08:00
Xiaoyu Zhang
c812504b92 [1/n jit_kernel restruct] unify cache usage and clean up naming in ngram_embedding (#20244) 2026-03-10 15:53:43 +08:00
Johnsonms
7cf0551014 Migrate norm kernels to FlashInfer JIT implementation (#18871) 2026-03-10 14:56:07 +08:00
Junrong Lin
69158e9d9f [Bugfix] Skip _mamba_verify_update for idle batch (#20167) 2026-03-10 14:53:01 +08:00
liupeng374
9b2e5526fb [NPU][Bug fix] context parallel bug fix (#19820) 2026-03-10 14:43:49 +08:00
khalilzhk
5f717913a0 support Kimi-K2.5-w4a8 on ascend 2026-03-10 14:43:27 +08:00
Jacob0226
dadd4dde83 [AMD] Skip the flaky test for lora ci test. (#20175)
Co-authored-by: YC Tseng <yctseng@amd.com>
2026-03-09 23:15:30 -07:00
shuwenn
5a11ae19c1 [CI] fix: notebook ci often OOM (#20199) 2026-03-09 22:32:41 -07:00
Liangsheng Yin
208f1428e9 [Scheduler] Decouple maybe_send_health_check_signal from process_batch_result (#20227)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-09 21:18:56 -07:00
sfiisf
08d37f6955 [diffusion] fix: add VAE tiling/slicing argument handling for diffusers backend (#17825)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-10 11:38:20 +08:00
huangtingwei
0a43229be1 Fix mooncake store write/read bandwidth logs (#18294) 2026-03-09 19:54:37 -07:00
Mook
9610944ae6 [Feature] Add SANA diffusion model (#19234) 2026-03-10 10:09:21 +08:00
huangtingwei
254d3cee0b [HiCache] Supports Indexer layout for NSATokenToKVPoolHost (#19912)
Co-authored-by: hzh0425 <hzh0425@apache.org>
2026-03-09 18:24:06 -07:00
Mohammad Miadh Angkad
ca997b7ba9 Add min_p and chat-template kwargs support to run_eval (#19571) 2026-03-09 14:53:09 -07:00
Baizhou Zhang
be63f982b7 [V32/GLM5] Control the threshold of applying dense attention with an environ (#20062) 2026-03-09 14:36:10 -07:00
Martin Vit
d39ed074cf fix: default FP4 GEMM backend to flashinfer_cudnn on SM120 (Blackwell) (#20047)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2026-03-09 14:13:08 -07:00
Baizhou Zhang
61d530e8ac [CI] Fix lint (#20209) 2026-03-09 14:09:59 -07:00
ybyang
3e8abc71ca [Disagg] Skip health check enqueue when PD disagg queues have backlog (#20191) 2026-03-09 12:58:10 -07:00
AMD-yanfeiwang
f0153ad225 [AMD][Feature] support fp4 dispatch and fp8 combine in moriep (#19757)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
2026-03-09 12:52:05 -07:00
Liangsheng Yin
ffb4b6f4c1 [Core] Replace server_args mutation hack with explicit MemoryPoolConfig for draft worker init (#20183) 2026-03-09 11:45:54 -07:00
Yuhao Yang
ecca8c553d [diffusion] fix: fix diffusers backend issues in diffusion ci gt workflow (#20173) 2026-03-10 00:51:48 +08:00
Ke Bao
2e444bdced Move stop words to args in send one (#20193) 2026-03-09 23:05:32 +08:00
sjqgogogogo
eb4ba1bde2 Feature/support longcat flash lite (#17838)
Co-authored-by: sunjiaqi11 <sunjiaqi11@meituan.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
2026-03-09 23:00:11 +08:00
wenxuewuhd
11b76d24dc [NPU] [DLLM]DLLM LLaDA2.x graph mode support with NPU speedup modifications (#18485)
Co-authored-by: Zhang-Xiaoxue <xiaoxuezhang17@outlook.com>
Co-authored-by: dawncc <dawn.cc022@gmail.com>
Co-authored-by: lixinqi7 <li_xinqi7@163.com>
Co-authored-by: rangejay <rangejay1st@163.com>
2026-03-09 22:41:05 +08:00
Xinyuan Tong
d116a8cd94 [Bugfix] Fix load_audio: mono before resample + use torchaudio (#20054) 2026-03-09 19:24:20 +08:00
Xinyuan Tong
4a757990a1 [VLM] Replace decord with torchcodec for video decoding (#20055)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
2026-03-09 19:23:49 +08:00
Yuzhen Zhou
b719219de9 [ROCm] Use unreg path for aiter custom all-reduce during CUDA graph capture (#20155) 2026-03-09 01:09:04 -07:00
luoyuyan
cabe171b6c Fix qwen3.5 mtp eplb related issues (#19767) 2026-03-09 16:05:32 +08:00
roikoren755
c76251f70c Return intermediate Mamba states (#19716) 2026-03-09 16:04:36 +08:00
Mike Qiu
96724f490c Add auto bind numa node (#15678)
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
2026-03-08 23:46:09 -07:00
Liangsheng Yin
2ef00383ab [Core] Refactor init_memory_pool into composable resolution helpers (#20142) 2026-03-08 21:46:27 -07:00
siyu
c6184b7dc0 Fix EPD OOM by offloading precomputed_embeddings during chunked prefill (#16503) 2026-03-08 20:10:40 -07:00