Commit Graph

7855 Commits

Author SHA1 Message Date
JiaruiChang5268
5a7c1b8ec6 [NPU] replace swiglu with custom kernel 2026-03-10 21:08:37 +08:00
Hexq0210
9884957c07 [NPU] Bugfix for qwen35 on NPU (#19756) 2026-03-10 20:03:26 +08:00
heziiop
6ed996bf65 [bugfix] disable share input buffer feature on npu due to accuracy issue (#19507) 2026-03-10 19:26:46 +08:00
Xiaoyu Zhang
51d9d34977 [2/n jit_kernel restruct] unify rotary embedding entrypoints under rope.py (#20247) 2026-03-10 17:49:57 +08:00
Thomas Wang
6407891b4f [AMD] Fp8 prefill integration with radix cache path for dpsk models (#20187) 2026-03-10 02:49:47 -07:00
Liangsheng Yin
ac07a6d439 Revert "[Scheduler] Decouple maybe_send_health_check_signal from process_batch_result" (#20259) 2026-03-10 01:58:48 -07:00
Lancer
8cd1de3354 [diffusion] fix: map each prompt to corresponding image in multi-prompt scenario (#20081)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-10 16:58:21 +08:00
Lancer
2c2003158f [diffusion] fix: fix flux2 lora (#20200)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
2026-03-10 16:57:01 +08:00
Xiaoyu Zhang
8517da5d08 [3/n jit_kernel restruct] Clean up benchmark naming and benchmarking helpers (#20250) 2026-03-10 16:39:03 +08:00
Xiaoyu Zhang
c812504b92 [1/n jit_kernel restruct] unify cache usage and clean up naming in ngram_embedding (#20244) 2026-03-10 15:53:43 +08:00
Johnsonms
7cf0551014 Migrate norm kernels to FlashInfer JIT implementation (#18871) 2026-03-10 14:56:07 +08:00
Junrong Lin
69158e9d9f [Bugfix] Skip _mamba_verify_update for idle batch (#20167) 2026-03-10 14:53:01 +08:00
liupeng374
9b2e5526fb [NPU][Bug fix] context parallel bug fix (#19820) 2026-03-10 14:43:49 +08:00
khalilzhk
5f717913a0 support Kimi-K2.5-w4a8 on ascend 2026-03-10 14:43:27 +08:00
Jacob0226
dadd4dde83 [AMD] Skip the flaky test for lora ci test. (#20175)
Co-authored-by: YC Tseng <yctseng@amd.com>
2026-03-09 23:15:30 -07:00
shuwenn
5a11ae19c1 [CI] fix: notebook ci often OOM (#20199) 2026-03-09 22:32:41 -07:00
Liangsheng Yin
208f1428e9 [Scheduler] Decouple maybe_send_health_check_signal from process_batch_result (#20227)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-09 21:18:56 -07:00
sfiisf
08d37f6955 [diffusion] fix: add VAE tiling/slicing argument handling for diffusers backend (#17825)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-03-10 11:38:20 +08:00
huangtingwei
0a43229be1 Fix mooncake store write/read bandwidth logs (#18294) 2026-03-09 19:54:37 -07:00
Mook
9610944ae6 [Feature] Add SANA diffusion model (#19234) 2026-03-10 10:09:21 +08:00
huangtingwei
254d3cee0b [HiCache] Supports Indexer layout for NSATokenToKVPoolHost (#19912)
Co-authored-by: hzh0425 <hzh0425@apache.org>
2026-03-09 18:24:06 -07:00
Mohammad Miadh Angkad
ca997b7ba9 Add min_p and chat-template kwargs support to run_eval (#19571) 2026-03-09 14:53:09 -07:00
Baizhou Zhang
be63f982b7 [V32/GLM5] Control the threshold of applying dense attention with an environ (#20062) 2026-03-09 14:36:10 -07:00
Martin Vit
d39ed074cf fix: default FP4 GEMM backend to flashinfer_cudnn on SM120 (Blackwell) (#20047)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2026-03-09 14:13:08 -07:00
Baizhou Zhang
61d530e8ac [CI] Fix lint (#20209) 2026-03-09 14:09:59 -07:00
ybyang
3e8abc71ca [Disagg] Skip health check enqueue when PD disagg queues have backlog (#20191) 2026-03-09 12:58:10 -07:00
AMD-yanfeiwang
f0153ad225 [AMD][Feature] support fp4 dispatch and fp8 combine in moriep (#19757)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
2026-03-09 12:52:05 -07:00
Liangsheng Yin
ffb4b6f4c1 [Core] Replace server_args mutation hack with explicit MemoryPoolConfig for draft worker init (#20183) 2026-03-09 11:45:54 -07:00
Yuhao Yang
ecca8c553d [diffusion] fix: fix diffusers backend issues in diffusion ci gt workflow (#20173) 2026-03-10 00:51:48 +08:00
Ke Bao
2e444bdced Move stop words to args in send one (#20193) 2026-03-09 23:05:32 +08:00
sjqgogogogo
eb4ba1bde2 Feature/support longcat flash lite (#17838)
Co-authored-by: sunjiaqi11 <sunjiaqi11@meituan.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
2026-03-09 23:00:11 +08:00
wenxuewuhd
11b76d24dc [NPU] [DLLM]DLLM LLaDA2.x graph mode support with NPU speedup modifications (#18485)
Co-authored-by: Zhang-Xiaoxue <xiaoxuezhang17@outlook.com>
Co-authored-by: dawncc <dawn.cc022@gmail.com>
Co-authored-by: lixinqi7 <li_xinqi7@163.com>
Co-authored-by: rangejay <rangejay1st@163.com>
2026-03-09 22:41:05 +08:00
Xinyuan Tong
d116a8cd94 [Bugfix] Fix load_audio: mono before resample + use torchaudio (#20054) 2026-03-09 19:24:20 +08:00
Xinyuan Tong
4a757990a1 [VLM] Replace decord with torchcodec for video decoding (#20055)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
2026-03-09 19:23:49 +08:00
Yuzhen Zhou
b719219de9 [ROCm] Use unreg path for aiter custom all-reduce during CUDA graph capture (#20155) 2026-03-09 01:09:04 -07:00
luoyuyan
cabe171b6c Fix qwen3.5 mtp eplb related issues (#19767) 2026-03-09 16:05:32 +08:00
roikoren755
c76251f70c Return intermediate Mamba states (#19716) 2026-03-09 16:04:36 +08:00
Mike Qiu
96724f490c Add auto bind numa node (#15678)
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
2026-03-08 23:46:09 -07:00
Liangsheng Yin
2ef00383ab [Core] Refactor init_memory_pool into composable resolution helpers (#20142) 2026-03-08 21:46:27 -07:00
siyu
c6184b7dc0 Fix EPD OOM by offloading precomputed_embeddings during chunked prefill (#16503) 2026-03-08 20:10:40 -07:00
Yuhao Yang
1cb86f5171 [diffusion] CI: fix CI script path and missing server arg in perf baseline generator (#20138) 2026-03-09 10:35:21 +08:00
cen121212
fc543df289 [NPU] qwen3_vl encoder support graph 2026-03-09 10:13:35 +08:00
Kaixi
8c5ca37aef Batch copy_ with torch._foreach_copy_ (#18558) 2026-03-08 19:09:02 -07:00
Yuhao Yang
57f28fda90 [diffusion] chore: add diffusion new model skill (#19605) 2026-03-09 09:45:23 +08:00
Simo Lin
3f3eb206fa feat(grpc): add SubscribeKvEvents RPC for KV cache event streaming (#20112)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2026-03-08 16:00:29 -07:00
Liangsheng Yin
7105bf3782 [Bug] Fix missing TTFT histogram for single-batch requests (#20122)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-08 15:18:51 -07:00
Junhao Liu
7662b8b919 [diffusion] feat: implement upscaling (#19723) 2026-03-09 02:06:40 +08:00
xingsy97
b77dd41db0 [diffusion] fix: fix temporary resolution workaround (#20046) 2026-03-09 02:05:35 +08:00
hzh0425
0ac6c63ae4 [SpecV2-Mamba]: Refactor additional_ratio calculation when init mamba pool (#19660) 2026-03-09 00:39:26 +08:00
Ke Bao
07359efce9 Fix missing clone in hicache (#20130) 2026-03-08 23:21:18 +08:00