JiaruiChang5268
|
5a7c1b8ec6
|
[NPU] replace swiglu with custom kernel
|
2026-03-10 21:08:37 +08:00 |
|
Hexq0210
|
9884957c07
|
[NPU] Bugfix for qwen35 on NPU (#19756)
|
2026-03-10 20:03:26 +08:00 |
|
heziiop
|
6ed996bf65
|
[bugfix] disable share input buffer feature on npu due to accuracy issue (#19507)
|
2026-03-10 19:26:46 +08:00 |
|
Xiaoyu Zhang
|
51d9d34977
|
[2/n jit_kernel restruct] unify rotary embedding entrypoints under rope.py (#20247)
|
2026-03-10 17:49:57 +08:00 |
|
Thomas Wang
|
6407891b4f
|
[AMD] Fp8 prefill integration with radix cache path for dpsk models (#20187)
|
2026-03-10 02:49:47 -07:00 |
|
Liangsheng Yin
|
ac07a6d439
|
Revert "[Scheduler] Decouple maybe_send_health_check_signal from process_batch_result" (#20259)
|
2026-03-10 01:58:48 -07:00 |
|
Lancer
|
8cd1de3354
|
[diffusion] fix: map each prompt to corresponding image in multi-prompt scenario (#20081)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-10 16:58:21 +08:00 |
|
Lancer
|
2c2003158f
|
[diffusion] fix: fix flux2 lora (#20200)
Signed-off-by: Lancer <maruixiang6688@gmail.com>
|
2026-03-10 16:57:01 +08:00 |
|
Xiaoyu Zhang
|
8517da5d08
|
[3/n jit_kernel restruct] Clean up benchmark naming and benchmarking helpers (#20250)
|
2026-03-10 16:39:03 +08:00 |
|
Xiaoyu Zhang
|
c812504b92
|
[1/n jit_kernel restruct] unify cache usage and clean up naming in ngram_embedding (#20244)
|
2026-03-10 15:53:43 +08:00 |
|
Johnsonms
|
7cf0551014
|
Migrate norm kernels to FlashInfer JIT implementation (#18871)
|
2026-03-10 14:56:07 +08:00 |
|
Junrong Lin
|
69158e9d9f
|
[Bugfix] Skip _mamba_verify_update for idle batch (#20167)
|
2026-03-10 14:53:01 +08:00 |
|
liupeng374
|
9b2e5526fb
|
[NPU][Bug fix] context parallel bug fix (#19820)
|
2026-03-10 14:43:49 +08:00 |
|
khalilzhk
|
5f717913a0
|
support Kimi-K2.5-w4a8 on ascend
|
2026-03-10 14:43:27 +08:00 |
|
Jacob0226
|
dadd4dde83
|
[AMD] Skip the flaky test for lora ci test. (#20175)
Co-authored-by: YC Tseng <yctseng@amd.com>
|
2026-03-09 23:15:30 -07:00 |
|
shuwenn
|
5a11ae19c1
|
[CI] fix: notebook ci often OOM (#20199)
|
2026-03-09 22:32:41 -07:00 |
|
Liangsheng Yin
|
208f1428e9
|
[Scheduler] Decouple maybe_send_health_check_signal from process_batch_result (#20227)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-09 21:18:56 -07:00 |
|
sfiisf
|
08d37f6955
|
[diffusion] fix: add VAE tiling/slicing argument handling for diffusers backend (#17825)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-03-10 11:38:20 +08:00 |
|
huangtingwei
|
0a43229be1
|
Fix mooncake store write/read bandwidth logs (#18294)
|
2026-03-09 19:54:37 -07:00 |
|
Mook
|
9610944ae6
|
[Feature] Add SANA diffusion model (#19234)
|
2026-03-10 10:09:21 +08:00 |
|
huangtingwei
|
254d3cee0b
|
[HiCache] Supports Indexer layout for NSATokenToKVPoolHost (#19912)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-03-09 18:24:06 -07:00 |
|
Mohammad Miadh Angkad
|
ca997b7ba9
|
Add min_p and chat-template kwargs support to run_eval (#19571)
|
2026-03-09 14:53:09 -07:00 |
|
Baizhou Zhang
|
be63f982b7
|
[V32/GLM5] Control the threshold of applying dense attention with an environ (#20062)
|
2026-03-09 14:36:10 -07:00 |
|
Martin Vit
|
d39ed074cf
|
fix: default FP4 GEMM backend to flashinfer_cudnn on SM120 (Blackwell) (#20047)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2026-03-09 14:13:08 -07:00 |
|
Baizhou Zhang
|
61d530e8ac
|
[CI] Fix lint (#20209)
|
2026-03-09 14:09:59 -07:00 |
|
ybyang
|
3e8abc71ca
|
[Disagg] Skip health check enqueue when PD disagg queues have backlog (#20191)
|
2026-03-09 12:58:10 -07:00 |
|
AMD-yanfeiwang
|
f0153ad225
|
[AMD][Feature] support fp4 dispatch and fp8 combine in moriep (#19757)
Co-authored-by: Duyi-Wang <duyi.wang@amd.com>
|
2026-03-09 12:52:05 -07:00 |
|
Liangsheng Yin
|
ffb4b6f4c1
|
[Core] Replace server_args mutation hack with explicit MemoryPoolConfig for draft worker init (#20183)
|
2026-03-09 11:45:54 -07:00 |
|
Yuhao Yang
|
ecca8c553d
|
[diffusion] fix: fix diffusers backend issues in diffusion ci gt workflow (#20173)
|
2026-03-10 00:51:48 +08:00 |
|
Ke Bao
|
2e444bdced
|
Move stop words to args in send one (#20193)
|
2026-03-09 23:05:32 +08:00 |
|
sjqgogogogo
|
eb4ba1bde2
|
Feature/support longcat flash lite (#17838)
Co-authored-by: sunjiaqi11 <sunjiaqi11@meituan.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2026-03-09 23:00:11 +08:00 |
|
wenxuewuhd
|
11b76d24dc
|
[NPU] [DLLM]DLLM LLaDA2.x graph mode support with NPU speedup modifications (#18485)
Co-authored-by: Zhang-Xiaoxue <xiaoxuezhang17@outlook.com>
Co-authored-by: dawncc <dawn.cc022@gmail.com>
Co-authored-by: lixinqi7 <li_xinqi7@163.com>
Co-authored-by: rangejay <rangejay1st@163.com>
|
2026-03-09 22:41:05 +08:00 |
|
Xinyuan Tong
|
d116a8cd94
|
[Bugfix] Fix load_audio: mono before resample + use torchaudio (#20054)
|
2026-03-09 19:24:20 +08:00 |
|
Xinyuan Tong
|
4a757990a1
|
[VLM] Replace decord with torchcodec for video decoding (#20055)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
|
2026-03-09 19:23:49 +08:00 |
|
Yuzhen Zhou
|
b719219de9
|
[ROCm] Use unreg path for aiter custom all-reduce during CUDA graph capture (#20155)
|
2026-03-09 01:09:04 -07:00 |
|
luoyuyan
|
cabe171b6c
|
Fix qwen3.5 mtp eplb related issues (#19767)
|
2026-03-09 16:05:32 +08:00 |
|
roikoren755
|
c76251f70c
|
Return intermediate Mamba states (#19716)
|
2026-03-09 16:04:36 +08:00 |
|
Mike Qiu
|
96724f490c
|
Add auto bind numa node (#15678)
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
|
2026-03-08 23:46:09 -07:00 |
|
Liangsheng Yin
|
2ef00383ab
|
[Core] Refactor init_memory_pool into composable resolution helpers (#20142)
|
2026-03-08 21:46:27 -07:00 |
|
siyu
|
c6184b7dc0
|
Fix EPD OOM by offloading precomputed_embeddings during chunked prefill (#16503)
|
2026-03-08 20:10:40 -07:00 |
|
Yuhao Yang
|
1cb86f5171
|
[diffusion] CI: fix CI script path and missing server arg in perf baseline generator (#20138)
|
2026-03-09 10:35:21 +08:00 |
|
cen121212
|
fc543df289
|
[NPU] qwen3_vl encoder support graph
|
2026-03-09 10:13:35 +08:00 |
|
Kaixi
|
8c5ca37aef
|
Batch copy_ with torch._foreach_copy_ (#18558)
|
2026-03-08 19:09:02 -07:00 |
|
Yuhao Yang
|
57f28fda90
|
[diffusion] chore: add diffusion new model skill (#19605)
|
2026-03-09 09:45:23 +08:00 |
|
Simo Lin
|
3f3eb206fa
|
feat(grpc): add SubscribeKvEvents RPC for KV cache event streaming (#20112)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2026-03-08 16:00:29 -07:00 |
|
Liangsheng Yin
|
7105bf3782
|
[Bug] Fix missing TTFT histogram for single-batch requests (#20122)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-08 15:18:51 -07:00 |
|
Junhao Liu
|
7662b8b919
|
[diffusion] feat: implement upscaling (#19723)
|
2026-03-09 02:06:40 +08:00 |
|
xingsy97
|
b77dd41db0
|
[diffusion] fix: fix temporary resolution workaround (#20046)
|
2026-03-09 02:05:35 +08:00 |
|
hzh0425
|
0ac6c63ae4
|
[SpecV2-Mamba]: Refactor additional_ratio calculation when init mamba pool (#19660)
|
2026-03-09 00:39:26 +08:00 |
|
Ke Bao
|
07359efce9
|
Fix missing clone in hicache (#20130)
|
2026-03-08 23:21:18 +08:00 |
|