Mick
|
355fcbcc17
|
[diffusion] fix: fix cache dit refresh none mask (#22374)
|
2026-04-09 11:58:24 +08:00 |
|
jsheng_Linkedin
|
6838a23226
|
[Feature] Add token embedding overrides for sparse embedding replacement (#20960)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 20:51:36 -07:00 |
|
Kurkur
|
a69be2e866
|
[Feature] Support eagle3 for qwen3-vl (#22230)
|
2026-04-09 11:45:36 +08:00 |
|
Lianmin Zheng
|
ddc8ef1038
|
Lazy import flash_attention_v4 to avoid loading flash_attn.cute at startup (#22306)
|
2026-04-08 20:40:25 -07:00 |
|
Khoa Pham
|
f127d67823
|
[Spec][Ngram] Misc enhance support for multiple SAMs (#22294)
|
2026-04-08 19:56:23 -07:00 |
|
Kangrui Du
|
1b7c33a5b7
|
[diffusion] rl: revamp rollout Log-Prob support with SDE/CPS for RL post-training (#21204)
Co-authored-by: MikukuOvO <mikukuovo@gmail.com>
|
2026-04-09 09:00:00 +08:00 |
|
Liangsheng Yin
|
1e3f6ebea6
|
[core] Extract pool sizing logic to pool_configurator.py (#22384)
|
2026-04-08 16:13:21 -07:00 |
|
Baizhou Zhang
|
4e5b8cb041
|
Fix get_version_tag.py to handle dot-separated post versions (#22385)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 15:18:22 -07:00 |
|
sglang-bot
|
df3275bd6c
|
chore: bump flashinfer version to 0.6.7.post3 (#22382)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2026-04-08 14:49:45 -07:00 |
|
Yufeng He
|
c89afaea7c
|
Fix hybrid_linear_attn_backend crash with ngram speculation (#20739)
Co-authored-by: kpham-sgl <khoa.pham@radixark.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 12:52:07 -07:00 |
|
YAMY
|
c26b8b4a4b
|
[GDN] Remove FlashInfer GDN decode + no_buffer guard and default to FlashInfer on SM100+ (#21861)
|
2026-04-08 11:59:15 -07:00 |
|
Kurt Shuster
|
db30a63a13
|
[sgl-kernel] support > 1024 experts in moe_align_block_size kernel (#21610)
|
2026-04-08 11:45:13 -07:00 |
|
Mick
|
4ac6fa0d87
|
[diffusion] fix: fix loading multiple ckpts with different precision for a same module (#22360)
|
2026-04-09 02:44:19 +08:00 |
|
Yihao Wang
|
a5ed507a16
|
[refactor] [asr] add transcription adapter for extensible ASR models support (#22181)
|
2026-04-09 01:19:37 +08:00 |
|
Yihao Wang
|
ae8da14ea3
|
[fix] [whisper] ensure inputs are moved to the correct device before processing. (#22293)
|
2026-04-08 23:45:42 +08:00 |
|
Xiaoyu Zhang
|
b5b2dbe05f
|
[Diffusion] Add diffusion NVFP4 scaled-mm correctness test (#22127)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-08 22:07:24 +08:00 |
|
zhaozx-cn
|
33c9cc8994
|
[NPU] fix qwen3.5 video processor (#22266)
|
2026-04-08 21:13:29 +08:00 |
|
Fergus
|
413913763f
|
fix: wrap _import_static_state in inference_mode to fix resume on Blackwell (#21035)
|
2026-04-08 02:03:39 -07:00 |
|
Vladislav Nosivskoy
|
79c82c5c42
|
[HiCache] Fix write_backup return type when parent not backed up (#22185)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-04-08 16:42:57 +08:00 |
|
Sundara Raman Ramachandran
|
712c8c5051
|
[Score API] Add SequenceClassification Model support (#22118)
|
2026-04-08 01:30:58 -07:00 |
|
HuangJi
|
c3c13dd5e3
|
[diffusion] fix: make warmup image initialization rank-safe (#21817)
|
2026-04-08 15:51:09 +08:00 |
|
Bingxu Chen
|
de0cfed159
|
[AMD] Fix DLPack Error in Aiter flydsl GEMM by Detaching MoE Gate Weight (#22262)
Co-authored-by: bingxche <binxche@amd.com>
|
2026-04-07 23:42:10 -07:00 |
|
Артем Савкин
|
cd373667cd
|
[Bugfix] [NPU] Qwen3.5 with quantization fix (#21692)
|
2026-04-08 09:15:48 +03:00 |
|
Thomas Wang
|
729b74d8dd
|
[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300 (#22314)
|
2026-04-07 21:16:02 -07:00 |
|
yuefeng Wu
|
4e4b4ac153
|
[NPU] enable index Cache for npu (#21502)
|
2026-04-08 11:45:17 +08:00 |
|
Alex Nails
|
493ec91cbe
|
[CI] Fix stage-b-test-1-gpu-large (0) timeout by reordering LoRA tests and using tokenizer from cache (#22292)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 20:00:44 -07:00 |
|
Liangsheng Yin
|
1c5c6dad5e
|
[tiny] Fix TOCTOU race in pause-aware weight update locking (#22304)
Co-authored-by: maocheng23 <maocheng@berkeley.edu>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 18:54:28 -07:00 |
|
Mick
|
eca62ab8f4
|
UX: clean loggings (#22174)
|
2026-04-08 09:46:38 +08:00 |
|
maocheng23
|
6c2a759a04
|
[fix] Fix writer lock deadlock in update_weights_from_ipc during pause_generation (#22290)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 18:32:56 -07:00 |
|
Trevor Morris
|
7546d04c81
|
[NVIDIA] Enable FP4 flashinfer trtllm routed moe (#21240)
|
2026-04-07 16:16:29 -07:00 |
|
Liangsheng Yin
|
0e2a0260a1
|
Add fast-fail to multimodal-gen CI (#22284)
|
2026-04-07 15:56:12 -07:00 |
|
Thomas Wang
|
671fe73961
|
Reduce unnecessary kernels and copies in the NSA indexer (#22232)
|
2026-04-07 15:37:08 -07:00 |
|
David Wang
|
f08726fd56
|
[Feature] Add DFLASH speculative decoding support (#22077)
Co-authored-by: Jian Chen <141193260+jianc99@users.noreply.github.com>
Co-authored-by: Zhijian Liu <5782437+zhijian-liu@users.noreply.github.com>
Co-authored-by: Richard Gong <8001209+gongy@users.noreply.github.com>
Co-authored-by: David Wang <21328423+dcw02@users.noreply.github.com>
Co-authored-by: yilian49 <43861414+yilian49@users.noreply.github.com>
Co-authored-by: xm:D <38322020+xiaomin-d@users.noreply.github.com>
|
2026-04-07 14:48:51 -07:00 |
|
Liangsheng Yin
|
cc35714b03
|
[tiny] migrate /get_server_info; print accept length in accuracy tests (#22282)
|
2026-04-07 13:08:35 -07:00 |
|
Rain Jiang
|
1a8eb890f6
|
Kernels community fa3 (#20796)
|
2026-04-07 12:48:44 -07:00 |
|
huangtingwei
|
0c204fbd57
|
[HiSparse] Optimize the scheduling of decode backup. (#21932)
Co-authored-by: hzh0425 <hzh0425@apache.org>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2026-04-07 10:34:58 -07:00 |
|
khalilzhk
|
6131fb5882
|
[NPU] enable mla prepare fused kernel only when being mla attn (#22024)
|
2026-04-08 00:49:16 +08:00 |
|
Ke Bao
|
be42fbbbd7
|
Support HTTP2 server (#21700)
|
2026-04-08 00:42:52 +08:00 |
|
shuwenn
|
ec5742f4ab
|
fix: Auto-correct page_size for Mamba no_buffer radix cache mode (#20538)
|
2026-04-08 00:19:31 +08:00 |
|
Henson-Zh-Ali
|
727a182067
|
[Mamba] eliminate D2H if tracking mamba states (#20522)
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-04-08 00:17:26 +08:00 |
|
YAMY
|
5ae00ecd48
|
[Disagg][NIXL] Support Mamba state slice transfer for heterogeneous TP (Step 2/2 for Qwen3.5) (#22240)
|
2026-04-07 23:47:31 +08:00 |
|
Mick
|
e7bc23cdab
|
[diffusion] CI: fix consistency check (#22251)
|
2026-04-07 23:43:18 +08:00 |
|
Yujun Dong
|
233f3e31bf
|
fix(pcg,mm): fix zeroing of input_embeds when replay PCG (#22229)
|
2026-04-07 20:33:17 +08:00 |
|
Xingyu Liu
|
98f38b14df
|
Add registration API for external linear attention backend (#21983)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
|
2026-04-07 02:47:40 -07:00 |
|
Nicolas Castet
|
490fa9fa44
|
[Perf] Restore torch.compile fusion for topk postprocessing (#21771)
|
2026-04-07 01:38:38 -07:00 |
|
YAMY
|
3148742ddb
|
[Disagg][NIXL] Fix heterogeneous TP KV transfer for non-MLA models (same logic with mooncake, Step 1/2 for Qwen3.5 support) (#22145)
|
2026-04-07 14:52:02 +08:00 |
|
amote-i
|
3f7dfba419
|
fix qwen2_5_math_rm_72b (#21295)
|
2026-04-07 14:36:57 +08:00 |
|
Aditya Sharma
|
f6e85676b5
|
model: support qwen3-asr (#22073)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-04-07 13:27:05 +08:00 |
|
Chang Min Bark
|
a757c1e3fb
|
[Apple Silicon] [MLX] Add mlx and mlx-lm dependencies (#22162)
Co-authored-by: R0CKSTAR <yeahdongcn@gmail.com>
|
2026-04-07 11:36:43 +08:00 |
|
Xinyuan Tong
|
2813cb6d9a
|
[New Model] Gemma 4 (#21952)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Pengyu Chen <pychen96@gmail.com>
Co-authored-by: kpham-sgl <khoa.pham@radixark.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Andy Luo <andy.luo@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
|
2026-04-06 20:24:44 -07:00 |
|