Shu Wang
|
5638d40f3a
|
[nvidia] Gemma4 nvfp4 fix (#22079)
|
2026-04-10 08:44:34 +08:00 |
|
Tarushii Goel
|
cebd9c2a1e
|
[sgl] add ability to return logprobs in MultiLayerEagleWorkerV2 (#22241)
|
2026-04-09 16:20:55 -07:00 |
|
Mohammad Miadh Angkad
|
c3833ba929
|
Enable DFLASH support for additional model backends (#22358)
Co-authored-by: David Wang <21328423+dcw02@users.noreply.github.com>
|
2026-04-09 14:36:12 -07:00 |
|
Ethan (Yusheng) Su
|
28ef6de091
|
[Lora] Lora quat info re-factor and support deepseekv3 mla lora (#22323)
|
2026-04-09 14:19:58 -07:00 |
|
Baizhou Zhang
|
60acdc31f2
|
[Fix] Fix several bugs on DSA models (#22430)
|
2026-04-09 12:46:23 -07:00 |
|
Baizhou Zhang
|
606aa11ea8
|
[DSA] Enable all reduce fusion for DSA models (#22390)
|
2026-04-09 12:42:44 -07:00 |
|
Lawrence Wu
|
8eb235ab51
|
fix: do not strip whitespace from GLM tool call values (#20543)
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
|
2026-04-09 11:14:15 -07:00 |
|
Lishan H
|
8b991d98a1
|
[feature] asr: add chunk-based streaming ASR for Qwen3-ASR (#22089)
|
2026-04-10 01:49:03 +08:00 |
|
billishyahao
|
1df9f4e2f6
|
[AMD] Add prealloc token env for mori-ep (#22329)
|
2026-04-09 09:34:35 -07:00 |
|
Jonah Bernard
|
8216b921a1
|
Add MLX profiling to bench_one_batch.py (#22159)
|
2026-04-09 20:45:21 +08:00 |
|
Liangsheng Yin
|
9fed58805f
|
[Doc] Clarify SWA HybridSWAPoolConfigurator comments on all-SWA vs hybrid semantics (#22443)
|
2026-04-09 03:02:16 -07:00 |
|
YMbmzy
|
8a67fb20ea
|
[Speculative] Support penalty for spec v2 overlap scheduling (#22049)
|
2026-04-09 01:59:04 -07:00 |
|
Thomas Wang
|
628df31d08
|
[AMD] Use aiter CK layernorm2d for LayerNorm to reduce NSA indexer kernel launches (#22424)
|
2026-04-09 01:55:29 -07:00 |
|
xutizhou
|
57ffc55fb6
|
feat: [1/2] [DeepEP] Fuse shared expert into MoE dispatch under EP (#20089)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: AichenF <aichenf@nvidia.com>
|
2026-04-09 01:48:28 -07:00 |
|
Mick
|
9709192ce9
|
[diffusion] feat: support FLUX.2-small-decoder (#22414)
|
2026-04-09 15:53:14 +08:00 |
|
Liangsheng Yin
|
de441ac6bb
|
[core] Introduce MemoryPoolConfigurator class hierarchy (#22389)
|
2026-04-09 15:29:19 +08:00 |
|
Evgueni Petrov
|
b9c316917b
|
fix AttributeError: 'LazyValue' object has no attribute 'keys' in eplb_manager.py for qwen3 moe (#21822)
|
2026-04-09 00:13:29 -07:00 |
|
Nicolas Castet
|
e379befbac
|
Add symmetric debug mode to print stack trace of comm ops with unregistered tensors (#18569)
|
2026-04-08 22:34:58 -07:00 |
|
Bingxu Chen
|
6b96f8341d
|
[AMD] Fix multimodal diffusion test crash on ROCm by falling back to SDPA (#22335)
|
2026-04-08 22:32:49 -07:00 |
|
Mick
|
355fcbcc17
|
[diffusion] fix: fix cache dit refresh none mask (#22374)
|
2026-04-09 11:58:24 +08:00 |
|
jsheng_Linkedin
|
6838a23226
|
[Feature] Add token embedding overrides for sparse embedding replacement (#20960)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 20:51:36 -07:00 |
|
Kurkur
|
a69be2e866
|
[Feature] Support eagle3 for qwen3-vl (#22230)
|
2026-04-09 11:45:36 +08:00 |
|
Lianmin Zheng
|
ddc8ef1038
|
Lazy import flash_attention_v4 to avoid loading flash_attn.cute at startup (#22306)
|
2026-04-08 20:40:25 -07:00 |
|
Khoa Pham
|
f127d67823
|
[Spec][Ngram] Misc enhance support for multiple SAMs (#22294)
|
2026-04-08 19:56:23 -07:00 |
|
Kangrui Du
|
1b7c33a5b7
|
[diffusion] rl: revamp rollout Log-Prob support with SDE/CPS for RL post-training (#21204)
Co-authored-by: MikukuOvO <mikukuovo@gmail.com>
|
2026-04-09 09:00:00 +08:00 |
|
Liangsheng Yin
|
1e3f6ebea6
|
[core] Extract pool sizing logic to pool_configurator.py (#22384)
|
2026-04-08 16:13:21 -07:00 |
|
Baizhou Zhang
|
4e5b8cb041
|
Fix get_version_tag.py to handle dot-separated post versions (#22385)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 15:18:22 -07:00 |
|
sglang-bot
|
df3275bd6c
|
chore: bump flashinfer version to 0.6.7.post3 (#22382)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2026-04-08 14:49:45 -07:00 |
|
Yufeng He
|
c89afaea7c
|
Fix hybrid_linear_attn_backend crash with ngram speculation (#20739)
Co-authored-by: kpham-sgl <khoa.pham@radixark.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 12:52:07 -07:00 |
|
YAMY
|
c26b8b4a4b
|
[GDN] Remove FlashInfer GDN decode + no_buffer guard and default to FlashInfer on SM100+ (#21861)
|
2026-04-08 11:59:15 -07:00 |
|
Kurt Shuster
|
db30a63a13
|
[sgl-kernel] support > 1024 experts in moe_align_block_size kernel (#21610)
|
2026-04-08 11:45:13 -07:00 |
|
Mick
|
4ac6fa0d87
|
[diffusion] fix: fix loading multiple ckpts with different precision for a same module (#22360)
|
2026-04-09 02:44:19 +08:00 |
|
Yihao Wang
|
a5ed507a16
|
[refactor] [asr] add transcription adapter for extensible ASR models support (#22181)
|
2026-04-09 01:19:37 +08:00 |
|
Yihao Wang
|
ae8da14ea3
|
[fix] [whisper] ensure inputs are moved to the correct device before processing. (#22293)
|
2026-04-08 23:45:42 +08:00 |
|
Xiaoyu Zhang
|
b5b2dbe05f
|
[Diffusion] Add diffusion NVFP4 scaled-mm correctness test (#22127)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-08 22:07:24 +08:00 |
|
zhaozx-cn
|
33c9cc8994
|
[NPU] fix qwen3.5 video processor (#22266)
|
2026-04-08 21:13:29 +08:00 |
|
Fergus
|
413913763f
|
fix: wrap _import_static_state in inference_mode to fix resume on Blackwell (#21035)
|
2026-04-08 02:03:39 -07:00 |
|
Vladislav Nosivskoy
|
79c82c5c42
|
[HiCache] Fix write_backup return type when parent not backed up (#22185)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: hzh0425 <hzh0425@apache.org>
|
2026-04-08 16:42:57 +08:00 |
|
Sundara Raman Ramachandran
|
712c8c5051
|
[Score API] Add SequenceClassification Model support (#22118)
|
2026-04-08 01:30:58 -07:00 |
|
HuangJi
|
c3c13dd5e3
|
[diffusion] fix: make warmup image initialization rank-safe (#21817)
|
2026-04-08 15:51:09 +08:00 |
|
Bingxu Chen
|
de0cfed159
|
[AMD] Fix DLPack Error in Aiter flydsl GEMM by Detaching MoE Gate Weight (#22262)
Co-authored-by: bingxche <binxche@amd.com>
|
2026-04-07 23:42:10 -07:00 |
|
Артем Савкин
|
cd373667cd
|
[Bugfix] [NPU] Qwen3.5 with quantization fix (#21692)
|
2026-04-08 09:15:48 +03:00 |
|
Thomas Wang
|
729b74d8dd
|
[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300 (#22314)
|
2026-04-07 21:16:02 -07:00 |
|
yuefeng Wu
|
4e4b4ac153
|
[NPU] enable index Cache for npu (#21502)
|
2026-04-08 11:45:17 +08:00 |
|
Alex Nails
|
493ec91cbe
|
[CI] Fix stage-b-test-1-gpu-large (0) timeout by reordering LoRA tests and using tokenizer from cache (#22292)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 20:00:44 -07:00 |
|
Liangsheng Yin
|
1c5c6dad5e
|
[tiny] Fix TOCTOU race in pause-aware weight update locking (#22304)
Co-authored-by: maocheng23 <maocheng@berkeley.edu>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 18:54:28 -07:00 |
|
Mick
|
eca62ab8f4
|
UX: clean loggings (#22174)
|
2026-04-08 09:46:38 +08:00 |
|
maocheng23
|
6c2a759a04
|
[fix] Fix writer lock deadlock in update_weights_from_ipc during pause_generation (#22290)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-07 18:32:56 -07:00 |
|
Trevor Morris
|
7546d04c81
|
[NVIDIA] Enable FP4 flashinfer trtllm routed moe (#21240)
|
2026-04-07 16:16:29 -07:00 |
|
Liangsheng Yin
|
0e2a0260a1
|
Add fast-fail to multimodal-gen CI (#22284)
|
2026-04-07 15:56:12 -07:00 |
|