Commit Graph

8107 Commits

Author SHA1 Message Date
Liangsheng Yin
00d620b77d introduce arg_groups/ with nemotron_h hook (#24328) 2026-05-03 16:28:11 -07:00
Liangsheng Yin
c3b6d20a80 Register deepseek_v32 alias instead of rewriting config.json (#24295) 2026-05-03 16:02:17 -07:00
Zhangheng
9a5450ad73 [PD]: Support incremental transfer for mooncake transfer engine (#24257)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2026-05-04 00:57:59 +08:00
Chi McIsaac
62265ca7fc [diffusion] feat: initial support for dynamic batching (#18764)
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: Junhao Liu <junhaoliu2023@gmail.com>
2026-05-04 00:44:42 +08:00
Xiaoyu Zhang
f2d1390909 [Diffusion] Add Qwen Image ModelOpt FP8 support (#23155)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-05-04 00:24:22 +08:00
Mick
5925572c95 [diffusion] CI: switch CI data references to sgl-project/ci-data (#24299) 2026-05-03 23:05:12 +08:00
Zhangheng
c0f5950636 [UnifiedRadixTree]: Support HiCache Framework for UnifiedRadixTree (#23316)
Co-authored-by: JINZ <1023553676@qq.com>
Co-authored-by: diemchai <diemchai@tencent.com>
2026-05-03 22:13:22 +08:00
GXIN
e37f46fcf7 [NPU] Fix Z-Image negative-branch rotary embeddings for CFG (#23538)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2026-05-03 16:18:26 +03:00
Zhangheng
44ca2d01fc [pd]: (Bug Fix) Incorrect out_cache_loc slicing in prepare_for_prebuilt (#24230)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2026-05-03 18:35:16 +08:00
Mick
2bfc5d3bb1 [diffusion] optimize LTX2.3 HQ denoising split passes (#24298) 2026-05-03 16:37:46 +08:00
Liangsheng Yin
fcc8b7b126 Rename SGLANG_USE_JIT_ALL_REDUCE to SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2 (#24297)
Co-authored-by: DarkSharpness <2040703891@qq.com>
2026-05-02 23:59:46 -07:00
Glen Liu
76b9c8de6f [Feature] add LoRADrainer to address high P99 TTFT (#17913) 2026-05-02 16:13:43 -07:00
Brayden Zhong
88bb5dffe4 [Dependency] Upgrade to Torch 2.11.0 (#21247)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-05-02 12:25:36 -07:00
Glen Liu
e0474fdd9b throw ValueError for DoRA adapters (#22125)
Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>
2026-05-02 14:54:19 +00:00
Xiaoyu Zhang
4128f1ffe2 [SKILLS] Tiny upgrade diffusion skills (#24273) 2026-05-02 22:04:05 +08:00
Xiaoyu Zhang
b712dd48fe [codex] diffusion: enable group norm silu fuse by default (#23148) 2026-05-02 20:55:51 +08:00
Xiaoyu Zhang
1360848ee1 Optimize large GroupNorm SiLU apply (#23938) 2026-05-02 20:54:46 +08:00
egvenediktov
83bf5d6869 [NPU]TP Communications compression For Qwen3 models for NPU (#20520)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2026-05-02 14:29:11 +03:00
Elizaveta Martirosian
ebbaab5597 [NPU] Add GitHub test summary and deduplicate test code. Part 1 (#23835)
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
Co-authored-by: root <root@localhost.localdomain>
Co-authored-by: Elizaveta Martirosian <you@example.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2026-05-02 14:18:18 +03:00
Liangsheng Yin
3259a2c789 Encode routed_experts in the detokenizer, off the tokenizer hot path (#24263)
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
2026-05-02 02:44:32 -07:00
Xiaoyu Zhang
589f90b368 [diffusion] chore: use lmsys as org for modelopt checkpoints (#23924) 2026-05-02 17:18:58 +08:00
Alison Shao
f3dbadb82b fix: accept 0-indexed safetensors shard names in CI weight validator (#24237) 2026-05-02 00:58:15 -07:00
Kangyan-Zhou
2e72a36420 [CI] Restore SMG e2e on 2-gpu-h100 / 4-gpu-h100 runners (#24222)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 23:55:20 -07:00
Clay
5ec3b26799 [diffusion] model: support JoyAI-Image-Edit (#22625)
Co-authored-by: chengyusong1 <chengyusong1@jd.com>
2026-05-02 14:08:57 +08:00
Kangyan-Zhou
cd27baaffd [ci][cu13] Bump torch_memory_saver to 0.0.9.post1; restore manual tests (#23182)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:50:38 -07:00
Mick
b7d4647568 [diffusion] CI: change ground truth repo (#24219) 2026-05-01 21:25:40 -07:00
Sam Shleifer
63f225ca2e [session] fix mamba pool leak in StreamingSession.release_session + plumb idle leak check (#23496) 2026-05-02 11:38:08 +08:00
Sam Shleifer
d41e8c459d Support RunAI loading for quantized checkpoints (#23850)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Sam Shleifer <sam@thinkingmachines.ai>
2026-05-02 11:11:40 +08:00
Johnsonms
4c2ed9a254 Flux2 nvfp4 quantization correctness on Blackwell (B200) (#23625) 2026-05-02 09:57:35 +08:00
Aurick Qiao
bfccc8e504 Allow configuring NIXL backend parameters from env (#24169) 2026-05-01 18:30:43 -07:00
Mick
193b977572 [diffusion] chore: clean scheduler (#24229) 2026-05-02 09:30:06 +08:00
Liangsheng Yin
cb8fbd53fc Reserve slot 0 as padding in all req pools (#24243) 2026-05-01 16:41:36 -07:00
Cheng Wan
b47fab6f5d [bugfix] Support MIXED forward mode in TBO splitter for DP attention (#24241) 2026-05-01 16:01:23 -07:00
Lucia Fang
05de73efd1 [core/model] Use explicit model arch for Llama4 attention backend auto-selection (#24232) 2026-05-01 15:49:30 -07:00
Liangsheng Yin
8a530468fd [Bug] Size mamba mappings from req pool, not mamba pool (#24244) 2026-05-01 15:45:20 -07:00
Yuxuan Zhang
79bc2505a5 [Bug Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5.1 (#23037) 2026-05-01 13:53:52 -07:00
Lucia Fang
b58fa60a1f [core/attention] Add SGLANG_FLASHINFER_USE_PAGED env to force paged wrapper (#24165) 2026-05-01 12:52:46 -07:00
Lianmin Zheng
ece8a1a788 Refactor device timer, clean up metrics collector, and add fwd occupancy metric (#24197) 2026-05-01 10:25:25 -07:00
JINZ
4a50cd781e [BugFix][HiMamba] Fix host-protected node deletion in HiMamba tombstone del (#23696)
Co-authored-by: diemchai <diemchai@tencent.com>
Co-authored-by: Zhangheng <hzh0425@apache.org>
2026-05-01 21:57:47 +08:00
ishandhanani
5b7ce417d0 [P/D disagg] - support decode side radix cache (#19746) 2026-05-01 21:55:34 +08:00
Cheng Wan
d48095ba53 Bypass torch.cuda.use_mem_pool generator-CM in SymmetricMemoryContext (#24190) 2026-05-01 01:25:49 -07:00
Lianmin Zheng
d9e8a4a7f8 [SWA] Ensure we use pre-computed SWA cache location during prefill (#24138)
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Yinghai Lu <yinghai@meta.com>
2026-05-01 00:01:49 -07:00
Yanbin Jiang
8975479f87 [LoRA][MOE] Fix EP correctness in MoE LoRA slicing and virtual-experts kernels (#24171) 2026-04-30 22:42:10 -07:00
Mick
9d84268705 [diffusion] refactor: introduce component residency manager (#23771) 2026-05-01 11:10:41 +08:00
Cheng Wan
108bfd8b6a [MoE] Add Aiter MoE runner backend and purge aiter.fused_moe from quant methods (#23597)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:50:52 -07:00
Yilong Zhao
f67292539f spec: gate dp mlp sync with server args (#24177) 2026-04-30 16:29:41 -07:00
Polisetty V R K Jyothendra Varma
da7f890788 [Intel GPU] Integrate flash_mla_decode in Intel XPU attention backend (#23557)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-05-01 07:21:28 +08:00
shubham singhal
e35ac95cdc [Test] Add XPU device support to unit tests (#22236)
Co-authored-by: vshekhawat-hlab <vshekhawat@habana.ai>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-05-01 07:18:51 +08:00
Roopak Srivastava
9c5cad3914 Use device-agnostic helpers for Mamba tests and core ops (#20234)
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-05-01 07:14:53 +08:00
Kalyan Kumar
8a9e424faa Replace hardcoded CUDA device with get_device() for XPU support (#13599)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
2026-05-01 07:13:46 +08:00