Liangsheng Yin
|
00d620b77d
|
introduce arg_groups/ with nemotron_h hook (#24328)
|
2026-05-03 16:28:11 -07:00 |
|
Liangsheng Yin
|
c3b6d20a80
|
Register deepseek_v32 alias instead of rewriting config.json (#24295)
|
2026-05-03 16:02:17 -07:00 |
|
Zhangheng
|
9a5450ad73
|
[PD]: Support incremental transfer for mooncake transfer engine (#24257)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-05-04 00:57:59 +08:00 |
|
Chi McIsaac
|
62265ca7fc
|
[diffusion] feat: initial support for dynamic batching (#18764)
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
Co-authored-by: Junhao Liu <junhaoliu2023@gmail.com>
|
2026-05-04 00:44:42 +08:00 |
|
Xiaoyu Zhang
|
f2d1390909
|
[Diffusion] Add Qwen Image ModelOpt FP8 support (#23155)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-05-04 00:24:22 +08:00 |
|
Mick
|
5925572c95
|
[diffusion] CI: switch CI data references to sgl-project/ci-data (#24299)
|
2026-05-03 23:05:12 +08:00 |
|
Zhangheng
|
c0f5950636
|
[UnifiedRadixTree]: Support HiCache Framework for UnifiedRadixTree (#23316)
Co-authored-by: JINZ <1023553676@qq.com>
Co-authored-by: diemchai <diemchai@tencent.com>
|
2026-05-03 22:13:22 +08:00 |
|
GXIN
|
e37f46fcf7
|
[NPU] Fix Z-Image negative-branch rotary embeddings for CFG (#23538)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-05-03 16:18:26 +03:00 |
|
Zhangheng
|
44ca2d01fc
|
[pd]: (Bug Fix) Incorrect out_cache_loc slicing in prepare_for_prebuilt (#24230)
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2026-05-03 18:35:16 +08:00 |
|
Mick
|
2bfc5d3bb1
|
[diffusion] optimize LTX2.3 HQ denoising split passes (#24298)
|
2026-05-03 16:37:46 +08:00 |
|
Liangsheng Yin
|
fcc8b7b126
|
Rename SGLANG_USE_JIT_ALL_REDUCE to SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2 (#24297)
Co-authored-by: DarkSharpness <2040703891@qq.com>
|
2026-05-02 23:59:46 -07:00 |
|
Glen Liu
|
76b9c8de6f
|
[Feature] add LoRADrainer to address high P99 TTFT (#17913)
|
2026-05-02 16:13:43 -07:00 |
|
Brayden Zhong
|
88bb5dffe4
|
[Dependency] Upgrade to Torch 2.11.0 (#21247)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-05-02 12:25:36 -07:00 |
|
Glen Liu
|
e0474fdd9b
|
throw ValueError for DoRA adapters (#22125)
Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>
|
2026-05-02 14:54:19 +00:00 |
|
Xiaoyu Zhang
|
4128f1ffe2
|
[SKILLS] Tiny upgrade diffusion skills (#24273)
|
2026-05-02 22:04:05 +08:00 |
|
Xiaoyu Zhang
|
b712dd48fe
|
[codex] diffusion: enable group norm silu fuse by default (#23148)
|
2026-05-02 20:55:51 +08:00 |
|
Xiaoyu Zhang
|
1360848ee1
|
Optimize large GroupNorm SiLU apply (#23938)
|
2026-05-02 20:54:46 +08:00 |
|
egvenediktov
|
83bf5d6869
|
[NPU]TP Communications compression For Qwen3 models for NPU (#20520)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-05-02 14:29:11 +03:00 |
|
Elizaveta Martirosian
|
ebbaab5597
|
[NPU] Add GitHub test summary and deduplicate test code. Part 1 (#23835)
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
Co-authored-by: root <root@localhost.localdomain>
Co-authored-by: Elizaveta Martirosian <you@example.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
|
2026-05-02 14:18:18 +03:00 |
|
Liangsheng Yin
|
3259a2c789
|
Encode routed_experts in the detokenizer, off the tokenizer hot path (#24263)
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: Yueming Yuan <yym022502@gmail.com>
|
2026-05-02 02:44:32 -07:00 |
|
Xiaoyu Zhang
|
589f90b368
|
[diffusion] chore: use lmsys as org for modelopt checkpoints (#23924)
|
2026-05-02 17:18:58 +08:00 |
|
Alison Shao
|
f3dbadb82b
|
fix: accept 0-indexed safetensors shard names in CI weight validator (#24237)
|
2026-05-02 00:58:15 -07:00 |
|
Kangyan-Zhou
|
2e72a36420
|
[CI] Restore SMG e2e on 2-gpu-h100 / 4-gpu-h100 runners (#24222)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-01 23:55:20 -07:00 |
|
Clay
|
5ec3b26799
|
[diffusion] model: support JoyAI-Image-Edit (#22625)
Co-authored-by: chengyusong1 <chengyusong1@jd.com>
|
2026-05-02 14:08:57 +08:00 |
|
Kangyan-Zhou
|
cd27baaffd
|
[ci][cu13] Bump torch_memory_saver to 0.0.9.post1; restore manual tests (#23182)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-01 22:50:38 -07:00 |
|
Mick
|
b7d4647568
|
[diffusion] CI: change ground truth repo (#24219)
|
2026-05-01 21:25:40 -07:00 |
|
Sam Shleifer
|
63f225ca2e
|
[session] fix mamba pool leak in StreamingSession.release_session + plumb idle leak check (#23496)
|
2026-05-02 11:38:08 +08:00 |
|
Sam Shleifer
|
d41e8c459d
|
Support RunAI loading for quantized checkpoints (#23850)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Sam Shleifer <sam@thinkingmachines.ai>
|
2026-05-02 11:11:40 +08:00 |
|
Johnsonms
|
4c2ed9a254
|
Flux2 nvfp4 quantization correctness on Blackwell (B200) (#23625)
|
2026-05-02 09:57:35 +08:00 |
|
Aurick Qiao
|
bfccc8e504
|
Allow configuring NIXL backend parameters from env (#24169)
|
2026-05-01 18:30:43 -07:00 |
|
Mick
|
193b977572
|
[diffusion] chore: clean scheduler (#24229)
|
2026-05-02 09:30:06 +08:00 |
|
Liangsheng Yin
|
cb8fbd53fc
|
Reserve slot 0 as padding in all req pools (#24243)
|
2026-05-01 16:41:36 -07:00 |
|
Cheng Wan
|
b47fab6f5d
|
[bugfix] Support MIXED forward mode in TBO splitter for DP attention (#24241)
|
2026-05-01 16:01:23 -07:00 |
|
Lucia Fang
|
05de73efd1
|
[core/model] Use explicit model arch for Llama4 attention backend auto-selection (#24232)
|
2026-05-01 15:49:30 -07:00 |
|
Liangsheng Yin
|
8a530468fd
|
[Bug] Size mamba mappings from req pool, not mamba pool (#24244)
|
2026-05-01 15:45:20 -07:00 |
|
Yuxuan Zhang
|
79bc2505a5
|
[Bug Fix] Resolve EAGLE cuda graph IMA under PD + DP + MTP with GLM-5.1 (#23037)
|
2026-05-01 13:53:52 -07:00 |
|
Lucia Fang
|
b58fa60a1f
|
[core/attention] Add SGLANG_FLASHINFER_USE_PAGED env to force paged wrapper (#24165)
|
2026-05-01 12:52:46 -07:00 |
|
Lianmin Zheng
|
ece8a1a788
|
Refactor device timer, clean up metrics collector, and add fwd occupancy metric (#24197)
|
2026-05-01 10:25:25 -07:00 |
|
JINZ
|
4a50cd781e
|
[BugFix][HiMamba] Fix host-protected node deletion in HiMamba tombstone del (#23696)
Co-authored-by: diemchai <diemchai@tencent.com>
Co-authored-by: Zhangheng <hzh0425@apache.org>
|
2026-05-01 21:57:47 +08:00 |
|
ishandhanani
|
5b7ce417d0
|
[P/D disagg] - support decode side radix cache (#19746)
|
2026-05-01 21:55:34 +08:00 |
|
Cheng Wan
|
d48095ba53
|
Bypass torch.cuda.use_mem_pool generator-CM in SymmetricMemoryContext (#24190)
|
2026-05-01 01:25:49 -07:00 |
|
Lianmin Zheng
|
d9e8a4a7f8
|
[SWA] Ensure we use pre-computed SWA cache location during prefill (#24138)
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Yinghai Lu <yinghai@meta.com>
|
2026-05-01 00:01:49 -07:00 |
|
Yanbin Jiang
|
8975479f87
|
[LoRA][MOE] Fix EP correctness in MoE LoRA slicing and virtual-experts kernels (#24171)
|
2026-04-30 22:42:10 -07:00 |
|
Mick
|
9d84268705
|
[diffusion] refactor: introduce component residency manager (#23771)
|
2026-05-01 11:10:41 +08:00 |
|
Cheng Wan
|
108bfd8b6a
|
[MoE] Add Aiter MoE runner backend and purge aiter.fused_moe from quant methods (#23597)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-30 19:50:52 -07:00 |
|
Yilong Zhao
|
f67292539f
|
spec: gate dp mlp sync with server args (#24177)
|
2026-04-30 16:29:41 -07:00 |
|
Polisetty V R K Jyothendra Varma
|
da7f890788
|
[Intel GPU] Integrate flash_mla_decode in Intel XPU attention backend (#23557)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-01 07:21:28 +08:00 |
|
shubham singhal
|
e35ac95cdc
|
[Test] Add XPU device support to unit tests (#22236)
Co-authored-by: vshekhawat-hlab <vshekhawat@habana.ai>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-01 07:18:51 +08:00 |
|
Roopak Srivastava
|
9c5cad3914
|
Use device-agnostic helpers for Mamba tests and core ops (#20234)
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-01 07:14:53 +08:00 |
|
Kalyan Kumar
|
8a9e424faa
|
Replace hardcoded CUDA device with get_device() for XPU support (#13599)
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
|
2026-05-01 07:13:46 +08:00 |
|