sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-02 21:37:11 +00:00

Author	SHA1	Message	Date
Junlin Wu	80a6014243	✨ [diffusion][npu][quant] Add MXFP8 quantization support for Wan2.2 Diffusion on Ascend NPU (#20922 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2026-05-07 21:30:56 +03:00
McZyWu	7d397ad23d	[NPU]Support model Trinity-mini for Npu, accuracy 90% (#18172 ) Co-authored-by: sglang-npu-bot <sglangnpu@163.com>	2026-05-07 20:58:18 +03:00
Mick	b0225a69dc	[diffusion] optimize: precompute LTX2 guidance perturbation states (#24494 )	2026-05-08 01:18:42 +08:00
ykcai-daniel	9c41b1058f	[diffusion] refactor: refactor cfg parallelism framework to support multi-branch CFG for LTX2 (#23736 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-05-07 22:56:55 +08:00
Vladimir Serov	263cb3b222	[LoRA] Torch Native enhancement: embedding and graph optimization (#21885 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-05-07 17:28:38 +03:00
ovidiusm	811d138c8a	Nixl async transfer (#23967 ) Signed-off-by: Ovidiu Mara <ovidium@nvidia.com>	2026-05-07 22:05:43 +08:00
Yuxuan Zhang	ec4560304b	[Bug Fix] Preserve decode state across retract-resume of GLM-5.1 (#23346 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-05-07 21:37:53 +08:00
Shangming Cai	e264b5785d	[PD] Centralize per-room cleanup in common backend (#24601 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-05-07 18:47:55 +08:00
inkcherry	3b2c730320	[AMD] Enable dual-stream MoE on ROCm (#24005 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2026-05-07 02:27:24 -07:00
Hanming Lu	92f281f856	[Spec][trtllm] use decode kernel for draft extend (#24566 )	2026-05-07 02:25:26 -07:00
weireweire	684638e053	Fix prefill batch iter logging under overlap (#20845 )	2026-05-07 02:10:42 -07:00
Polisetty V R K Jyothendra Varma	9dfb1d2ebe	[Intel GPU] Fix flash_mla_get_workspace_size call in intel_xpu (#24372 ) Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>	2026-05-07 13:45:32 +08:00
Kangyan-Zhou	a2586f1c53	[CI] pin NeMo-Skills install to known-good SHA in accuracy_test_runner (#24581 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 22:24:48 -07:00
Yanbin Jiang	f0368a6666	[LoRA] Use deterministic lora_id for --lora-paths so multi-node ranks agree (#24555 ) Co-authored-by: gh1595 <278903827+gh1595@users.noreply.github.com>	2026-05-06 22:20:15 -07:00
Jun Liu	65ce9965ce	[Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (#22715 ) Co-authored-by: Alex Nails <alex.nails@radixark.ai>	2026-05-06 19:20:04 -07:00
Baizhou Zhang	ecb786c8d7	[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (#24268 )	2026-05-06 18:59:01 -07:00
Liangsheng Yin	eaf074d50e	propagate pytest exit code from test __main__ entries (#24487 )	2026-05-06 18:46:52 -07:00
Yuzhen Zhou	4a279d9c36	[R3] Avoid implicit CUDA sync in routed experts DP slicing (#24550 ) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2026-05-06 18:37:36 -07:00
huangtingwei	27445f9836	Add ChatCompletionRequest-style support to /v1/tokenize (#23981 )	2026-05-06 18:35:20 -07:00
Brayden Zhong	3fe8bc987e	Support Triton MLA FP8 KV cache (#20479 ) Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>	2026-05-06 18:32:39 -07:00
Mick	2e642ea187	[diffusion] chore: align LTX-2 with official (#24313 )	2026-05-07 08:46:28 +08:00
Xiaoyu Zhang	a9a8b20a90	[codex] Optimize Z-Image packed QKV (#24117 )	2026-05-07 07:51:22 +08:00
gh1595	ece7e95b65	[LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (#24420 ) Co-authored-by: Yanbin Jiang <jybsuper@gmail.com>	2026-05-06 14:51:30 -07:00
Lianmin Zheng	b859f7ffba	Improve metrics, observability, and PD deploy tooling (#24521 )	2026-05-06 11:27:35 -07:00
Xiaoyu Zhang	d86f2916cc	Fix diffusion fallback guards and validation (#23335 )	2026-05-07 00:05:43 +08:00
Shangming Cai	32d9998b9d	[PD] Prevent update_status to Failed from cleared entries (#24539 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-05-06 23:32:04 +08:00
sky	bfc1aeae13	[CP] Register KV cache allgather buffer with symmetric memory (#24040 ) Signed-off-by: wangfakang <fakangwang@gmail.com>	2026-05-06 23:24:36 +08:00
fzyzcjy	c4c5541618	Support getting checksums in weight checker (#24537 )	2026-05-06 22:59:28 +08:00
fzyzcjy	ae5ae840f6	Refactor buffer patterns in weight checker (#24538 )	2026-05-06 22:52:07 +08:00
Ke Bao	eb5f0fbeef	Support swa HiCache for unified radix cache (#23391 ) Co-authored-by: hzh0425 <hzh0425@apache.org>	2026-05-06 22:19:25 +08:00
fzyzcjy	491051c622	Cherry pick weight_checker `_weight_fp32` buffer skip from #22663 (#24534 ) Co-authored-by: JD <jaedon.guo@gmail.com>	2026-05-06 21:21:12 +08:00
fzyzcjy	0d40931b08	Cherry pick weight_checker non-persistent buffer pattern list from #21278 (#24533 ) Co-authored-by: JD <jaedon.guo@gmail.com>	2026-05-06 21:14:01 +08:00
fzyzcjy	864f9633f2	Cherry pick weight_checker fp8 dequant fix and non-persistent buffer skip from #21494 (#24532 ) Co-authored-by: Yueming Yuan <yym022502@gmail.com>	2026-05-06 21:13:17 +08:00
Lianmin Zheng	d4d4b04d66	[PD] Fix missing update_status call in abort() across all KV backends (#24522 )	2026-05-06 05:30:11 -07:00
cctry	163bf1ba71	[PD] Fix KV transfer metrics (#24416 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-05-06 03:44:48 -07:00
Xiaoyu Zhang	b67df7cd1b	[Codex] Diffusion handle non-contiguous CFG communication (#24332 ) Co-authored-by: BBuf Codex <bbuf-codex@users.noreply.github.com>	2026-05-06 17:27:14 +08:00
sky	c8bc23522f	Refactor: decouple segment tracking from comm registration (#21392 ) Signed-off-by: wangfakang <fakangwang@gmail.com>	2026-05-06 17:07:58 +08:00
fzyzcjy	a858fda708	Add e2e test with log snapshot in dumper grafter (#24513 )	2026-05-06 17:00:13 +08:00
fzyzcjy	8527db0a91	Enhance diff and tensor-info logging in dumper grafter (#24512 )	2026-05-06 16:58:08 +08:00
fzyzcjy	75943cfbcf	Support per-call extras and dataclass transform input in dumper grafter (#24511 )	2026-05-06 16:57:44 +08:00
fzyzcjy	833279eb2e	Support multi-rank exchange via all_gather_object in dumper grafter (#24510 )	2026-05-06 16:57:20 +08:00
fzyzcjy	ebd64f5d40	Support user-supplied recv-side transform in dumper grafter (#24509 )	2026-05-06 16:56:52 +08:00
fzyzcjy	9a65f0ac26	Support t2b direction and overlap protection in dumper grafter (#24508 )	2026-05-06 16:56:24 +08:00
fzyzcjy	58487e68e5	Support cross-system tensor grafting in dumper (#24507 )	2026-05-06 16:55:40 +08:00
fzyzcjy	61104d7d0a	Add prefixed _log helper in dumper (#24506 )	2026-05-06 16:54:20 +08:00
Mick	fbebfdec9a	[diffusion] fix: fix diffusion FSDP sharding (#24431 )	2026-05-06 14:55:51 +08:00
cctry	660a77f221	Silence noisy health-check race log in TokenizerManager (#24466 )	2026-05-05 21:06:43 -07:00
ybyang	3da87902d7	[HiSparse] Support FP8 KV cache by routing to flashmla_kv backend (#23013 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>	2026-05-06 03:18:30 +00:00
Night	b2420d72ff	[RL] DeepEP support for `--enable-return-routed-experts` (#16859 ) Co-authored-by: hnyls2002 <lsyincs@gmail.com> Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com>	2026-05-05 20:01:07 -07:00
Xiaoyu Zhang	d7385b575f	[Diffusion] Optimize Hunyuan3D shape denoising (#24287 )	2026-05-06 10:10:09 +08:00

1 2 3 4 5 ...

8205 Commits