sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 19:57:52 +00:00

Author	SHA1	Message	Date
AlonKejzman	66ea0aee7f	tokenizer: Add fastokens support (#23753 )	2026-04-28 11:43:10 -07:00
Mick	144038fbae	[diffusion] chore: change default seed to 42 (#23836 )	2026-04-28 20:39:23 +08:00
Xiaoyu Zhang	7824903417	[SKILL] Sync SGLang skill docs (#23921 )	2026-04-28 17:05:36 +08:00
Yinzuo Jiang	71160e4ddb	feat(observability): add OpenTelemetry tracing for pipeline parallelism (#23169 ) Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>	2026-04-28 17:05:23 +08:00
Xun Sun	9a53ab3d6d	[6/N] (Elastic EP) Recover failed ranks (#15771 )	2026-04-28 00:44:26 -07:00
yaya159456	b8a2dcd300	fix: resolve tensor file overwrite between target and draft models (#21694 ) Co-authored-by: jiangguangya <jiangguangya@baidu.com> Co-authored-by: Khoa Pham <khoa.pham@radixark.ai>	2026-04-27 23:40:49 -07:00
Xiaoyu Zhang	6fbad22feb	Remove smoke wording from tests and comments (#23355 )	2026-04-28 12:05:27 +08:00
JoyFuture	1a55646dcd	[Feature] Xiaomi MiMo-V2.5-Pro day0 support (#23808 )	2026-04-28 11:43:29 +08:00
Baizhou Zhang	c1d1412333	[Flashinfer] Integrate flashinfer router gemm for sm103 (#23285 )	2026-04-27 20:37:56 -07:00
chenkaiyue	3066ba8167	fix(hicache): add retry logic for MooncakeStore warmup (#17195 ) Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-04-27 18:50:43 -07:00
cen121212	b1e1fe8eee	【NPU】【bugfix】accuracy fix when enable both nsa cp and prefixcache (#23268 )	2026-04-28 09:08:28 +08:00
看海的人	9ffc0cc67e	[NPU] Support GLM-4.5V (#22961 )	2026-04-28 09:08:19 +08:00
PiteXChen	4a04a9818e	【hicache】Optimize HiCache prefetch logic: adapt to remaining available memory when memory is insufficient (#16370 ) Signed-off-by: CLFutureX <chenyongqyl@163.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-04-27 17:18:35 -07:00
Byron Hsu	27659ea8c8	[PD+Pause] Remove redundant post processing (#23886 ) Co-authored-by: root <root@slurm-h200-206-011.slurm-compute.tenant-slurm.svc.cluster.local>	2026-04-27 17:01:26 -07:00
Byron Hsu	cb0429f253	[Disagg] Finalize routed_experts_output in process_batch_result_disagg_prefill (#23885 ) Co-authored-by: Byron Hsu <byron@periodiclabs.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:40:05 -07:00
Bi Xue	41181b6238	[sgl] copy mm_input in piecewise cuda graph when eagle3 is on (#23613 )	2026-04-27 13:35:19 -07:00
Kurt Shuster	f34c20af86	[VLM] Fix Kimi-K2.5 CPU path: rename grid_thws -> image_grid_thw (#23501 ) Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>	2026-04-27 20:34:28 +00:00
Vladislav Nosivskoy	28ee08c172	[HiCache] Add synchronization for context parallelism (#20460 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>	2026-04-28 02:13:07 +08:00
Jonah Bernard	4fc5ebf0b7	[Chore] Remove deadcode in prefill delayer (#23389 ) Co-authored-by: Jonah Bernard <96398205+Jonahcb@users.noreply.github.com>	2026-04-27 09:59:36 -07:00
ranjiewen	f2b84b90ac	[npu]fix: qwen3-next w8a8 precision bugs (#21698 )	2026-04-27 18:14:33 +08:00
Lianmin Zheng	8536d4b402	Clean up noisy startup warnings from third-party deps (#23669 )	2026-04-27 03:10:46 -07:00
Shenxiu Liu	a3fc982ba7	[Whisper] Automatic language detection via structured generation (#22997 ) Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-04-27 15:54:41 +08:00
Xiaoyu Zhang	5f47cae1a0	add H100 configs for GLM-4.7-Flash (#23719 )	2026-04-27 15:07:39 +08:00
Colin Z	d49561b8ae	[AMD] Fix Kimi-K2.6 Quark MXFP4 loading prefix and packed module mapping (#23408 )	2026-04-26 23:56:15 -07:00
Praneth Paruchuri	b7113cadb1	[Bug Fix] Reject pp_max_micro_batch_size=0 to prevent silent deadlock on generate() (#23799 )	2026-04-27 13:36:04 +08:00
Xinyuan Tong	e5198386bd	Upgrade transformers from 5.5.4 to 5.6.0 (#23525 )	2026-04-26 22:33:54 -07:00
Zheng Wengang	91825b8808	[FEAT][EPD] support encoder real health (#23343 )	2026-04-27 13:21:28 +08:00
AMD-yanfeiwang	5141d8ae21	[AMD]fix: use CUDA event for targeted draft-to-verify sync in EAGLE overlap (#21940 )	2026-04-26 21:58:34 -07:00
Bingxu Chen	d84470079d	[AMD] Fix Grok-2 nightly: avoid multimodal misdetection from auto-populated vision_config (#23383 )	2026-04-26 21:54:36 -07:00
Jia Guo	bead2e3470	perf: optimize PCG inductor path for FP8 models (redo of #21734 ) (#23227 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:34:27 -07:00
Byron Hsu	85376a6119	refactor(moe): centralize post-experts all-reduce skip predicate (#23748 ) Co-authored-by: Byron Hsu <byron@periodiclabs.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:29:59 -07:00
iridiumine	32c3513816	[NPU] Support MTP for Qwen3.5 (#20918 )	2026-04-27 10:44:17 +08:00
Kangyan-Zhou	35591c7d51	fix(lora): don't assert on non-LoRA lm_head adapter weights (#23433 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:10:07 -07:00
Mick	a392ae8879	[diffusion] feat: accelerate multiple-outputs generation (#23759 )	2026-04-27 01:47:33 +08:00
jianan-gu	10fd0faccd	[CPU] Add Qwen3.5 model optimization for CPU (#19484 ) Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> Co-authored-by: Ma Mingfei <mingfei.ma@intel.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2026-04-26 10:12:36 -07:00
Liwansi	7d49564431	[NPU]Fix support_triton bug (#23604 )	2026-04-26 21:34:56 +08:00
Cheng Wan	c7878dbb6d	[MoE] Deprecate act_and_mul_triton; fold filter_expert into JIT silu/gelu_and_mul (#23707 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 01:41:35 -07:00
Mick	d49a0377de	[diffusion] refactor: make timestep scheduler request-local (#23716 )	2026-04-26 15:59:53 +08:00
sglang-bot	9003f24e2b	chore: bump sglang-kernel version to 0.4.1.post1 (#23733 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com> Co-authored-by: Kangyan Zhou <zky314343421@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 23:23:49 -07:00
Byron Hsu	ba4e9d2ac2	Apply should_use_dp_reduce_scatterv guard to remaining MoE models (follow-up to #23731 ) (#23732 ) Co-authored-by: Byron Hsu <byronhsu@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>	2026-04-25 20:36:16 -07:00
Byron Hsu	71029abd64	Fix Qwen3 MoE: also guard EP all-reduce with not use_reduce_scatter (follow-up to #23731 ) (#23734 ) Co-authored-by: Byron Hsu <byron@periodiclabs.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 20:35:52 -07:00
Byron Hsu	99b59b279c	Fix Qwen3 MoE double-reduce when DP attention + EP + reduce_scatterv (#23729 ) (#23731 ) Co-authored-by: Byron Hsu <byronhsu@noreply.github.com>	2026-04-25 15:28:28 -07:00
AlbeeSo	e0a4522370	[typo] fix typo in parallel_state (#23710 )	2026-04-25 09:33:33 -07:00
Mick	03849496ad	jit_kernel: tolerate FA3 kernels without out arg (#23717 )	2026-04-25 23:42:33 +08:00
1874.	046c14a3ed	[NPU] Support GGUF quantization for Ascend NPU (dense + MoE) (#17883 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-04-25 17:16:47 +03:00
gjsheu	e708ea6d94	[diffusion] fix: restore cache-dit support for LTX2 (#23235 ) Co-authored-by: gengjinsong <gengjinsong@huawei.com>	2026-04-25 18:10:43 +08:00
Aleksi Vesanto	50ce2708ca	[diffusion] fix: Fix FLUX.1/2 graph breaks (#23648 )	2026-04-25 17:54:52 +08:00
kk	393252f514	[AMD] fused qk gemma norm kernels to reduce four kernels (#23575 ) Co-authored-by: root <root@smci355-ccs-aus-g12-26.cs-aus.dcgpu>	2026-04-25 00:30:01 -07:00
Артем Савкин	bd523dd60d	[NPU] [Bugfix] [Diffusion] Fixed gray images at the generation output (#23266 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2026-04-25 10:20:38 +03:00
Yujing	6175946db7	[Feature]Add MSProbe dump support in SGLang (#18349 )	2026-04-25 10:12:50 +03:00

1 2 3 4 5 ...

7993 Commits