sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
YAMY	cfead25bbf	[Qwen3.5] mamba slice fix (Prefill TP != Decode TP & decode TP size>1) (#20655 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2026-03-17 19:30:58 +08:00
AMD-yanfeiwang	966ae87d02	[AMD] avoid correction_bias_dtype dtype convert (#20692 )	2026-03-17 02:55:05 -07:00
Liangsheng Yin	5270a06488	[Disagg] Fix health check false-positive in disagg `is_fully_idle` (#20756 )	2026-03-17 17:18:54 +08:00
Duyi-Wang	385a35bd11	[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars. (#20647 )	2026-03-17 01:13:42 -07:00
Junhao Liu	ee106757df	[diffusion] fix: fix Diffusers backend ignores model-specific sampling parameter (#20080 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-17 16:10:46 +08:00
akhilg-nv	9a697ceabb	[Fix #20389 ] Illegal memory access in triton attention for large token counts (#20390 )	2026-03-17 00:42:11 -07:00
Ratish P	e3277b3be2	[diffusion]: remove stale offload-manager in LTX2 AV denoising (#20624 )	2026-03-17 15:14:00 +08:00
DefTruth	025691cd9e	[diffusion] chore: bump up cache-dit & support quant for diffusers backend (#20361 )	2026-03-17 12:51:31 +08:00
Rocky Song	079a1fd35e	[Bugfix] Fix write-through events not processed when scheduler is idle (#20560 )	2026-03-16 21:49:59 -07:00
Shangming Cai	5d5c31c6e4	[PP] Add CP pyobj broadcasting when enable dynamic CPP (#20738 )	2026-03-17 12:20:11 +08:00
MMuzzammil1	855ec7017d	Add check to provide hicache-storage-backend when enabling kv caching on Decode Side in PD Disaggregation (#20732 ) Signed-off-by: Mohd Muzzammil <me.muzzammil@samsung.com>	2026-03-17 11:25:14 +08:00
Hubert Lu	943f34f642	Add NCCL/RCCL pre-warming to reduce P99 TTFT cold-start latency (#20477 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 20:23:14 -07:00
Jay Shaik	e4d06b3db2	Fix /generate JSON serialization for non-finite top_logprobs (#20714 )	2026-03-16 20:07:12 -07:00
shuwenn	515b3a323d	feat: support human-readable suffixes (25.6k, 1M, 1Mi) for token CLI (#20577 )	2026-03-16 20:05:33 -07:00
psaab	9f56b471aa	[Network] Use `NetworkAddress` for `dist_init_method` and loopback fallbacks (#20657 ) Co-authored-by: hnyls2002 <lsyincs@gmail.com>	2026-03-16 19:59:49 -07:00
Jason Yao	4dbec2dd2b	[typo] Fix typos in comments and log messages in common.py (#20723 )	2026-03-16 19:26:59 -07:00
Qiaolin Yu	7d87a6a071	Fix spec v1 token_ids_logprobs (#20718 )	2026-03-16 19:23:28 -07:00
Mick	474a851ae3	[diffusion] fix: fix sampling params incorrectly override in cli (#20689 )	2026-03-17 08:48:10 +08:00
Mick	1eea744855	[diffusion] CI: enable UT (#20690 )	2026-03-17 07:44:04 +08:00
roikoren755	5ef5806160	[Nemotron] Small reasoning parser fix (#20284 )	2026-03-16 13:29:40 -07:00
Bruce Wu	70a6fb53af	Enable embedding lookup/lora_a logic for chunked backend (#17692 ) Co-authored-by: Bruce Wu <mogicianwu@fb.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>	2026-03-16 11:37:58 -07:00
Douglas Yang	061ec582bf	fix: adding teacache.params back to sampling params as intended (#20665 )	2026-03-16 11:27:06 -07:00
ybyang	289cbcf482	fix: support PP2+CP8+TP8 (PP with context parallelism) (#19548 )	2026-03-16 16:51:47 +00:00
Xiaoyu Zhang	6489f77733	[Diffusion] Fix compile graph broken by flashinfer rope (#20699 )	2026-03-16 23:14:27 +08:00
Du Bin	d3c0f4376a	Fix AssertionError crash in disagg prefill inflight queue with PP (#20686 )	2026-03-16 22:38:59 +08:00
Xiaoyu Zhang	15097c5c3b	Release sglang kernel 0.4.0 (#20440 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-16 20:34:58 +08:00
sky	3d58cd16d9	[DP Attention] Optimize dp_padding_mode selection for dp_size=1 in extend mode (#20406 ) Signed-off-by: wangfakang <fakangwang@gmail.com>	2026-03-16 18:44:42 +08:00
Xun Sun	549fbcc864	[5/N] (Elastic EP) Use GPU P2P to exchange expert weights during EPLB as much as possible (#12068 ) Co-authored-by: Hank Han <hanhan.hank@bytedance.com> Co-authored-by: Hank Han <hanhan7630@outlook.com>	2026-03-16 18:40:58 +08:00
Xiaoyu Zhang	3055b6906d	[Diffusion] Document torch.compile graph-break checks in diffusion benchmark skills (#20681 )	2026-03-16 17:41:40 +08:00
Mick	485597e651	[diffusion] fix: fix some sampling args passed via cli are omitted (#20630 )	2026-03-16 16:55:30 +08:00
Sugar920	895e56097c	Add NPU basic function testcases (#19382 ) Co-authored-by: cy <chenyang08056032@163.com> Co-authored-by: Cherry_ming <136634645@qq.com>	2026-03-16 15:09:56 +08:00
shuwenn	42f18fe560	[HiCache] fix: release write-through lock_ref during decode (#20049 )	2026-03-16 14:49:31 +08:00
Ke Bao	39336f5812	Precompute swa cache location (#20449 )	2026-03-16 14:38:08 +08:00
Zheng Wengang	135af6dc92	[EPD][VLM] support video/audio input (#17824 ) Co-authored-by: siyu <liusy58@linux.alibaba.com>	2026-03-16 14:18:21 +08:00
Shangming Cai	738cbde902	[PD] Make pending reqs resolving more robust (#20505 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-16 14:12:13 +08:00
pansicheng	97b2a89334	[RadixTree][8/N Refactor]: unify lock interface (#20330 )	2026-03-16 11:49:51 +08:00
Liangsheng Yin	f0458e0b49	[Utils] Move network/socket utilities from `common.py` to `network.py` (#20646 )	2026-03-15 20:35:24 -07:00
Javier Torres	afc71bae3a	feat: Add 'none' reasoning effort to ChatCompletionRequest (#20556 )	2026-03-15 20:25:48 -07:00
gaopengff	f4393bf3f6	Fix correctness test issue for bench_one_batch (#20650 )	2026-03-15 20:05:36 -07:00
Xiaoyu Zhang	e1eb25880f	[Diffusion] Add a benchmark for rmsnorm/fuse_add_rmsnorm (#20632 )	2026-03-16 09:50:33 +08:00
Zhirui	35c249b4de	[OpenAI] Log raw request payload for --log-requests (#20605 )	2026-03-15 17:45:00 -07:00
Liangsheng Yin	d852f26cb6	Fix dual-stack socket handling: `IPV6_V6ONLY`, IPv4-first, `is_port_available` all-family check (#20643 )	2026-03-15 17:17:23 -07:00
jellysnack	53f831691a	fix: propagate grammar errors and improve llguidance backend (#20467 )	2026-03-15 16:11:18 -07:00
psaab	1145805e7d	Fix socket utilities and reserve_port for IPv6 dual-stack support (#20491 ) Co-authored-by: hnyls2002 <lsyincs@gmail.com>	2026-03-15 14:29:10 -07:00
Ke Bao	e2be31824f	[CI] Add ut coverage tool (#20628 )	2026-03-15 21:13:45 +08:00
Yuhao Yang	1c456a0af5	VLM: add Conv2dLayer/Conv3dLayer to fix PyTorch 2.9.1 CuDNN Conv3d (#20282 ) Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2026-03-15 19:17:44 +08:00
Kit Fraser-Taliente	7c773ddb0a	[Fix] Slice input_embeds to extend_input_len in prepare_for_extend (#20376 )	2026-03-15 00:07:05 -07:00
Juan Muneton	7458407437	Fix InternVL and vision attention for non-CUDA backends (e.g. XPU) (#19997 ) Co-authored-by: Yang Wang <mr.yang.wang@outlook.com>	2026-03-14 23:24:41 -07:00
shuwenn	1ac6a26464	fix: Nemotron chunk size alias (#20458 )	2026-03-14 23:23:39 -07:00
Liangsheng Yin	fc7f9c1de7	Rename --stream-output to --incremental-streaming-output (#20614 )	2026-03-14 23:22:33 -07:00

1 2 3 4 5 ...

7039 Commits