sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 12:17:09 +00:00

Author	SHA1	Message	Date
shuwenn	42f18fe560	[HiCache] fix: release write-through lock_ref during decode (#20049 )	2026-03-16 14:49:31 +08:00
Ke Bao	39336f5812	Precompute swa cache location (#20449 )	2026-03-16 14:38:08 +08:00
Zheng Wengang	135af6dc92	[EPD][VLM] support video/audio input (#17824 ) Co-authored-by: siyu <liusy58@linux.alibaba.com>	2026-03-16 14:18:21 +08:00
Shangming Cai	738cbde902	[PD] Make pending reqs resolving more robust (#20505 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-16 14:12:13 +08:00
pansicheng	97b2a89334	[RadixTree][8/N Refactor]: unify lock interface (#20330 )	2026-03-16 11:49:51 +08:00
Liangsheng Yin	f0458e0b49	[Utils] Move network/socket utilities from `common.py` to `network.py` (#20646 )	2026-03-15 20:35:24 -07:00
Javier Torres	afc71bae3a	feat: Add 'none' reasoning effort to ChatCompletionRequest (#20556 )	2026-03-15 20:25:48 -07:00
gaopengff	f4393bf3f6	Fix correctness test issue for bench_one_batch (#20650 )	2026-03-15 20:05:36 -07:00
Xiaoyu Zhang	e1eb25880f	[Diffusion] Add a benchmark for rmsnorm/fuse_add_rmsnorm (#20632 )	2026-03-16 09:50:33 +08:00
Zhirui	35c249b4de	[OpenAI] Log raw request payload for --log-requests (#20605 )	2026-03-15 17:45:00 -07:00
Liangsheng Yin	d852f26cb6	Fix dual-stack socket handling: `IPV6_V6ONLY`, IPv4-first, `is_port_available` all-family check (#20643 )	2026-03-15 17:17:23 -07:00
jellysnack	53f831691a	fix: propagate grammar errors and improve llguidance backend (#20467 )	2026-03-15 16:11:18 -07:00
psaab	1145805e7d	Fix socket utilities and reserve_port for IPv6 dual-stack support (#20491 ) Co-authored-by: hnyls2002 <lsyincs@gmail.com>	2026-03-15 14:29:10 -07:00
Ke Bao	e2be31824f	[CI] Add ut coverage tool (#20628 )	2026-03-15 21:13:45 +08:00
Yuhao Yang	1c456a0af5	VLM: add Conv2dLayer/Conv3dLayer to fix PyTorch 2.9.1 CuDNN Conv3d (#20282 ) Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2026-03-15 19:17:44 +08:00
Kit Fraser-Taliente	7c773ddb0a	[Fix] Slice input_embeds to extend_input_len in prepare_for_extend (#20376 )	2026-03-15 00:07:05 -07:00
Juan Muneton	7458407437	Fix InternVL and vision attention for non-CUDA backends (e.g. XPU) (#19997 ) Co-authored-by: Yang Wang <mr.yang.wang@outlook.com>	2026-03-14 23:24:41 -07:00
shuwenn	1ac6a26464	fix: Nemotron chunk size alias (#20458 )	2026-03-14 23:23:39 -07:00
Liangsheng Yin	fc7f9c1de7	Rename --stream-output to --incremental-streaming-output (#20614 )	2026-03-14 23:22:33 -07:00
shuwenn	538acb4c46	fix: Add .text property to HttpResponse to prevent AttributeError (#20518 )	2026-03-14 22:59:32 -07:00
Yuhao Yang	a6ecf050be	diffusion: fix helios accuracy issue (#20036 )	2026-03-15 13:55:51 +08:00
sglang-bot	93afe15b43	chore: bump flashinfer version to 0.6.6 (#20480 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2026-03-14 13:05:10 -07:00
Xiaowei Wang	574dbe23b2	Add piecewise cuda graph for Qwen3-Next FP8 flashinfer_trtllm moe backend (#18184 )	2026-03-14 13:03:31 -07:00
Baizhou Zhang	39008955ff	Revert "[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars." (#20602 )	2026-03-14 12:12:42 -07:00
Xiaoyu Zhang	5ab2cfe9a8	[Diffusion] Clean upstream fa3 in hopper (#20576 )	2026-03-14 23:41:23 +08:00
Yuan Luo	22e67876d6	[Omni] Optimize AudioEncoder for Qwen3_Omni_Thinker (#18185 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-14 23:00:17 +08:00
Ratish P	574aa2d723	[diffusion]: remove stale offload-manager cleanup in denoising stage (#20587 )	2026-03-14 22:56:57 +08:00
Xiaoyu Zhang	25e38216b6	[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277 )	2026-03-14 16:45:54 +08:00
Mohammad Miadh Angkad	75a7879fd4	[Model] Support Nemotron 3 Super NVFP4 (#20407 )	2026-03-14 00:56:26 -07:00
SoluMilken	c95dc88f86	[CI] migrate ascend-gptq from `test/srt` to `test/registered` (#19628 )	2026-03-14 00:28:57 -07:00
Xiaoyu Zhang	f9e4221b71	[Diffusion] add mova and hunyuanvideo to perf skills (#20563 )	2026-03-14 13:49:50 +08:00
Shangming Cai	99a3b25c9b	[PP] Fix recv tensor dict potential race condition (#20341 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2026-03-14 13:35:01 +08:00
Xinyuan Tong	c330b687a1	[Bugfix] Fix GLM-4.6V vision regression in glm4v_moe and glm_ocr (#20463 )	2026-03-13 21:48:28 -07:00
ziruiliu	dfd0a77a9a	[bugfix] Add prev_prefix_len parameter to HiMambaRadixCache's _insert_helper() (#20539 )	2026-03-14 09:54:14 +08:00
Duyi-Wang	0eea80bc00	[AMD][MORI] Fix MTP crash with FP4/FP8 dispatch and add NEXTN dispatch env vars. (#20453 )	2026-03-13 14:03:17 -07:00
YC Tseng	c37ef7f18b	[AMD] diffusion refactor: move ROCM VAE optimization to Platform abstraction (#20496 )	2026-03-13 13:10:05 -07:00
Simo Lin	654fc02cf1	[gRPC] Extract gRPC servicer into standalone package (#20478 ) Signed-off-by: Simo Lin <linsimo.mark@gmail.com>	2026-03-13 09:13:29 -07:00
Xiaoyu Zhang	be7a0311a0	[Diffusion] Fix and validate diffusion skills benchmarking/profiling workflow (#20528 )	2026-03-13 21:11:37 +08:00
Leon Gao	b1246c50f8	Fix chunked prefill and KV cache leaks for streaming sessions (#20476 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: hnyls2002 <lsyincs@gmail.com>	2026-03-13 02:36:55 -07:00
Ke Bao	287dc12b05	Fix hicache log metrics (#20504 )	2026-03-13 16:29:58 +08:00
Baizhou Zhang	f8668d9e78	[Fix] Add fallback for flashinfer allreduce fusion (#20384 )	2026-03-13 01:24:55 -07:00
Mick	b638b25b22	[diffusion] UX: suppress excessive logging from httpx and httpcore (#20452 )	2026-03-13 14:43:09 +08:00
seungrokj	9c8777c80f	[AMD][Qwen3.5] aiter a8w8 gemm configuration (#19826 ) Signed-off-by: seungrokj <seungrok.jung@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com>	2026-03-12 23:23:58 -07:00
Antonin Vidon	63ecdcbb18	Expose async LoRA interface to Offline Engine (#18636 )	2026-03-12 23:09:47 -07:00
StonyPort	d4e68ead1d	[quant] Ignore FP8 quantization layers (#20340 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-13 13:59:39 +08:00
Xiaoyu Zhang	e00328d1e5	[Diffusion] Opt qwen-image-edit with fuse_residual_layernorm_scale_shift_gate_select01_kernel (#20395 ) Co-authored-by: Yihan Chen <yingluosanqian@gmail.com>	2026-03-13 13:15:22 +08:00
hzh0425	197f807134	[RadixTree][7/N Refactor]: Refactor mamba radix tree, release dup kvcache in insert func (#19429 )	2026-03-13 12:28:32 +08:00
Liangsheng Yin	f605612b87	[HTTP] Fix `/GET` HTTP route when ollama endpoint is not set. (#20494 )	2026-03-12 20:54:32 -07:00
Xiaoyu Zhang	7ecf07b8f4	[jit_kernel] Temporarily Skip Flaky JIT Kernel GDN Test and Add PR Label (#20436 )	2026-03-13 09:34:22 +08:00
Pai Liu	65dd08153d	Fix Test* mixin classes being collected as standalone pytest tests (#20417 )	2026-03-12 18:18:45 -07:00

1 2 3 4 5 ...

7008 Commits