sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
Prozac614	e3b581ce6b	[diffusion] fix: remove num_frames in wan2_1_t2v_1_3b_lora_1gpu test (#20009 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-03-06 21:36:43 +08:00
Kangyan-Zhou	25e678d933	[diffusion] endpoint: add /server_info and /model_info endpoints for gateway discovery (#20020 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 21:36:13 +08:00
inkcherry	84aaa69795	[AMD] Use bfloat16 for correction_bias in AITER FP8 path to avoid runtime dtype conversion for dsv3 (#19843 )	2026-03-06 00:57:12 -08:00
Clint	27053aa5ed	Fix MLA decode path returning unwritten (padded) rows (#19902 )	2026-03-06 00:54:29 -08:00
xdtbynd	0252ca8255	[Bugfix] Fix the bug blocking the startup of Llama-3.2-11b-Vision-Instruct (#19638 ) Co-authored-by: sglang-npu-bot <sglangnpu@163.com>	2026-03-06 16:21:50 +08:00
Zheng Wengang	da27d9bff6	[Bug-Fix][EPD]: skip log waiting-image-req for zmq_to_tokenzer/mooncake (#19555 )	2026-03-06 14:39:22 +08:00
Mook	be9a9e4819	refactor(multimodal/test): centralize model names and shared utilities in test_utils (#19354 ) Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>	2026-03-05 20:09:42 -08:00
Baizhou Zhang	51e5dc845a	Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005 )	2026-03-05 19:40:00 -08:00
sushil Dubey	6e5a2de354	[diffusion] fix: fix reading multiple prompts from prompt file (#19075 ) Signed-off-by: Sushil Dubey <sushil.dubey@intel.com>	2026-03-06 11:23:31 +08:00
Simo Lin	9502369488	fix(grpc): add server-side keepalive options to prevent GOAWAY (#19986 ) Signed-off-by: Simo Lin <linsimo.mark@gmail.com>	2026-03-05 18:56:35 -08:00
liupeng374	5471e4a492	[NPU][Feature] eliminate dsv3 redundant rotary embed calculation (#19842 )	2026-03-06 09:02:14 +08:00
chenxu214	b912d7ae19	[OPT]Skip the first delayer to maximize the BS of the decoding. (#19836 )	2026-03-06 08:53:19 +08:00
shadowxz109	261be85ecc	Support mrope_position_delta cache	2026-03-06 08:50:53 +08:00
Xinyuan Tong	9ebffef1ef	[FIX] NSA backend page_table overflow in speculative decoding target_verify (#19016 )	2026-03-05 16:04:58 -08:00
Ajay Anubolu	13af7cbb02	fix: use consistent time denominator for throughput metrics in bench_one_batch_server (#19223 )	2026-03-05 15:58:17 -08:00
Chang Su	dd2bbe6d62	fix(grpc): use context.abort() with proper status codes instead of in-band errors (#19972 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-03-05 14:53:18 -08:00
Qiaolin Yu	46dced64ea	Adjust padding size to improve triton_kernels moe performance (#19174 )	2026-03-05 14:50:40 -08:00
kpham-sgl	346a4131cf	[Spec] Refactor NaN/OOB checks to async `maybe_detect_*` with env-var control (#19899 ) Co-authored-by: hnyls2002 <lsyincs@gmail.com>	2026-03-05 13:51:05 -08:00
Xinyu Zhang	b3cfad0a80	Add Ray actor support for scheduler process management (DP=1) (#17684 ) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-03-05 13:21:23 -08:00
sglang-bot	ebb66cc1de	[misc] Priority scheduling metrics cleanup (#19927 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 12:42:42 -08:00
danielafrimi	ff6048fb9c	rename nemotron reasoning parser (#19865 ) Signed-off-by: dafrimi <dafrimi@nvidia.com>	2026-03-05 11:27:07 -08:00
Mohammad Miadh Angkad	41fd53fe37	Fix `profile_activities` parameter name in `bench_one_batch_server_internal.py` (#19954 )	2026-03-05 10:34:06 -08:00
akhilg-nv	73d272bddb	Revised fix for HybridAttnBackend forward for linear attn (#19369 )	2026-03-06 00:05:35 +08:00
Zheng Wengang	0de0d74195	[EPD][Feat]support adaptive forward (#18118 )	2026-03-05 21:12:30 +08:00
StonyPort	806d41ab65	[quant] fix fp32 downcasting (#19844 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>	2026-03-05 17:54:59 +08:00
Rain Jiang	472eef4071	fa4 cleanup (#19727 )	2026-03-05 17:54:25 +08:00
Chi McIsaac	c36de62bfc	[diffusion] fix images/edit with 2 images (#17520 ) Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-05 16:56:39 +08:00
xingsy97	dbc896f204	[Test] Enhance JIT kvcache store kernel test coverage (#19630 )	2026-03-05 16:17:15 +08:00
Tiwei Bie	727face6c2	[DLLM] Add initial radix cache support (#18724 )	2026-03-04 23:24:09 -08:00
Kalyan Kumar	c1df359b44	Add XPU profiler activity support in benchmark code (#12981 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 23:22:56 -08:00
Mohammad Miadh Angkad	2bdd89a6cd	[Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437 )	2026-03-05 15:22:28 +08:00
Yilong Zhao	1bbfed0539	[misc] add env for http keep alive timeout (#19847 )	2026-03-04 22:00:51 -08:00
Chenxi Li	86c5617787	[BUG]: fix prevent illegal memory access in Mamba SSM tracking during EAGLE speculative verification (#19415 ) Co-authored-by: ConnorLi96 <ConnorLi96@users.noreply.github.com>	2026-03-04 21:13:21 -08:00
Baizhou Zhang	10c65df48a	[Bug] Fix lora tp bug on H200 (#19769 )	2026-03-04 20:11:02 -08:00
Xinyi Song	0e6a64712a	[bugfix] Fix PPMissingLayer AttributeError when Using PP (#19804 )	2026-03-04 19:48:15 -08:00
Kangyan-Zhou	198381d9ce	Add SSL/TLS support for HTTP and gRPC servers (#18973 ) Co-authored-by: guys@spotify.com	2026-03-04 19:27:16 -08:00
Junhao Liu	9c11a7ae40	[diffusion] fix: fix the frame interpolation testcase in CI regarding number of frames (#19659 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-05 11:21:53 +08:00
R0CKSTAR	fc53307ce9	[diffusion] hardware: SiluAndMul/RMSNorm/LayerNorm MUSA implementations (custom ops, 12/N) (#18583 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Qingfu Wen <qingfu.wen@mthreads.com>	2026-03-05 11:10:57 +08:00
Xiaoyu Zhang	9795b4cd5b	[Diffusion] Open t5 encoder parallel folding for wan2.2 and mova video (#18493 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-05 10:18:00 +08:00
Ethan (Yusheng) Su	e555a6c171	[feat] Enhance lora_update_weight_from_tensor for RL training (#19314 )	2026-03-04 18:10:42 -08:00
Shu Wang	43bdee703e	Fix Fp8 MTP layer a2a backend without EP. (#18515 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2026-03-04 16:28:10 -08:00
Liangsheng Yin	33c92732f4	[Triton] Use dynamic loop bound in `alloc_extend_kernel` (#19898 )	2026-03-04 16:15:58 -08:00
rakesh	a710b7d791	[Sarvam] Add inference support for Sarvam MoE LLMs (#18938 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 15:28:00 -08:00
kpham-sgl	376dfb03f7	Fix issue 19717 by making `qo_indptr` uniform strided instead of packed (#19807 )	2026-03-04 15:27:10 -08:00
zhuxinjie-nz	28c931e1a5	feat: Priority-based scheduling optimization (including default priority, preemption toggle, priority-based metrics, etc.) (#17026 ) Co-authored-by: hnyls2002 <lsyincs@gmail.com>	2026-03-04 14:52:08 -08:00
hlu1	9457c049e1	[Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17B-NVFP4 (#19391 )	2026-03-04 14:01:25 -08:00
Chang Su	0ee9d3c8e9	fix(grpc): send last chunk before completion during streaming (#19895 )	2026-03-04 13:21:21 -08:00
Bingxu Chen	329817e262	[AMD] Move get_global_server_args import out of CUDA-only block to fix NameError on AMD (#19866 )	2026-03-04 10:23:42 -08:00
Ken J	44208d2adf	[vlm][minicpm] support input formats of processor output and embedding (#19614 )	2026-03-04 12:11:12 -05:00
Kangyan-Zhou	c03deb8175	Fix disagg PD bootstrap and KV transfer metrics (#19009 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 09:08:10 -08:00

... 20 21 22 23 24 ...

7855 Commits