sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-01 20:27:57 +00:00

Author	SHA1	Message	Date
MARATRIX	069d4c577b	Fix Kimi K2.5 PP layer range exposure for PD disaggregation (#19959 ) Signed-off-by: yafeng.li <yafeng.li@mthreads.com>	2026-03-06 16:14:02 -08:00
Liangsheng Yin	ddcecdea49	[Core] Unify `max_num_reqs` `dp_size` division for pool sizing (#20063 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-06 16:12:59 -08:00
Kangyan-Zhou	7a12255b6e	fix: set first_token_time before computing decode_throughput for single-batch completions (#19984 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 16:11:41 -08:00
Aurick Qiao	5c8e28698c	Add cleanup for _ATTN_TP in parallel_state.py (#19978 )	2026-03-06 15:43:31 -08:00
Shu Wang	61de303f0a	Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe (#19189 )	2026-03-06 15:15:04 -08:00
Kangyan-Zhou	e89069ee64	Fallback to torch.cuda.mem_get_info() when nvidia-smi is unavailable (#18957 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 15:00:08 -08:00
Liangsheng Yin	604db4471d	[Core] Clarify memory variable naming in model runner (#20060 )	2026-03-06 14:00:46 -08:00
Liangsheng Yin	7a6cf0e9ba	[Core] Extract `_calculate_mamba_ratio` and `_init_pools` from `init_memory_pool` (#20058 )	2026-03-06 13:37:22 -08:00
Mohammad Miadh Angkad	759700c808	Fix SM120 `triton_kernels` MXFP4 `block_k` for GPT-OSS (#20040 )	2026-03-06 10:53:08 -08:00
R0CKSTAR	de1a0afcbc	[MUSA][10/N] Add GGUF support (#18357 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2026-03-06 10:50:35 -08:00
JohnHerry	e8f2b80340	[diffusion] improve: improve code readability of DenoisingStage (#20003 )	2026-03-06 23:23:44 +08:00
xingsy97	54634b9a40	[Kernel] Dispatch exp/sin/cos through dtype_trait (#19798 )	2026-03-06 22:57:52 +08:00
Johnsonms	2d266c73ea	Migrate renorm kernels from sgl-kernel to FlashInfer JIT (#18854 )	2026-03-06 22:53:28 +08:00
Xiaoyu Zhang	6d22c9f369	[Diffusion] Move hf kernels diffusion cuda kernels skills to SGLD (#20001 ) Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-06 22:16:06 +08:00
Yuan Luo	f7de9375ac	[GDN][Qwen3-Next][Qwen3.5] Fuse fused_gdn_gating and fused_recurrent_gated_delta_rule_update in verify_target (#19775 )	2026-03-06 21:42:44 +08:00
Prozac614	e3b581ce6b	[diffusion] fix: remove num_frames in wan2_1_t2v_1_3b_lora_1gpu test (#20009 ) Co-authored-by: daiweitao <dwti614707404@163.com>	2026-03-06 21:36:43 +08:00
Kangyan-Zhou	25e678d933	[diffusion] endpoint: add /server_info and /model_info endpoints for gateway discovery (#20020 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 21:36:13 +08:00
inkcherry	84aaa69795	[AMD] Use bfloat16 for correction_bias in AITER FP8 path to avoid runtime dtype conversion for dsv3 (#19843 )	2026-03-06 00:57:12 -08:00
Clint	27053aa5ed	Fix MLA decode path returning unwritten (padded) rows (#19902 )	2026-03-06 00:54:29 -08:00
xdtbynd	0252ca8255	[Bugfix] Fix the bug blocking the startup of Llama-3.2-11b-Vision-Instruct (#19638 ) Co-authored-by: sglang-npu-bot <sglangnpu@163.com>	2026-03-06 16:21:50 +08:00
Zheng Wengang	da27d9bff6	[Bug-Fix][EPD]: skip log waiting-image-req for zmq_to_tokenzer/mooncake (#19555 )	2026-03-06 14:39:22 +08:00
Mook	be9a9e4819	refactor(multimodal/test): centralize model names and shared utilities in test_utils (#19354 ) Co-authored-by: Ratish P <114130421+Ratish1@users.noreply.github.com>	2026-03-05 20:09:42 -08:00
Baizhou Zhang	51e5dc845a	Revert "[Kernel Slimming] Migrate NVFP4 kernels to JIT" (#20005 )	2026-03-05 19:40:00 -08:00
sushil Dubey	6e5a2de354	[diffusion] fix: fix reading multiple prompts from prompt file (#19075 ) Signed-off-by: Sushil Dubey <sushil.dubey@intel.com>	2026-03-06 11:23:31 +08:00
Simo Lin	9502369488	fix(grpc): add server-side keepalive options to prevent GOAWAY (#19986 ) Signed-off-by: Simo Lin <linsimo.mark@gmail.com>	2026-03-05 18:56:35 -08:00
liupeng374	5471e4a492	[NPU][Feature] eliminate dsv3 redundant rotary embed calculation (#19842 )	2026-03-06 09:02:14 +08:00
chenxu214	b912d7ae19	[OPT]Skip the first delayer to maximize the BS of the decoding. (#19836 )	2026-03-06 08:53:19 +08:00
shadowxz109	261be85ecc	Support mrope_position_delta cache	2026-03-06 08:50:53 +08:00
Xinyuan Tong	9ebffef1ef	[FIX] NSA backend page_table overflow in speculative decoding target_verify (#19016 )	2026-03-05 16:04:58 -08:00
Ajay Anubolu	13af7cbb02	fix: use consistent time denominator for throughput metrics in bench_one_batch_server (#19223 )	2026-03-05 15:58:17 -08:00
Chang Su	dd2bbe6d62	fix(grpc): use context.abort() with proper status codes instead of in-band errors (#19972 ) Signed-off-by: Chang Su <chang.s.su@oracle.com>	2026-03-05 14:53:18 -08:00
Qiaolin Yu	46dced64ea	Adjust padding size to improve triton_kernels moe performance (#19174 )	2026-03-05 14:50:40 -08:00
kpham-sgl	346a4131cf	[Spec] Refactor NaN/OOB checks to async `maybe_detect_*` with env-var control (#19899 ) Co-authored-by: hnyls2002 <lsyincs@gmail.com>	2026-03-05 13:51:05 -08:00
Xinyu Zhang	b3cfad0a80	Add Ray actor support for scheduler process management (DP=1) (#17684 ) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-03-05 13:21:23 -08:00
sglang-bot	ebb66cc1de	[misc] Priority scheduling metrics cleanup (#19927 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 12:42:42 -08:00
danielafrimi	ff6048fb9c	rename nemotron reasoning parser (#19865 ) Signed-off-by: dafrimi <dafrimi@nvidia.com>	2026-03-05 11:27:07 -08:00
Mohammad Miadh Angkad	41fd53fe37	Fix `profile_activities` parameter name in `bench_one_batch_server_internal.py` (#19954 )	2026-03-05 10:34:06 -08:00
akhilg-nv	73d272bddb	Revised fix for HybridAttnBackend forward for linear attn (#19369 )	2026-03-06 00:05:35 +08:00
Zheng Wengang	0de0d74195	[EPD][Feat]support adaptive forward (#18118 )	2026-03-05 21:12:30 +08:00
StonyPort	806d41ab65	[quant] fix fp32 downcasting (#19844 ) Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>	2026-03-05 17:54:59 +08:00
Rain Jiang	472eef4071	fa4 cleanup (#19727 )	2026-03-05 17:54:25 +08:00
Chi McIsaac	c36de62bfc	[diffusion] fix images/edit with 2 images (#17520 ) Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-03-05 16:56:39 +08:00
xingsy97	dbc896f204	[Test] Enhance JIT kvcache store kernel test coverage (#19630 )	2026-03-05 16:17:15 +08:00
Tiwei Bie	727face6c2	[DLLM] Add initial radix cache support (#18724 )	2026-03-04 23:24:09 -08:00
Kalyan Kumar	c1df359b44	Add XPU profiler activity support in benchmark code (#12981 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-04 23:22:56 -08:00
Mohammad Miadh Angkad	2bdd89a6cd	[Kernel Slimming] Migrate NVFP4 kernels to JIT (#19437 )	2026-03-05 15:22:28 +08:00
Yilong Zhao	1bbfed0539	[misc] add env for http keep alive timeout (#19847 )	2026-03-04 22:00:51 -08:00
Chenxi Li	86c5617787	[BUG]: fix prevent illegal memory access in Mamba SSM tracking during EAGLE speculative verification (#19415 ) Co-authored-by: ConnorLi96 <ConnorLi96@users.noreply.github.com>	2026-03-04 21:13:21 -08:00
Baizhou Zhang	10c65df48a	[Bug] Fix lora tp bug on H200 (#19769 )	2026-03-04 20:11:02 -08:00
Xinyi Song	0e6a64712a	[bugfix] Fix PPMissingLayer AttributeError when Using PP (#19804 )	2026-03-04 19:48:15 -08:00

1 2 3 4 5 ...

6820 Commits