sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 13:57:04 +00:00

Author	SHA1	Message	Date
fzyzcjy	cc63c99f11	Enhance hook mechanism in dumper (#19073 )	2026-02-22 16:13:38 +08:00
fzyzcjy	fdf80b5031	Extract framework plugins in dumper (#19072 )	2026-02-22 16:10:43 +08:00
fzyzcjy	e32b5364a2	Auto annotate context in dumper (#19071 )	2026-02-22 16:08:48 +08:00
fzyzcjy	8bc0751376	Support enabling partial non intrusive dump in dumper (#19069 )	2026-02-22 16:07:45 +08:00
fzyzcjy	0384c459a7	Support non-intrusive dumping in dumper (#19068 )	2026-02-22 16:04:02 +08:00
fzyzcjy	5eccc3cff9	Refactor dumper and change on_forward_pass_start API (#19065 )	2026-02-22 16:03:27 +08:00
Shangming Cai	eccee4c48e	[PD] Change bootstrap_room metadata dtype from int64 to uint64 (#19141 )	2026-02-22 14:20:16 +08:00
YAMY	5995bfec63	[Qwen3-Next] Enable fused_qkvzba_split_reshape_cat also for prefill (#18917 )	2026-02-22 13:57:17 +08:00
Qiaolin Yu	8cf003c44b	Fix spec v2+dp attention in nsa backend (#19134 )	2026-02-22 13:46:15 +08:00
Bhavneek Singh	8612358ee9	[BUG] [DLLM] Missing max_running_requests value (#18740 )	2026-02-22 13:38:56 +08:00
fy1214	ae62898ffb	[diffusion] quant: adapt FP8 linear to sgld and support quant in flux (#17023 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-22 12:55:28 +08:00
YAMY	cef353f338	[Fix] Quick fix for int32 overflow in Mooncakes' send_kvcache_slice (#19076 )	2026-02-22 12:00:33 +08:00
Liangsheng Yin	4653939cda	Revert "[jit kernel] Support per_token_group_quant_8bit jit kernel" (#19131 )	2026-02-22 07:54:24 +08:00
Liangsheng Yin	1f2da824dd	[Benchmark] Remove re-exports from bench_serving.py (#19130 )	2026-02-21 14:30:30 -08:00
Ratish P	f158869c2c	[Refactor] Benchmark Phase 1: extract utils and datasets from bench_serving (#19077 ) Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>	2026-02-21 13:50:11 -08:00
Mohammad Miadh Angkad	7b0fb43c7a	[FlashInfer] Switch FlashInfer allreduce fusion to unified API (#18341 )	2026-02-22 00:07:16 +08:00
Bi Xue	bf36aa4c31	[sgl] view could hold the memory too long and introduced large memory (#19109 )	2026-02-21 23:40:56 +08:00
Xinyuan Tong	677b66af80	fix KimiK2Detector regex patterns with re.DOTALL (#19120 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-02-21 22:13:08 +08:00
Xinyuan Tong	4a362a0e04	fix tool handling in OpenAIServingChat (#18996 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-02-21 22:07:09 +08:00
Xiaoyu Zhang	66497ab0aa	[Diffusion] Restruct and clean Diffusion rotary embedding (#19064 )	2026-02-21 21:41:47 +08:00
DarkSharpness	d8d0208c63	[Feature] rewrite rope kernel; remove flashinfer dependencies (#18844 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-21 21:32:40 +08:00
Mick	6503f94211	[diffusion] feat: support passing component path via server args (#19108 )	2026-02-21 21:22:47 +08:00
Xinyuan Tong	cc451671b5	[FEAT] Add Anthropic compatible API endpoint (#18630 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-02-21 19:37:38 +08:00
Mick	b89ca65789	[diffusion] refactor: reduce redundancy and improve stage api (#19060 )	2026-02-21 16:35:47 +08:00
danielafrimi	33c33a7de9	[Quantization] Support config.json quantization_config format, fix exclude_modules matching, and fix KV cache scale loading for Nemotron (#18546 ) Signed-off-by: root <dafrimi@nvidia.com>	2026-02-21 16:14:29 +08:00
Nicolas Castet	51b3ed02ca	Fix bug in symm mem pre-allocation default (#19082 )	2026-02-21 11:51:28 +08:00
Vladislav Nosivskoy	afd91e8782	[DSv32] Fix MTP and CP compatability (#19062 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>	2026-02-21 11:10:34 +08:00
Lianmin Zheng	463baafe10	[Auto Sync] Update batch_invariant_ops.py (20260221) (#19098 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>	2026-02-20 18:27:31 -08:00
Lianmin Zheng	2928dfb8fa	[Auto Sync] Update bench_one_batch_server_internal.py (20260221) (#19097 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2026-02-20 18:19:14 -08:00
Cheng Wan	84c67c8be0	Refactor graph input buffers (#18991 )	2026-02-20 18:09:31 -08:00
Minglei Zhu	4bffd3a232	[GPT-OSS] support fp8 online quantization for gpt-oss bf16 (#18988 ) merge it as all required CI passed	2026-02-20 14:16:57 -08:00
Qiaolin Yu	96bae2355e	Add generated-shared-prefix dataset in bench_one_batch (#18986 )	2026-02-20 13:33:10 -08:00
0xNullPath	ab18734375	[feat] feat: support swa in trtllm_mha (#18970 )	2026-02-21 01:39:29 +08:00
billishyahao	fbb6098487	[AMD] support two batch overlapping for mori ep (#17953 ) Co-authored-by: kkHuang-amd <wunhuang@amd.com> Co-authored-by: Feiyue Zhai <feiyue.zhai@amd.com> Co-authored-by: Duyi-Wang <duyi.wang@amd.com> Co-authored-by: HAI <hixiao@gmail.com>	2026-02-20 08:45:55 -08:00
Cheng Wan	38ee749dd9	Fix adjust_num_token_non_padded_for_attn_tp returning CPU tensor (#19051 )	2026-02-20 23:23:38 +08:00
Nicolas Castet	3358ba8945	[Fix] Run FlashInfer autotune on non-default stream for NCCL 2.29+ compatibility (#18987 )	2026-02-20 23:21:38 +08:00
DarkSharpness	52852404c8	[Fix] DO NOT skip save_kv_cache for dllm (#19020 )	2026-02-20 23:20:29 +08:00
Mohammad Miadh Angkad	f23a23cc05	Fix NSA FP8 KV cache path for both-trtllm MHA one-shot (#18931 ) Co-authored-by: rainj-me <96632942+rainj-me@users.noreply.github.com>	2026-02-20 22:00:09 +08:00
Mick	8d789b5c3d	[diffusion] feat: support nunchaku for Z-Image-Turbo and flux.1 (int4) (#18959 )	2026-02-20 21:16:08 +08:00
Yuan Luo	7d953440ec	[jit kernel] Support per_token_group_quant_8bit jit kernel (#18905 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-20 21:01:05 +08:00
Mick	38a69652e6	[diffusion] logging: log available mem when each stage starts in debug level (#18998 )	2026-02-20 19:57:06 +08:00
fzyzcjy	0d20cf5a66	Fix lint on main (#19054 )	2026-02-20 15:45:24 +08:00
Cheng Wan	b59a22f781	fix lint on main (#19052 )	2026-02-20 15:30:57 +08:00
Duyi-Wang	b0786cdf94	[AMD] Replace msgpack with msgspec in MORI-IO (#19007 )	2026-02-19 23:04:15 -08:00
YAMY	8541b1118d	[Fix][Qwen3.5] Pass max_mamba_cache_size to mamba pool in disaggregation decode path (#19002 )	2026-02-20 14:31:26 +08:00
chengshuang18	295bc17576	Feature/sdar support (#19044 ) Co-authored-by: root <root@gpu-lg-cmc-h-h200-3047.host.h.pjlab.org.cn> Co-authored-by: chengshuang <chengshuang@pjlab.org.cn> Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>	2026-02-19 21:58:15 -08:00
fzyzcjy	046ef0aa35	Support using SGLang port in dumper (#19038 )	2026-02-20 12:30:24 +08:00
fzyzcjy	2fecc2c075	Support resetting and enhance HTTP endpoints for dumper (#19046 ) Co-authored-by: Yueming Yuan <112649537+yueming-yuan@users.noreply.github.com>	2026-02-20 12:29:09 +08:00
fzyzcjy	503bf3047a	Enhance configure and env parsing in dumper (#19034 )	2026-02-20 12:28:10 +08:00
fzyzcjy	df995aab56	Support filtering labels in dumper (#19018 )	2026-02-20 12:27:12 +08:00

1 2 3 4 5 ...

6437 Commits