sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-02 04:37:14 +00:00

Author	SHA1	Message	Date
Junrong Lin	6448de7e96	Reorganize topk logic to clean up code and expose logical experts (#16945 )	2026-02-23 00:30:57 -08:00
Yuzhen Zhou	4f3dc1ef5b	[ROCm] Use unreg path for custom all-reduce during CUDA graph capture (#19162 )	2026-02-22 23:27:31 -08:00
Changyi Yang	c3c1532820	[diffusion] feat: detect Flux2 custom VAE path from component_paths (#19170 )	2026-02-23 15:19:53 +08:00
Qiaolin Yu	42b1019881	Fix bench_one_batch_server by moving the print statements (#19175 )	2026-02-22 22:06:25 -08:00
Baizhou Zhang	2472e47d73	Revert "Refactor graph input buffers (#18991 )" (#19173 )	2026-02-23 13:09:54 +08:00
Baizhou Zhang	fa80b9beba	[CI] Skip some subtests for tool call parser (#19172 )	2026-02-23 12:20:12 +08:00
Ratish P	0e670c36d6	fix(diffusion): enforce strict input_reference validation for T2V (#14825 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2026-02-23 12:04:06 +08:00
Baizhou Zhang	43f83525c0	Revert "[AMD] support two batch overlapping for mori ep #17953 " (#19161 )	2026-02-23 01:19:23 +08:00
Mick	45095bac70	[diffusion] refactor: rename quantized model path server arg (#19142 )	2026-02-22 23:18:35 +08:00
Talor Abramovich	c6a99e43b9	Fix corrupted JSONL metrics file due to concurrent writes (#19011 )	2026-02-22 20:33:27 +08:00
Mick	7d4860bc5e	[diffusion] CI: relax perf check threshold (#19154 )	2026-02-22 20:02:12 +08:00
Mick	87823722b3	[diffusion] chore: minor cleanups (#19123 )	2026-02-22 19:07:25 +08:00
fzyzcjy	1a1c768d44	Support kwargs and megatron core tensor parsing in dumper (#19138 )	2026-02-22 16:24:33 +08:00
Ziang Li	eddf193292	[DSv32] [GLM5] Improve Model Quality by Avoiding FP32 Precision Loss in `weights_proj` (#19041 )	2026-02-22 16:20:51 +08:00
fzyzcjy	326b788ab4	Fix wrongly large dumped file and handle non intrusive hook reset in dumper (#19124 )	2026-02-22 16:20:08 +08:00
fzyzcjy	c1f497e20e	Enhance reset, states, http in dumper (#19095 )	2026-02-22 16:17:41 +08:00
fzyzcjy	4091b720c5	Support multi colocated dumper, named exp cleanup, argparse config (#19094 )	2026-02-22 16:16:15 +08:00
fzyzcjy	31f0c11405	Configure and call dumper in main SGLang logic (#19093 )	2026-02-22 16:14:27 +08:00
fzyzcjy	cc63c99f11	Enhance hook mechanism in dumper (#19073 )	2026-02-22 16:13:38 +08:00
fzyzcjy	fdf80b5031	Extract framework plugins in dumper (#19072 )	2026-02-22 16:10:43 +08:00
fzyzcjy	e32b5364a2	Auto annotate context in dumper (#19071 )	2026-02-22 16:08:48 +08:00
fzyzcjy	8bc0751376	Support enabling partial non intrusive dump in dumper (#19069 )	2026-02-22 16:07:45 +08:00
fzyzcjy	0384c459a7	Support non-intrusive dumping in dumper (#19068 )	2026-02-22 16:04:02 +08:00
fzyzcjy	5eccc3cff9	Refactor dumper and change on_forward_pass_start API (#19065 )	2026-02-22 16:03:27 +08:00
Shangming Cai	eccee4c48e	[PD] Change bootstrap_room metadata dtype from int64 to uint64 (#19141 )	2026-02-22 14:20:16 +08:00
YAMY	5995bfec63	[Qwen3-Next] Enable fused_qkvzba_split_reshape_cat also for prefill (#18917 )	2026-02-22 13:57:17 +08:00
Qiaolin Yu	8cf003c44b	Fix spec v2+dp attention in nsa backend (#19134 )	2026-02-22 13:46:15 +08:00
Bhavneek Singh	8612358ee9	[BUG] [DLLM] Missing max_running_requests value (#18740 )	2026-02-22 13:38:56 +08:00
fy1214	ae62898ffb	[diffusion] quant: adapt FP8 linear to sgld and support quant in flux (#17023 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com>	2026-02-22 12:55:28 +08:00
YAMY	cef353f338	[Fix] Quick fix for int32 overflow in Mooncakes' send_kvcache_slice (#19076 )	2026-02-22 12:00:33 +08:00
Liangsheng Yin	4653939cda	Revert "[jit kernel] Support per_token_group_quant_8bit jit kernel" (#19131 )	2026-02-22 07:54:24 +08:00
Liangsheng Yin	1f2da824dd	[Benchmark] Remove re-exports from bench_serving.py (#19130 )	2026-02-21 14:30:30 -08:00
Ratish P	f158869c2c	[Refactor] Benchmark Phase 1: extract utils and datasets from bench_serving (#19077 ) Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>	2026-02-21 13:50:11 -08:00
Mohammad Miadh Angkad	7b0fb43c7a	[FlashInfer] Switch FlashInfer allreduce fusion to unified API (#18341 )	2026-02-22 00:07:16 +08:00
Bi Xue	bf36aa4c31	[sgl] view could hold the memory too long and introduced large memory (#19109 )	2026-02-21 23:40:56 +08:00
Xinyuan Tong	677b66af80	fix KimiK2Detector regex patterns with re.DOTALL (#19120 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-02-21 22:13:08 +08:00
Xinyuan Tong	4a362a0e04	fix tool handling in OpenAIServingChat (#18996 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-02-21 22:07:09 +08:00
Xiaoyu Zhang	66497ab0aa	[Diffusion] Restruct and clean Diffusion rotary embedding (#19064 )	2026-02-21 21:41:47 +08:00
DarkSharpness	d8d0208c63	[Feature] rewrite rope kernel; remove flashinfer dependencies (#18844 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-21 21:32:40 +08:00
Mick	6503f94211	[diffusion] feat: support passing component path via server args (#19108 )	2026-02-21 21:22:47 +08:00
Xinyuan Tong	cc451671b5	[FEAT] Add Anthropic compatible API endpoint (#18630 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2026-02-21 19:37:38 +08:00
Mick	b89ca65789	[diffusion] refactor: reduce redundancy and improve stage api (#19060 )	2026-02-21 16:35:47 +08:00
danielafrimi	33c33a7de9	[Quantization] Support config.json quantization_config format, fix exclude_modules matching, and fix KV cache scale loading for Nemotron (#18546 ) Signed-off-by: root <dafrimi@nvidia.com>	2026-02-21 16:14:29 +08:00
Nicolas Castet	51b3ed02ca	Fix bug in symm mem pre-allocation default (#19082 )	2026-02-21 11:51:28 +08:00
Vladislav Nosivskoy	afd91e8782	[DSv32] Fix MTP and CP compatability (#19062 ) Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>	2026-02-21 11:10:34 +08:00
Lianmin Zheng	463baafe10	[Auto Sync] Update batch_invariant_ops.py (20260221) (#19098 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jiayi Yuan <34369239+jy-yuan@users.noreply.github.com>	2026-02-20 18:27:31 -08:00
Lianmin Zheng	2928dfb8fa	[Auto Sync] Update bench_one_batch_server_internal.py (20260221) (#19097 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2026-02-20 18:19:14 -08:00
Cheng Wan	84c67c8be0	Refactor graph input buffers (#18991 )	2026-02-20 18:09:31 -08:00
Minglei Zhu	4bffd3a232	[GPT-OSS] support fp8 online quantization for gpt-oss bf16 (#18988 ) merge it as all required CI passed	2026-02-20 14:16:57 -08:00
Qiaolin Yu	96bae2355e	Add generated-shared-prefix dataset in bench_one_batch (#18986 )	2026-02-20 13:33:10 -08:00

... 27 28 29 30 31 ...

7855 Commits