sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 11:48:01 +00:00

Author	SHA1	Message	Date
David Cheung	ed427e1299	Migrate all callers from /get_server_info to /server_info (#21463 )	2026-04-01 21:17:50 -07:00
zwang86	5fc5c18bed	fix(security): replace unsafe pickle.loads with SafeUnpickler for CVE-2026-3989 (#20904 )	2026-03-27 00:43:41 -07:00
Ratish P	ae6f6e1495	[Refactor] Benchmark: Add typed DatasetArgs/Loader registry and CPU dataset unit tests (#19147 ) Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2026-02-24 12:22:01 -08:00
Liangsheng Yin	1f2da824dd	[Benchmark] Remove re-exports from bench_serving.py (#19130 )	2026-02-21 14:30:30 -08:00
SoluMilken	07a24f1a38	update pre-commit config (#18860 )	2026-02-16 00:18:31 +08:00
shuwenn	3299c4f9c1	[CI] feat: add early exit to wait_for_server when process dies (#18602 )	2026-02-13 16:46:09 -08:00
cswuyg	33c053c50c	fix(benchmark): add missing args for speculative decoding benchmark (#17974 )	2026-01-29 23:05:42 -08:00
Chenxi Li	b7c7e03d93	Fix crash dump replay script for image data replay (#16277 )	2026-01-02 13:42:22 -08:00
Baizhou Zhang	42fcf5438f	Revert "tiny remove deprecated endpoint call" (#14533 )	2025-12-05 23:48:54 -08:00
b8zhong	ec7b2c16d9	tiny remove deprecated endpoint call (#13607 )	2025-12-05 09:54:49 -08:00
Lzhang-hub	2847e5c4b4	fix bench_speculative bug (#13197 )	2025-11-20 17:09:04 +08:00
Xiaoyu Zhang	8b5e2c5368	[Tiny fix] Fix bench_speculative.py run bug (#13416 )	2025-11-17 18:58:19 +08:00
Liangsheng Yin	ae7698fbd5	Remove deprecated scripts (#13399 )	2025-11-17 16:54:39 +08:00
Zaili Wang	50b6842b4b	fix: Add default value for backend in sample_mmmu_requests (#12256 )	2025-10-31 19:31:40 +08:00
fzyzcjy	fdc4e1e570	Tiny move files to utils folder (#11166 )	2025-10-03 22:40:06 +08:00
Lzhang-hub	4efe2c57c9	support vlm model spec bench (#10173 )	2025-09-10 13:37:04 +08:00
Chayenne	9b08d975a0	[docs] Refactor, remove compiled results and add gpt-oss (#9613 ) Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>	2025-08-25 15:27:06 -07:00
Lianmin Zheng	c480a3f6ea	Minor style fixes for sgl-kernel (#9289 )	2025-08-18 09:38:35 -07:00
Kay Yan	975a5ec69c	[fix] update bench_speculative.py for compatibility (#7764 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-07-04 16:32:54 +08:00
Lianmin Zheng	22352d47a9	Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632 ) Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-06-29 23:16:19 -07:00
Lianmin Zheng	0f218731e3	Do not run frontend_reasoning.ipynb to reduce the CI load (#7073 )	2025-06-10 17:15:31 -07:00
fzyzcjy	25be63d0b2	Auto handle PD disaggregation in bench_serving (#6587 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-25 22:41:27 -07:00
Byron Hsu	2d831c6ef9	[PD] Support structured output (#6560 )	2025-05-23 21:49:00 -07:00
Byron Hsu	8233cc10fd	[PD] Support logprob & Add failure test (#6558 )	2025-05-23 14:29:20 -07:00
Yineng Zhang	eabcf82acb	feat: add long context example (#6391 )	2025-05-18 01:45:17 -07:00
Yineng Zhang	7282ab741a	fix: update bench_speculative (#5649 )	2025-04-22 16:08:15 -07:00
Byron Hsu	bf98d2e377	[PD] Support prefill overlap + Ensure no race condition (#5609 )	2025-04-21 12:12:56 -07:00
Byron Hsu	deded17f38	[PD] Fix edge case and simplify large page size + chunked prefill (#5589 )	2025-04-21 10:27:02 -07:00
Byron Hsu	c951d312ed	[PD] Fix large page size + chunk prefill (#5588 )	2025-04-20 17:21:54 -07:00
Baizhou Zhang	6fb29ffd9e	Deprecate enable-flashinfer-mla and enable-flashmla (#5480 )	2025-04-17 01:43:33 -07:00
lukec	a53fe428f9	Support FlashMLA backend (#4472 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-16 09:07:06 -07:00
Ke Bao	f1d09a6541	Update bench speculative script (#4235 )	2025-03-09 12:19:01 -07:00
Adarsh Shirawalmath	19fd57bcd7	[docs] fix HF reference script command (#4148 )	2025-03-06 13:21:54 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Yineng Zhang	bc6ad367c2	fix lint (#2733 )	2025-01-05 14:45:42 +08:00
Ce Gao	f5d0865b25	feat: Support VLM in reference_hf (#2726 ) Signed-off-by: Ce Gao <gaocegege@hotmail.com>	2025-01-03 22:32:30 +08:00
Ying Sheng	e1e595d702	[feat] Refactor session control interface and add CI (#2173 )	2024-11-25 12:32:51 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Byron Hsu	30af7dfb34	[router] add base_gpu_id server args & merged radix tree python reference (#2115 )	2024-11-21 17:13:33 -08:00
Lianmin Zheng	56a347f7d3	Move test_session_id.py to playground (#2104 )	2024-11-20 01:28:27 -08:00
Ke Bao	62832bb272	Support cuda graph for DP attention (#2061 )	2024-11-17 16:29:20 -08:00
Chayenne	c77c1e05ba	fix black in pre-commit (#1940 )	2024-11-08 07:42:47 +08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Jani Monoses	916b3cdddc	Allow passing dtype and max_new_tokens to HF reference script (#1903 )	2024-11-03 08:24:37 -08:00
Ying Sheng	c5325aba75	[Profile] Add pytorch profiler (#1604 )	2024-10-07 14:37:16 -07:00
Lianmin Zheng	fb2d0680e0	[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510 )	2024-09-24 21:37:33 -07:00
Lianmin Zheng	2854a5ea9f	Fix the overhead due to penalizer in bench_latency (#1496 )	2024-09-23 07:38:14 -07:00
Lianmin Zheng	167591e864	Better unit tests for adding a new model (#1488 )	2024-09-22 01:50:37 -07:00
Ying Sheng	37963394aa	[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433 )	2024-09-15 12:46:04 -07:00

1 2

54 Commits