sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-07-03 22:07:12 +00:00

Author	SHA1	Message	Date
Liangsheng Yin	1f2da824dd	[Benchmark] Remove re-exports from bench_serving.py (#19130 )	2026-02-21 14:30:30 -08:00
satyamk7054	355127c2e9	Fix benchmark_sglang_fused_moe_triton.py (#18940 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-02-17 17:25:37 -05:00
SoluMilken	07a24f1a38	update pre-commit config (#18860 )	2026-02-16 00:18:31 +08:00
Ke Bao	a9d59776cc	Enhence gsm8k test (#18791 )	2026-02-13 18:08:57 +08:00
Liangsheng Yin	cd90346a2b	Add cache hit rate UT (#18566 )	2026-02-10 21:27:41 -08:00
Zheng Li	27c447653d	model: support Qwen3.5 (#18489 ) Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>	2026-02-10 00:27:59 +08:00
Xinyuan Tong	0b4d4f2838	Fix MMLU benchmark to auto-download data and resolve path issue (#18486 )	2026-02-09 10:40:40 -05:00
Yuan Luo	4ea4f2a20c	[VLM] Optimize get_rope_index for GLM4v (#17420 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-01 18:59:15 +08:00
b8zhong	22498e10c0	[Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB (#17965 )	2026-01-31 15:56:26 +08:00
Yuan Luo	7bb41989fa	[1/N] Optimize All Reduce - Benchmark different AR operations (#13797 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-01-26 22:44:13 +08:00
Jacob Gordon	a296c99ff4	refactor(benchmark): prevents variable shadowing (#17607 )	2026-01-22 17:00:11 -05:00
Julian Huang	db2425a00b	[Fix]: correctly fetch ds32 config in tuning_fused_moe_triton (#17409 ) Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com>	2026-01-20 20:08:28 +08:00
Mohammad Miadh Angkad	b0701f02b3	Fix benchmark import for should_use_tensor_core (#17232 )	2026-01-16 17:48:36 -05:00
Yongfei Xu	82a1b645ba	[DeepSeek V3.1/V3.2] Optimize fused moe configs for H20 & H20-3E based on swapab (#17133 )	2026-01-17 00:10:52 +08:00
b8zhong	d44f09ad98	[Benchmark] Add GSM8K Platinum Eval (#14565 )	2026-01-16 11:06:14 +08:00
JinYan Su	72e2f70ef7	feat(hicache): support numa detect to reduce long tail latency (#11028 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2026-01-15 14:11:49 -08:00
elvischenv	1d811094f8	[Misc] Auto download question file for benchmark/mtbench (#17019 )	2026-01-13 10:34:29 -05:00
Bhavneek Singh	559ff9ecaf	Bug: fixed multi_chain_reasoning test (#16192 )	2026-01-12 09:06:41 -08:00
Yuan Luo	d1ec93e3ac	Optimize layernorm_gated for Qwen3-Next (#16397 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-01-10 20:55:31 +08:00
fzyzcjy	249c356331	Super tiny update tokenizer benchmark (#16429 )	2026-01-05 09:14:52 +08:00
fzyzcjy	387fad2f74	Tiny add detokenization benchmarks (#16400 )	2026-01-04 22:53:38 +08:00
roikoren755	b021332339	[NemotronH] Add latent MoE support (#16227 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-01-02 22:08:58 +08:00
satyamk7054	38dd4fbb66	Add overlap scheduling for embeddings code path (#14032 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2025-12-24 18:24:18 -08:00
Lee Nau	7e027691c8	update benchmark README to use --fp8-gemm-backend instead of env var (#15689 )	2025-12-23 23:23:31 -08:00
Yubo Wang	762846531f	Fix Illegal Memory Access when fa3 + spec + topk + page_size > 1 (#15469 ) Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2025-12-24 00:13:57 +08:00
Frank	9749d3e346	Update benchmarks to use HF token from environment. (#15421 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-18 13:47:27 -08:00
zhangheng	ea7c69ce28	[hotfix]: Add missing args for 3FS bench_client.py (#14791 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-12-14 11:13:19 -10:00
sglang-bot	5c8bd8b51b	chore: bump SGLang version to 0.5.6.post2 (#14858 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-12-11 12:29:52 -08:00
sglang-bot	9a327bdfcf	chore: bump SGLang version to 0.5.6.post1 (#14651 )	2025-12-09 00:35:28 +08:00
Xiaoyu Zhang	03b835e7d1	Refactor tuning block wise kernel and opt Qwen/Qwen3-VL-32B-Instruct-FP8 (#14141 )	2025-12-08 09:24:58 +08:00
Lee Nau	5f6f550af8	Update DeepSeek V3 docs to use B200 (#14447 )	2025-12-06 17:22:11 -08:00
Xinyuan Tong	6d37e70883	ministral3 (#14251 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Yueming Yuan <yy28@illinois.edu>	2025-12-04 14:31:26 -08:00
Daniel Cámpora	8428078436	Add Mistral Large 3 support. (#14213 ) Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-12-04 20:00:05 +08:00
sglang-bot	7ae368efde	chore: bump SGLang version to 0.5.6 (#14316 ) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>	2025-12-02 17:17:13 -08:00
Lianmin Zheng	bc3d2a85af	[Minor] update docs (#14212 )	2025-12-01 02:33:58 -08:00
Uranus	982db4ebac	Feat: GLM-4.6 supports shared experts fusion (#13873 ) Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com> Co-authored-by: Kevin-XiongC <kevin_xiong1997@outlook.com> Co-authored-by: Mingyi Jin <jinmingyi1998@sina.cn>	2025-12-01 11:33:18 +08:00
Netanel Haber	082b54c689	Support nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 (and nvidia/C-RADIOv2-H) (#12277 )	2025-11-26 16:28:52 -07:00
Yubo Wang	18fb51583f	Support FlashAttention3 page_size > 1 and topk > 1 case with paged attn and spec decode (#7725 )	2025-11-26 11:44:41 +08:00
Zhi Yiliu	a95a38078b	[Fix] Fix uvloop get_event_loop() is not suitable for 0.22.x (#13612 ) Signed-off-by: lzy <tomlzy213@gmail.com> Co-authored-by: lzy <tomlzy213@gmail.com>	2025-11-25 01:20:00 +08:00
Xiaoyu Zhang	ecefc7904f	[sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819 )	2025-11-24 18:26:25 +08:00
DarkSharpness	ac5505b04c	[Feature] HiCache JIT kernel (once again) (#13764 )	2025-11-22 22:19:16 -08:00
roikoren755	1b48e1b974	Feat/nemotron nano v3 support (#12690 )	2025-11-21 13:53:05 -08:00
Lianmin Zheng	7af9b88c6c	Revert "[Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel)" (#13644 )	2025-11-20 02:11:12 -08:00
DarkSharpness	b51f9bbee7	[Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel) (#13453 )	2025-11-20 00:03:32 -08:00
Kaixi Hou	c3c4da71fb	[NVIDIA] Add fp8 gemm benchmark on blackwell (#13528 )	2025-11-19 19:35:00 -08:00
Liangsheng Yin	4ce8fb3cc2	Fix lora test (#13479 )	2025-11-18 12:21:57 +08:00
sglang-bot	7b2fb3d47c	chore: bump SGLang version to 0.5.5.post3 (#13366 )	2025-11-16 17:55:38 -08:00
Junlin Zhou	0779c3d148	docs: update fused MoE config path (#13211 )	2025-11-13 11:14:01 -08:00
Shu Wang	6664083522	Replace [silu_and_mul_]scaled_fp4_group_quant by Flashinfer equivalent (#12376 )	2025-11-13 00:26:00 -08:00
Hubert Lu	e4b2937017	[AMD] Add AITER Custom All-Reduce (#13102 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: HaiShaw <hixiao@gmail.com>	2025-11-12 21:53:44 -08:00

1 2 3 4 5 ...

423 Commits