sglang

mirror of https://github.com/kvcache-ai/sglang.git synced 2026-06-30 11:48:01 +00:00

Author	SHA1	Message	Date
shuwenn	b65799cf83	[SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599 ) Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2026-04-20 14:25:04 -07:00
Cheng Wan	5f7aee726a	refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 17:05:13 -07:00
Hubert Lu	edaa5973d4	[AMD][No-Merge] Simplify fused allreduce + RMSNorm and remove hidden_dim allowlist (#21986 ) Co-authored-by: HAI <hixiao@gmail.com>	2026-04-11 23:47:08 -07:00
satyamk7054	059b287e25	Add offline auto-tuning for LoRA CSGMV kernel (#20391 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-04-10 13:10:43 -07:00
Aditya Sharma	f6e85676b5	model: support qwen3-asr (#22073 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2026-04-07 13:27:05 +08:00
Xinyuan Tong	2813cb6d9a	[New Model] Gemma 4 (#21952 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Pengyu Chen <pychen96@gmail.com> Co-authored-by: kpham-sgl <khoa.pham@radixark.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Andy Luo <andy.luo@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>	2026-04-06 20:24:44 -07:00
Xiaoyu Zhang	f3f7711dac	Fix Python 3.11 f-string lint error in deepgemm Blackwell benchmark (#22108 )	2026-04-04 21:15:22 +08:00
harrisonlimh	9fa12d605a	Add dsv3 router gemm benchmark on blackwell (#17707 )	2026-04-04 01:18:01 -07:00
Xiaoyu Zhang	ee9d922f5a	Revert "[Kernel] Fuse temperature + softmax in sampling for decode speedup" (#22046 )	2026-04-03 21:32:08 +08:00
Mook	7a59e05dd1	[Kernel] Fuse temperature + softmax in sampling for decode speedup (#20501 )	2026-04-02 12:46:36 +08:00
Yuan Luo	03a87068ea	[KDA] Fuse scaled_dot_kkt + solve_tril + recompute_w_u for KDA (#21604 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-31 20:57:27 -07:00
Polisetty V R K Jyothendra Varma	f0303fd07e	[Intel GPU] Enable DeepSeek R1 inference on XPU (#18461 ) Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>	2026-03-29 22:35:59 -07:00
shuwenn	c34593f951	[HiCache] fix: graceful shutdown of pending async tasks in bench_mix.py (#20276 )	2026-03-29 00:46:32 -07:00
zhangxiaolei	e2b8463c80	[fix] qwen3.5 fuse_moe_triton_tune bug (#20232 )	2026-03-27 19:23:24 -04:00
Yuan Luo	f273ba1ccc	[KDA] Support CuTeDSL KDA decode kernel (#21203 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-25 09:47:09 +08:00
Jiaxin(Jackson) Deng	c4db64c16b	Add Lychee Doc Links Check to Local and CI (#19742 ) Co-authored-by: Zijie Xia <zijie_xia@icloud.com> Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>	2026-03-24 13:48:26 -07:00
hzh0425	0986bed8e2	[HiCache][HybridModel]: Support mamba state offloading & HybridCacheController (#20457 ) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>	2026-03-23 20:02:50 -07:00
Lianmin Zheng	104b10f70a	refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009 )	2026-03-20 05:55:36 -07:00
cs-cat	22e378af86	Fix result writer in tuning_block_wise_kernel.py, and add FP8 kernel config for L40 (#20368 ) Signed-off-by: cs-cat <118669451+cs-cat@users.noreply.github.com>	2026-03-20 09:28:54 +08:00
Xinyuan Tong	6b8a6545b2	Add Mistral Small 4 (Pixtral) support (#20708 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Alex Nails <alexnails@radixark.ai> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: dbari <dbari@users.noreply.github.com>	2026-03-18 14:15:32 -07:00
Yuan Luo	9c87e137ee	[GDN] Support GDN packed decode (#20627 )	2026-03-18 13:20:07 +08:00
Xiaoyu Zhang	25e38216b6	[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277 )	2026-03-14 16:45:54 +08:00
Chongchong Tian	70d4aabe42	Add CLI args to conveniently support tuning more models (#12922 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-12 23:10:55 -07:00
Yuan Luo	e29305c120	[GDN] Add benchmark for sglang gdn prefill (#20428 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: Kaixi Hou <kaixih@nvidia.com>	2026-03-12 22:25:02 +08:00
Mook	abc672e717	[Benchmark] use flashinfer bench_gpu_time instead of triton do_bench (#20305 )	2026-03-12 04:04:30 +00:00
Yuan Luo	751c454099	Add DeepSeek3.2 and GlmMoeDsa into moe tune (#18876 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-03-10 17:12:58 +08:00
Mohammad Miadh Angkad	1b76eb9361	[Doc] Update version references and add automation (#18409 )	2026-03-04 09:51:46 -08:00
Kangyan-Zhou	dc92f88a21	Enhance bench_multiturn.py with OpenAI API support and richer metrics (#19724 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 13:48:04 -08:00
RoyWang	a1ef8e2cc0	[AMD] optimize Kimi K2.5 fused_moe_triton performance by tuning (#19228 )	2026-02-26 11:50:13 -08:00
Alison Shao	a0a8f1473c	[Benchmark] Fix generated_shared_prefix attribute naming and remove args dependency (#19363 ) Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net> Co-authored-by: sglang-bot <sglangbot@gmail.com>	2026-02-25 18:45:54 -08:00
Julian Huang	a55f658835	[Misc] Normalize `--host` parameter to use plain hostname without scheme (#19309 ) Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>	2026-02-25 00:37:24 -08:00
Hubert Lu	17b0affbdf	[AMD] Support --enable-aiter-allreduce-fusion on AMD GPUs (#13747 ) Co-authored-by: yctseng0211 <yctseng@amd.com>	2026-02-24 23:11:55 -08:00
Ratish P	ae6f6e1495	[Refactor] Benchmark: Add typed DatasetArgs/Loader registry and CPU dataset unit tests (#19147 ) Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>	2026-02-24 12:22:01 -08:00
Alec Leng	38f25e802d	Fix/deepseek readme link (#19258 )	2026-02-24 10:46:56 -08:00
Xinyuan Tong	581bf53e03	Whisper model support & `/v1/audio/transcriptions` endpoint & benchmark (#16983 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-23 17:28:37 -08:00
Liangsheng Yin	1f2da824dd	[Benchmark] Remove re-exports from bench_serving.py (#19130 )	2026-02-21 14:30:30 -08:00
satyamk7054	355127c2e9	Fix benchmark_sglang_fused_moe_triton.py (#18940 ) Co-authored-by: Satyam Kumar <satyamk@linkedin.com>	2026-02-17 17:25:37 -05:00
SoluMilken	07a24f1a38	update pre-commit config (#18860 )	2026-02-16 00:18:31 +08:00
Ke Bao	a9d59776cc	Enhence gsm8k test (#18791 )	2026-02-13 18:08:57 +08:00
Liangsheng Yin	cd90346a2b	Add cache hit rate UT (#18566 )	2026-02-10 21:27:41 -08:00
Zheng Li	27c447653d	model: support Qwen3.5 (#18489 ) Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>	2026-02-10 00:27:59 +08:00
Xinyuan Tong	0b4d4f2838	Fix MMLU benchmark to auto-download data and resolve path issue (#18486 )	2026-02-09 10:40:40 -05:00
Yuan Luo	4ea4f2a20c	[VLM] Optimize get_rope_index for GLM4v (#17420 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-02-01 18:59:15 +08:00
b8zhong	22498e10c0	[Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB (#17965 )	2026-01-31 15:56:26 +08:00
Yuan Luo	7bb41989fa	[1/N] Optimize All Reduce - Benchmark different AR operations (#13797 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2026-01-26 22:44:13 +08:00
Jacob Gordon	a296c99ff4	refactor(benchmark): prevents variable shadowing (#17607 )	2026-01-22 17:00:11 -05:00
Julian Huang	db2425a00b	[Fix]: correctly fetch ds32 config in tuning_fused_moe_triton (#17409 ) Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com>	2026-01-20 20:08:28 +08:00
Mohammad Miadh Angkad	b0701f02b3	Fix benchmark import for should_use_tensor_core (#17232 )	2026-01-16 17:48:36 -05:00
Yongfei Xu	82a1b645ba	[DeepSeek V3.1/V3.2] Optimize fused moe configs for H20 & H20-3E based on swapab (#17133 )	2026-01-17 00:10:52 +08:00
b8zhong	d44f09ad98	[Benchmark] Add GSM8K Platinum Eval (#14565 )	2026-01-16 11:06:14 +08:00

1 2 3 4 5 ...

458 Commits