Commit Graph

423 Commits

Author SHA1 Message Date
Liangsheng Yin
1f2da824dd [Benchmark] Remove re-exports from bench_serving.py (#19130) 2026-02-21 14:30:30 -08:00
satyamk7054
355127c2e9 Fix benchmark_sglang_fused_moe_triton.py (#18940)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
2026-02-17 17:25:37 -05:00
SoluMilken
07a24f1a38 update pre-commit config (#18860) 2026-02-16 00:18:31 +08:00
Ke Bao
a9d59776cc Enhence gsm8k test (#18791) 2026-02-13 18:08:57 +08:00
Liangsheng Yin
cd90346a2b Add cache hit rate UT (#18566) 2026-02-10 21:27:41 -08:00
Zheng Li
27c447653d model: support Qwen3.5 (#18489)
Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>
2026-02-10 00:27:59 +08:00
Xinyuan Tong
0b4d4f2838 Fix MMLU benchmark to auto-download data and resolve path issue (#18486) 2026-02-09 10:40:40 -05:00
Yuan Luo
4ea4f2a20c [VLM] Optimize get_rope_index for GLM4v (#17420)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-02-01 18:59:15 +08:00
b8zhong
22498e10c0 [Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB (#17965) 2026-01-31 15:56:26 +08:00
Yuan Luo
7bb41989fa [1/N] Optimize All Reduce - Benchmark different AR operations (#13797)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-01-26 22:44:13 +08:00
Jacob Gordon
a296c99ff4 refactor(benchmark): prevents variable shadowing (#17607) 2026-01-22 17:00:11 -05:00
Julian Huang
db2425a00b [Fix]: correctly fetch ds32 config in tuning_fused_moe_triton (#17409)
Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com>
2026-01-20 20:08:28 +08:00
Mohammad Miadh Angkad
b0701f02b3 Fix benchmark import for should_use_tensor_core (#17232) 2026-01-16 17:48:36 -05:00
Yongfei Xu
82a1b645ba [DeepSeek V3.1/V3.2] Optimize fused moe configs for H20 & H20-3E based on swapab (#17133) 2026-01-17 00:10:52 +08:00
b8zhong
d44f09ad98 [Benchmark] Add GSM8K Platinum Eval (#14565) 2026-01-16 11:06:14 +08:00
JinYan Su
72e2f70ef7 feat(hicache): support numa detect to reduce long tail latency (#11028)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2026-01-15 14:11:49 -08:00
elvischenv
1d811094f8 [Misc] Auto download question file for benchmark/mtbench (#17019) 2026-01-13 10:34:29 -05:00
Bhavneek Singh
559ff9ecaf Bug: fixed multi_chain_reasoning test (#16192) 2026-01-12 09:06:41 -08:00
Yuan Luo
d1ec93e3ac Optimize layernorm_gated for Qwen3-Next (#16397)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
2026-01-10 20:55:31 +08:00
fzyzcjy
249c356331 Super tiny update tokenizer benchmark (#16429) 2026-01-05 09:14:52 +08:00
fzyzcjy
387fad2f74 Tiny add detokenization benchmarks (#16400) 2026-01-04 22:53:38 +08:00
roikoren755
b021332339 [NemotronH] Add latent MoE support (#16227)
Signed-off-by: Roi Koren <roik@nvidia.com>
2026-01-02 22:08:58 +08:00
satyamk7054
38dd4fbb66 Add overlap scheduling for embeddings code path (#14032)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
2025-12-24 18:24:18 -08:00
Lee Nau
7e027691c8 update benchmark README to use --fp8-gemm-backend instead of env var (#15689) 2025-12-23 23:23:31 -08:00
Yubo Wang
762846531f Fix Illegal Memory Access when fa3 + spec + topk + page_size > 1 (#15469)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
2025-12-24 00:13:57 +08:00
Frank
9749d3e346 Update benchmarks to use HF token from environment. (#15421)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-18 13:47:27 -08:00
zhangheng
ea7c69ce28 [hotfix]: Add missing args for 3FS bench_client.py (#14791)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-12-14 11:13:19 -10:00
sglang-bot
5c8bd8b51b chore: bump SGLang version to 0.5.6.post2 (#14858)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-12-11 12:29:52 -08:00
sglang-bot
9a327bdfcf chore: bump SGLang version to 0.5.6.post1 (#14651) 2025-12-09 00:35:28 +08:00
Xiaoyu Zhang
03b835e7d1 Refactor tuning block wise kernel and opt Qwen/Qwen3-VL-32B-Instruct-FP8 (#14141) 2025-12-08 09:24:58 +08:00
Lee Nau
5f6f550af8 Update DeepSeek V3 docs to use B200 (#14447) 2025-12-06 17:22:11 -08:00
Xinyuan Tong
6d37e70883 ministral3 (#14251)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Yueming Yuan <yy28@illinois.edu>
2025-12-04 14:31:26 -08:00
Daniel Cámpora
8428078436 Add Mistral Large 3 support. (#14213)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-12-04 20:00:05 +08:00
sglang-bot
7ae368efde chore: bump SGLang version to 0.5.6 (#14316)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
2025-12-02 17:17:13 -08:00
Lianmin Zheng
bc3d2a85af [Minor] update docs (#14212) 2025-12-01 02:33:58 -08:00
Uranus
982db4ebac Feat: GLM-4.6 supports shared experts fusion (#13873)
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Co-authored-by: Kevin-XiongC <kevin_xiong1997@outlook.com>
Co-authored-by: Mingyi Jin <jinmingyi1998@sina.cn>
2025-12-01 11:33:18 +08:00
Netanel Haber
082b54c689 Support nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 (and nvidia/C-RADIOv2-H) (#12277) 2025-11-26 16:28:52 -07:00
Yubo Wang
18fb51583f Support FlashAttention3 page_size > 1 and topk > 1 case with paged attn and spec decode (#7725) 2025-11-26 11:44:41 +08:00
Zhi Yiliu
a95a38078b [Fix] Fix uvloop get_event_loop() is not suitable for 0.22.x (#13612)
Signed-off-by: lzy <tomlzy213@gmail.com>
Co-authored-by: lzy <tomlzy213@gmail.com>
2025-11-25 01:20:00 +08:00
Xiaoyu Zhang
ecefc7904f [sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819) 2025-11-24 18:26:25 +08:00
DarkSharpness
ac5505b04c [Feature] HiCache JIT kernel (once again) (#13764) 2025-11-22 22:19:16 -08:00
roikoren755
1b48e1b974 Feat/nemotron nano v3 support (#12690) 2025-11-21 13:53:05 -08:00
Lianmin Zheng
7af9b88c6c Revert "[Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel)" (#13644) 2025-11-20 02:11:12 -08:00
DarkSharpness
b51f9bbee7 [Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel) (#13453) 2025-11-20 00:03:32 -08:00
Kaixi Hou
c3c4da71fb [NVIDIA] Add fp8 gemm benchmark on blackwell (#13528) 2025-11-19 19:35:00 -08:00
Liangsheng Yin
4ce8fb3cc2 Fix lora test (#13479) 2025-11-18 12:21:57 +08:00
sglang-bot
7b2fb3d47c chore: bump SGLang version to 0.5.5.post3 (#13366) 2025-11-16 17:55:38 -08:00
Junlin Zhou
0779c3d148 docs: update fused MoE config path (#13211) 2025-11-13 11:14:01 -08:00
Shu Wang
6664083522 Replace [silu_and_mul_]scaled_fp4_group_quant by Flashinfer equivalent (#12376) 2025-11-13 00:26:00 -08:00
Hubert Lu
e4b2937017 [AMD] Add AITER Custom All-Reduce (#13102)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: HaiShaw <hixiao@gmail.com>
2025-11-12 21:53:44 -08:00