Liangsheng Yin
|
1f2da824dd
|
[Benchmark] Remove re-exports from bench_serving.py (#19130)
|
2026-02-21 14:30:30 -08:00 |
|
satyamk7054
|
355127c2e9
|
Fix benchmark_sglang_fused_moe_triton.py (#18940)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-02-17 17:25:37 -05:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
Ke Bao
|
a9d59776cc
|
Enhence gsm8k test (#18791)
|
2026-02-13 18:08:57 +08:00 |
|
Liangsheng Yin
|
cd90346a2b
|
Add cache hit rate UT (#18566)
|
2026-02-10 21:27:41 -08:00 |
|
Zheng Li
|
27c447653d
|
model: support Qwen3.5 (#18489)
Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>
|
2026-02-10 00:27:59 +08:00 |
|
Xinyuan Tong
|
0b4d4f2838
|
Fix MMLU benchmark to auto-download data and resolve path issue (#18486)
|
2026-02-09 10:40:40 -05:00 |
|
Yuan Luo
|
4ea4f2a20c
|
[VLM] Optimize get_rope_index for GLM4v (#17420)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-01 18:59:15 +08:00 |
|
b8zhong
|
22498e10c0
|
[Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB (#17965)
|
2026-01-31 15:56:26 +08:00 |
|
Yuan Luo
|
7bb41989fa
|
[1/N] Optimize All Reduce - Benchmark different AR operations (#13797)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-01-26 22:44:13 +08:00 |
|
Jacob Gordon
|
a296c99ff4
|
refactor(benchmark): prevents variable shadowing (#17607)
|
2026-01-22 17:00:11 -05:00 |
|
Julian Huang
|
db2425a00b
|
[Fix]: correctly fetch ds32 config in tuning_fused_moe_triton (#17409)
Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com>
|
2026-01-20 20:08:28 +08:00 |
|
Mohammad Miadh Angkad
|
b0701f02b3
|
Fix benchmark import for should_use_tensor_core (#17232)
|
2026-01-16 17:48:36 -05:00 |
|
Yongfei Xu
|
82a1b645ba
|
[DeepSeek V3.1/V3.2] Optimize fused moe configs for H20 & H20-3E based on swapab (#17133)
|
2026-01-17 00:10:52 +08:00 |
|
b8zhong
|
d44f09ad98
|
[Benchmark] Add GSM8K Platinum Eval (#14565)
|
2026-01-16 11:06:14 +08:00 |
|
JinYan Su
|
72e2f70ef7
|
feat(hicache): support numa detect to reduce long tail latency (#11028)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2026-01-15 14:11:49 -08:00 |
|
elvischenv
|
1d811094f8
|
[Misc] Auto download question file for benchmark/mtbench (#17019)
|
2026-01-13 10:34:29 -05:00 |
|
Bhavneek Singh
|
559ff9ecaf
|
Bug: fixed multi_chain_reasoning test (#16192)
|
2026-01-12 09:06:41 -08:00 |
|
Yuan Luo
|
d1ec93e3ac
|
Optimize layernorm_gated for Qwen3-Next (#16397)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-01-10 20:55:31 +08:00 |
|
fzyzcjy
|
249c356331
|
Super tiny update tokenizer benchmark (#16429)
|
2026-01-05 09:14:52 +08:00 |
|
fzyzcjy
|
387fad2f74
|
Tiny add detokenization benchmarks (#16400)
|
2026-01-04 22:53:38 +08:00 |
|
roikoren755
|
b021332339
|
[NemotronH] Add latent MoE support (#16227)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-01-02 22:08:58 +08:00 |
|
satyamk7054
|
38dd4fbb66
|
Add overlap scheduling for embeddings code path (#14032)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2025-12-24 18:24:18 -08:00 |
|
Lee Nau
|
7e027691c8
|
update benchmark README to use --fp8-gemm-backend instead of env var (#15689)
|
2025-12-23 23:23:31 -08:00 |
|
Yubo Wang
|
762846531f
|
Fix Illegal Memory Access when fa3 + spec + topk + page_size > 1 (#15469)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-12-24 00:13:57 +08:00 |
|
Frank
|
9749d3e346
|
Update benchmarks to use HF token from environment. (#15421)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-18 13:47:27 -08:00 |
|
zhangheng
|
ea7c69ce28
|
[hotfix]: Add missing args for 3FS bench_client.py (#14791)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-12-14 11:13:19 -10:00 |
|
sglang-bot
|
5c8bd8b51b
|
chore: bump SGLang version to 0.5.6.post2 (#14858)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-12-11 12:29:52 -08:00 |
|
sglang-bot
|
9a327bdfcf
|
chore: bump SGLang version to 0.5.6.post1 (#14651)
|
2025-12-09 00:35:28 +08:00 |
|
Xiaoyu Zhang
|
03b835e7d1
|
Refactor tuning block wise kernel and opt Qwen/Qwen3-VL-32B-Instruct-FP8 (#14141)
|
2025-12-08 09:24:58 +08:00 |
|
Lee Nau
|
5f6f550af8
|
Update DeepSeek V3 docs to use B200 (#14447)
|
2025-12-06 17:22:11 -08:00 |
|
Xinyuan Tong
|
6d37e70883
|
ministral3 (#14251)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Yueming Yuan <yy28@illinois.edu>
|
2025-12-04 14:31:26 -08:00 |
|
Daniel Cámpora
|
8428078436
|
Add Mistral Large 3 support. (#14213)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-12-04 20:00:05 +08:00 |
|
sglang-bot
|
7ae368efde
|
chore: bump SGLang version to 0.5.6 (#14316)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-12-02 17:17:13 -08:00 |
|
Lianmin Zheng
|
bc3d2a85af
|
[Minor] update docs (#14212)
|
2025-12-01 02:33:58 -08:00 |
|
Uranus
|
982db4ebac
|
Feat: GLM-4.6 supports shared experts fusion (#13873)
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Co-authored-by: Kevin-XiongC <kevin_xiong1997@outlook.com>
Co-authored-by: Mingyi Jin <jinmingyi1998@sina.cn>
|
2025-12-01 11:33:18 +08:00 |
|
Netanel Haber
|
082b54c689
|
Support nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 (and nvidia/C-RADIOv2-H) (#12277)
|
2025-11-26 16:28:52 -07:00 |
|
Yubo Wang
|
18fb51583f
|
Support FlashAttention3 page_size > 1 and topk > 1 case with paged attn and spec decode (#7725)
|
2025-11-26 11:44:41 +08:00 |
|
Zhi Yiliu
|
a95a38078b
|
[Fix] Fix uvloop get_event_loop() is not suitable for 0.22.x (#13612)
Signed-off-by: lzy <tomlzy213@gmail.com>
Co-authored-by: lzy <tomlzy213@gmail.com>
|
2025-11-25 01:20:00 +08:00 |
|
Xiaoyu Zhang
|
ecefc7904f
|
[sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819)
|
2025-11-24 18:26:25 +08:00 |
|
DarkSharpness
|
ac5505b04c
|
[Feature] HiCache JIT kernel (once again) (#13764)
|
2025-11-22 22:19:16 -08:00 |
|
roikoren755
|
1b48e1b974
|
Feat/nemotron nano v3 support (#12690)
|
2025-11-21 13:53:05 -08:00 |
|
Lianmin Zheng
|
7af9b88c6c
|
Revert "[Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel)" (#13644)
|
2025-11-20 02:11:12 -08:00 |
|
DarkSharpness
|
b51f9bbee7
|
[Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel) (#13453)
|
2025-11-20 00:03:32 -08:00 |
|
Kaixi Hou
|
c3c4da71fb
|
[NVIDIA] Add fp8 gemm benchmark on blackwell (#13528)
|
2025-11-19 19:35:00 -08:00 |
|
Liangsheng Yin
|
4ce8fb3cc2
|
Fix lora test (#13479)
|
2025-11-18 12:21:57 +08:00 |
|
sglang-bot
|
7b2fb3d47c
|
chore: bump SGLang version to 0.5.5.post3 (#13366)
|
2025-11-16 17:55:38 -08:00 |
|
Junlin Zhou
|
0779c3d148
|
docs: update fused MoE config path (#13211)
|
2025-11-13 11:14:01 -08:00 |
|
Shu Wang
|
6664083522
|
Replace [silu_and_mul_]scaled_fp4_group_quant by Flashinfer equivalent (#12376)
|
2025-11-13 00:26:00 -08:00 |
|
Hubert Lu
|
e4b2937017
|
[AMD] Add AITER Custom All-Reduce (#13102)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2025-11-12 21:53:44 -08:00 |
|