shuwenn
|
b65799cf83
|
[SPEC][1/N] feat: add adaptive speculative_num_steps for EAGLE topk=1 (#21599)
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
|
2026-04-20 14:25:04 -07:00 |
|
Cheng Wan
|
5f7aee726a
|
refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-17 17:05:13 -07:00 |
|
Hubert Lu
|
edaa5973d4
|
[AMD][No-Merge] Simplify fused allreduce + RMSNorm and remove hidden_dim allowlist (#21986)
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-04-11 23:47:08 -07:00 |
|
satyamk7054
|
059b287e25
|
Add offline auto-tuning for LoRA CSGMV kernel (#20391)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-04-10 13:10:43 -07:00 |
|
Aditya Sharma
|
f6e85676b5
|
model: support qwen3-asr (#22073)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2026-04-07 13:27:05 +08:00 |
|
Xinyuan Tong
|
2813cb6d9a
|
[New Model] Gemma 4 (#21952)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Pengyu Chen <pychen96@gmail.com>
Co-authored-by: kpham-sgl <khoa.pham@radixark.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Andy Luo <andy.luo@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
|
2026-04-06 20:24:44 -07:00 |
|
Xiaoyu Zhang
|
f3f7711dac
|
Fix Python 3.11 f-string lint error in deepgemm Blackwell benchmark (#22108)
|
2026-04-04 21:15:22 +08:00 |
|
harrisonlimh
|
9fa12d605a
|
Add dsv3 router gemm benchmark on blackwell (#17707)
|
2026-04-04 01:18:01 -07:00 |
|
Xiaoyu Zhang
|
ee9d922f5a
|
Revert "[Kernel] Fuse temperature + softmax in sampling for decode speedup" (#22046)
|
2026-04-03 21:32:08 +08:00 |
|
Mook
|
7a59e05dd1
|
[Kernel] Fuse temperature + softmax in sampling for decode speedup (#20501)
|
2026-04-02 12:46:36 +08:00 |
|
Yuan Luo
|
03a87068ea
|
[KDA] Fuse scaled_dot_kkt + solve_tril + recompute_w_u for KDA (#21604)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-31 20:57:27 -07:00 |
|
Polisetty V R K Jyothendra Varma
|
f0303fd07e
|
[Intel GPU] Enable DeepSeek R1 inference on XPU (#18461)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
|
2026-03-29 22:35:59 -07:00 |
|
shuwenn
|
c34593f951
|
[HiCache] fix: graceful shutdown of pending async tasks in bench_mix.py (#20276)
|
2026-03-29 00:46:32 -07:00 |
|
zhangxiaolei
|
e2b8463c80
|
[fix] qwen3.5 fuse_moe_triton_tune bug (#20232)
|
2026-03-27 19:23:24 -04:00 |
|
Yuan Luo
|
f273ba1ccc
|
[KDA] Support CuTeDSL KDA decode kernel (#21203)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-25 09:47:09 +08:00 |
|
Jiaxin(Jackson) Deng
|
c4db64c16b
|
Add Lychee Doc Links Check to Local and CI (#19742)
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Co-authored-by: Zijie Xia <zijiexia@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
|
2026-03-24 13:48:26 -07:00 |
|
hzh0425
|
0986bed8e2
|
[HiCache][HybridModel]: Support mamba state offloading & HybridCacheController (#20457)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2026-03-23 20:02:50 -07:00 |
|
Lianmin Zheng
|
104b10f70a
|
refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009)
|
2026-03-20 05:55:36 -07:00 |
|
cs-cat
|
22e378af86
|
Fix result writer in tuning_block_wise_kernel.py, and add FP8 kernel config for L40 (#20368)
Signed-off-by: cs-cat <118669451+cs-cat@users.noreply.github.com>
|
2026-03-20 09:28:54 +08:00 |
|
Xinyuan Tong
|
6b8a6545b2
|
Add Mistral Small 4 (Pixtral) support (#20708)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Alex Nails <alexnails@radixark.ai>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: dbari <dbari@users.noreply.github.com>
|
2026-03-18 14:15:32 -07:00 |
|
Yuan Luo
|
9c87e137ee
|
[GDN] Support GDN packed decode (#20627)
|
2026-03-18 13:20:07 +08:00 |
|
Xiaoyu Zhang
|
25e38216b6
|
[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277)
|
2026-03-14 16:45:54 +08:00 |
|
Chongchong Tian
|
70d4aabe42
|
Add CLI args to conveniently support tuning more models (#12922)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-12 23:10:55 -07:00 |
|
Yuan Luo
|
e29305c120
|
[GDN] Add benchmark for sglang gdn prefill (#20428)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Kaixi Hou <kaixih@nvidia.com>
|
2026-03-12 22:25:02 +08:00 |
|
Mook
|
abc672e717
|
[Benchmark] use flashinfer bench_gpu_time instead of triton do_bench (#20305)
|
2026-03-12 04:04:30 +00:00 |
|
Yuan Luo
|
751c454099
|
Add DeepSeek3.2 and GlmMoeDsa into moe tune (#18876)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-10 17:12:58 +08:00 |
|
Mohammad Miadh Angkad
|
1b76eb9361
|
[Doc] Update version references and add automation (#18409)
|
2026-03-04 09:51:46 -08:00 |
|
Kangyan-Zhou
|
dc92f88a21
|
Enhance bench_multiturn.py with OpenAI API support and richer metrics (#19724)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-03 13:48:04 -08:00 |
|
RoyWang
|
a1ef8e2cc0
|
[AMD] optimize Kimi K2.5 fused_moe_triton performance by tuning (#19228)
|
2026-02-26 11:50:13 -08:00 |
|
Alison Shao
|
a0a8f1473c
|
[Benchmark] Fix generated_shared_prefix attribute naming and remove args dependency (#19363)
Co-authored-by: Alison Shao <alisonshao@Mac.attlocal.net>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
|
2026-02-25 18:45:54 -08:00 |
|
Julian Huang
|
a55f658835
|
[Misc] Normalize --host parameter to use plain hostname without scheme (#19309)
Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2026-02-25 00:37:24 -08:00 |
|
Hubert Lu
|
17b0affbdf
|
[AMD] Support --enable-aiter-allreduce-fusion on AMD GPUs (#13747)
Co-authored-by: yctseng0211 <yctseng@amd.com>
|
2026-02-24 23:11:55 -08:00 |
|
Ratish P
|
ae6f6e1495
|
[Refactor] Benchmark: Add typed DatasetArgs/Loader registry and CPU dataset unit tests (#19147)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2026-02-24 12:22:01 -08:00 |
|
Alec Leng
|
38f25e802d
|
Fix/deepseek readme link (#19258)
|
2026-02-24 10:46:56 -08:00 |
|
Xinyuan Tong
|
581bf53e03
|
Whisper model support & /v1/audio/transcriptions endpoint & benchmark (#16983)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-23 17:28:37 -08:00 |
|
Liangsheng Yin
|
1f2da824dd
|
[Benchmark] Remove re-exports from bench_serving.py (#19130)
|
2026-02-21 14:30:30 -08:00 |
|
satyamk7054
|
355127c2e9
|
Fix benchmark_sglang_fused_moe_triton.py (#18940)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-02-17 17:25:37 -05:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
Ke Bao
|
a9d59776cc
|
Enhence gsm8k test (#18791)
|
2026-02-13 18:08:57 +08:00 |
|
Liangsheng Yin
|
cd90346a2b
|
Add cache hit rate UT (#18566)
|
2026-02-10 21:27:41 -08:00 |
|
Zheng Li
|
27c447653d
|
model: support Qwen3.5 (#18489)
Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>
|
2026-02-10 00:27:59 +08:00 |
|
Xinyuan Tong
|
0b4d4f2838
|
Fix MMLU benchmark to auto-download data and resolve path issue (#18486)
|
2026-02-09 10:40:40 -05:00 |
|
Yuan Luo
|
4ea4f2a20c
|
[VLM] Optimize get_rope_index for GLM4v (#17420)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-02-01 18:59:15 +08:00 |
|
b8zhong
|
22498e10c0
|
[Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB (#17965)
|
2026-01-31 15:56:26 +08:00 |
|
Yuan Luo
|
7bb41989fa
|
[1/N] Optimize All Reduce - Benchmark different AR operations (#13797)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-01-26 22:44:13 +08:00 |
|
Jacob Gordon
|
a296c99ff4
|
refactor(benchmark): prevents variable shadowing (#17607)
|
2026-01-22 17:00:11 -05:00 |
|
Julian Huang
|
db2425a00b
|
[Fix]: correctly fetch ds32 config in tuning_fused_moe_triton (#17409)
Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com>
|
2026-01-20 20:08:28 +08:00 |
|
Mohammad Miadh Angkad
|
b0701f02b3
|
Fix benchmark import for should_use_tensor_core (#17232)
|
2026-01-16 17:48:36 -05:00 |
|
Yongfei Xu
|
82a1b645ba
|
[DeepSeek V3.1/V3.2] Optimize fused moe configs for H20 & H20-3E based on swapab (#17133)
|
2026-01-17 00:10:52 +08:00 |
|
b8zhong
|
d44f09ad98
|
[Benchmark] Add GSM8K Platinum Eval (#14565)
|
2026-01-16 11:06:14 +08:00 |
|