Cheng Wan
|
5f7aee726a
|
refactor(moe): de-duplicate triton MoE runner path into shared helpers (#23019)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-17 17:05:13 -07:00 |
|
Hubert Lu
|
edaa5973d4
|
[AMD][No-Merge] Simplify fused allreduce + RMSNorm and remove hidden_dim allowlist (#21986)
Co-authored-by: HAI <hixiao@gmail.com>
|
2026-04-11 23:47:08 -07:00 |
|
satyamk7054
|
059b287e25
|
Add offline auto-tuning for LoRA CSGMV kernel (#20391)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-04-10 13:10:43 -07:00 |
|
Xinyuan Tong
|
2813cb6d9a
|
[New Model] Gemma 4 (#21952)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Pengyu Chen <pychen96@gmail.com>
Co-authored-by: kpham-sgl <khoa.pham@radixark.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Andy Luo <andy.luo@amd.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
|
2026-04-06 20:24:44 -07:00 |
|
Xiaoyu Zhang
|
f3f7711dac
|
Fix Python 3.11 f-string lint error in deepgemm Blackwell benchmark (#22108)
|
2026-04-04 21:15:22 +08:00 |
|
harrisonlimh
|
9fa12d605a
|
Add dsv3 router gemm benchmark on blackwell (#17707)
|
2026-04-04 01:18:01 -07:00 |
|
Xiaoyu Zhang
|
ee9d922f5a
|
Revert "[Kernel] Fuse temperature + softmax in sampling for decode speedup" (#22046)
|
2026-04-03 21:32:08 +08:00 |
|
Mook
|
7a59e05dd1
|
[Kernel] Fuse temperature + softmax in sampling for decode speedup (#20501)
|
2026-04-02 12:46:36 +08:00 |
|
Polisetty V R K Jyothendra Varma
|
f0303fd07e
|
[Intel GPU] Enable DeepSeek R1 inference on XPU (#18461)
Signed-off-by: P V R K Jyothendra Varma <polisetty.v.r.k.jyothendra.varma@intel.com>
|
2026-03-29 22:35:59 -07:00 |
|
zhangxiaolei
|
e2b8463c80
|
[fix] qwen3.5 fuse_moe_triton_tune bug (#20232)
|
2026-03-27 19:23:24 -04:00 |
|
Lianmin Zheng
|
104b10f70a
|
refactor: consolidate is_in_ci (jit_kernel, sgl-kernel benchmarks, tests) (#21009)
|
2026-03-20 05:55:36 -07:00 |
|
cs-cat
|
22e378af86
|
Fix result writer in tuning_block_wise_kernel.py, and add FP8 kernel config for L40 (#20368)
Signed-off-by: cs-cat <118669451+cs-cat@users.noreply.github.com>
|
2026-03-20 09:28:54 +08:00 |
|
Xiaoyu Zhang
|
25e38216b6
|
[kernel slimming] Clean many useless sgl-kernel deprecated kernels (#20277)
|
2026-03-14 16:45:54 +08:00 |
|
Chongchong Tian
|
70d4aabe42
|
Add CLI args to conveniently support tuning more models (#12922)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-12 23:10:55 -07:00 |
|
Mook
|
abc672e717
|
[Benchmark] use flashinfer bench_gpu_time instead of triton do_bench (#20305)
|
2026-03-12 04:04:30 +00:00 |
|
Yuan Luo
|
751c454099
|
Add DeepSeek3.2 and GlmMoeDsa into moe tune (#18876)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-03-10 17:12:58 +08:00 |
|
RoyWang
|
a1ef8e2cc0
|
[AMD] optimize Kimi K2.5 fused_moe_triton performance by tuning (#19228)
|
2026-02-26 11:50:13 -08:00 |
|
Hubert Lu
|
17b0affbdf
|
[AMD] Support --enable-aiter-allreduce-fusion on AMD GPUs (#13747)
Co-authored-by: yctseng0211 <yctseng@amd.com>
|
2026-02-24 23:11:55 -08:00 |
|
satyamk7054
|
355127c2e9
|
Fix benchmark_sglang_fused_moe_triton.py (#18940)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
|
2026-02-17 17:25:37 -05:00 |
|
Zheng Li
|
27c447653d
|
model: support Qwen3.5 (#18489)
Co-authored-by: 瑀澈 <yuche.lz@alibaba-inc.com>
|
2026-02-10 00:27:59 +08:00 |
|
b8zhong
|
22498e10c0
|
[Fix] Triton TP MoE Dpsk V3/Qwen3 Coder with SwapAB (#17965)
|
2026-01-31 15:56:26 +08:00 |
|
Yuan Luo
|
7bb41989fa
|
[1/N] Optimize All Reduce - Benchmark different AR operations (#13797)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2026-01-26 22:44:13 +08:00 |
|
Julian Huang
|
db2425a00b
|
[Fix]: correctly fetch ds32 config in tuning_fused_moe_triton (#17409)
Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com>
|
2026-01-20 20:08:28 +08:00 |
|
Mohammad Miadh Angkad
|
b0701f02b3
|
Fix benchmark import for should_use_tensor_core (#17232)
|
2026-01-16 17:48:36 -05:00 |
|
Yongfei Xu
|
82a1b645ba
|
[DeepSeek V3.1/V3.2] Optimize fused moe configs for H20 & H20-3E based on swapab (#17133)
|
2026-01-17 00:10:52 +08:00 |
|
roikoren755
|
b021332339
|
[NemotronH] Add latent MoE support (#16227)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-01-02 22:08:58 +08:00 |
|
Xiaoyu Zhang
|
03b835e7d1
|
Refactor tuning block wise kernel and opt Qwen/Qwen3-VL-32B-Instruct-FP8 (#14141)
|
2025-12-08 09:24:58 +08:00 |
|
Daniel Cámpora
|
8428078436
|
Add Mistral Large 3 support. (#14213)
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-12-04 20:00:05 +08:00 |
|
Uranus
|
982db4ebac
|
Feat: GLM-4.6 supports shared experts fusion (#13873)
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Co-authored-by: Kevin-XiongC <kevin_xiong1997@outlook.com>
Co-authored-by: Mingyi Jin <jinmingyi1998@sina.cn>
|
2025-12-01 11:33:18 +08:00 |
|
Xiaoyu Zhang
|
ecefc7904f
|
[sgl-kernel Code Clean] Remove useless lightning_attention kernel (#13819)
|
2025-11-24 18:26:25 +08:00 |
|
roikoren755
|
1b48e1b974
|
Feat/nemotron nano v3 support (#12690)
|
2025-11-21 13:53:05 -08:00 |
|
Kaixi Hou
|
c3c4da71fb
|
[NVIDIA] Add fp8 gemm benchmark on blackwell (#13528)
|
2025-11-19 19:35:00 -08:00 |
|
Junlin Zhou
|
0779c3d148
|
docs: update fused MoE config path (#13211)
|
2025-11-13 11:14:01 -08:00 |
|
Shu Wang
|
6664083522
|
Replace [silu_and_mul_]scaled_fp4_group_quant by Flashinfer equivalent (#12376)
|
2025-11-13 00:26:00 -08:00 |
|
Hubert Lu
|
e4b2937017
|
[AMD] Add AITER Custom All-Reduce (#13102)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2025-11-12 21:53:44 -08:00 |
|
Xiaoyu Zhang
|
f18ec927f3
|
fix tuning_fused_moe_triton_sep tool per_channel_quant bug (#13027)
|
2025-11-11 10:33:54 +08:00 |
|
Xiaoyu Zhang
|
fc84b0730c
|
[Refactor] Refactor fused_moe_triton tuning tools: extract shared utils, add EP/MLLM support, reduce overhead (#12440)
Co-authored-by: xu-yfei <xu-yfei@users.noreply.github.com>
Co-authored-by: Yongfei Xu <xuyongfei.xyf@antgroup.com>
|
2025-11-06 20:54:42 +08:00 |
|
Yuan Luo
|
819fc59123
|
Add prefix for torch symm mem (#12506)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-11-02 11:23:05 -08:00 |
|
Xinyuan Tong
|
82cfcd3bb8
|
[Refactor] tuning_fused_moe for MLLM and small refactor (#11224)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
|
2025-10-31 08:54:14 +08:00 |
|
Chen1022
|
1ed1abfd45
|
feat: add EP support in tuning (#12012)
|
2025-10-30 07:58:50 -07:00 |
|
Xiaoyu Zhang
|
04e5b6faa7
|
Revert "Triton fused_moe_kernel support ep moe tuning" (#12377)
|
2025-10-30 07:12:06 -07:00 |
|
Xiaoyu Zhang
|
52694b60da
|
Triton fused_moe_kernel support ep moe tuning (#12343)
|
2025-10-29 23:16:09 +08:00 |
|
Liana Koleva
|
1357397a34
|
feat: preview filename from tuning_fused_moe_triton.py (#12276)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-10-29 16:12:25 +08:00 |
|
Yongfei Xu
|
d2b8c4123e
|
Opt fused triton moe: add tma for down proj kernel (#10567)
Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com>
|
2025-10-28 14:26:17 +08:00 |
|
Zhengyi Lai
|
81fd2b0ee0
|
fix(deepep): resolve benchmark failure on 4×IB-card setup by aligning tuning config with DeepEP commit bdd119f8 (#11965)
|
2025-10-22 21:20:54 -07:00 |
|
Liangsheng Yin
|
9d61205dac
|
[lint] improve ruff check (#11922)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2025-10-22 11:32:50 +08:00 |
|
Cheng Wan
|
5b214b50b6
|
[Refactor] move deep_gemm_wrapper out of quantization (#11784)
|
2025-10-17 18:57:54 -07:00 |
|
Cheng Wan
|
3c06b673af
|
[8/N] MoE Refactor: deprecate EPMoE (#11211)
|
2025-10-07 21:51:41 -07:00 |
|
Yuan Luo
|
4f42c8cd3e
|
[sgl-kernel] Support float64 moe_sum_reduce cuda kernel (#11068)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-07 14:31:11 +00:00 |
|
Yuan Luo
|
590f2da052
|
[Feat] Support Torch Symm Mem AllReduce (#10571)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-05 13:55:19 -07:00 |
|