Commit Graph

43 Commits

Author SHA1 Message Date
Yuhao Yang
2f06867128 Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (#24775)
Co-authored-by: Cheng Wan <chwan@rice.edu>
Co-authored-by: Chunan Zeng <zcnrex@gmail.com>
2026-05-10 19:03:37 +08:00
Baizhou Zhang
ef5e9f8aba [DSV4] Cherry pick missing commits from deepseek_v4 branch and enhance tests (#24793)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: yueming-yuan <yym022502@gmail.com>
2026-05-09 04:15:37 -07:00
Liangsheng Yin
ba625d5290 slash command rerun UX: emoji semantics + result writeback (#24802) 2026-05-09 03:19:24 -07:00
Alison Shao
aefd8e257f Re-land #23109: rebase-required mode + fix for grep-no-match abort (#24180) 2026-05-08 15:28:57 -07:00
Alison Shao
094b90b1ec ci: drop 1-gpu-h100-h200 shared label (#24495) 2026-05-06 01:02:31 -07:00
Liangsheng Yin
53df43d0a3 rerun-test: route deepep h200 suite to deepep runner (#24325) 2026-05-03 15:57:53 -07:00
Alison Shao
694ef516cb Revert "[ci] split stage-c-test-4-gpu-b200 to enable a low-disk runner pool" (#24163)
Co-authored-by: Alison Shao <alisonshao@radixark.ai>
2026-04-30 15:57:19 -07:00
Kangyan-Zhou
d4040e7010 [CI] Broaden stage-b-test-1-gpu-large runner pool to H100 + H200 (#24080)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:18:10 -07:00
shuwenn
03147f66b8 ci: add /rerun-group to rerun all registered tests in a group (#24023)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 10:24:16 -07:00
Kangyan-Zhou
c689f774a4 [CI] /rerun-stage: fix workflow-run URL lookup for sgl-kernel PRs (#23510)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:48:38 -07:00
Kangyan-Zhou
14ac14287c [CI] /rerun-stage: auto-include wheel build when PR modifies sgl-kernel/ (#23492)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 11:28:06 -07:00
Jia Guo
286fba2073 ci: use rerun_failed_jobs for skipped workflows in /rerun-failed-ci (#23008)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 23:59:15 -07:00
Alison Shao
04b1caf75b ci: enable /rerun-test for multimodal gen PR tests (#22828) 2026-04-21 21:34:14 -07:00
Kangyan-Zhou
77fd86f89e [ci] split stage-c-test-4-gpu-b200 to enable a low-disk runner pool (#23417)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:33:33 -07:00
Alison Shao
0e165ffbfc ci: enable /rerun-test for nightly test suites (#22830) 2026-04-21 18:28:10 -07:00
Mick
e95c2e73bd [diffusion] CI: refactor diffusion ci and reduce redundancy (#22810) 2026-04-15 10:12:29 +08:00
Jia Guo
bc16130a17 ci: skip full rerun when sgl-kernel wheel already built (#22534)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 20:32:55 -07:00
Ratish P
cf5ad12612 [diffusion][CI]: route multimodal component accuracy through run_suite (#21960) 2026-04-10 23:06:03 +08:00
Liangsheng Yin
5118295f7b [CI] Support CPU stage and auto-batch same-stage files in /rerun-test (#22081) 2026-04-03 15:56:54 -07:00
Mick
838f815e9f [diffusion] CI: temporarily disable accuracy ci (#22031) 2026-04-03 17:39:29 +08:00
Prozac614
24997fe42c [diffusion] CI: add initial nvfp4 ci test for b200 (#21767)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-02 11:31:08 +08:00
Ratish P
4f5b55e379 [diffusion][CI]: Add individual component accuracy CI for diffusion models (#18709)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-04-01 21:51:36 +08:00
Ke Bao
acd37d8701 [CI] Fix rerun-test suite detection to skip commented registrations (#21753) 2026-03-31 18:00:53 +08:00
Ke Bao
2456889f98 Rename rerun-ut to rerun-test (#21747) 2026-03-31 17:31:55 +08:00
Liangsheng Yin
e1ee68d0fc Release mm features on session close and support multiple /rerun-ut specs (#21501) 2026-03-26 18:31:29 -07:00
Liangsheng Yin
9dc266adb4 Fix concurrent /rerun-ut posting duplicate workflow URLs (#21495) 2026-03-26 16:26:00 -07:00
Lianmin Zheng
814202704b ci: unify PR test suite naming (#21187) 2026-03-23 00:18:45 -07:00
Liangsheng Yin
d9f5c2179c ci(slash-cmd): allow write-permission users to /rerun-ut on fork PRs (#21121) 2026-03-22 00:45:48 -07:00
Liangsheng Yin
1e97864d75 ci(slash-cmd): allow repo write-permission users to /rerun-ut (#21120) 2026-03-22 00:32:29 -07:00
Alison Shao
b7a1ae4fac Fix /rerun-stage dispatch failure for non-AMD stages (#21076)
Co-authored-by: Alison Shao <alison.shao@Mac.attlocal.net>
2026-03-20 23:48:29 -07:00
Lianmin Zheng
2d7a262ca3 ci: rename 1/2-gpu-runner labels to 1/2-gpu-h100 (#21008) 2026-03-20 06:04:15 -07:00
Lianmin Zheng
c1da420799 ci: run Stage A CUDA tests as stage-a-test-small-1-gpu on 5090 (#20988) 2026-03-20 02:55:16 -07:00
Liangsheng Yin
5f1bfb0d28 [Security] Fix /rerun-ut bypassing run-ci gate for fork PRs (#20424)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-12 02:24:29 -07:00
Liangsheng Yin
c7ffbf25e9 [CI] Fix rerun-ut workflow: add DeepEP install, RDMA env, Blackwell detection (#19803) 2026-03-03 15:17:16 -08:00
Liangsheng Yin
1135e214b3 [CI] support /rerun-ut command in slash handler (#19800) 2026-03-03 14:10:49 -08:00
Michael
1b79934d34 [AMD] Fix AMD CI test of TestToolChoiceLfm2Moe (#19113)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
2026-02-27 10:18:15 -08:00
Alison Shao
2c856c6d27 Allow PR authors to use /rerun-failed-ci on their own PRs (#19496)
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
2026-02-27 10:14:57 -08:00
Ke Bao
a6c4b52ac5 Cleanup unused rerun stages (#18788) 2026-02-13 17:44:42 +08:00
Alison Shao
bedade1ef0 Merge stage-c-test-large-4-gpu suites into partitioned suites (#18325) 2026-02-06 15:32:33 -08:00
Alison Shao
a0bae4c343 Migrate 4-GPU/8-GPU workflow jobs to stage-c and add CI registry decorators (#17299) 2026-01-31 22:37:22 -08:00
Alison Shao
1f75c2af4d Fix /tag-and-rerun-ci to do full rerun when PR has sgl-kernel changes (#17729) 2026-01-29 12:54:30 -08:00
YC Tseng
52bca42870 [AMD] CI - enable deepseekv3.2 on MI325-8gpu and merge perf/accuracy test suites into stage-b suites (#17633)
Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>
2026-01-27 18:54:36 -08:00
Makcum888e
d1042e0d62 [Refactore] [CI] Remove redundant CI test runs step 2 (#17584) 2026-01-24 23:39:48 -08:00