Commit Graph

48 Commits

Author SHA1 Message Date
Liangsheng Yin
6cc2eee50d [misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305) 2026-04-20 21:16:24 -07:00
Mick
5de89ea942 [diffusion] CI: fix auto-partition (#23076) 2026-04-17 22:37:24 +08:00
Alex Nails
43eb66028f ci: install rust toolchain in ci_install_dependency.sh (#23017) 2026-04-16 23:18:22 -07:00
Bingxu Chen
7ac337df94 [AMD] CI Job Monitor: fix queue time, utilization, and summary metrics (#22274)
Co-authored-by: bingxche <binxche@amd.com>
2026-04-16 22:03:37 -07:00
Mick
80718492dd [diffusion] CI: reset thresholds (#22854) 2026-04-15 21:11:00 +08:00
Mick
e95c2e73bd [diffusion] CI: refactor diffusion ci and reduce redundancy (#22810) 2026-04-15 10:12:29 +08:00
Mick
c5e95080d2 [diffusion] model: support Ltx 2.3 two stage ti2v (#22667) 2026-04-14 22:10:08 +08:00
Jia Guo
bc16130a17 ci: skip full rerun when sgl-kernel wheel already built (#22534)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 20:32:55 -07:00
Mick
d524f110ac [diffusion] refactor: streamline denoising stages (#22633) 2026-04-13 13:34:37 +08:00
Prozac614
45472d70cc [diffusion] CI: dynamic load-balanced partitioning for diffusion CI (#15528)
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: SGLang CI <ci@sglang.ai>
2026-04-12 13:02:43 +08:00
Ratish P
cf5ad12612 [diffusion][CI]: route multimodal component accuracy through run_suite (#21960) 2026-04-10 23:06:03 +08:00
Mick
efee62efa6 [diffusion] CI: improve diffusion comparison benchmark setting for realistic perf and auto-discover ut (#22086)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 23:20:37 +08:00
Xiaoyu Zhang
da25b471e3 Align diffusion nightly presets and broaden skill discovery (#22099) 2026-04-04 21:43:52 +08:00
Prozac614
db3d4f4b76 [diffusion] model: support two stage pipeline of LTX-2 (#20707)
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: GMI Xiao Jin <xiao.j@gmicloud.ai>
2026-04-04 09:37:28 +08:00
Liangsheng Yin
5118295f7b [CI] Support CPU stage and auto-batch same-stage files in /rerun-test (#22081) 2026-04-03 15:56:54 -07:00
Mick
838f815e9f [diffusion] CI: temporarily disable accuracy ci (#22031) 2026-04-03 17:39:29 +08:00
Prozac614
24997fe42c [diffusion] CI: add initial nvfp4 ci test for b200 (#21767)
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-02 11:31:08 +08:00
Ratish P
4f5b55e379 [diffusion][CI]: Add individual component accuracy CI for diffusion models (#18709)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
2026-04-01 21:51:36 +08:00
Ke Bao
acd37d8701 [CI] Fix rerun-test suite detection to skip commented registrations (#21753) 2026-03-31 18:00:53 +08:00
Ke Bao
2456889f98 Rename rerun-ut to rerun-test (#21747) 2026-03-31 17:31:55 +08:00
Mick
db5d9eb8ce [diffusion] CI: fix dashboard chart (nightly) display issues (#21653)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-30 12:02:01 +08:00
Bingxu Chen
6047d2c690 [AMD] Fix AMD CI monitor GitHub API rate limit exhaustion (#21527) 2026-03-27 02:55:56 -07:00
Liangsheng Yin
e1ee68d0fc Release mm features on session close and support multiple /rerun-ut specs (#21501) 2026-03-26 18:31:29 -07:00
Liangsheng Yin
9dc266adb4 Fix concurrent /rerun-ut posting duplicate workflow URLs (#21495) 2026-03-26 16:26:00 -07:00
Mick
238a4b8f8f [diffusion] CI: fix breaking import path in nightly (#21449) 2026-03-26 16:33:22 +08:00
Mick
04eb72801f [diffusion] CI: add performance tracking job to nightly (#21091) 2026-03-25 19:01:33 +08:00
Lianmin Zheng
814202704b ci: unify PR test suite naming (#21187) 2026-03-23 00:18:45 -07:00
Liangsheng Yin
d9f5c2179c ci(slash-cmd): allow write-permission users to /rerun-ut on fork PRs (#21121) 2026-03-22 00:45:48 -07:00
Liangsheng Yin
1e97864d75 ci(slash-cmd): allow repo write-permission users to /rerun-ut (#21120) 2026-03-22 00:32:29 -07:00
Alison Shao
b7a1ae4fac Fix /rerun-stage dispatch failure for non-AMD stages (#21076)
Co-authored-by: Alison Shao <alison.shao@Mac.attlocal.net>
2026-03-20 23:48:29 -07:00
Lianmin Zheng
50404e0d1f ci: refactor CUDA dependency install script (#21017) 2026-03-20 21:36:58 -07:00
Lianmin Zheng
2d7a262ca3 ci: rename 1/2-gpu-runner labels to 1/2-gpu-h100 (#21008) 2026-03-20 06:04:15 -07:00
Lianmin Zheng
c1da420799 ci: run Stage A CUDA tests as stage-a-test-small-1-gpu on 5090 (#20988) 2026-03-20 02:55:16 -07:00
Lianmin Zheng
712a48c5d2 ci: move metrics scripts under scripts/ci/utils (#20986) 2026-03-19 23:47:57 -07:00
Yuhao Yang
2ccdb7373e [diffusion] CI: fix consistency test workflow (#20704) 2026-03-17 07:42:30 +08:00
Liangsheng Yin
5f1bfb0d28 [Security] Fix /rerun-ut bypassing run-ci gate for fork PRs (#20424)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-12 02:24:29 -07:00
Liangsheng Yin
c7ffbf25e9 [CI] Fix rerun-ut workflow: add DeepEP install, RDMA env, Blackwell detection (#19803) 2026-03-03 15:17:16 -08:00
Liangsheng Yin
1135e214b3 [CI] support /rerun-ut command in slash handler (#19800) 2026-03-03 14:10:49 -08:00
Michael
1b79934d34 [AMD] Fix AMD CI test of TestToolChoiceLfm2Moe (#19113)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
2026-02-27 10:18:15 -08:00
Alison Shao
2c856c6d27 Allow PR authors to use /rerun-failed-ci on their own PRs (#19496)
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local>
2026-02-27 10:14:57 -08:00
Kangyan-Zhou
eccf875d49 [CI] Revive 8-GPU trace upload in nightly test workflow (#18820)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 08:37:08 +08:00
Ke Bao
a6c4b52ac5 Cleanup unused rerun stages (#18788) 2026-02-13 17:44:42 +08:00
Alison Shao
bedade1ef0 Merge stage-c-test-large-4-gpu suites into partitioned suites (#18325) 2026-02-06 15:32:33 -08:00
Alison Shao
a0bae4c343 Migrate 4-GPU/8-GPU workflow jobs to stage-c and add CI registry decorators (#17299) 2026-01-31 22:37:22 -08:00
Kangyan-Zhou
2cd2c3118d Add concurrency tracking to runner utilization report (#17963)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 17:31:55 -08:00
Alison Shao
1f75c2af4d Fix /tag-and-rerun-ci to do full rerun when PR has sgl-kernel changes (#17729) 2026-01-29 12:54:30 -08:00
YC Tseng
52bca42870 [AMD] CI - enable deepseekv3.2 on MI325-8gpu and merge perf/accuracy test suites into stage-b suites (#17633)
Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>
2026-01-27 18:54:36 -08:00
Makcum888e
d1042e0d62 [Refactore] [CI] Remove redundant CI test runs step 2 (#17584) 2026-01-24 23:39:48 -08:00