Liangsheng Yin
6cc2eee50d
[misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc ( #23305 )
2026-04-20 21:16:24 -07:00
Mick
5de89ea942
[diffusion] CI: fix auto-partition ( #23076 )
2026-04-17 22:37:24 +08:00
Alex Nails
43eb66028f
ci: install rust toolchain in ci_install_dependency.sh ( #23017 )
2026-04-16 23:18:22 -07:00
Bingxu Chen
7ac337df94
[AMD] CI Job Monitor: fix queue time, utilization, and summary metrics ( #22274 )
...
Co-authored-by: bingxche <binxche@amd.com >
2026-04-16 22:03:37 -07:00
Mick
80718492dd
[diffusion] CI: reset thresholds ( #22854 )
2026-04-15 21:11:00 +08:00
Mick
e95c2e73bd
[diffusion] CI: refactor diffusion ci and reduce redundancy ( #22810 )
2026-04-15 10:12:29 +08:00
Mick
c5e95080d2
[diffusion] model: support Ltx 2.3 two stage ti2v ( #22667 )
2026-04-14 22:10:08 +08:00
Jia Guo
bc16130a17
ci: skip full rerun when sgl-kernel wheel already built ( #22534 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-13 20:32:55 -07:00
Mick
d524f110ac
[diffusion] refactor: streamline denoising stages ( #22633 )
2026-04-13 13:34:37 +08:00
Prozac614
45472d70cc
[diffusion] CI: dynamic load-balanced partitioning for diffusion CI ( #15528 )
...
Co-authored-by: daiweitao <dwti614707404@163.com >
Co-authored-by: SGLang CI <ci@sglang.ai >
2026-04-12 13:02:43 +08:00
Ratish P
cf5ad12612
[diffusion][CI]: route multimodal component accuracy through run_suite ( #21960 )
2026-04-10 23:06:03 +08:00
Mick
efee62efa6
[diffusion] CI: improve diffusion comparison benchmark setting for realistic perf and auto-discover ut ( #22086 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-04 23:20:37 +08:00
Xiaoyu Zhang
da25b471e3
Align diffusion nightly presets and broaden skill discovery ( #22099 )
2026-04-04 21:43:52 +08:00
Prozac614
db3d4f4b76
[diffusion] model: support two stage pipeline of LTX-2 ( #20707 )
...
Co-authored-by: daiweitao <dwti614707404@163.com >
Co-authored-by: Mick <mickjagger19@icloud.com >
Co-authored-by: GMI Xiao Jin <xiao.j@gmicloud.ai >
2026-04-04 09:37:28 +08:00
Liangsheng Yin
5118295f7b
[CI] Support CPU stage and auto-batch same-stage files in /rerun-test ( #22081 )
2026-04-03 15:56:54 -07:00
Mick
838f815e9f
[diffusion] CI: temporarily disable accuracy ci ( #22031 )
2026-04-03 17:39:29 +08:00
Prozac614
24997fe42c
[diffusion] CI: add initial nvfp4 ci test for b200 ( #21767 )
...
Co-authored-by: Mick <mickjagger19@icloud.com >
2026-04-02 11:31:08 +08:00
Ratish P
4f5b55e379
[diffusion][CI]: Add individual component accuracy CI for diffusion models ( #18709 )
...
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com >
2026-04-01 21:51:36 +08:00
Ke Bao
acd37d8701
[CI] Fix rerun-test suite detection to skip commented registrations ( #21753 )
2026-03-31 18:00:53 +08:00
Ke Bao
2456889f98
Rename rerun-ut to rerun-test ( #21747 )
2026-03-31 17:31:55 +08:00
Mick
db5d9eb8ce
[diffusion] CI: fix dashboard chart (nightly) display issues ( #21653 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-30 12:02:01 +08:00
Bingxu Chen
6047d2c690
[AMD] Fix AMD CI monitor GitHub API rate limit exhaustion ( #21527 )
2026-03-27 02:55:56 -07:00
Liangsheng Yin
e1ee68d0fc
Release mm features on session close and support multiple /rerun-ut specs ( #21501 )
2026-03-26 18:31:29 -07:00
Liangsheng Yin
9dc266adb4
Fix concurrent /rerun-ut posting duplicate workflow URLs ( #21495 )
2026-03-26 16:26:00 -07:00
Mick
238a4b8f8f
[diffusion] CI: fix breaking import path in nightly ( #21449 )
2026-03-26 16:33:22 +08:00
Mick
04eb72801f
[diffusion] CI: add performance tracking job to nightly ( #21091 )
2026-03-25 19:01:33 +08:00
Lianmin Zheng
814202704b
ci: unify PR test suite naming ( #21187 )
2026-03-23 00:18:45 -07:00
Liangsheng Yin
d9f5c2179c
ci(slash-cmd): allow write-permission users to /rerun-ut on fork PRs ( #21121 )
2026-03-22 00:45:48 -07:00
Liangsheng Yin
1e97864d75
ci(slash-cmd): allow repo write-permission users to /rerun-ut ( #21120 )
2026-03-22 00:32:29 -07:00
Alison Shao
b7a1ae4fac
Fix /rerun-stage dispatch failure for non-AMD stages ( #21076 )
...
Co-authored-by: Alison Shao <alison.shao@Mac.attlocal.net >
2026-03-20 23:48:29 -07:00
Lianmin Zheng
50404e0d1f
ci: refactor CUDA dependency install script ( #21017 )
2026-03-20 21:36:58 -07:00
Lianmin Zheng
2d7a262ca3
ci: rename 1/2-gpu-runner labels to 1/2-gpu-h100 ( #21008 )
2026-03-20 06:04:15 -07:00
Lianmin Zheng
c1da420799
ci: run Stage A CUDA tests as stage-a-test-small-1-gpu on 5090 ( #20988 )
2026-03-20 02:55:16 -07:00
Lianmin Zheng
712a48c5d2
ci: move metrics scripts under scripts/ci/utils ( #20986 )
2026-03-19 23:47:57 -07:00
Yuhao Yang
2ccdb7373e
[diffusion] CI: fix consistency test workflow ( #20704 )
2026-03-17 07:42:30 +08:00
Liangsheng Yin
5f1bfb0d28
[Security] Fix /rerun-ut bypassing run-ci gate for fork PRs ( #20424 )
...
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-12 02:24:29 -07:00
Liangsheng Yin
c7ffbf25e9
[CI] Fix rerun-ut workflow: add DeepEP install, RDMA env, Blackwell detection ( #19803 )
2026-03-03 15:17:16 -08:00
Liangsheng Yin
1135e214b3
[CI] support /rerun-ut command in slash handler ( #19800 )
2026-03-03 14:10:49 -08:00
Michael
1b79934d34
[AMD] Fix AMD CI test of TestToolChoiceLfm2Moe ( #19113 )
...
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com >
Co-authored-by: bingxche <Bingxu.Chen@amd.com >
Co-authored-by: yctseng0211 <yctseng@amd.com >
2026-02-27 10:18:15 -08:00
Alison Shao
2c856c6d27
Allow PR authors to use /rerun-failed-ci on their own PRs ( #19496 )
...
Co-authored-by: Alison Shao <alisonshao@MacBook-Pro-D2W773R9CD.local >
2026-02-27 10:14:57 -08:00
Kangyan-Zhou
eccf875d49
[CI] Revive 8-GPU trace upload in nightly test workflow ( #18820 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-14 08:37:08 +08:00
Ke Bao
a6c4b52ac5
Cleanup unused rerun stages ( #18788 )
2026-02-13 17:44:42 +08:00
Alison Shao
bedade1ef0
Merge stage-c-test-large-4-gpu suites into partitioned suites ( #18325 )
2026-02-06 15:32:33 -08:00
Alison Shao
a0bae4c343
Migrate 4-GPU/8-GPU workflow jobs to stage-c and add CI registry decorators ( #17299 )
2026-01-31 22:37:22 -08:00
Kangyan-Zhou
2cd2c3118d
Add concurrency tracking to runner utilization report ( #17963 )
...
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-29 17:31:55 -08:00
Alison Shao
1f75c2af4d
Fix /tag-and-rerun-ci to do full rerun when PR has sgl-kernel changes ( #17729 )
2026-01-29 12:54:30 -08:00
YC Tseng
52bca42870
[AMD] CI - enable deepseekv3.2 on MI325-8gpu and merge perf/accuracy test suites into stage-b suites ( #17633 )
...
Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com >
2026-01-27 18:54:36 -08:00
Makcum888e
d1042e0d62
[Refactore] [CI] Remove redundant CI test runs step 2 ( #17584 )
2026-01-24 23:39:48 -08:00