Yuhao Yang
2f06867128
Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head ( #24775 )
...
Co-authored-by: Cheng Wan <chwan@rice.edu >
Co-authored-by: Chunan Zeng <zcnrex@gmail.com >
2026-05-10 19:03:37 +08:00
Baizhou Zhang
ef5e9f8aba
[DSV4] Cherry pick missing commits from deepseek_v4 branch and enhance tests ( #24793 )
...
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com >
Co-authored-by: yueming-yuan <yym022502@gmail.com >
2026-05-09 04:15:37 -07:00
Liangsheng Yin
ba625d5290
slash command rerun UX: emoji semantics + result writeback ( #24802 )
2026-05-09 03:19:24 -07:00
Alison Shao
aefd8e257f
Re-land #23109 : rebase-required mode + fix for grep-no-match abort ( #24180 )
2026-05-08 15:28:57 -07:00
Alison Shao
094b90b1ec
ci: drop 1-gpu-h100-h200 shared label ( #24495 )
2026-05-06 01:02:31 -07:00
Kangyan-Zhou
52b4609789
[Docker] Prep for torch 2.11: cu129 fix, image validator, dep cleanup ( #23593 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com >
2026-05-04 00:37:55 -07:00
Liangsheng Yin
53df43d0a3
rerun-test: route deepep h200 suite to deepep runner ( #24325 )
2026-05-03 15:57:53 -07:00
Mick
5925572c95
[diffusion] CI: switch CI data references to sgl-project/ci-data ( #24299 )
2026-05-03 23:05:12 +08:00
Mick
b7d4647568
[diffusion] CI: change ground truth repo ( #24219 )
2026-05-01 21:25:40 -07:00
Alec
9d95783603
Add Docker image provenance metadata ( #24090 )
...
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com >
2026-04-30 21:40:42 -07:00
Alison Shao
694ef516cb
Revert "[ci] split stage-c-test-4-gpu-b200 to enable a low-disk runner pool" ( #24163 )
...
Co-authored-by: Alison Shao <alisonshao@radixark.ai >
2026-04-30 15:57:19 -07:00
Alison Shao
7bb7f6049a
ci: add per-host utilization view to runner-utilization report ( #24102 )
2026-04-30 10:05:16 -07:00
Mick
0b1fbdba15
[diffusion] CI: change ground truth upload path and improve publish script ( #24120 )
2026-04-30 12:26:10 +08:00
Alex Nails
c3ab5bec7d
ci: consolidate rust + protoc install across workflows ( #23700 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-29 13:39:02 -07:00
Kangyan-Zhou
d4040e7010
[CI] Broaden stage-b-test-1-gpu-large runner pool to H100 + H200 ( #24080 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-29 12:18:10 -07:00
shuwenn
03147f66b8
ci: add /rerun-group to rerun all registered tests in a group ( #24023 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-29 10:24:16 -07:00
Mick
144038fbae
[diffusion] chore: change default seed to 42 ( #23836 )
2026-04-28 20:39:23 +08:00
Mick
c0166355ae
[diffusion] CI: minor refactor CI ( #23576 )
2026-04-24 08:48:31 +08:00
Kangyan-Zhou
c689f774a4
[CI] /rerun-stage: fix workflow-run URL lookup for sgl-kernel PRs ( #23510 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-22 17:48:38 -07:00
Kangyan-Zhou
14ac14287c
[CI] /rerun-stage: auto-include wheel build when PR modifies sgl-kernel/ ( #23492 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-22 11:28:06 -07:00
Jia Guo
286fba2073
ci: use rerun_failed_jobs for skipped workflows in /rerun-failed-ci ( #23008 )
...
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-21 23:59:15 -07:00
Alison Shao
04b1caf75b
ci: enable /rerun-test for multimodal gen PR tests ( #22828 )
2026-04-21 21:34:14 -07:00
Yuhao Yang
f41f1a74a4
[diffusion] chore: support custom output folder name in GT generation workflow ( #23422 )
2026-04-22 11:18:21 +08:00
Kangyan-Zhou
77fd86f89e
[ci] split stage-c-test-4-gpu-b200 to enable a low-disk runner pool ( #23417 )
...
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-04-21 18:33:33 -07:00
Alison Shao
0e165ffbfc
ci: enable /rerun-test for nightly test suites ( #22830 )
2026-04-21 18:28:10 -07:00
Liangsheng Yin
6cc2eee50d
[misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc ( #23305 )
2026-04-20 21:16:24 -07:00
Mick
5de89ea942
[diffusion] CI: fix auto-partition ( #23076 )
2026-04-17 22:37:24 +08:00
Alex Nails
43eb66028f
ci: install rust toolchain in ci_install_dependency.sh ( #23017 )
2026-04-16 23:18:22 -07:00
Bingxu Chen
7ac337df94
[AMD] CI Job Monitor: fix queue time, utilization, and summary metrics ( #22274 )
...
Co-authored-by: bingxche <binxche@amd.com >
2026-04-16 22:03:37 -07:00
Mick
80718492dd
[diffusion] CI: reset thresholds ( #22854 )
2026-04-15 21:11:00 +08:00
Mick
e95c2e73bd
[diffusion] CI: refactor diffusion ci and reduce redundancy ( #22810 )
2026-04-15 10:12:29 +08:00
Mick
c5e95080d2
[diffusion] model: support Ltx 2.3 two stage ti2v ( #22667 )
2026-04-14 22:10:08 +08:00
Jia Guo
bc16130a17
ci: skip full rerun when sgl-kernel wheel already built ( #22534 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-13 20:32:55 -07:00
Mick
d524f110ac
[diffusion] refactor: streamline denoising stages ( #22633 )
2026-04-13 13:34:37 +08:00
Prozac614
45472d70cc
[diffusion] CI: dynamic load-balanced partitioning for diffusion CI ( #15528 )
...
Co-authored-by: daiweitao <dwti614707404@163.com >
Co-authored-by: SGLang CI <ci@sglang.ai >
2026-04-12 13:02:43 +08:00
Ratish P
cf5ad12612
[diffusion][CI]: route multimodal component accuracy through run_suite ( #21960 )
2026-04-10 23:06:03 +08:00
Mick
efee62efa6
[diffusion] CI: improve diffusion comparison benchmark setting for realistic perf and auto-discover ut ( #22086 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-04 23:20:37 +08:00
Xiaoyu Zhang
da25b471e3
Align diffusion nightly presets and broaden skill discovery ( #22099 )
2026-04-04 21:43:52 +08:00
Prozac614
db3d4f4b76
[diffusion] model: support two stage pipeline of LTX-2 ( #20707 )
...
Co-authored-by: daiweitao <dwti614707404@163.com >
Co-authored-by: Mick <mickjagger19@icloud.com >
Co-authored-by: GMI Xiao Jin <xiao.j@gmicloud.ai >
2026-04-04 09:37:28 +08:00
Liangsheng Yin
5118295f7b
[CI] Support CPU stage and auto-batch same-stage files in /rerun-test ( #22081 )
2026-04-03 15:56:54 -07:00
Mick
838f815e9f
[diffusion] CI: temporarily disable accuracy ci ( #22031 )
2026-04-03 17:39:29 +08:00
Prozac614
24997fe42c
[diffusion] CI: add initial nvfp4 ci test for b200 ( #21767 )
...
Co-authored-by: Mick <mickjagger19@icloud.com >
2026-04-02 11:31:08 +08:00
Ratish P
4f5b55e379
[diffusion][CI]: Add individual component accuracy CI for diffusion models ( #18709 )
...
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com >
2026-04-01 21:51:36 +08:00
Ke Bao
acd37d8701
[CI] Fix rerun-test suite detection to skip commented registrations ( #21753 )
2026-03-31 18:00:53 +08:00
Ke Bao
2456889f98
Rename rerun-ut to rerun-test ( #21747 )
2026-03-31 17:31:55 +08:00
Mick
db5d9eb8ce
[diffusion] CI: fix dashboard chart (nightly) display issues ( #21653 )
...
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-30 12:02:01 +08:00
Bingxu Chen
6047d2c690
[AMD] Fix AMD CI monitor GitHub API rate limit exhaustion ( #21527 )
2026-03-27 02:55:56 -07:00
Liangsheng Yin
e1ee68d0fc
Release mm features on session close and support multiple /rerun-ut specs ( #21501 )
2026-03-26 18:31:29 -07:00
Liangsheng Yin
9dc266adb4
Fix concurrent /rerun-ut posting duplicate workflow URLs ( #21495 )
2026-03-26 16:26:00 -07:00
Mick
238a4b8f8f
[diffusion] CI: fix breaking import path in nightly ( #21449 )
2026-03-26 16:33:22 +08:00