Commit Graph

1349 Commits

Author SHA1 Message Date
Liangsheng Yin
6cc2eee50d [misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305) 2026-04-20 21:16:24 -07:00
Alison Shao
6b19e8a452 ci: reduce scheduled PR test from 4x to 3x daily (#23313) 2026-04-20 20:53:13 -07:00
Mingyi
712b01d875 Update CODEOWNERS to include new documentation paths for docs and doc… (#23293) 2026-04-20 16:48:41 -07:00
ishandhanani
90d527195b [CI] Fix nightly docker builds failing on root-owned workspace leftovers (#23279)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:56:33 -07:00
Kangyan-Zhou
4698f4cd10 [CI] Fix wait-for-jobs hanging when matrix job skipped at job level (#23277)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 11:16:10 -07:00
YC Yen-Ching Tseng
da62e90904 [AMD] Fix multimodal timeout issue : rocm7.2 PR Test (#23247) 2026-04-20 18:36:08 +08:00
YC Yen-Ching Tseng
cf4b84f839 [AMD] Update AMD workflow name (#23245)
Co-authored-by: bingxche <bingxche@amd.com>
2026-04-20 18:18:24 +08:00
Kangyan-Zhou
1ebe1c57ed [CI] Partition stage-a-test-cpu into 4 matrix shards (#23208)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 22:07:37 -07:00
Alex Nails
10e17cc55e [gRPC] Native gRPC server: proto + Rust crate scaffold + server args (#22736) 2026-04-20 12:39:35 +08:00
Kangyan-Zhou
1d252803f5 fix(ci): repair path filters regressed by #21482 (#23201)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 20:34:57 -07:00
Thomas
3063d640dd [CI] Exclude diffusion-specific paths from main_package filter (#23053)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2026-04-20 10:43:44 +08:00
Cheng Wan
ebcc2b3eec ci: run weekly est_time update on Monday using p90 of last 15 runs (#23120)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 14:39:27 -07:00
YC Yen-Ching Tseng
32b7777f6c [AMD]Fix AMD multimodal-gen-test-2-gpu timeout by adding partition for standalone test (#23130) 2026-04-19 23:16:18 +08:00
Baizhou Zhang
6ecd6f84db [CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
2026-04-19 05:32:36 -07:00
Yanbin Jiang
fd7db0eace Update CI_PERMISSIONS (#23108) 2026-04-17 16:53:03 -07:00
Xiaoyu Zhang
83c5119d01 [diffusion] CI: fix ModelOpt B200 CI artifact coverage (#22955) 2026-04-17 23:33:42 +08:00
YC Yen-Ching Tseng
f399997d2f [AMD] mirror nightly images to local registry and prefer LAN pulls (#23073)
Co-authored-by: bingxche <bingxche@amd.com>
2026-04-17 19:49:26 +08:00
YC Yen-Ching Tseng
8c13295842 [AMD] fix AMD CI gate (#22974)
Co-authored-by: bingxche <bingxche@amd.com>
2026-04-17 18:32:26 +08:00
Alison Shao
0052093178 test(4-gpu-b200): split test_qwen35_models.py + bump partitions 5→6 (#22913) 2026-04-16 18:51:59 -07:00
Makcum888e
e353630b57 [Diffusion] [NPU] Fix multimodal gen CI (#22879) 2026-04-17 04:09:44 +03:00
Qiaolin Yu
12266cf953 [misc] update .github/CODEOWNERS (#22993) 2026-04-16 14:19:41 -07:00
ishandhanani
f61c332cba ci: log analyzer (#22859) 2026-04-15 14:10:00 -07:00
Mick
e95c2e73bd [diffusion] CI: refactor diffusion ci and reduce redundancy (#22810) 2026-04-15 10:12:29 +08:00
Michael
39c6bf730c [AMD][CI] Add GLM-5-MXFP4 accuracy and perf nightly tests for MI35x (#21773) 2026-04-14 18:55:36 -07:00
ishandhanani
2c9e76d333 ci: skip approval for nightly gb200 runs, keep for manual triggers (#22768) 2026-04-14 16:34:57 -07:00
Jimmy Shong
e83560562b Update CI Permissions (#22826) 2026-04-14 15:13:31 -07:00
Michael
eab045b2b7 [AMD] Add MiniMax-M2.7 accuracy and performance nightly tests (#22722)
Co-authored-by: HaiShaw <hixiao@gmail.com>
2026-04-14 00:30:11 -07:00
Sahithi Chigurupati
7c1bde2e38 [CI] Add optional image input to GB200 nightly workflow_dispatch (#22745)
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
2026-04-13 23:57:15 -07:00
YC Yen-Ching Tseng
d44eb16ac6 [AMD] Replace push trigger with scheduled runs and enable parallel stage execution (#22489)
Co-authored-by: bingxche <bingxche@amd.com>
2026-04-14 13:33:29 +08:00
Sahithi Chigurupati
ff61b2e470 [CI] Add workflow_dispatch and environment gate to GB200 nightly pipeline (#22733) 2026-04-13 17:08:18 -07:00
Baizhou Zhang
b441317aa4 Revert "Upgrade CI default CUDA version from 12.9 to 13.0" (#22727) 2026-04-13 14:39:24 -07:00
R0CKSTAR
f51ce2c92f Update CODEOWNERS for musa/mlx (#22593)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2026-04-13 12:41:07 -07:00
Alison Shao
3f4fbc165d Upgrade CI default CUDA version from 12.9 to 13.0 (#21441) 2026-04-12 21:48:40 -07:00
Prozac614
45472d70cc [diffusion] CI: dynamic load-balanced partitioning for diffusion CI (#15528)
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: SGLang CI <ci@sglang.ai>
2026-04-12 13:02:43 +08:00
Alison Shao
870a21bf39 [CI] Remove Slack bot from CI failure monitor (#21581)
Co-authored-by: Alison Shao <alison.shao@Mac.attlocal.net>
2026-04-11 20:34:48 -07:00
Baizhou Zhang
2e70e4f4f6 [CI] Little renaming of gb200 CI workflow (#22608) 2026-04-11 17:52:42 -07:00
YC Yen-Ching Tseng
3ce72252de [AMD] Fix Timeout: stage-b-test-2-gpu-large-amd,stage-b-test-1-gpu-large-amd (#22228)
Co-authored-by: HAI <hixiao@gmail.com>
2026-04-10 22:55:44 -07:00
Qiaolin Yu
f41c810a2d [misc] update CI_PERMISSIONS.json (#22570) 2026-04-10 18:58:55 -07:00
Sahithi Chigurupati
451320596f [CI] Add GB200 nightly perf regression pipeline (#22461) 2026-04-10 15:12:24 -07:00
Cheng Wan
3f39b3d811 feat: add weekly workflow to update CI test est_time values (#22545)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-10 15:03:37 -07:00
satyamk7054
6d8330bdb7 Update CI_PERMISSIONS.json (#22465)
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
2026-04-10 13:43:50 -07:00
Ratish P
cf5ad12612 [diffusion][CI]: route multimodal component accuracy through run_suite (#21960) 2026-04-10 23:06:03 +08:00
Alison Shao
b853e2c41b [CI] Remove Slack notification from ci-auto-bisect workflow (#22483)
Co-authored-by: Alison Shao <alison.shao@Mac.lan>
2026-04-09 20:32:09 -07:00
ishandhanani
3aaaf53f59 [Docker] Fix CI docker target after Dockerfile restructure (#22478) 2026-04-09 18:53:42 -07:00
Sundara Raman Ramachandran
a64905a7b8 [CICD] [prefill-only] Consolidate prefill-only model E2E tests (#22405) 2026-04-09 00:54:34 -07:00
Michael
ef6bfc1197 [AMD] Add GLM-5.1-FP8 nightly accuracy and performance benchmarks for MI30x and MI35x (#22336) 2026-04-08 22:57:43 -07:00
Liangsheng Yin
edfddda192 Move runai model loader test to nightly suite (#22418) 2026-04-08 21:39:32 -07:00
jsheng_Linkedin
6838a23226 [Feature] Add token embedding overrides for sparse embedding replacement (#20960)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 20:51:36 -07:00
tfhddd
c431b11d8b [CI] Use UV to improve pip install speed (#22029) 2026-04-09 09:18:32 +08:00
Liangsheng Yin
2c4e113dd7 [CI] Fast-fail on lint check failure in check-stage-health (#22400) 2026-04-08 17:17:07 -07:00