Liangsheng Yin
|
6cc2eee50d
|
[misc] CI hygiene: enforce __main__ entry, drop silent-skipped tests, fix rerun-test protoc (#23305)
|
2026-04-20 21:16:24 -07:00 |
|
Cheng Wan
|
ebcc2b3eec
|
ci: run weekly est_time update on Monday using p90 of last 15 runs (#23120)
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-04-19 14:39:27 -07:00 |
|
Baizhou Zhang
|
6ecd6f84db
|
[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 (#23119)
Co-authored-by: Kangyan Zhou <zky314343421@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Alison Shao <a.shao@wustl.edu>
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-19 05:32:36 -07:00 |
|
Mick
|
5de89ea942
|
[diffusion] CI: fix auto-partition (#23076)
|
2026-04-17 22:37:24 +08:00 |
|
YC Yen-Ching Tseng
|
f399997d2f
|
[AMD] mirror nightly images to local registry and prefer LAN pulls (#23073)
Co-authored-by: bingxche <bingxche@amd.com>
|
2026-04-17 19:49:26 +08:00 |
|
Alex Nails
|
43eb66028f
|
ci: install rust toolchain in ci_install_dependency.sh (#23017)
|
2026-04-16 23:18:22 -07:00 |
|
Bingxu Chen
|
7ac337df94
|
[AMD] CI Job Monitor: fix queue time, utilization, and summary metrics (#22274)
Co-authored-by: bingxche <binxche@amd.com>
|
2026-04-16 22:03:37 -07:00 |
|
ishandhanani
|
761259448d
|
ci: re-enable fp8 nightly benchmark configs (#22910)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-15 15:57:49 -07:00 |
|
ishandhanani
|
2b0f349927
|
ci: clarify srt-slurm issue filing for incompatible flag combos (#22903)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-15 15:02:26 -07:00 |
|
ishandhanani
|
9497001b0c
|
ci: add issue filing and suspect PR identification to log analyzer (#22899)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-15 14:27:14 -07:00 |
|
ishandhanani
|
f61c332cba
|
ci: log analyzer (#22859)
|
2026-04-15 14:10:00 -07:00 |
|
Mick
|
80718492dd
|
[diffusion] CI: reset thresholds (#22854)
|
2026-04-15 21:11:00 +08:00 |
|
Mick
|
e95c2e73bd
|
[diffusion] CI: refactor diffusion ci and reduce redundancy (#22810)
|
2026-04-15 10:12:29 +08:00 |
|
Mick
|
c5e95080d2
|
[diffusion] model: support Ltx 2.3 two stage ti2v (#22667)
|
2026-04-14 22:10:08 +08:00 |
|
Baizhou Zhang
|
8fe9bbffb6
|
[CI] Reinstall flashinfer-jit-cache on CUDA version mismatch (#22741)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-13 23:04:23 -07:00 |
|
Jia Guo
|
bc16130a17
|
ci: skip full rerun when sgl-kernel wheel already built (#22534)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-13 20:32:55 -07:00 |
|
Baizhou Zhang
|
b441317aa4
|
Revert "Upgrade CI default CUDA version from 12.9 to 13.0" (#22727)
|
2026-04-13 14:39:24 -07:00 |
|
Mick
|
d524f110ac
|
[diffusion] refactor: streamline denoising stages (#22633)
|
2026-04-13 13:34:37 +08:00 |
|
Alison Shao
|
3f4fbc165d
|
Upgrade CI default CUDA version from 12.9 to 13.0 (#21441)
|
2026-04-12 21:48:40 -07:00 |
|
Mohammad Miadh Angkad
|
701a0e0c25
|
[CI/Docker] Clean up redundant flashinfer cubin downloads (#22491)
|
2026-04-12 12:30:41 -07:00 |
|
Prozac614
|
45472d70cc
|
[diffusion] CI: dynamic load-balanced partitioning for diffusion CI (#15528)
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: SGLang CI <ci@sglang.ai>
|
2026-04-12 13:02:43 +08:00 |
|
Alison Shao
|
f21d23a211
|
ci: use local NVIDIA wheels to avoid re-downloading ~2GB every CI run (#22602)
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
|
2026-04-11 21:32:51 -07:00 |
|
Alison Shao
|
870a21bf39
|
[CI] Remove Slack bot from CI failure monitor (#21581)
Co-authored-by: Alison Shao <alison.shao@Mac.attlocal.net>
|
2026-04-11 20:34:48 -07:00 |
|
Bingxu Chen
|
213027951a
|
[AMD] Upgrade Aiter (#22264)
|
2026-04-10 18:40:43 -07:00 |
|
Cheng Wan
|
b5e4ae7b1a
|
fix: match est_time updates by backend, not just suite (#22563)
|
2026-04-10 17:54:50 -07:00 |
|
Cheng Wan
|
0011d2aec0
|
fix: track est_time per suite instead of per backend (#22557)
|
2026-04-10 16:58:40 -07:00 |
|
Sahithi Chigurupati
|
451320596f
|
[CI] Add GB200 nightly perf regression pipeline (#22461)
|
2026-04-10 15:12:24 -07:00 |
|
Cheng Wan
|
3f39b3d811
|
feat: add weekly workflow to update CI test est_time values (#22545)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-10 15:03:37 -07:00 |
|
Ratish P
|
cf5ad12612
|
[diffusion][CI]: route multimodal component accuracy through run_suite (#21960)
|
2026-04-10 23:06:03 +08:00 |
|
tfhddd
|
c431b11d8b
|
[CI] Use UV to improve pip install speed (#22029)
|
2026-04-09 09:18:32 +08:00 |
|
Alison Shao
|
e41647f52b
|
[CI] Add pre-commit hook to validate test/registered/ files have CI registry (#22308)
Co-authored-by: Alison Shao <alison.shao@Mac.attlocal.net>
Co-authored-by: Alison Shao <alison.shao@MacBook-Pro-D2W773R9CD.local>
|
2026-04-08 15:59:15 -07:00 |
|
Rain Jiang
|
1a8eb890f6
|
Kernels community fa3 (#20796)
|
2026-04-07 12:48:44 -07:00 |
|
Kangyan-Zhou
|
596c34ee04
|
Update ci_auto_bisect.py to have streak 1 so that all failures will b… (#22161)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-05 10:39:19 -07:00 |
|
Kangyan-Zhou
|
edee9ae929
|
Update ci_auto_bisect.py to use correct model (#22142)
|
2026-04-04 23:57:52 -07:00 |
|
Kangyan-Zhou
|
8cbeacd783
|
feat: CI auto-bisect workflow for automated regression analysis (#22119)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-04 18:58:18 -07:00 |
|
Mick
|
efee62efa6
|
[diffusion] CI: improve diffusion comparison benchmark setting for realistic perf and auto-discover ut (#22086)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-04 23:20:37 +08:00 |
|
Xiaoyu Zhang
|
da25b471e3
|
Align diffusion nightly presets and broaden skill discovery (#22099)
|
2026-04-04 21:43:52 +08:00 |
|
Prozac614
|
db3d4f4b76
|
[diffusion] model: support two stage pipeline of LTX-2 (#20707)
Co-authored-by: daiweitao <dwti614707404@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: GMI Xiao Jin <xiao.j@gmicloud.ai>
|
2026-04-04 09:37:28 +08:00 |
|
Liangsheng Yin
|
5118295f7b
|
[CI] Support CPU stage and auto-batch same-stage files in /rerun-test (#22081)
|
2026-04-03 15:56:54 -07:00 |
|
Mick
|
838f815e9f
|
[diffusion] CI: temporarily disable accuracy ci (#22031)
|
2026-04-03 17:39:29 +08:00 |
|
Duyi-Wang
|
ac593fed90
|
[AMD][Dockerfile] Support build-arg AITER_COMMIT for rocm.Dockerfile (#21949)
|
2026-04-03 01:54:28 -07:00 |
|
monkeyLoveding
|
658a2813d8
|
[NPU] Update CI Dependency (#21578)
|
2026-04-03 16:22:11 +08:00 |
|
Liangsheng Yin
|
4cc970290d
|
[CI] Fix duplicate job names that bypass branch protection (#22001)
|
2026-04-02 23:59:35 -07:00 |
|
Feng Su
|
8732b2e9c6
|
[CI] [Tracing] Add ci for tracing and fix bugs (#21740)
|
2026-04-02 10:50:50 -07:00 |
|
David Cheung
|
ed427e1299
|
Migrate all callers from /get_server_info to /server_info (#21463)
|
2026-04-01 21:17:50 -07:00 |
|
Prozac614
|
24997fe42c
|
[diffusion] CI: add initial nvfp4 ci test for b200 (#21767)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2026-04-02 11:31:08 +08:00 |
|
Shangming Cai
|
7004df6094
|
chore: bump mooncake version to 0.3.10.post1 (#21844)
|
2026-04-02 10:54:22 +08:00 |
|
Noa Neria
|
8d9145d97e
|
Direct model loading from object storage with Runai Model Streamer (#17948)
Signed-off-by: Noa Neria <noa@run.ai>
|
2026-04-01 18:41:22 -07:00 |
|
Ratish P
|
4f5b55e379
|
[diffusion][CI]: Add individual component accuracy CI for diffusion models (#18709)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
|
2026-04-01 21:51:36 +08:00 |
|
Douglas Yang
|
1b45d81e91
|
fix: only showing recent runners from ci failure analysis (#21015)
|
2026-03-31 20:18:17 -07:00 |
|