Alison Shao
|
f9c3def7fe
|
Fix CI: add flashinfer --download-cubin to install dependencies (#18887)
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2026-02-16 13:50:10 -08:00 |
|
Douglas Yang
|
f1efb46bdd
|
fix: adding performance logging for nightly diffusion (#18023)
|
2026-02-16 14:09:00 +08:00 |
|
SoluMilken
|
07a24f1a38
|
update pre-commit config (#18860)
|
2026-02-16 00:18:31 +08:00 |
|
Kangyan-Zhou
|
eccf875d49
|
[CI] Revive 8-GPU trace upload in nightly test workflow (#18820)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-14 08:37:08 +08:00 |
|
Mohammad Miadh Angkad
|
1be41e9036
|
[FlashInfer] Bump FlashInfer version from 0.6.2 to 0.6.3 (#18448)
|
2026-02-14 07:43:33 +08:00 |
|
Ke Bao
|
a6c4b52ac5
|
Cleanup unused rerun stages (#18788)
|
2026-02-13 17:44:42 +08:00 |
|
Kangyan-Zhou
|
1b8f68af57
|
Fix B200 installation issue (#18725)
|
2026-02-12 22:06:23 +08:00 |
|
Alison Shao
|
f20b1703ce
|
[CI] Fix torchaudio/torchvision CUDA version mismatch (#18211)
|
2026-02-11 23:47:32 -08:00 |
|
YC Tseng
|
20554a0a4f
|
[AMD] rocm 7.2 image release, PR test, Nightly Test (#17799)
Co-authored-by: Alan Kao <akao@amd.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: Michael <13900043+michaelzhang-ai@users.noreply.github.com>
|
2026-02-11 21:29:25 -08:00 |
|
Alison Shao
|
7eaf866846
|
[CI] Install python3-dev for Triton JIT compilation on fresh runners (#18644)
|
2026-02-11 16:28:57 -08:00 |
|
Alison Shao
|
dcc63dc545
|
[CI] Guard python3 call in install script for fresh runners (#18609)
|
2026-02-12 00:05:29 +08:00 |
|
Bingxu Chen
|
316f9cbb35
|
[AMD] add amd ci monitor (#17476)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
Co-authored-by: YC Tseng <yctseng@amd.com>
|
2026-02-09 09:04:54 -08:00 |
|
YC Tseng
|
28717e3d28
|
[AMD] CI - Fix AMD daily image release and install dependency (#18452)
Co-authored-by: Bingxu Chen <bingxche@amd.com>
|
2026-02-08 22:20:09 -08:00 |
|
Bingxu Chen
|
3f3c201243
|
[AMD] Update aiter to v0.1.10.post2 (#18423)
Co-authored-by: kkHuang-amd <wunhuang@amd.com>
Co-authored-by: YC Tseng <yctseng@amd.com>
|
2026-02-08 22:08:24 -08:00 |
|
Shangming Cai
|
52401bec1d
|
chore: bump mooncake version to 0.3.9 (#18316)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-02-07 17:30:01 +08:00 |
|
Alison Shao
|
bedade1ef0
|
Merge stage-c-test-large-4-gpu suites into partitioned suites (#18325)
|
2026-02-06 15:32:33 -08:00 |
|
Zhaoyi Li
|
8e933e1914
|
AMD PD/D PR ci (#17183)
Co-authored-by: YC Tseng <yctseng@amd.com>
Co-authored-by: Bingxu Chen <bingxche@amd.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
|
2026-02-02 23:29:14 -08:00 |
|
sunxxuns
|
47592a23c7
|
[CI] Fix AMD CI by inlining dummy_grok config (#18044)
Co-authored-by: root <root@mi300x8-005.atl1.do.cpe.ice.amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-02-01 00:20:57 -08:00 |
|
Kangyan-Zhou
|
e5ac6229e1
|
Fix installation script for H200 runners (#18050)
|
2026-01-31 23:30:51 -08:00 |
|
Alison Shao
|
a0bae4c343
|
Migrate 4-GPU/8-GPU workflow jobs to stage-c and add CI registry decorators (#17299)
|
2026-01-31 22:37:22 -08:00 |
|
Kangyan-Zhou
|
2cd2c3118d
|
Add concurrency tracking to runner utilization report (#17963)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-29 17:31:55 -08:00 |
|
Alison Shao
|
1f75c2af4d
|
Fix /tag-and-rerun-ci to do full rerun when PR has sgl-kernel changes (#17729)
|
2026-01-29 12:54:30 -08:00 |
|
Kangyan-Zhou
|
c0b4dd68a2
|
Add a performance dashboard server and frontend for nightly CUDA tests (#17725)
|
2026-01-27 22:22:33 -08:00 |
|
YC Tseng
|
52bca42870
|
[AMD] CI - enable deepseekv3.2 on MI325-8gpu and merge perf/accuracy test suites into stage-b suites (#17633)
Co-authored-by: Bingxu Chen <Bingxu.Chen@amd.com>
|
2026-01-27 18:54:36 -08:00 |
|
Hubert Lu
|
93423ff780
|
[AMD] Deprecate ROCm 6.3 artifacts and standardize gfx942 on ROCm 7 (#17785)
|
2026-01-27 15:58:49 -08:00 |
|
monkeyLoveding
|
d578b41bad
|
[NPU] Adapt cann 8.5: use sfa and lightning indexer op from cann and CI update (#17615)
Co-authored-by: Kelon <kelonlu@163.com>
|
2026-01-27 19:03:53 +08:00 |
|
Makcum888e
|
bba6e38ff8
|
[NPU] Split pyproject npu from pyproject other (#17641)
|
2026-01-26 09:45:44 -08:00 |
|
shaharmor98
|
f6f1b6d000
|
Bump FI version (#17700)
Signed-off-by: Shahar Mor <smor@nvidia.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
|
2026-01-26 16:50:06 +08:00 |
|
Alison Shao
|
7b22b8ff8a
|
Fix sgl-kernel install: fail instead of PyPI fallback when artifacts missing (#17728)
|
2026-01-26 11:46:49 +08:00 |
|
Kangyan-Zhou
|
344eeaee90
|
Upload nightly test metrics to GH artifacts (#17696)
|
2026-01-25 14:35:14 -08:00 |
|
Makcum888e
|
d1042e0d62
|
[Refactore] [CI] Remove redundant CI test runs step 2 (#17584)
|
2026-01-24 23:39:48 -08:00 |
|
Alison Shao
|
b23470e95a
|
Fix CI install failure when rerunning tests via workflow_dispatch (#17612)
|
2026-01-23 00:04:16 -08:00 |
|
YC Tseng
|
04a10c9bc2
|
[AMD] CI - migrate perf test and fix stage-b-test-1-gpu-amd (#17340)
Co-authored-by: Bingxu Chen <bingxche@amd.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: michaelzhang-ai <michaelzhang.ai@users.noreply.github.com>
|
2026-01-22 18:45:05 -08:00 |
|
Michael
|
a3addd6203
|
[AMD] Add DeepSeek-V3.2 and VLMs model in nightly tests (#17179)
Co-authored-by: michaelzhang-ai <michaelzhang-ai@users.noreply.github.com>
Co-authored-by: YC Tseng <yctseng@amd.com>
Co-authored-by: Bingxu Chen <bingxche@amd.com>
|
2026-01-19 20:31:56 -08:00 |
|
Alison Shao
|
fb88fb672e
|
fix(ci): rate limit and permission errors in trace publishing (#17238)
|
2026-01-18 23:20:22 -08:00 |
|
Alison Shao
|
7edb06158e
|
Add runner utilization report workflow (#17234)
|
2026-01-17 19:28:05 -08:00 |
|
fzyzcjy
|
a7b5f75d88
|
Support integration tests with Redis binary (#17045)
|
2026-01-17 11:59:04 +08:00 |
|
Alison Shao
|
b4fce9955a
|
Add CI Coverage Overview workflow with detailed test listings (#16842)
|
2026-01-16 09:42:50 -08:00 |
|
Baizhou Zhang
|
a04675892e
|
Update flashinfer to 0.6.1 (#15551)
|
2026-01-17 00:48:30 +08:00 |
|
YC Tseng
|
968c4f55b1
|
[AMD] Enable DeepseekV3.2 test for AMD CI (#16934)
|
2026-01-15 21:58:46 -08:00 |
|
Hudson Xing
|
21ee597e4a
|
ci: enable offline mode when local cache is complete to avoid HF Hub … (#16121)
|
2026-01-15 20:15:33 -08:00 |
|
Alison Shao
|
146b5fcc84
|
[CI] Reorganize stage-b 1-GPU tests for 5090 compatibility (#16826)
|
2026-01-15 15:23:35 -08:00 |
|
Bingxu Chen
|
98096b5e02
|
[AMD CI] migrate and re-enable CI tests to new CI registry (#16949)
Co-authored-by: yctseng0211 <yctseng@amd.com>
|
2026-01-14 21:25:25 -08:00 |
|
Alison Shao
|
b880607108
|
Add 5090 dry run stage to PR test workflow (#17022)
|
2026-01-13 14:12:33 -08:00 |
|
James
|
ae0baefb94
|
[NPU] upgrade npu mf_apater plugin (#15853)
|
2026-01-13 09:02:10 +08:00 |
|
Alison Shao
|
17cb3c8e49
|
Enable /rerun-stage workflow URL lookup for fork PRs (#16851)
|
2026-01-11 23:05:37 +08:00 |
|
Alison Shao
|
9c64a15ad4
|
feat: add workflow run URL to /rerun-stage comment (#16825)
|
2026-01-10 10:41:20 +08:00 |
|
Alison Shao
|
ef35d8fe4e
|
Migrate VLM tests and remove unit-test-backend-1-gpu job (#16679)
|
2026-01-09 15:24:26 -08:00 |
|
Shangming Cai
|
0c4e155a3c
|
chore: bump mooncake version to 0.3.8.post1 (#16792)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2026-01-09 18:42:27 +08:00 |
|
YC Tseng
|
ccd0fb3291
|
[AMD] Change AITER package name (#16721)
|
2026-01-08 21:17:20 -08:00 |
|