Commit Graph

9 Commits

Author SHA1 Message Date
Yi DING
eaaed3e35e [rocm-libraries] ROCm/rocm-libraries#6563 (commit 6559ac9)
[CK] Add render group to AITER and FA dockers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

The AITER and FA test dockers (`Dockerfile.aiter`, `Dockerfile.fa`)
inherit from the `rocm/pytorch` base image. Recent updates to that base
image dropped the `render` group from `/etc/group`, so every parallel
test stage now fails on the test agents with:

```
docker: Error response from daemon: Unable to find group render:
no matching entries in group file.
```

Jenkins resolves `--group-add render` against the **container's**
`/etc/group`, not the host's, so even though the test agents have render
in their `/etc/group` (GID 109), the container lookup fails.

This pattern affects every recent develop build
([#673](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/673),
[#674](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/674),
[#686](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/686),
[#688](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/688),
[#699](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/699),
[#708](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/708)
— 6 days in a row), where AITER tests fail in seconds and the cascading
failure aborts all downstream Build/FMHA/TILE_ENGINE stages.

## Technical Details

Add `groupadd -f render` to both `Dockerfile.aiter` and `Dockerfile.fa`,
mirroring what the main `Dockerfile` already does (`Dockerfile:96`) and
what `Dockerfile.pytorch` does (`Dockerfile.pytorch:4`). The `-f` flag
makes it idempotent — silently succeeds if the group already exists.

This guarantees the `render` group is always present in the container,
regardless of whether the base image happens to ship it.

## Test Plan
Triggering AITER CI job:

## Test Result

## Submission Checklist

- [x] Look over the contributing guidelines at

https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-21 05:36:37 +00:00
Illia Silin
2f98c7bbef [rocm-libraries] ROCm/rocm-libraries#5891 (commit 82563ff)
fix AITER docker setup

## Motivation

Add a new python package required to build AITER.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-27 04:36:16 +00:00
Yi DING
a8e2ec22cf [rocm-libraries] ROCm/rocm-libraries#5011 (commit b31a678)
[CK] Fix aiter tests in CI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

Updates the CK/AITER CI Docker build to source Composable Kernel either
from `ROCm/rocm-libraries` (via sparse-checkout) or directly from
`ROCm/composable_kernel`, aiming to make aiter tests reliable in CI.

**Changes:**
- Added a build arg to toggle fetching CK from `ROCm/rocm-libraries`
(enabled by default).
- Implemented sparse-checkout + local re-init/commit flow to materialize
CK into a local `ck/` directory.
- Updated aiter’s CK vendoring step to clone from the locally prepared
`ck/` directory.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-02 15:36:01 +00:00
Illia Silin
2ffbf7f476 add tabulate package to aiter docker (#3519) 2026-01-06 09:36:54 -08:00
Illia Silin
3784c0e7c3 add permissions for /tmp folder (#3201) 2025-11-12 11:47:07 -08:00
Illia Silin
e02b1e7caf Fix AITER tests. (#3106)
* change base docker image for aiter

* do not add group irc to aiter docker

* add user and group jenkins

* pip install ninja

* update permissions for /home/jenkins
2025-10-27 20:59:21 -07:00
Illia Silin
b9d69d32a8 Enable FMHA and AITER tests on gfx950. (#2812)
* enable aiter and fmha test stages on gfx950

* use newer compiler for gfx950

* make sure gfx950 runs correct docker

* fix typo

* upgrade base docker for aiter

* change base docker for aiter tests

* do not add group render to ck_aiter image

* add group irc in ck_aiter docker

* do not fix the irc group id to 39

* do not set jenkins uid and gid

* skip group irc for aiter tests

* fix syntax error in dockerfile

* change the base docker for aiter tests

* add irc group back to ck_aiter docker
2025-09-12 12:20:32 -07:00
Illia Silin
7ac850ac72 Add daily AITER tests on gfx942. (#2639)
* add option to select aiter branch, add tests on gfx942
2025-08-08 09:30:46 -07:00
Illia Silin
e6104daecc Add a daily CI stage to test AITER with latest CK. (#2598)
* add a CI stage for AITER testing
2025-08-01 07:55:51 -07:00