From eaaed3e35e64a078990c7af1da2dbc64cdf2b3be Mon Sep 17 00:00:00 2001 From: Yi DING <28386673+DDEle@users.noreply.github.com> Date: Tue, 21 Apr 2026 05:36:37 +0000 Subject: [PATCH] [rocm-libraries] ROCm/rocm-libraries#6563 (commit 6559ac9) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [CK] Add render group to AITER and FA dockers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Motivation The AITER and FA test dockers (`Dockerfile.aiter`, `Dockerfile.fa`) inherit from the `rocm/pytorch` base image. Recent updates to that base image dropped the `render` group from `/etc/group`, so every parallel test stage now fails on the test agents with: ``` docker: Error response from daemon: Unable to find group render: no matching entries in group file. ``` Jenkins resolves `--group-add render` against the **container's** `/etc/group`, not the host's, so even though the test agents have render in their `/etc/group` (GID 109), the container lookup fails. This pattern affects every recent develop build ([#673](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/673), [#674](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/674), [#686](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/686), [#688](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/688), [#699](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/699), [#708](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/708) — 6 days in a row), where AITER tests fail in seconds and the cascading failure aborts all downstream Build/FMHA/TILE_ENGINE stages. ## Technical Details Add `groupadd -f render` to both `Dockerfile.aiter` and `Dockerfile.fa`, mirroring what the main `Dockerfile` already does (`Dockerfile:96`) and what `Dockerfile.pytorch` does (`Dockerfile.pytorch:4`). The `-f` flag makes it idempotent — silently succeeds if the group already exists. This guarantees the `render` group is always present in the container, regardless of whether the base image happens to ship it. ## Test Plan Triggering AITER CI job: ## Test Result ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --- Dockerfile.aiter | 2 ++ Dockerfile.fa | 2 ++ 2 files changed, 4 insertions(+) diff --git a/Dockerfile.aiter b/Dockerfile.aiter index ebfef41643..8d6e995656 100644 --- a/Dockerfile.aiter +++ b/Dockerfile.aiter @@ -34,6 +34,8 @@ RUN pip install pandas zmq einops ninja tabulate vcs_versioning && \ python3 setup.py develop && \ groupadd -g 1001 jenkins && \ useradd -u 1001 -g 1001 -m -s /bin/bash jenkins && \ + groupadd -f video && \ + groupadd -f render && \ chown -R jenkins:jenkins /home/jenkins && \ chmod -R a+rwx /home/jenkins && \ chown -R jenkins:jenkins /tmp && \ diff --git a/Dockerfile.fa b/Dockerfile.fa index c5cbacfc16..47643310bd 100644 --- a/Dockerfile.fa +++ b/Dockerfile.fa @@ -36,6 +36,8 @@ RUN set -x ; \ MAX_JOBS=$(nproc) GPU_ARCHS="$GPU_ARCHS" /opt/venv/bin/python3 -u -m pip install --no-build-isolation -v . && \ groupadd -g 1001 jenkins && \ useradd -u 1001 -g 1001 -m -s /bin/bash jenkins && \ + groupadd -f video && \ + groupadd -f render && \ chown -R jenkins:jenkins /home/jenkins && \ chmod -R a+rwx /home/jenkins && \ chown -R jenkins:jenkins /tmp && \