mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-12 09:16:52 +00:00
[CK] Add render group to AITER and FA dockers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Motivation The AITER and FA test dockers (`Dockerfile.aiter`, `Dockerfile.fa`) inherit from the `rocm/pytorch` base image. Recent updates to that base image dropped the `render` group from `/etc/group`, so every parallel test stage now fails on the test agents with: ``` docker: Error response from daemon: Unable to find group render: no matching entries in group file. ``` Jenkins resolves `--group-add render` against the **container's** `/etc/group`, not the host's, so even though the test agents have render in their `/etc/group` (GID 109), the container lookup fails. This pattern affects every recent develop build ([#673](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/673), [#674](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/674), [#686](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/686), [#688](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/688), [#699](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/699), [#708](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/708) — 6 days in a row), where AITER tests fail in seconds and the cascading failure aborts all downstream Build/FMHA/TILE_ENGINE stages. ## Technical Details Add `groupadd -f render` to both `Dockerfile.aiter` and `Dockerfile.fa`, mirroring what the main `Dockerfile` already does (`Dockerfile:96`) and what `Dockerfile.pytorch` does (`Dockerfile.pytorch:4`). The `-f` flag makes it idempotent — silently succeeds if the group already exists. This guarantees the `render` group is always present in the container, regardless of whether the base image happens to ship it. ## Test Plan Triggering AITER CI job: ## Test Result ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
44 lines
2.1 KiB
Docker
44 lines
2.1 KiB
Docker
ARG BASE_DOCKER="rocm/pytorch:latest"
|
|
FROM $BASE_DOCKER
|
|
ARG AITER_BRANCH="main"
|
|
ARG CK_AITER_BRANCH="develop"
|
|
# CK_FROM_ROCM_LIBRARIES - 1: CK from rocm-libraries sparse-checkout; 0: direct clone from ROCm/composable_kernel
|
|
ARG CK_FROM_ROCM_LIBRARIES=1
|
|
RUN pip install pandas zmq einops ninja tabulate vcs_versioning && \
|
|
pip install numpy==1.26.2 && \
|
|
sudo mkdir /home/jenkins && \
|
|
sudo mkdir /home/jenkins/workspace && \
|
|
cd /home/jenkins/workspace && rm -rf rocm-libraries ck && \
|
|
if [ "$CK_FROM_ROCM_LIBRARIES" = "1" ]; then \
|
|
git clone --depth 1 -b "$CK_AITER_BRANCH" --no-checkout --filter=blob:none https://github.com/ROCm/rocm-libraries.git && \
|
|
cd rocm-libraries && \
|
|
git sparse-checkout init --cone && \
|
|
git sparse-checkout set projects/composablekernel && \
|
|
git checkout "$CK_AITER_BRANCH" && \
|
|
ROCM_LIBRARIES_SHA=$(git rev-parse --short HEAD) && \
|
|
mv projects/composablekernel ../ck && \
|
|
cd ../ck && rm -rf ../rocm-libraries && \
|
|
git init && \
|
|
git config user.name "assistant-librarian[bot]" && \
|
|
git config user.email "assistant-librarian[bot]@users.noreply.github.com" && \
|
|
git branch -m "$CK_AITER_BRANCH" && git add -A && \
|
|
git commit -m "import from ROCm/rocm-libraries@$ROCM_LIBRARIES_SHA" ; \
|
|
else \
|
|
git clone --depth 1 -b "$CK_AITER_BRANCH" https://github.com/ROCm/composable_kernel.git ck ; \
|
|
fi && \
|
|
cd /home/jenkins/workspace && rm -rf aiter && \
|
|
git clone --depth 1 -b "$AITER_BRANCH" --recursive https://github.com/ROCm/aiter.git && \
|
|
cd aiter && \
|
|
rm -rf 3rdparty/composable_kernel/ && \
|
|
git clone -b "$CK_AITER_BRANCH" ../ck 3rdparty/composable_kernel/ && \
|
|
python3 setup.py develop && \
|
|
groupadd -g 1001 jenkins && \
|
|
useradd -u 1001 -g 1001 -m -s /bin/bash jenkins && \
|
|
groupadd -f video && \
|
|
groupadd -f render && \
|
|
chown -R jenkins:jenkins /home/jenkins && \
|
|
chmod -R a+rwx /home/jenkins && \
|
|
chown -R jenkins:jenkins /tmp && \
|
|
chmod -R a+rwx /tmp && \
|
|
sudo usermod -aG irc jenkins
|