mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-12 17:26:00 +00:00
[CK] Add flash_attn tests ## Motivation Add CI support for running [flash-attention](https://github.com/ROCm/flash-attention) tests against CK, similar to existing AITER and PyTorch downstream test pipelines. ## Technical Details ### New: `Dockerfile.fa` A new Dockerfile that builds a flash-attention test image on top of a ROCm PyTorch base image. It: - Sparse-checkouts CK from `rocm-libraries` (or clones directly from `ROCm/composable_kernel`) - Clones and builds `flash-attention` with CK as the backend - Supports configurable `FA_BRANCH`, `CK_FA_BRANCH`, and `GPU_ARCHS` build args ### Updated: `Jenkinsfile` **buildDocker refactor:** - Extracted `buildAndPushDockerImage()` helper that handles both "check if exists, skip" and "force build, push" logic, eliminating the duplicated try/catch blocks - Split monolithic `buildDocker()` into `buildDockerBase()`, `buildDockerPytorch()`, `buildDockerAiter()`, and new `buildDockerFa()` - Each downstream docker build now runs unconditionally within its respective guard (`RUN_PYTORCH_TESTS`, `RUN_AITER_TESTS`, `RUN_FA_TESTS`) - Image digests are stored in env vars (`CK_BASE_IMAGE`, `CK_PYTORCH_IMAGE`, `CK_AITER_IMAGE`, `CK_FA_IMAGE`) for use in downstream stages **run_downstream_tests refactor:** - Merged `run_aiter_tests()` and `run_pytorch_tests()` into a single generic `run_downstream_tests(conf)` that accepts `image`, `timeoutHours`, and `execute_cmds` - Test commands for each downstream target are declared as top-level lists (`RUN_PYTORCH_TESTS_CMDS`, `RUN_AITER_TESTS_CMDS`, `RUN_FA_TESTS_CMDS`) **Pipeline stages:** - Merged "Run Pytorch Tests" and "Run AITER Tests" into a single "Run Downstream Tests" parallel stage - Added two new FA test stages: "Run FA Tests on gfx942" and "Run FA Tests on gfx950" - Added new pipeline parameters: `RUN_FA_TESTS`, `fa_base_docker`, `fa_branch`, `ck_fa_branch` - `ck_pytorch_branch` and `ck_aiter_branch` now default to the current branch instead of hardcoded `develop` - CRON schedule at 13:00 now also triggers `RUN_FA_TESTS=true` ## Test Plan - [x] Trigger pipeline manually with `RUN_FA_TESTS=true` on gfx942 and gfx950 nodes - [x] Verify existing AITER and PyTorch test stages are unaffected - [x] Verify `buildAndPushDockerImage` correctly skips rebuild when image already exists (with `BUILD_DOCKER=false`) ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
44 lines
2.2 KiB
Docker
44 lines
2.2 KiB
Docker
ARG BASE_DOCKER="rocm/pytorch:latest"
|
|
FROM $BASE_DOCKER
|
|
ARG FA_ORIGIN="ROCm"
|
|
ARG FA_BRANCH="tridao"
|
|
ARG CK_FA_ORIGIN="ROCm"
|
|
ARG CK_FA_BRANCH="develop"
|
|
# CK_FROM_ROCM_LIBRARIES - 1: CK from rocm-libraries sparse-checkout; 0: direct clone from ROCm/composable_kernel
|
|
ARG CK_FROM_ROCM_LIBRARIES=1
|
|
ARG GPU_ARCHS="gfx90a;gfx942;gfx950"
|
|
RUN set -x ; \
|
|
sudo mkdir /home/jenkins && \
|
|
sudo mkdir /home/jenkins/workspace && \
|
|
cd /home/jenkins/workspace && rm -rf rocm-libraries ck && \
|
|
if [ "$CK_FROM_ROCM_LIBRARIES" = "1" ]; then \
|
|
git clone --depth 1 -b "$CK_FA_BRANCH" --no-checkout --filter=blob:none https://github.com/$CK_FA_ORIGIN/rocm-libraries.git && \
|
|
cd rocm-libraries && \
|
|
git sparse-checkout init --cone && \
|
|
git sparse-checkout set projects/composablekernel && \
|
|
git checkout "$CK_FA_BRANCH" && \
|
|
ROCM_LIBRARIES_SHA=$(git rev-parse --short HEAD) && \
|
|
mv projects/composablekernel ../ck && \
|
|
cd ../ck && rm -rf ../rocm-libraries && \
|
|
git init && \
|
|
git config user.name "assistant-librarian[bot]" && \
|
|
git config user.email "assistant-librarian[bot]@users.noreply.github.com" && \
|
|
git branch -m "$CK_FA_BRANCH" && git add -A && \
|
|
git commit -m "import from ROCm/rocm-libraries@$ROCM_LIBRARIES_SHA" > /dev/null ; \
|
|
else \
|
|
git clone --depth 1 -b "$CK_FA_BRANCH" https://github.com/$CK_FA_ORIGIN/composable_kernel.git ck ; \
|
|
fi && \
|
|
cd /home/jenkins/workspace && rm -rf flash-attention && \
|
|
git clone --depth 1 -b "$FA_BRANCH" --recursive "https://github.com/$FA_ORIGIN/flash-attention.git" && \
|
|
cd flash-attention && \
|
|
rm -rf csrc/composable_kernel/ && \
|
|
git clone -b "$CK_FA_BRANCH" ../ck csrc/composable_kernel/ && git add csrc/composable_kernel && \
|
|
MAX_JOBS=$(nproc) GPU_ARCHS="$GPU_ARCHS" /opt/venv/bin/python3 -u -m pip install --no-build-isolation -v . && \
|
|
groupadd -g 1001 jenkins && \
|
|
useradd -u 1001 -g 1001 -m -s /bin/bash jenkins && \
|
|
chown -R jenkins:jenkins /home/jenkins && \
|
|
chmod -R a+rwx /home/jenkins && \
|
|
chown -R jenkins:jenkins /tmp && \
|
|
chmod -R a+rwx /tmp && \
|
|
sudo usermod -aG irc jenkins
|