[ck] Unify Build_CK and buildHipClangJob into buildAndTest
(#8108)
## Motivation
`projects/composablekernel/vars/ck.groovy` had two near-identical build
functions, `buildHipClangJob` (lean: static checks, FMHA, tile-engine,
conv) and `Build_CK` (main per-arch matrix). This removes the
duplication and fixes a latent GitHub-status bug that lived in both.
## Technical Details
- Merged both into one `buildAndTest(Map conf)` gated by an explicit
`is_main_build` flag (default `false` = lean path; `true` adds the GPU
check + arch-gated inductor/perf/hipTensor; only `runBuildCKAndTests`
sets it).
- Deleted the `Build_CK_and_Reboot` / `buildHipClangJobAndReboot`
wrappers (they only logged and re-threw); all 13 call sites now call
`buildAndTest` directly.
- Widened the shared `catch` to `Exception` so build / image-pull / "GPU
not found" failures report **failure** instead of leaving the check
stuck **pending** (failing stages now go red).
- Removed the dead `no_reboot` key. No change to what is built or
tested.
## Test Plan
- Jenkins linter on the `Jenkinsfile`.
- One branch run covering both paths (per-arch matrix + lean stages),
spot-checking gfx1250 and a nogpu stage.
## Test Result
- Verified statically: no `buildHipClangJob*` / `Build_CK*` references
remain; `buildAndTest` defined once, all call sites wired.
- Pending: linter + branch run before merge.
## Submission Checklist
- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
[CK] Load ck.groovy via Jenkins Shared Library
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Motivation
This allows the CI service to have a configuration source-of-truth
outside the PR under test, allowing rapid system changes. Bug fixes on
the develop branch propagate immediately to all pipelines that don't
override the parameter -- no rebase required.
A new `USE_CURRENT_BRANCH_FOR_CK_GROOVY` parameter lets contributors
test pipeline changes on their own branch without any extra
configuration.
## Technical Details
- `loadCk()` in the Jenkinsfile is updated to call
`library("ck@${branch}").ck.get()` instead of `checkout scm` + `load
"vars/ck.groovy"`. The `checkout scm` inside `loadCk()` is removed since
Jenkins now handles the library fetch internally.
- A `USE_CURRENT_BRANCH_FOR_CK_GROOVY` boolean parameter (default: off)
is added. When off, `ck.groovy` is always loaded from `develop` — all
normal PR builds are unaffected. When on, `ck.groovy` is loaded from the
current branch automatically via `env.CHANGE_BRANCH`, so contributors
testing pipeline changes just tick the box.
- `return this` is removed from the end of `ck.groovy`. This was
required by the `load` convention but is not needed (and can cause
errors) in a shared library context.
- `loadCk()` is kept at every call site rather than called once at the
top, preserving restart-from-stage safety — if a build is restarted from
a mid-pipeline stage, `ck` is still initialized correctly.
- The Jenkins Shared Library named `"ck"` must be registered in Jenkins
Global Pipeline Libraries
## Test Plan
1. Trigger "Build with Parameters" on the PR branch with
`USE_CURRENT_BRANCH_FOR_CK_GROOVY=true`
2. Verify "Determine CI Execution" stage completes and the library()
calls indicates the current branch
3. Verify "Static checks" stage completes.
4. Trigger a second build with `USE_CURRENT_BRANCH_FOR_CK_GROOVY=false`
(default) to confirm normal builds still load from `develop`.
## Test Result
Verified both paths. The develop library is loaded by default, the
branch library is loaded when the parameter is enabled.
## Submission Checklist
- [ X ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
[CK] Fix gfx950 AITER Sync Regressions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Summary
Fixes three gfx950 regressions in the AITER downstream CI that surfaced
after the internal/gfx1250 re-sync (ROCm/rocm-libraries#6978):
> **Companion aiter PR:** ROCm/aiter#3392 — host-side adaptations
(`Kernel::BlockSize()` `constexpr` drops, blockscale `KBatch=1` clamp)
plus the CK submodule bump used to validate these fixes together.
- **FlyDSL MoE AOT cache miss** — the AITER MoE tests run with
`check_aot_cache=True` and fail on any FlyDSL JIT cache miss, but the CI
never pre-compiles the FlyDSL MoE kernels, so gfx950 always misses.
Pre-compile them at the start of the AITER test stage.
- **`buffer.load.lds.v4i32` link error** — ROCm/rocm-libraries#6978
reintroduced a clang-version guard mapping
`llvm.amdgcn.raw.buffer.load.lds` to a `.v4i32`-suffixed name. That name
exists in no LLVM (the rsrc operand is a fixed, non-overloaded `<4 x
i32>`, so the intrinsic is never type-mangled), so gfx950 4-DWORD
direct-to-LDS (e.g. fp4 MoE bpreshuffle) fails to link with `lld:
undefined symbol: llvm.amdgcn.raw.buffer.load.lds.v4i32`. Use the
canonical plain name unconditionally.
- **mixed-precision flatmm warp-GEMM call** — ROCm/rocm-libraries#6978
generalized the scaled `WarpGemmImpl::operator()` from a fixed `<index_t
opselA, index_t opselB>` signature to a variadic `<typename... Params>`
one and updated the `mx_flatmm` pipeline to pass the op-selectors as
`OpSelA<>`/`OpSelB<>` types, but missed the mixed-precision flatmm
pipeline (`F8xMXF4`/`F16xMXF4`), which still passed raw integer
op-selectors. These no longer bind to `typename... Params` (`error: no
matching member function for call to 'operator()'`), breaking
compilation of the fp8/bf16 × fp4 cktile MoE gemm1 instances on gfx950
(aiter `test_moe_2stage`). Wrap the op-selectors in
`OpSelA<>`/`OpSelB<>`.
## Changes
- `Jenkinsfile`: pre-compile the FlyDSL MoE AOT cache (`python3
aiter/aot/flydsl/moe.py`) before the AITER tests.
- `include/ck/utility/amd_buffer_addressing_builtins.hpp` and
`include/ck_tile/core/arch/amd_buffer_addressing_builtins.hpp`: drop the
`__clang_major__` guard and always use
`__asm("llvm.amdgcn.raw.buffer.load.lds")`. The plain name is the
canonical one for all sizes including the gfx950 16-byte form, as the
upstream LLVM gfx950 tests confirm.
-
`include/ck_tile/ops/flatmm/pipeline/mixed_prec_flatmm_pipeline_agmem_bgmem_creg_v1.hpp`:
wrap the warp-GEMM op-selectors in `OpSelA<>`/`OpSelB<>` at the five
call sites, matching the `mx_flatmm` pipeline.
## Test plan
Validated via CI.
[CK] Upgrade to new gfx1250 compiler and fix build issues
(#7960)
## Motivation
The docker image we've been using to build for gfx1250 is a few months
old, so we need to upgrade. Some of the changes in the latest compiler
version require changes in the code. TDM is temporarily disabled due to
changes in the lds load/store intrinsics.
## Technical Details
<!-- Explain the changes along with any relevant GitHub links. -->
## Test Plan
<!-- Explain any relevant testing done to verify this PR. -->
## Test Result
<!-- Briefly summarize test outcomes. -->
## Submission Checklist
- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
[CK] Extract Jenkinsfile helpers into vars/ck.groovy shared
library (#7743)
## Motivation
The CK Jenkinsfile is a 2,215-line monolith mixing helper function
definitions with pipeline stage declarations. This makes it difficult to
review, modify, or extend CI stages without wading through unrelated
infrastructure code.
## Technical Details
Extract all helper functions from the Jenkinsfile into vars/ck.groovy,
loaded at runtime via ck = load "vars/ck.groovy" in the first stage. The
Jenkinsfile is reduced from 2,215 lines to 810 lines containing only the
pipeline structure.
- 36 helper functions moved to ck.groovy with no logic changes
- 10 new stage-wrapper functions (runBuildCKAndTests,
runTileEngineGemmTests, runClangFormat, etc.) extract inline
environment{}/steps{} business logic from stages, eliminating the
MethodTooLargeException caused by CPS-transformed shell strings
exceeding the JVM 64KB bytecode limit
- All ck. method calls in steps{} blocks wrapped in script{} as required
by Jenkins Declarative Pipeline
- rocmnode() remains in the Jenkinsfile (needed for agent{} labels
before ck is loaded)
- CRON_SETTINGS / POLL_SPEC remain in the Jenkinsfile (triggers{}
evaluates at parse time before any workspace is available)
- No stage names changed
## Test Plan
- Jenkinsfile validated against the Jenkins Pipeline Linter
(/pipeline-model-converter/validate)
- All 35 shared helper functions diffed line-by-line against develop to
verify no regressions
- Merge from develop incorporated and verified (gfx1250 stage, ROCm 7.13
default, cmake_build updates)
## Test Result
- Linter: passes
- Function diff vs develop: all 35 functions match exactly
- Awaiting Jenkins run to confirm end-to-end stage execution
## Submission Checklist
- [ x ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.