312 Commits

Author SHA1 Message Date
Brock Hargreaves
1b649a8d4b [rocm-libraries] ROCm/rocm-libraries#8332 (commit 48c389c)
[CK][CI] Retry builds on node failure with automatic
 rerouting (#8332)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

When a Jenkins node enters a bad state (missing GPU driver, dead Docker
daemon, full disk), every PR scheduled onto it fails the same way until
a human manually takes it offline. Some failures are also transient and
would pass on a simple retry. Today the pipeline does neither — every
failure goes straight to red on the same node.

## Technical Details

Two new retry behaviors based on failure type:
- **Different node** for persistent node faults (driver missing, daemon
down, disk full, container won't start)
- **Retry in place** for transient glitches (registry pull, DNS), then a
different node if retries are exhausted

Real build/compile failures and aborted builds are never retried.

**New:** `src/org/ck/NodeFault.groovy`, `TransientFault.groovy` — typed
exceptions in the shared library `src/` for stable classloader identity
under dynamic library loading.

**`vars/ck.groovy`:** adds `preflight()` (host health checks before
build), `pullImage()` (classifying pull failures at the call site,
replacing `getDockerImage()`), `runOnHealthyNode()` (outer reroute loop,
up to 3 nodes), `runInPlace()` (same-node transient retries). GitHub
failure status is only set once all retries are exhausted.

**`Jenkinsfile`:** all active `Build CK and run Tests` stages converted
to `agent none` + `ck.runOnHealthyNode(…)`.

## Test Plan

Tested on `users/brockhargreaves-amd/ck/node-failure-retry-logic` with
`USE_CURRENT_BRANCH_FOR_CK_GROOVY=true`. Verified preflight logging,
reroute on node fault, attempt counter in logs, no retry on aborts, and
single failure status report after budget exhausted.

## Test Result

Retry logic working as expected. Three bugs found and fixed during
testing: false `NodeFault` from host-level sccache probe (sccache is
in-container), `null` node name in catch logging, and `sh` calls outside
`node()` context in status reporting.

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-06-15 17:40:10 +00:00
John Afaganis
329e589840 [rocm-libraries] ROCm/rocm-libraries#8260 (commit 1139236)
[ck] Enforce LF-only line endings in C/C++ sources
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

Several CK source files carry Windows **CRLF** line endings (a trailing
carriage return on each line), introduced by editors configured for
Windows endings or copy/paste from Windows tooling. These are purely
cosmetic but they pollute diffs (whole-file churn the first time someone
makes an LF edit), confuse `clang-format`, and are inconsistent with the
LF-only convention used across the rest of the tree.

This PR (a) normalizes every existing CRLF file (6 files) to LF and (b)
adds a pre-checkin gate so new CRLF leaks are rejected before merge.

## File extensions covered

Both the cleanup scan and the new Jenkins enforcement stage use the same
predicate as the adjacent `ASCII Only Check` stage:

```
*.h  *.hpp  *.cpp  *.h.in  *.hpp.in  *.cpp.in  *.inc  *.cl
```

(excluding `*/build/*` and `*/include/rapidjson/*`). The local
pre-commit hook's `c++/inc` type filter covers the same set.

## Why no enforcement today

CK is opted out of the rocm-libraries root `.pre-commit-config.yaml`, so
the existing `pre-commit` workflow doesn't touch CK. The local CK
`.pre-commit-config.yaml` only runs for developers who installed hooks.
The **authoritative gate is therefore the new Jenkins stage** in this
PR; the local hook is convenience.

## Commit layout (bisect-friendly)

1. `[ck] Normalize CRLF line endings to LF in C/C++ sources`
Mechanical line-ending cleanup across 6 files. No content change: every
edit is purely CRLF -> LF, verified with `git diff --ignore-cr-at-eol`
reporting an empty diff.

2. `[ck] Enforce LF-only line endings in C/C++ sources`
- New `projects/composablekernel/script/check_no_crlf.sh` (modeled on
`check_ascii_only.sh`).
- New `crlf-checker` entry in
`projects/composablekernel/.pre-commit-config.yaml` under the
local-hooks block (`types_or: [c++, inc]`).
- New `CRLF Check` parallel stage in
`projects/composablekernel/Jenkinsfile`'s `Static checks` block,
mirroring the adjacent `ASCII Only Check` stage. Always-on, no
`RUN_CPPCHECK` gate.

The tree is buildable at every commit boundary. Commit 1 leaves 0 CRLF
violations; commit 2 wires the gate.

## Demo

Script output on a synthesized violation:

```
$ printf 'int main() {}\r\n' > /tmp/bad.cpp
$ projects/composablekernel/script/check_no_crlf.sh /tmp/bad.cpp
ERROR: /tmp/bad.cpp contains CRLF (Windows) line endings:
1:int main() {}<CR>
  Fix: convert to LF, e.g. 'sed -i 's/\r$//' /tmp/bad.cpp' or 'dos2unix /tmp/bad.cpp'
$ echo $?
1
```

Full repo scan after the cleanup commit:

```
$ cd projects/composablekernel && find . -type f \( -name '*.h' -o -name '*.hpp' -o -name '*.cpp' \
    -o -name '*.h.in' -o -name '*.hpp.in' -o -name '*.cpp.in' -o -name '*.inc' -o -name '*.cl' \) \
    -not -path '*/build/*' -not -path '*/include/rapidjson/*' -print0 \
  | xargs -0 -P 8 -n 64 script/check_no_crlf.sh
$ echo $?
0
```

## Test plan

- [ ] Jenkins PR build: confirm new `Static checks -> CRLF Check` stage
runs green over the full predicate and the existing `ASCII Only Check` /
`Clang Format` stages are unaffected.
- [ ] Local: `pre-commit run crlf-checker --all-files` runs cleanly
after installing CK pre-commit hooks.
- [ ] Manually inject a CRLF line ending in any `.cpp/.hpp/.inc` file,
push: confirm Jenkins fails the new stage with a clear error.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-12 21:11:59 +00:00
Brock Hargreaves
65c395984d [rocm-libraries] ROCm/rocm-libraries#8108 (commit c620f0a)
[ck] Unify Build_CK and buildHipClangJob into buildAndTest
 (#8108)

## Motivation

`projects/composablekernel/vars/ck.groovy` had two near-identical build
functions, `buildHipClangJob` (lean: static checks, FMHA, tile-engine,
conv) and `Build_CK` (main per-arch matrix). This removes the
duplication and fixes a latent GitHub-status bug that lived in both.

  ## Technical Details

- Merged both into one `buildAndTest(Map conf)` gated by an explicit
`is_main_build` flag (default `false` = lean path; `true` adds the GPU
check + arch-gated inductor/perf/hipTensor; only `runBuildCKAndTests`
sets it).
- Deleted the `Build_CK_and_Reboot` / `buildHipClangJobAndReboot`
wrappers (they only logged and re-threw); all 13 call sites now call
`buildAndTest` directly.
- Widened the shared `catch` to `Exception` so build / image-pull / "GPU
not found" failures report **failure** instead of leaving the check
stuck **pending** (failing stages now go red).
- Removed the dead `no_reboot` key. No change to what is built or
tested.

  ## Test Plan

  - Jenkins linter on the `Jenkinsfile`.
- One branch run covering both paths (per-arch matrix + lean stages),
spot-checking gfx1250 and a nogpu stage.

  ## Test Result

- Verified statically: no `buildHipClangJob*` / `Build_CK*` references
remain; `buildAndTest` defined once, all call sites wired.
  - Pending: linter + branch run before merge.

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-06-08 23:45:42 +00:00
Brock Hargreaves
4e1296674d [rocm-libraries] ROCm/rocm-libraries#7990 (commit b8b5b43)
[CK] Load ck.groovy via Jenkins Shared Library
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

This allows the CI service to have a configuration source-of-truth
outside the PR under test, allowing rapid system changes. Bug fixes on
the develop branch propagate immediately to all pipelines that don't
override the parameter -- no rebase required.

A new `USE_CURRENT_BRANCH_FOR_CK_GROOVY` parameter lets contributors
test pipeline changes on their own branch without any extra
configuration.

## Technical Details

- `loadCk()` in the Jenkinsfile is updated to call
`library("ck@${branch}").ck.get()` instead of `checkout scm` + `load
"vars/ck.groovy"`. The `checkout scm` inside `loadCk()` is removed since
Jenkins now handles the library fetch internally.
- A `USE_CURRENT_BRANCH_FOR_CK_GROOVY` boolean parameter (default: off)
is added. When off, `ck.groovy` is always loaded from `develop` — all
normal PR builds are unaffected. When on, `ck.groovy` is loaded from the
current branch automatically via `env.CHANGE_BRANCH`, so contributors
testing pipeline changes just tick the box.
- `return this` is removed from the end of `ck.groovy`. This was
required by the `load` convention but is not needed (and can cause
errors) in a shared library context.
- `loadCk()` is kept at every call site rather than called once at the
top, preserving restart-from-stage safety — if a build is restarted from
a mid-pipeline stage, `ck` is still initialized correctly.
- The Jenkins Shared Library named `"ck"` must be registered in Jenkins
Global Pipeline Libraries

## Test Plan

1. Trigger "Build with Parameters" on the PR branch with
`USE_CURRENT_BRANCH_FOR_CK_GROOVY=true`
2. Verify "Determine CI Execution" stage completes and the library()
calls indicates the current branch
3. Verify "Static checks" stage completes.
4. Trigger a second build with `USE_CURRENT_BRANCH_FOR_CK_GROOVY=false`
(default) to confirm normal builds still load from `develop`.

## Test Result

Verified both paths. The develop library is loaded by default, the
branch library is loaded when the parameter is enabled.

## Submission Checklist

- [ X ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-06-04 22:32:37 +00:00
John Afaganis
96c39b331e [rocm-libraries] ROCm/rocm-libraries#7829 (commit 13af7da)
[ck] Enforce ASCII-only C/C++ sources for hipRTC
 compatibility (#7829)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

CK source files must be compilable via **hipRTC (HIP runtime
compilation)**, whose preprocessor does not accept non-ASCII bytes
anywhere in a translation unit — **including in comments**. Bytes that
are harmless under `hipcc` (em-dashes, smart quotes, multiplication
signs, Greek letters, box-drawing glyphs, etc.) cause hipRTC to fail at
preprocessing time. These regularly leak in via LLM-assisted authoring
or copy/paste from formatted documents and silently break hipRTC paths
that are not exercised by the default `hipcc`-based build matrix.

This PR (a) cleans every existing violation (53 files) and (b) adds a
pre-checkin gate so new violations are rejected before merge.

## File extensions covered

Both the cleanup scan and the new Jenkins enforcement stage use the same
predicate:

```
*.h  *.hpp  *.cpp  *.h.in  *.hpp.in  *.cpp.in  *.inc  *.cl
```

(excluding `*/build/*` and `*/include/rapidjson/*`). This is a strict
superset of the existing `Clang Format` stage's predicate — `*.inc` is
added so test-fixture include files are also gated. The local pre-commit
hook's `c++/inc` type filter covers the same set.

## Why no enforcement today

CK is opted out of the rocm-libraries root `.pre-commit-config.yaml`, so
the existing `pre-commit` workflow doesn't touch CK. The local CK
`.pre-commit-config.yaml` only runs for developers who installed hooks.
The **authoritative gate is therefore the new Jenkins stage** in this
PR; the local hook is convenience.

## Commit layout (bisect-friendly)

1. `79798aa6261` — **`[ck] Convert reflect/ rendering to ASCII for
hipRTC compatibility`**
Behavior change, isolated. `TreeFormatter` swaps `├─ / └─ / │ ` for `|-
/ +- / | ` (3-col width preserved so alignment is unchanged).
`conv_description.hpp` swaps `×` for `x` as the dimension separator.
`test_conv_description.cpp` expected strings updated in lockstep so the
snapshot test stays green. This is the only commit in the series with
observable runtime impact.

2. `738fdb0d81c` — **`[ck] Strip non-ASCII bytes from C++ sources for
hipRTC compatibility`**
Mechanical text cleanup across 53 files. Replacements happen in comments
or in `std::cout` strings that are not asserted on by any test. None of
the 174 `.inc` files in the tree required edits, but they were in the
scan's predicate so the enforcement stage's predicate is a superset of
what was scanned. Full replacement table in the commit message.

3. `1d7cd8ba235` — **`[ck] Enforce ASCII-only C/C++ sources for hipRTC
compatibility`**
- New `projects/composablekernel/script/check_ascii_only.sh` (modeled on
`check_copyright_year.sh`).
- New entry in `projects/composablekernel/.pre-commit-config.yaml` under
the local-hooks block (`types_or: [c++, inc]`).
- New `ASCII Only Check` parallel stage in
`projects/composablekernel/Jenkinsfile`'s `Static checks` block,
mirroring the existing `Clang Format` stage but with `*.inc` added to
the find predicate. Always-on, no `RUN_CPPCHECK` gate.

The tree is buildable at every commit boundary. Commit 1 leaves 50 known
violations; commit 2 leaves 0; commit 3 wires the gate.

## Demo

Script output on a synthesized violation:

```
$ printf '// em-dash test \xe2\x80\x94 here\n' > /tmp/bad.cpp
$ projects/composablekernel/script/check_ascii_only.sh /tmp/bad.cpp
ERROR: /tmp/bad.cpp contains non-ASCII bytes:
1:// em-dash test — here
  Fix: replace with ASCII (em-dash -> --, smart quotes -> ", arrows -> ->, etc.)
$ echo $?
1
```

Full repo scan after the cleanup commits (note the `-name '*.inc'`
clause):

```
$ cd projects/composablekernel && find . -type f \( -name '*.h' -o -name '*.hpp' -o -name '*.cpp' \
    -o -name '*.h.in' -o -name '*.hpp.in' -o -name '*.cpp.in' -o -name '*.inc' -o -name '*.cl' \) \
    -not -path '*/build/*' -not -path '*/include/rapidjson/*' -print0 \
  | xargs -0 -P 8 -n 64 script/check_ascii_only.sh
$ echo $?
0
```

## Test plan

- [ ] Jenkins PR build: confirm new `Static checks -> ASCII Only Check`
stage runs green over the full predicate (incl. `*.inc`) and existing
`Clang Format` stage is unaffected.
- [ ] `test_conv_description` passes against the ASCII tree-formatter
output (touched in commit 1).
- [ ] Local: `pre-commit run ascii-only-checker --all-files` runs
cleanly after installing CK pre-commit hooks via
`script/install_precommit.sh`.
- [ ] Manually inject a non-ASCII byte in any `.cpp/.hpp/.inc` file,
push: confirm Jenkins fails the new stage with a clear error.
- [ ] Spot-check a representative subset of touched files under hipRTC
compilation to confirm no remaining hipRTC-blocking content (optional,
since the static byte check is a sufficient condition for hipRTC
preprocessor acceptance on this dimension).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-06-04 15:00:17 +00:00
Brock Hargreaves
843d993835 [rocm-libraries] ROCm/rocm-libraries#7743 (commit 15ef85c)
[CK] Extract Jenkinsfile helpers into vars/ck.groovy shared
 library (#7743)

## Motivation
The CK Jenkinsfile is a 2,215-line monolith mixing helper function
definitions with pipeline stage declarations. This makes it difficult to
review, modify, or extend CI stages without wading through unrelated
infrastructure code.

## Technical Details
Extract all helper functions from the Jenkinsfile into vars/ck.groovy,
loaded at runtime via ck = load "vars/ck.groovy" in the first stage. The
Jenkinsfile is reduced from 2,215 lines to 810 lines containing only the
pipeline structure.

- 36 helper functions moved to ck.groovy with no logic changes
- 10 new stage-wrapper functions (runBuildCKAndTests,
runTileEngineGemmTests, runClangFormat, etc.) extract inline
environment{}/steps{} business logic from stages, eliminating the
MethodTooLargeException caused by CPS-transformed shell strings
exceeding the JVM 64KB bytecode limit
- All ck. method calls in steps{} blocks wrapped in script{} as required
by Jenkins Declarative Pipeline
- rocmnode() remains in the Jenkinsfile (needed for agent{} labels
before ck is loaded)
- CRON_SETTINGS / POLL_SPEC remain in the Jenkinsfile (triggers{}
evaluates at parse time before any workspace is available)
- No stage names changed

## Test Plan
- Jenkinsfile validated against the Jenkins Pipeline Linter
(/pipeline-model-converter/validate)
- All 35 shared helper functions diffed line-by-line against develop to
verify no regressions
- Merge from develop incorporated and verified (gfx1250 stage, ROCm 7.13
default, cmake_build updates)

## Test Result
- Linter: passes
- Function diff vs develop: all 35 functions match exactly
- Awaiting Jenkins run to confirm end-to-end stage execution

## Submission Checklist

- [ x ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-06-01 21:10:12 +00:00
Illia Silin
0edfcf06e5 [rocm-libraries] ROCm/rocm-libraries#7894 (commit 5e66689)
[CK] add credentials to docker manifest inspect call

## Motivation

This should fix an issue that we recently encountered in CI when we
exceeded the limit of accessing docker without authentication:

[2026-05-29T16:08:42.447Z] + docker manifest inspect --insecure
rocm/composable_kernel:ck_ub24.04_rocm7.13
[2026-05-29T16:08:42.833Z] toomanyrequests: You have reached your
unauthenticated pull rate limit.
https://www.docker.com/increase-rate-limit

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-05-29 19:18:57 +00:00
Illia Silin
016f8891de [rocm-libraries] ROCm/rocm-libraries#7815 (commit e34ac06)
[CK] fix daily build of CK for all supported targets.

## Motivation

Fixing the daily build of CK packages for all supported targets. In the
past, if no GPU_TARGETS was specified, we would by default build CK for
all supported targets, But recently, the MIOpen team requested to change
the default behavior to not build at all if no target is specified (for
the purposes of filtering out unsupported targets in TheRock). So just
adding the explicit list of targets to our daily builds now.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-05-28 14:52:19 +00:00
Illia Silin
e02c566795 [rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)
[CK] upgrade CI to rocm7.13 as default compiler (#7612)

## Motivation

Upgrade the default docker and compiler version in CI to rocm7.13.
In order to pass all the checks I had to also clean up a lot of
non-ascii characters in the source code comments and modify a couple of
tests that were affected by a new compiler logic.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Aviral Goel <aviral.goel@amd.com>
2026-05-22 02:43:50 +00:00
Thrupti Raj Lakshmana Gowda
c31fc4df52 [rocm-libraries] ROCm/rocm-libraries#7311 (commit 79d8cae)
[CK Tile Engine] Daily tier sampling for tile engine GEMM  (#7311)

Summary
- Replace uniform random instance sampling (random.shuffle) with
scrambled Sobol + Latin Hypercube + maximin space-filling
sampling, per the Tile Engine Benchmark Sampling RFC
- Add op-weighted budget allocation via new
TILE_ENGINE_SAMPLING_TIER=daily CMake knob that auto-distributes 8,000
instances across
ops proportional to registered weights in op_weights.json
  - Emit chosen_instances.json manifests for reproducibility tracking
- Consolidate 5 copies of sampling logic into single _apply_sampling()
method on the base class
Jenkinsfile changes
Replace per-op -D *_MAX_INSTANCES=250 with single -D
TILE_ENGINE_SAMPLING_TIER=daily in gfx942/gfx950/gfx1201 stages. Budget
  auto-distributes (8000 total per GPU target).

---------

Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>
2026-05-21 02:17:42 -05:00
Illia Silin
f01a8cb28d [rocm-libraries] ROCm/rocm-libraries#7547 (commit 7e032ad)
[CK] fix daily builds for pytorch (#7547)

## Motivation

This will restore the daily builds that test whether the latest pytorch
code can build with the latest CK code (pulled from the standalone CK
repo).

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-05-19 07:13:36 -07:00
Illia Silin
e496f94445 [rocm-libraries] ROCm/rocm-libraries#7516 (commit de93737)
[CK] fix some jenkins logic errors. (#7516)

## Motivation

After merge from internal repo got some logic errors in the internal CI
jenkinsfile.
Here are 2 fixes for 2 issues:
1. make sure .ninja_log file exists before trying to parse it
2. use the default compiler for the gfx1250 target, since it's getting
built in its own special docker, which does not have the option of
installing alternative compilers.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-05-16 01:32:29 +00:00
Jobbins
8ca5c0a8d7 [rocm-libraries] ROCm/rocm-libraries#6961 (commit 47e8768)
[CK] print hostname and $NODE_NAME to find inconsistencies (#6961)

## Motivation

We suspect that the check for amdgpu: `cat /sys/module/amdgpu/version`
sometimes gets ran on the Jenkins controller instead of the node. This
adds the `hostname` command to compare to the $NODE_NAME variable.

## Technical Details

Updated Jenkinsfile to include the `hostname` command.

## Test Plan

## Test Result

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-05-15 14:48:46 -07:00
Illia Silin
717f2efef7 [rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)
[CK] add composable kernel support on gfx1250 (#6978)

## Motivation

Add composable kernel support on gfx1250.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Qun Lin <qlin@amd.com>
Co-authored-by: jialuo12_amdeng <jia.luo@amd.com>
Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com>
Co-authored-by: hsivasun_amdeng <haresh.sivasuntharampillai@amd.com>
2026-05-15 06:46:51 -07:00
John Shumway
142ee00585 [rocm-libraries] ROCm/rocm-libraries#7090 (commit 316fded)
[CK] Add rocm_ck directory structure with feature flag (#7090)

## Summary

Adds initial rocm_ck directory structure, #7119.

- Establishes production `rocm_ck/` directory at
`composablekernel/rocm_ck/`, peer to `tile_engine/` and `dispatcher/`
- Adds `CK_ENABLE_ROCM_CK` option (default OFF) as a CK-internal feature
flag — no superbuild or TheRock changes needed
- Creates `rocm_ck` INTERFACE library, `ck_tile_headers` target, GTest
integration with builder-style convenience targets (`smoke-rocm-ck`,
`check-rocm-ck`)
- Adds Jenkins `RUN_ROCM_CK_TESTS` parameter for CI, following the
`RUN_BUILDER_TESTS` pattern
- README explains the constexpr schema model: host-device separation via
constexpr data rather than template parameters, enabling multi-arch
distribution through kpack archives

## Test plan

- [x] `cmake -DCK_ENABLE_ROCM_CK=ON` configures without errors
- [x] `ninja check-rocm-ck` passes (4 host-only index type tests)
- [x] Default build (`CK_ENABLE_ROCM_CK=OFF`) is unaffected — no rocm_ck
targets present
- [x] Jenkins `RUN_ROCM_CK_TESTS=true` enables the flag and runs
`check-rocm-ck`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>
2026-05-14 18:51:37 +00:00
Illia Silin
22b9feb40f [rocm-libraries] ROCm/rocm-libraries#7111 (commit 651947f)
[CK] Fix latest batch of staging compiler warnings (#7111)

## Motivation

Suppress the new batch of clang lifetimebound and invalidation warnings
with the latest staging compiler.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-05-08 07:14:14 -07:00
Illia Silin
263e9965f6 [rocm-libraries] ROCm/rocm-libraries#7138 (commit 70e6660)
[CK] disable tile_engine by default, limit gfx1030 CI builds to develop only. (#7138)

## Motivation

An attempt to reduce the build time and keep CI moving faster.
Disable tile_engine by default since even the cmake step may take up to
30 minutes.
Since we're down to a single gfx1030 CI node, use it only for develop
builds.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-05-08 01:39:45 +00:00
Illia Silin
1f8dbfb63e [rocm-libraries] ROCm/rocm-libraries#7046 (commit aaf7665)
[CK] fix CI git token. (#7046)

## Motivation

Fix the CI breakage due to git PAT deprecation.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-05-06 02:31:47 +00:00
Illia Silin
5df88bfe73 [rocm-libraries] ROCm/rocm-libraries#6741 (commit 0d4180f)
[CK] restore fmha performance reporting and disable c++17 in CI. (#6741)

## Motivation

This change restores monitoring of FMHA benchmarks performance in daily
builds and removes the std=c++17 flag from CI builds on gfx90a.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-25 02:22:11 +00:00
Yi DING
e8c36b1e65 [rocm-libraries] ROCm/rocm-libraries#6701 (commit f9a8d1c)
[CK] Fix CI Failures for PR From Forks (#6701)

## Motivation

Fork PRs fail CI when `RUN_AITER_TESTS` or `RUN_FA_TESTS` is enabled.
The docker scripts run `git clone -b "$CK_*_BRANCH"
https://github.com/ROCm/rocm-libraries.git`, but a fork's branch doesn't
exist upstream:

```
fatal: Remote branch <fork-branch> not found in upstream origin
```

Example: [PR #6529 build
#4](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/PR-6529/4/pipeline).

## Technical Details

**`Jenkinsfile`** — for PRs, use the upstream-visible PR ref instead of
the head branch name:

```groovy
CURRENT_BRANCH_NAME = env.CHANGE_ID
    ? "refs/pull/${env.CHANGE_ID}/head"
    : (env.CHANGE_BRANCH ? env.CHANGE_BRANCH : env.BRANCH_NAME)
```

**`Dockerfile.aiter` / `Dockerfile.fa`** — `git clone -b <ref>` only
accepts branches (`refs/heads/*`) and tags (`refs/tags/*`), so it can't
resolve `refs/pull/N/head`. Switch to `git fetch`, which accepts any
refspec (and still works for plain branch names):

```sh
mkdir rocm-libraries && cd rocm-libraries
git init -q
git remote add origin https://github.com/ROCm/rocm-libraries.git
git fetch --depth 1 --filter=blob:none origin "$CK_*_BRANCH"
git sparse-checkout init --cone
git sparse-checkout set projects/composablekernel
git checkout FETCH_HEAD
```

`git checkout FETCH_HEAD` lands in detached HEAD, which breaks the
existing `git branch -m "$CK_*_BRANCH"` (and that name isn't a valid
local branch anyway). Decouple the local branch name from the upstream
ref:

- Replace `git init` + `git branch -m` with `git init -b
"$LOCAL_BRANCH"` (requires git ≥ 2.28, satisfied by base images)
- `LOCAL_BRANCH="ck-import-${ROCM_LIBRARIES_SHA}"` in the rocm-libraries
path; `LOCAL_BRANCH="$CK_*_BRANCH"` in the fallback
- Downstream `git clone -b ... ../ck` uses `$LOCAL_BRANCH`

## Test Plan

Manually trigger a build on this PR with `RUN_AITER_TESTS=true` and
`RUN_FA_TESTS=true`; both docker images should build end-to-end.

## Test Result
[jenkins / rocm-libraries-folder/Composable Kernel / PR-6701 /
#3](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/PR-6701/3/pipeline/)

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-24 16:22:28 +08:00
Max Podkorytov
8c238fe875 [rocm-libraries] ROCm/rocm-libraries#6434 (commit 87aae5c)
Fix ck4inductor conv instance parsing for NumGroupsToMerge parameter (#6434)

## Summary
- Add `num_groups_to_merge` field to `CKGroupedConvFwdOp` dataclass to
match the new (#4273) `NumGroupsToMerge` template parameter added to
`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3`
- Enable inductor tests by default in Jenkins CI

## Test plan
- [x] Built wheel without patch: `test_gen_conv_instances` fails with
`TypeError: takes from 47 to 50 positional arguments but 51 were given`
- [x] Built wheel with patch: `test_gen_conv_instances` passes
2026-04-22 11:05:11 -07:00
arai713
12f3d646a0 [rocm-libraries] ROCm/rocm-libraries#4769 (commit 72ae66e)
[CK_TILE] Restructure Tile Engine's benchmarking and profiling (#4769)

## Motivation
This PR introduces a restructure for the benchmarking and profiling
aspects of CK Tile's Tile Engine, expanding on the groundwork from this
previous https://github.com/ROCm/composable_kernel/pull/3434 and
outlined in this [design
document](https://amdcloud-my.sharepoint.com/:w:/r/personal/astharai_amd_com/Documents/Restructuring%20Tile%20Engine.docx?d=w14ea28a30718416988ed5ebb759bd3b2&csf=1&web=1&e=l3VBuX).
In PR 3434, to reduce repeated code we implemented:

- Base class that centralizes common functionality and provides a
default implementation (Universal GEMM)
- Child classes for GEMM variants override virtual functions to handle
variant-specific behavior

This refactoring in this PR follows the same process and should greatly
reduce the duplicated code present in Tile Engine and make it simpler to
add in new operations, increasing scalability.

## Technical Details
The files have been refactored around new base structs for benchmarks,
profiling and problem descriptions. The new base structs are:

- GemmProblem
- GemmBenchmark
- GemmProfiler

Universal GEMM, Preshuffle GEMM, and Multi-D GEMM all have child classes
that will inherit from these base structs overriding only what differs
per variant.
All common functions across the benchmarking and profiling files have
been moved into newly added common utility files under the commons/
directory. The new utility files are:

- utils.hpp: common functions for the benchmarking and profiling process
- benchmark_utils.py: common utility functions for the benchmark
generation

## Test Plan
I tested using the existing tests for Tile Engine.
## Test Result
All tests passed.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-14 10:50:24 -07:00
Yi DING
92dc99b713 [rocm-libraries] ROCm/rocm-libraries#5329 (commit 9c43062)
[CK] Add flash_attn tests (#5329)

## Motivation

Add CI support for running
[flash-attention](https://github.com/ROCm/flash-attention) tests against
CK, similar to existing AITER and PyTorch downstream test pipelines.

## Technical Details

### New: `Dockerfile.fa`
A new Dockerfile that builds a flash-attention test image on top of a
ROCm PyTorch base image. It:
- Sparse-checkouts CK from `rocm-libraries` (or clones directly from
`ROCm/composable_kernel`)
- Clones and builds `flash-attention` with CK as the backend
- Supports configurable `FA_BRANCH`, `CK_FA_BRANCH`, and `GPU_ARCHS`
build args

### Updated: `Jenkinsfile`

**buildDocker refactor:**
- Extracted `buildAndPushDockerImage()` helper that handles both "check
if exists, skip" and "force build, push" logic, eliminating the
duplicated try/catch blocks
- Split monolithic `buildDocker()` into `buildDockerBase()`,
`buildDockerPytorch()`, `buildDockerAiter()`, and new `buildDockerFa()`
- Each downstream docker build now runs unconditionally within its
respective guard (`RUN_PYTORCH_TESTS`, `RUN_AITER_TESTS`,
`RUN_FA_TESTS`)
- Image digests are stored in env vars (`CK_BASE_IMAGE`,
`CK_PYTORCH_IMAGE`, `CK_AITER_IMAGE`, `CK_FA_IMAGE`) for use in
downstream stages

**run_downstream_tests refactor:**
- Merged `run_aiter_tests()` and `run_pytorch_tests()` into a single
generic `run_downstream_tests(conf)` that accepts `image`,
`timeoutHours`, and `execute_cmds`
- Test commands for each downstream target are declared as top-level
lists (`RUN_PYTORCH_TESTS_CMDS`, `RUN_AITER_TESTS_CMDS`,
`RUN_FA_TESTS_CMDS`)

**Pipeline stages:**
- Merged "Run Pytorch Tests" and "Run AITER Tests" into a single "Run
Downstream Tests" parallel stage
- Added two new FA test stages: "Run FA Tests on gfx942" and "Run FA
Tests on gfx950"
- Added new pipeline parameters: `RUN_FA_TESTS`, `fa_base_docker`,
`fa_branch`, `ck_fa_branch`
- `ck_pytorch_branch` and `ck_aiter_branch` now default to the current
branch instead of hardcoded `develop`
- CRON schedule at 13:00 now also triggers `RUN_FA_TESTS=true`

## Test Plan

- [x] Trigger pipeline manually with `RUN_FA_TESTS=true` on gfx942 and
gfx950 nodes
- [x] Verify existing AITER and PyTorch test stages are unaffected
- [x] Verify `buildAndPushDockerImage` correctly skips rebuild when
image already exists (with `BUILD_DOCKER=false`)

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-10 09:23:10 +08:00
Illia Silin
eb1508015d [rocm-libraries] ROCm/rocm-libraries#6147 (commit 8035856)
[CK] Replace daily CI builds with mainline compiler with TheRock compiler. (#6147)

## Motivation

Since the compiler team has deprecated the amd-mainline branch and
switched to TheRock, we'll start building a docker image with TheRock
artifacts and building/testing Ck with that.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-03 11:04:12 -06:00
Illia Silin
cd7e17c837 [rocm-libraries] ROCm/rocm-libraries#6103 (commit c74e44d)
Use ck_pytorch docker from private repo. (#6103)

## Motivation

Move the pytorch docker image used for CK testing into private repo.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-02 09:06:44 -07:00
Jobbins
db382efaf7 [rocm-libraries] ROCm/rocm-libraries#6107 (commit e69d1b2)
[CK] poll every 6 hours as workaround (#6107)
2026-04-01 15:52:45 -04:00
Jobbins
ef2a63047f [rocm-libraries] ROCm/rocm-libraries#6064 (commit cce30ab)
[CK] poll develop every 15 minutes for changes (#6064)
2026-04-01 10:34:48 -04:00
Illia Silin
70e4696f01 [rocm-libraries] ROCm/rocm-libraries#5921 (commit 032ac1b)
[CK] fix clang lifetimebound errors with staging compiler (#5921)

## Motivation

The ROCm staging compiler (newer Clang) enforces
`[[clang::lifetimebound]]` annotations on methods that return references
or pointers to internal object data. Without these annotations, the
staging compiler emits compilation errors for container accessor methods
across the CK and CK Tile namespaces.

  ## Technical Details

Adds `[[clang::lifetimebound]]` to all reference/pointer-returning
accessors in core container types:

  **`ck::` namespace:**
  - `Array` -- `At()`, `operator[]`, `operator()`, `begin()`, `end()`
  - `index_array` -- `operator[]`
  - `StaticallyIndexedArray_v2` -- `At()`, `operator[]`, `operator()`
  - `IndexLookupTable` -- `operator[]`

  **`ck_tile::` namespace:**
  - `array` -- `get(i)`, `at()`, `operator[]`, `operator()`
  - `static_array` -- `operator[]`
  - `thread_buffer` -- `get(i)`, `at()`, `operator[]`, `operator()`
  - `make_kernel()` -- parameter pack

Also removes the unused `instance_index` variable from
`batched_gemm_reduce_fp16.cpp` and simplifies its argument parsing
  accordingly.

  ## Test Plan

- Compile with the staging compiler to verify all lifetimebound errors
are resolved
- Existing tests pass unchanged -- the attribute is a compile-time
annotation with no runtime effect

 ## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-30 07:19:32 -07:00
Illia Silin
689c46ff53 [rocm-libraries] ROCm/rocm-libraries#5764 (commit f3c1232)
Re-enable daily builds with staging compiler (#5764)

## Motivation

This should help us catch and fix any new compilation issues early on.

## Technical Details

We now have three compiler profiles:
* **develop**: slightly stabilized version of amd-staging with some of
the obvious offending PRs reverted, 1-2 weeks behind amd-staging;
* **amd-mainline**: more stable version of compiler, the baseline for
all other branches, e.g., release, npi, etc. 2-4 weeks behind
amd-staging.
* **amd-staging**: latest compiler version where all new PRs land, often
broken;

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: kensclin <lshyhchy@amd.com>
2026-03-25 16:37:00 +00:00
Eiden Yoshida
1b4f091908 [rocm-libraries] ROCm/rocm-libraries#5691 (commit 2fbb1fc)
[CK] MICI: Revert "add self healing to ref repo" (#5691)

The check may not be working as intended, causing premature deletion of
reference repositories
2026-03-23 07:15:51 -07:00
andrew clark
fe1f90100d [rocm-libraries] ROCm/rocm-libraries#5464 (commit debfc96)
Improved CI infrastructure failure detection (#5464)

## Motivation

This PR re-enables CI infrastructure failure detection and notification,
which had been disabled due to performance issues caused by loading
large build logs (~80k lines) into memory for pattern scanning. The goal
is to reliably detect known infrastructure failures (GPU errors, Docker
authentication issues, disk space errors, etc.) and send actionable
Teams notifications without hanging on large logs.

## Technical Details

- Replaced full build log loading and Groovy-based pattern scanning with
a streaming wget | grep -E pipe. grep scans natively so the full log is
never loaded into Groovy, resolving the hang on large logs.
- Combined all failure patterns into a single grep -E call to avoid
multiple log fetches.
- The node name is now tracked with the observed failure.
- Added a new failure pattern for device's running out of space.

## Test Plan

- Forced failures in the "Determine CI Execution" stage with all 9
failure patterns echoed to the build log.
- Simulated large log sizes (~80k lines of dummy output) to validate
pattern detection and node name extraction at realistic log scales,
including patterns placed both before and after large blocks of dummy
output.

## Test Result

All 9 failure patterns detected correctly. Teams notifications sent with
accurate log context, node name, and job links. No hangs observed on 80k
line simulated logs.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-20 19:16:58 +00:00
Jobbins
f28fed5f72 [rocm-libraries] ROCm/rocm-libraries#5630 (commit 14cd617)
add self healing to ref repo (#5630)

## Motivation

Check for when mirror repo gets corrupted in CI

## Technical Details

We detect broken ref objects and rebuild the local mirror in that case
of corruption

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-20 09:42:53 -07:00
Yaswanth Raparti
d782cc344d [rocm-libraries] ROCm/rocm-libraries#5614 (commit 32933df)
[CK][CK TILE] Fix smart-build to run install target for client examples (#5614)

How ninja install works:
- Builds library dependencies (device_operations, etc.)
- Installs them to CMAKE_INSTALL_PREFIX
- Skips building test executables (not install dependencies)

Affected stages (8):
- gfx942/gfx950/gfx908/gfx90a CK Client Examples
- gfx10-1/gfx10-3/gfx11/gfx12 CK Client Examples

## Motivation

Problem: When smart-build is enabled (runAllUnitTests=false), the build
step is skipped entirely. This causes client example stages to fail
because they depend on the CK library being installed to ../install.

Error seen:
  Target "client_gemm" links to:
    composable_kernel::device_other_operations
  but the target was not found.

## Technical Details
Root cause: Line 712 only checked runAllUnitTests, so when building with
config_targets="install", the install target was never built, leaving
the install directory empty.

Fix: Added condition to always build when config_targets contains
'install'. The install target automatically builds its dependencies (the
CK libraries) but skips building tests, which aligns with smart-build
philosophy.

## Test Plan

Should be tested on CI

## Test Result

Should be tested on CI

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2026-03-19 22:00:29 +00:00
Yaswanth Raparti
3910c9c2a1 [rocm-libraries] ROCm/rocm-libraries#5249 (commit 2a114bb)
[CK] [CK_TILE] Improve build and test time of CI with smart dependency parser (#5249)

## Motivation

Existing dependency parser needs full build of tests to determine which
tests are affected by code changes in a PR. This still takes 2-4 hours
for building the tests which slows down the CI as the number of tests
grow. To resolve this issue we implemented a smart dependency parser
which uses CMake Configure to parse dependencies and build only the
affected test cases. We have ensured that two approaches are available
1) CMake pre-build analysis for each PR to ensure fast build and test.
2) Ninja post-build analysis to enable full build for nightly tests.

## Technical Details

```bash
### 1. Configure the project with CMake
cmake -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ..

### 2. Analyze dependencies (no build required!)
python3 ../script/dependency-parser/main.py cmake-parse compile_commands.json build.ninja \
  --workspace-root .. --output cmake_dependency_mapping.json --parallel 8

### 3. Find tests affected by changes
python3 ../script/dependency-parser/main.py select cmake_dependency_mapping.json origin/develop \
  HEAD --test-prefix --output tests_to_run.json

### 4. Build only affected tests
ninja $(jq -r '.executables[]' tests_to_run.json | tr '\n' ' ')

### 5. Run affected tests
ctest -R "$(jq -r '.regex' tests_to_run.json)"
```

### Jenkins Integration
- Added `buildMode` to jenkinsfile to integrate both `selective` and
`full` build methods

### Known Limitations

### 1. Build-Time Generated Headers (HIGH RISK)

**Problem:** Files generated during the build process (e.g., via
`add_custom_command`) cannot be analyzed before building.

**Example:**
```cmake
add_custom_command(
  OUTPUT ${CMAKE_BINARY_DIR}/generated/config.hpp
  COMMAND generate_config.sh
  DEPENDS template.hpp.in
)
```

**Impact:** If a source file includes `generated/config.hpp`, the
dependency won't be detected until after building.

**Mitigation:**
- CK analysis shows **no generated headers** currently used
- If generated headers are added in the future, they must be built first
- Recommendation: Generate headers in CMake configure phase (not build
phase) when possible

## Test Plan
**1. Modified Files:**
```
include/ck_tile/ops/common.hpp
include/ck_tile/ops/gemm.hpp
include/ck_tile/ops/gemm/warp/warp_gemm.hpp
```
**2. Compare tests selected between `build.ninja` and `cmake-parse`
methods**

## Test Result
- 1. The test completed in 5-6 minutes finding about 8000+ executables
that should be built.
- 2. We selected a commit 5ccc1387ea which resulted in same 7 tests with
both legacy and new methods.
-

PR | Legacy tests | Smart tests | Notes
-- | -- | -- | --
5261 | 453 | 455 | Only 2 tests (test_amdgcn_mma and
test_amdgcn_sparse_mma)
5168 | 0 | 0 | Changes in   dispatcher only. No CK tests invoked.
5249 | 0 | 0 | Changes to   dependency parser. No CK tests invoked
5260 | 0 | 0 | Changes in   dispatcher only. No CK tests invoked.
5174 | 1 | 1 | One test from FMHA   affected by this PR in both cases
5383 | 0 | 0 | Changes are only in benchmark files. Did not trigger any
tests
5445 | 1 | 1 | Changes are only to tests/ck_tile/gemm_streamk. Only
triggered one streamk test in both cases.
5454 | 3 | 3 | Both methods identified same test_grouped_conv_bwd tests
5427 | 234 | 234 | Core infrastructure header changes. Detected exactly
same tests
5388 | 85 | 85 | modifies warp-level GEMM operations (warp_gemm.hpp,
warp_gemm_dispatcher.hpp). Correctly identified all the streamK gemm
tests

## Submission Checklist

- [x ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2026-03-19 05:30:19 +00:00
Thrupti Raj Lakshmana Gowda
bdb589448d [rocm-libraries] ROCm/rocm-libraries#4996 (commit 0a47fbe)
[CK TILE ENGINE] Add grouped_gemm operator to Tile Engine (gfx942/gfx950) (#4996)

## Motivation

The grouped_gemm CK Tile kernel exists (e.g.,
`example/17_grouped_gemm/`) but has no Tile Engine wrapper. Grouped GEMM
handles multiple independent GEMM problems with varying M/N/K dimensions
in a single kernel launch. This PR adds the Tile Engine infrastructure
for automated kernel generation, benchmarking, and profiling of grouped
GEMM kernels.

Jira: AICK-809

## Technical Details

- Created Tile Engine wrapper under `tile_engine/ops/gemm/grouped_gemm/`
following the `gemm_universal` template
- Files added: `CMakeLists.txt`, `grouped_gemm_common.hpp`,
`grouped_gemm_benchmark.hpp`, `grouped_gemm_profiler.hpp`,
`grouped_gemm_benchmark.py`, `grouped_gemm_benchmark_single.cpp`,
`grouped_gemm_instance_builder.py`, `configs/`
- Supported datatypes: fp16, fp8, bf16, bf8
- Supported layouts: rcr, rrr, ccr, crr
- Target GPUs: gfx942, gfx950
- CK Tile kernel: `ck_tile::GroupedGemmKernel` from
`include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp`
- Instance builder extends `GemmKernelBuilder` base class
- Registered in `tile_engine/ops/gemm/CMakeLists.txt`
- Updated Jenkinsfile to build and benchmark grouped_gemm targets in CI
- Benchmark infrastructure includes JSON output, CSV export, and
verification support

## Test Plan

- CMake configure succeeds for grouped_gemm targets
- Kernel instance builder generates valid kernel headers for all
(datatype, layout) combinations
- At least one kernel binary compiles and runs per datatype/layout
combination
- Correctness passes with `--verify 1` on gfx942/gfx950

## Test Result

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-10 18:58:37 -05:00
Bartłomiej Kocot
e262252c4c [rocm-libraries] ROCm/rocm-libraries#5115 (commit a21861e)
[CK][CK Tile] Add grouped conv backward weight tile test and fix tr load in BASE_V1 pipeline (#5115)

## Motivation

Test grouped conv backward weight from ck tile and fix incorrect values.

## Technical Details

- Add test for CI
- Add daily tests
- Fix transpose load in BASE_V1 pipeline

## Test Plan

test_grouped_convnd_backward_weight_tile

## Test Result

in progress

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

AICK-783
2026-03-10 03:03:04 +00:00
andrew clark
b7320a2b19 [rocm-libraries] ROCm/rocm-libraries#5063 (commit 01abf3d)
CI Skip Testing Fix (#5063)

## Motivation

While testing the Skip CI functionality, it revealed a minor issue where
the CI skip check fails when a branch is built at the exact commit where
it diverged from develop. CI is still run by default if a failure is
detected.

When git log <merge-base>..HEAD returns no files (because HEAD equals
merge-base), the command grep -v '^$' exits with error code 1, causing
the skip check to fail.

## Technical Details

Added || true to the grep commands so empty output is handled gracefully
instead of causing a script failure.

## Test Plan

- Simulate the failures and ensure the grep failure is handled
gracefully.

## Test Result

- Simulated grep failures using an empty string. The script handles the
error correctly.
- Verified the CI skip functionality skips CI when non-relevant file
changes are made.
- Verified the CI skip functionality does not skip CI when relevant file
changes are made.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-04 22:01:25 -07:00
Thrupti Raj Lakshmana Gowda
94dd8b6955 [rocm-libraries] ROCm/rocm-libraries#4958 (commit 713881f)
bf8 and bf16 support for Universal GEMM in Tile Engine (#4958)

## Motivation

Currently we have only fp8 and fp16 datatype support for universal GEMM
in Tile Engine with this PR support for bf8 and bf16 datatype will be
added during the CI phase

## Technical Details

Adding bf8 and bf16 support

## Test Plan

NA

## Test Result

NA

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-03 15:29:13 -08:00
andrew clark
f798c36fdd [rocm-libraries] ROCm/rocm-libraries#4943 (commit ea40212)
[CK] Updating CI skip logic (#4943)

## Motivation

The CI skip logic has two issues that prevented it from working
correctly:

1. **Incorrect file patterns**: After migrating from standalone repo to
`rocm-libraries`, file paths now include the
`projects/composablekernel/` prefix (e.g.,
`projects/composablekernel/docs/README.md`). The skip patterns were
still checking for paths starting with `docs/`, which never matched.

2. **Incomplete build type support**: Jenkins multibranch pipelines
provide different environment variables for PR builds (`$CHANGE_TARGET`,
`$CHANGE_ID`) vs branch builds (`$BRANCH_NAME`). The previous logic only
compared `HEAD~1..HEAD` for branch builds, which missed changes from
multi-commit pushes and didn't properly handle feature branch builds.

When CI skipped or ran, there was no visibility into which files
triggered the decision, making it difficult to diagnose issues. You can
now see which files triggered the CI run.

## Technical Details

PR builds: Compares all commits against origin/$CHANGE_TARGET.
Feature branch builds: Uses git merge-base to find divergence point from
develop and checks all touched files since then.
Scheduled develop builds are unaffected. These builds are forced to run
from the pipeline parameters.

Example log output for PR Builds:
<img width="647" height="260" alt="image"
src="https://github.com/user-attachments/assets/c8673a81-acb2-4fb2-acbb-1c07b5ab3b69"
/>

Example log output for Branch Builds:
<img width="488" height="287" alt="image"
src="https://github.com/user-attachments/assets/fbb17ba7-eb2c-42a4-b820-b2a8b9e479c4"
/>

## Test Plan

Pre-PR validation (branch builds):

Push commits with only documentation changes → CI should skip. I will
have to verify this after this PR is merged!
Push commits with code changes → CI should run
Push commits that modify then revert code → CI should run (catching
reverts)
Verify debug output clearly shows skip/run decision

Post-PR validation (PR builds):

Create PR with only doc changes → CI should skip. I will have to verify
this after this PR is merged!
Create PR with mixed doc + code changes → CI should run and log which
files triggered it
Verify debug output clearly shows skip/run decision

## Test Result

All branch build checks succeeded.
All PR build checks succeeded.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-03 07:49:34 -08:00
Illia Silin
c754ee3df0 [rocm-libraries] ROCm/rocm-libraries#5036 (commit 0bee213)
[CK] Switch compiler branch from staging to develop and upgrade sccache. (#5036)

## Motivation

Upgrade to official sccache version 0.14, since it now supports hip.
Also, switching daily builds from amd-staging to develop compiler
branch, since it should be more stable.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-03 07:32:24 -08:00
Thrupti Raj Lakshmana Gowda
720d7fa02b [rocm-libraries] ROCm/rocm-libraries#4592 (commit 45f76cb)
Tile Engine support for gfx950 (#4592)

## Motivation

This PR adds support for the gfx950 GPU architecture to the Tile Engine
in Composable Kernel library, focusing on GEMM operations with FP8 and
BF8 data types.

## Technical Details

Added gfx950-specific MFMA warp GEMM implementations with conditional
compilation.
Updated default GEMM configuration parameters for tile sizes and warp
configurations.
Added Jenkins CI pipeline stage for testing TILE_ENGINE_GEMM on gfx950
hardware.

## Test Plan

Tile engine itself is a benchmarking utility, so if it passes the CI it
will be tested automatically.

## Test Result

Tile engine itself is a benchmarking utility, so if it passes the CI it
will be tested automatically.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Thrupti Raj Lakshmana Gowda<ThruptiRaj.LakshmanaGowda@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2026-02-26 10:14:40 -06:00
Eiden Yoshida
e16789b609 [rocm-libraries] ROCm/rocm-libraries#4373 (commit 1c29275)
[CK] MICI: Disable failure pattern checking

## Motivation

- ck mici jobs hanging at end, possibly at failure pattern checking

## Technical Details

- Disable failure pattern checking to see if hanging goes away

## Test Plan

- Observe behavior after merge
2026-02-09 15:25:01 +00:00
spolifroni-amd
d2f1541976 [rocm-libraries] ROCm/rocm-libraries#4300 (commit 07e9d56)
[CK] add inter/intrawave scheduling concept doc
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Proposed changes

Adding information about inter/intrawave scheduling
2026-02-07 00:11:11 +00:00
Illia Silin
1ddb38f098 [rocm-libraries] ROCm/rocm-libraries#4375 (commit 45b616b)
[CK] fix path for build filter

## Motivation

Fix the filter that determines whether CI builds are necessary.

## Technical Details

A script checks the files list returned by git diff and checks whether
any code source was modified. If not, if only documentation was changed,
it will allow skipping the builds. We make sure we only look at the
changes in projects/composablekernel/ folder.
2026-02-06 18:18:14 +00:00
Illia Silin
4dd4869fbf [rocm-libraries] ROCm/rocm-libraries#4361 (commit 37a74ef)
[CK]  a bunch of CI fixes.

## Motivation

Fixing some of the CK CI issues

## Technical Details

fixing paths to dockerfiles and scripts;
moving codegen tests to separate stage (collides with main build since
you must call cmake from same folder but different options);
fixing a couple of clang compilation issues with staging compiler;
2026-02-06 01:07:34 +00:00
Eiden Yoshida
e96beb1f3e [rocm-libraries] ROCm/rocm-libraries#4352 (commit 3c9beb3)
[CK] MICI: Fix git diff in selective_test_filter.py

## Motivation

- git diff needs access to reference repo

## Technical Details

- mount reference repo path into docker for selective_test_filter.py to
access

## Test Plan

- tested in MICI

## Test Result

- launch_tests.sh ran successfully
2026-02-05 22:57:20 +00:00
Jobbins
344d98781b [rocm-libraries] ROCm/rocm-libraries#4351 (commit 3b98c98)
[composablekernel] fix failure status

## Motivation

Pipelines were failing on Math CI status check.

## Technical Details

For the success case, we just changed the config in Jenkins to use a
proper app token and no code changes were required. However, the failure
case would not have worked as coded, so we needed to move that outside
of the `rocmnode()` block.

## Test Plan

I removed all of the CI in one of the commits to quickly test, and then
added it back.  Got a successful "success" message and "failure" message
produced
2026-02-05 15:57:21 +00:00
Eiden Yoshida
3a02862241 [rocm-libraries] ROCm/rocm-libraries#4349 (commit 9bb7f5c)
[CK] MICI: Correct path for build trace script

## Motivation

- Corrects path to script due to superrepo migration
- Forces all tests to run by default

## Technical Details

- now in /projects/composablekernel
2026-02-05 15:56:52 +00:00
Eiden Yoshida
3f42f76b45 [rocm-libraries] ROCm/rocm-libraries#4336 (commit d26a782)
[CK] MICI: Use reference repo for checkout operations

## Motivation

- Maintain a reference repo on slave nodes that speeds up any
clone/checkout operations

## Technical Details

- clone a ref repo if it does not exist
- update ref repo if it does exist
- checkout after ref repo is updated
- eliminates double clone

## Test Result

- Initial checkouts succeeded
2026-02-05 02:44:29 +00:00
Jeff Huang
7b18f5fed2 [rocm-libraries] ROCm/rocm-libraries#4263 (commit f34aec2)
[CK] Add FP8 KV_BLOCKSCALE support for batch prefill
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Implement per-page K/V quantization for paged attention:
  - Add KV_BLOCKSCALE enum to BlockAttentionQuantScaleEnum
  - Use exp2 shift trick to eliminate explicit P scaling overhead
- Prefetch physical pages offset for KV cache, overlaps with
computations

## Proposed changes

Please describe the motivation behind the pull request, whether it
enables a new feature or fixes a bug. If there are associated pull
requests or issues, please link them to the pull request.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [ ] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered
2026-02-04 23:26:20 +00:00