composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-28 18:56:59 +00:00

Author	SHA1	Message	Date
Brock Hargreaves	1b649a8d4b	[rocm-libraries] ROCm/rocm-libraries#8332 (commit 48c389c) [CK][CI] Retry builds on node failure with automatic rerouting (#8332) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Motivation When a Jenkins node enters a bad state (missing GPU driver, dead Docker daemon, full disk), every PR scheduled onto it fails the same way until a human manually takes it offline. Some failures are also transient and would pass on a simple retry. Today the pipeline does neither — every failure goes straight to red on the same node. ## Technical Details Two new retry behaviors based on failure type: - Different node for persistent node faults (driver missing, daemon down, disk full, container won't start) - Retry in place for transient glitches (registry pull, DNS), then a different node if retries are exhausted Real build/compile failures and aborted builds are never retried. New: `src/org/ck/NodeFault.groovy`, `TransientFault.groovy` — typed exceptions in the shared library `src/` for stable classloader identity under dynamic library loading. `vars/ck.groovy`: adds `preflight()` (host health checks before build), `pullImage()` (classifying pull failures at the call site, replacing `getDockerImage()`), `runOnHealthyNode()` (outer reroute loop, up to 3 nodes), `runInPlace()` (same-node transient retries). GitHub failure status is only set once all retries are exhausted. `Jenkinsfile`: all active `Build CK and run Tests` stages converted to `agent none` + `ck.runOnHealthyNode(…)`. ## Test Plan Tested on `users/brockhargreaves-amd/ck/node-failure-retry-logic` with `USE_CURRENT_BRANCH_FOR_CK_GROOVY=true`. Verified preflight logging, reroute on node fault, attempt counter in logs, no retry on aborts, and single failure status report after budget exhausted. ## Test Result Retry logic working as expected. Three bugs found and fixed during testing: false `NodeFault` from host-level sccache probe (sccache is in-container), `null` node name in catch logging, and `sh` calls outside `node()` context in status reporting. ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-06-15 17:40:10 +00:00
John Afaganis	329e589840	[rocm-libraries] ROCm/rocm-libraries#8260 (commit 1139236) [ck] Enforce LF-only line endings in C/C++ sources MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary Several CK source files carry Windows CRLF line endings (a trailing carriage return on each line), introduced by editors configured for Windows endings or copy/paste from Windows tooling. These are purely cosmetic but they pollute diffs (whole-file churn the first time someone makes an LF edit), confuse `clang-format`, and are inconsistent with the LF-only convention used across the rest of the tree. This PR (a) normalizes every existing CRLF file (6 files) to LF and (b) adds a pre-checkin gate so new CRLF leaks are rejected before merge. ## File extensions covered Both the cleanup scan and the new Jenkins enforcement stage use the same predicate as the adjacent `ASCII Only Check` stage: ``` .h .hpp .cpp .h.in .hpp.in .cpp.in .inc .cl ``` (excluding `/build/` and `/include/rapidjson/`). The local pre-commit hook's `c++/inc` type filter covers the same set. ## Why no enforcement today CK is opted out of the rocm-libraries root `.pre-commit-config.yaml`, so the existing `pre-commit` workflow doesn't touch CK. The local CK `.pre-commit-config.yaml` only runs for developers who installed hooks. The authoritative gate is therefore the new Jenkins stage in this PR; the local hook is convenience. ## Commit layout (bisect-friendly) 1. `[ck] Normalize CRLF line endings to LF in C/C++ sources` Mechanical line-ending cleanup across 6 files. No content change: every edit is purely CRLF -> LF, verified with `git diff --ignore-cr-at-eol` reporting an empty diff. 2. `[ck] Enforce LF-only line endings in C/C++ sources` - New `projects/composablekernel/script/check_no_crlf.sh` (modeled on `check_ascii_only.sh`). - New `crlf-checker` entry in `projects/composablekernel/.pre-commit-config.yaml` under the local-hooks block (`types_or: [c++, inc]`). - New `CRLF Check` parallel stage in `projects/composablekernel/Jenkinsfile`'s `Static checks` block, mirroring the adjacent `ASCII Only Check` stage. Always-on, no `RUN_CPPCHECK` gate. The tree is buildable at every commit boundary. Commit 1 leaves 0 CRLF violations; commit 2 wires the gate. ## Demo Script output on a synthesized violation: ``` $ printf 'int main() {}\r\n' > /tmp/bad.cpp $ projects/composablekernel/script/check_no_crlf.sh /tmp/bad.cpp ERROR: /tmp/bad.cpp contains CRLF (Windows) line endings: 1:int main() {}<CR> Fix: convert to LF, e.g. 'sed -i 's/\r$//' /tmp/bad.cpp' or 'dos2unix /tmp/bad.cpp' $ echo $? 1 ``` Full repo scan after the cleanup commit: ``` $ cd projects/composablekernel && find . -type f $ -name '.h' -o -name '.hpp' -o -name '.cpp' \ -o -name '.h.in' -o -name '.hpp.in' -o -name '.cpp.in' -o -name '.inc' -o -name '.cl' $ \ -not -path '/build/' -not -path '/include/rapidjson/' -print0 \ \| xargs -0 -P 8 -n 64 script/check_no_crlf.sh $ echo $? 0 ``` ## Test plan - [ ] Jenkins PR build: confirm new `Static checks -> CRLF Check` stage runs green over the full predicate and the existing `ASCII Only Check` / `Clang Format` stages are unaffected. - [ ] Local: `pre-commit run crlf-checker --all-files` runs cleanly after installing CK pre-commit hooks. - [ ] Manually inject a CRLF line ending in any `.cpp/.hpp/.inc` file, push: confirm Jenkins fails the new stage with a clear error. 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-06-12 21:11:59 +00:00
Brock Hargreaves	65c395984d	[rocm-libraries] ROCm/rocm-libraries#8108 (commit c620f0a) [ck] Unify Build_CK and buildHipClangJob into buildAndTest (#8108) ## Motivation `projects/composablekernel/vars/ck.groovy` had two near-identical build functions, `buildHipClangJob` (lean: static checks, FMHA, tile-engine, conv) and `Build_CK` (main per-arch matrix). This removes the duplication and fixes a latent GitHub-status bug that lived in both. ## Technical Details - Merged both into one `buildAndTest(Map conf)` gated by an explicit `is_main_build` flag (default `false` = lean path; `true` adds the GPU check + arch-gated inductor/perf/hipTensor; only `runBuildCKAndTests` sets it). - Deleted the `Build_CK_and_Reboot` / `buildHipClangJobAndReboot` wrappers (they only logged and re-threw); all 13 call sites now call `buildAndTest` directly. - Widened the shared `catch` to `Exception` so build / image-pull / "GPU not found" failures report failure instead of leaving the check stuck pending (failing stages now go red). - Removed the dead `no_reboot` key. No change to what is built or tested. ## Test Plan - Jenkins linter on the `Jenkinsfile`. - One branch run covering both paths (per-arch matrix + lean stages), spot-checking gfx1250 and a nogpu stage. ## Test Result - Verified statically: no `buildHipClangJob` / `Build_CK` references remain; `buildAndTest` defined once, all call sites wired. - Pending: linter + branch run before merge. ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-06-08 23:45:42 +00:00
Brock Hargreaves	4e1296674d	[rocm-libraries] ROCm/rocm-libraries#7990 (commit b8b5b43) [CK] Load ck.groovy via Jenkins Shared Library MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Motivation This allows the CI service to have a configuration source-of-truth outside the PR under test, allowing rapid system changes. Bug fixes on the develop branch propagate immediately to all pipelines that don't override the parameter -- no rebase required. A new `USE_CURRENT_BRANCH_FOR_CK_GROOVY` parameter lets contributors test pipeline changes on their own branch without any extra configuration. ## Technical Details - `loadCk()` in the Jenkinsfile is updated to call `library("ck@${branch}").ck.get()` instead of `checkout scm` + `load "vars/ck.groovy"`. The `checkout scm` inside `loadCk()` is removed since Jenkins now handles the library fetch internally. - A `USE_CURRENT_BRANCH_FOR_CK_GROOVY` boolean parameter (default: off) is added. When off, `ck.groovy` is always loaded from `develop` — all normal PR builds are unaffected. When on, `ck.groovy` is loaded from the current branch automatically via `env.CHANGE_BRANCH`, so contributors testing pipeline changes just tick the box. - `return this` is removed from the end of `ck.groovy`. This was required by the `load` convention but is not needed (and can cause errors) in a shared library context. - `loadCk()` is kept at every call site rather than called once at the top, preserving restart-from-stage safety — if a build is restarted from a mid-pipeline stage, `ck` is still initialized correctly. - The Jenkins Shared Library named `"ck"` must be registered in Jenkins Global Pipeline Libraries ## Test Plan 1. Trigger "Build with Parameters" on the PR branch with `USE_CURRENT_BRANCH_FOR_CK_GROOVY=true` 2. Verify "Determine CI Execution" stage completes and the library() calls indicates the current branch 3. Verify "Static checks" stage completes. 4. Trigger a second build with `USE_CURRENT_BRANCH_FOR_CK_GROOVY=false` (default) to confirm normal builds still load from `develop`. ## Test Result Verified both paths. The develop library is loaded by default, the branch library is loaded when the parameter is enabled. ## Submission Checklist - [ X ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-06-04 22:32:37 +00:00
John Afaganis	96c39b331e	[rocm-libraries] ROCm/rocm-libraries#7829 (commit 13af7da) [ck] Enforce ASCII-only C/C++ sources for hipRTC compatibility (#7829) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary CK source files must be compilable via hipRTC (HIP runtime compilation), whose preprocessor does not accept non-ASCII bytes anywhere in a translation unit — including in comments. Bytes that are harmless under `hipcc` (em-dashes, smart quotes, multiplication signs, Greek letters, box-drawing glyphs, etc.) cause hipRTC to fail at preprocessing time. These regularly leak in via LLM-assisted authoring or copy/paste from formatted documents and silently break hipRTC paths that are not exercised by the default `hipcc`-based build matrix. This PR (a) cleans every existing violation (53 files) and (b) adds a pre-checkin gate so new violations are rejected before merge. ## File extensions covered Both the cleanup scan and the new Jenkins enforcement stage use the same predicate: ``` .h .hpp .cpp .h.in .hpp.in .cpp.in .inc .cl ``` (excluding `/build/` and `/include/rapidjson/`). This is a strict superset of the existing `Clang Format` stage's predicate — `.inc` is added so test-fixture include files are also gated. The local pre-commit hook's `c++/inc` type filter covers the same set. ## Why no enforcement today CK is opted out of the rocm-libraries root `.pre-commit-config.yaml`, so the existing `pre-commit` workflow doesn't touch CK. The local CK `.pre-commit-config.yaml` only runs for developers who installed hooks. The authoritative gate is therefore the new Jenkins stage* in this PR; the local hook is convenience. ## Commit layout (bisect-friendly) 1. `79798aa6261` — `[ck] Convert reflect/ rendering to ASCII for hipRTC compatibility` Behavior change, isolated. `TreeFormatter` swaps `├─ / └─ / │ ` for `\|- / +- / \| ` (3-col width preserved so alignment is unchanged). `conv_description.hpp` swaps `×` for `x` as the dimension separator. `test_conv_description.cpp` expected strings updated in lockstep so the snapshot test stays green. This is the only commit in the series with observable runtime impact. 2. `738fdb0d81c` — `[ck] Strip non-ASCII bytes from C++ sources for hipRTC compatibility` Mechanical text cleanup across 53 files. Replacements happen in comments or in `std::cout` strings that are not asserted on by any test. None of the 174 `.inc` files in the tree required edits, but they were in the scan's predicate so the enforcement stage's predicate is a superset of what was scanned. Full replacement table in the commit message. 3. `1d7cd8ba235` — `[ck] Enforce ASCII-only C/C++ sources for hipRTC compatibility` - New `projects/composablekernel/script/check_ascii_only.sh` (modeled on `check_copyright_year.sh`). - New entry in `projects/composablekernel/.pre-commit-config.yaml` under the local-hooks block (`types_or: [c++, inc]`). - New `ASCII Only Check` parallel stage in `projects/composablekernel/Jenkinsfile`'s `Static checks` block, mirroring the existing `Clang Format` stage but with `.inc` added to the find predicate. Always-on, no `RUN_CPPCHECK` gate. The tree is buildable at every commit boundary. Commit 1 leaves 50 known violations; commit 2 leaves 0; commit 3 wires the gate. ## Demo Script output on a synthesized violation: ``` $ printf '// em-dash test \xe2\x80\x94 here\n' > /tmp/bad.cpp $ projects/composablekernel/script/check_ascii_only.sh /tmp/bad.cpp ERROR: /tmp/bad.cpp contains non-ASCII bytes: 1:// em-dash test — here Fix: replace with ASCII (em-dash -> --, smart quotes -> ", arrows -> ->, etc.) $ echo $? 1 ``` Full repo scan after the cleanup commits (note the `-name '.inc'` clause): ``` $ cd projects/composablekernel && find . -type f $ -name '.h' -o -name '.hpp' -o -name '.cpp' \ -o -name '.h.in' -o -name '.hpp.in' -o -name '.cpp.in' -o -name '.inc' -o -name '.cl' $ \ -not -path '/build/' -not -path '/include/rapidjson/' -print0 \ \| xargs -0 -P 8 -n 64 script/check_ascii_only.sh $ echo $? 0 ``` ## Test plan - [ ] Jenkins PR build: confirm new `Static checks -> ASCII Only Check` stage runs green over the full predicate (incl. `*.inc`) and existing `Clang Format` stage is unaffected. - [ ] `test_conv_description` passes against the ASCII tree-formatter output (touched in commit 1). - [ ] Local: `pre-commit run ascii-only-checker --all-files` runs cleanly after installing CK pre-commit hooks via `script/install_precommit.sh`. - [ ] Manually inject a non-ASCII byte in any `.cpp/.hpp/.inc` file, push: confirm Jenkins fails the new stage with a clear error. - [ ] Spot-check a representative subset of touched files under hipRTC compilation to confirm no remaining hipRTC-blocking content (optional, since the static byte check is a sufficient condition for hipRTC preprocessor acceptance on this dimension). 🤖 Generated with [Claude Code](https://claude.com/claude-code)	2026-06-04 15:00:17 +00:00
Brock Hargreaves	843d993835	[rocm-libraries] ROCm/rocm-libraries#7743 (commit 15ef85c) [CK] Extract Jenkinsfile helpers into vars/ck.groovy shared library (#7743) ## Motivation The CK Jenkinsfile is a 2,215-line monolith mixing helper function definitions with pipeline stage declarations. This makes it difficult to review, modify, or extend CI stages without wading through unrelated infrastructure code. ## Technical Details Extract all helper functions from the Jenkinsfile into vars/ck.groovy, loaded at runtime via ck = load "vars/ck.groovy" in the first stage. The Jenkinsfile is reduced from 2,215 lines to 810 lines containing only the pipeline structure. - 36 helper functions moved to ck.groovy with no logic changes - 10 new stage-wrapper functions (runBuildCKAndTests, runTileEngineGemmTests, runClangFormat, etc.) extract inline environment{}/steps{} business logic from stages, eliminating the MethodTooLargeException caused by CPS-transformed shell strings exceeding the JVM 64KB bytecode limit - All ck. method calls in steps{} blocks wrapped in script{} as required by Jenkins Declarative Pipeline - rocmnode() remains in the Jenkinsfile (needed for agent{} labels before ck is loaded) - CRON_SETTINGS / POLL_SPEC remain in the Jenkinsfile (triggers{} evaluates at parse time before any workspace is available) - No stage names changed ## Test Plan - Jenkinsfile validated against the Jenkins Pipeline Linter (/pipeline-model-converter/validate) - All 35 shared helper functions diffed line-by-line against develop to verify no regressions - Merge from develop incorporated and verified (gfx1250 stage, ROCm 7.13 default, cmake_build updates) ## Test Result - Linter: passes - Function diff vs develop: all 35 functions match exactly - Awaiting Jenkins run to confirm end-to-end stage execution ## Submission Checklist - [ x ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-06-01 21:10:12 +00:00
Illia Silin	0edfcf06e5	[rocm-libraries] ROCm/rocm-libraries#7894 (commit 5e66689) [CK] add credentials to docker manifest inspect call ## Motivation This should fix an issue that we recently encountered in CI when we exceeded the limit of accessing docker without authentication: [2026-05-29T16:08:42.447Z] + docker manifest inspect --insecure rocm/composable_kernel:ck_ub24.04_rocm7.13 [2026-05-29T16:08:42.833Z] toomanyrequests: You have reached your unauthenticated pull rate limit. https://www.docker.com/increase-rate-limit ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-29 19:18:57 +00:00
Illia Silin	016f8891de	[rocm-libraries] ROCm/rocm-libraries#7815 (commit e34ac06) [CK] fix daily build of CK for all supported targets. ## Motivation Fixing the daily build of CK packages for all supported targets. In the past, if no GPU_TARGETS was specified, we would by default build CK for all supported targets, But recently, the MIOpen team requested to change the default behavior to not build at all if no target is specified (for the purposes of filtering out unsupported targets in TheRock). So just adding the explicit list of targets to our daily builds now. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-28 14:52:19 +00:00
Illia Silin	e02c566795	[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24) [CK] upgrade CI to rocm7.13 as default compiler (#7612) ## Motivation Upgrade the default docker and compiler version in CI to rocm7.13. In order to pass all the checks I had to also clean up a lot of non-ascii characters in the source code comments and modify a couple of tests that were affected by a new compiler logic. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Aviral Goel <aviral.goel@amd.com>	2026-05-22 02:43:50 +00:00
Thrupti Raj Lakshmana Gowda	c31fc4df52	[rocm-libraries] ROCm/rocm-libraries#7311 (commit 79d8cae) [CK Tile Engine] Daily tier sampling for tile engine GEMM (#7311) Summary - Replace uniform random instance sampling (random.shuffle) with scrambled Sobol + Latin Hypercube + maximin space-filling sampling, per the Tile Engine Benchmark Sampling RFC - Add op-weighted budget allocation via new TILE_ENGINE_SAMPLING_TIER=daily CMake knob that auto-distributes 8,000 instances across ops proportional to registered weights in op_weights.json - Emit chosen_instances.json manifests for reproducibility tracking - Consolidate 5 copies of sampling logic into single _apply_sampling() method on the base class Jenkinsfile changes Replace per-op -D *_MAX_INSTANCES=250 with single -D TILE_ENGINE_SAMPLING_TIER=daily in gfx942/gfx950/gfx1201 stages. Budget auto-distributes (8000 total per GPU target). --------- Co-authored-by: Claude Sonnet 4 <noreply@anthropic.com>	2026-05-21 02:17:42 -05:00
Illia Silin	f01a8cb28d	[rocm-libraries] ROCm/rocm-libraries#7547 (commit 7e032ad) [CK] fix daily builds for pytorch (#7547) ## Motivation This will restore the daily builds that test whether the latest pytorch code can build with the latest CK code (pulled from the standalone CK repo). ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-19 07:13:36 -07:00
Illia Silin	e496f94445	[rocm-libraries] ROCm/rocm-libraries#7516 (commit de93737) [CK] fix some jenkins logic errors. (#7516) ## Motivation After merge from internal repo got some logic errors in the internal CI jenkinsfile. Here are 2 fixes for 2 issues: 1. make sure .ninja_log file exists before trying to parse it 2. use the default compiler for the gfx1250 target, since it's getting built in its own special docker, which does not have the option of installing alternative compilers. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-16 01:32:29 +00:00
Jobbins	8ca5c0a8d7	[rocm-libraries] ROCm/rocm-libraries#6961 (commit 47e8768) [CK] print hostname and $NODE_NAME to find inconsistencies (#6961) ## Motivation We suspect that the check for amdgpu: `cat /sys/module/amdgpu/version` sometimes gets ran on the Jenkins controller instead of the node. This adds the `hostname` command to compare to the $NODE_NAME variable. ## Technical Details Updated Jenkinsfile to include the `hostname` command. ## Test Plan ## Test Result ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-15 14:48:46 -07:00
Illia Silin	717f2efef7	[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d) [CK] add composable kernel support on gfx1250 (#6978) ## Motivation Add composable kernel support on gfx1250. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Qun Lin <qlin@amd.com> Co-authored-by: jialuo12_amdeng <jia.luo@amd.com> Co-authored-by: Andriy Roshchenko <andriy.roshchenko@amd.com> Co-authored-by: hsivasun_amdeng <haresh.sivasuntharampillai@amd.com>	2026-05-15 06:46:51 -07:00
John Shumway	142ee00585	[rocm-libraries] ROCm/rocm-libraries#7090 (commit 316fded) [CK] Add rocm_ck directory structure with feature flag (#7090) ## Summary Adds initial rocm_ck directory structure, #7119. - Establishes production `rocm_ck/` directory at `composablekernel/rocm_ck/`, peer to `tile_engine/` and `dispatcher/` - Adds `CK_ENABLE_ROCM_CK` option (default OFF) as a CK-internal feature flag — no superbuild or TheRock changes needed - Creates `rocm_ck` INTERFACE library, `ck_tile_headers` target, GTest integration with builder-style convenience targets (`smoke-rocm-ck`, `check-rocm-ck`) - Adds Jenkins `RUN_ROCM_CK_TESTS` parameter for CI, following the `RUN_BUILDER_TESTS` pattern - README explains the constexpr schema model: host-device separation via constexpr data rather than template parameters, enabling multi-arch distribution through kpack archives ## Test plan - [x] `cmake -DCK_ENABLE_ROCM_CK=ON` configures without errors - [x] `ninja check-rocm-ck` passes (4 host-only index type tests) - [x] Default build (`CK_ENABLE_ROCM_CK=OFF`) is unaffected — no rocm_ck targets present - [x] Jenkins `RUN_ROCM_CK_TESTS=true` enables the flag and runs `check-rocm-ck` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>	2026-05-14 18:51:37 +00:00
Illia Silin	22b9feb40f	[rocm-libraries] ROCm/rocm-libraries#7111 (commit 651947f) [CK] Fix latest batch of staging compiler warnings (#7111) ## Motivation Suppress the new batch of clang lifetimebound and invalidation warnings with the latest staging compiler. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-08 07:14:14 -07:00
Illia Silin	263e9965f6	[rocm-libraries] ROCm/rocm-libraries#7138 (commit 70e6660) [CK] disable tile_engine by default, limit gfx1030 CI builds to develop only. (#7138) ## Motivation An attempt to reduce the build time and keep CI moving faster. Disable tile_engine by default since even the cmake step may take up to 30 minutes. Since we're down to a single gfx1030 CI node, use it only for develop builds. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-08 01:39:45 +00:00
Illia Silin	1f8dbfb63e	[rocm-libraries] ROCm/rocm-libraries#7046 (commit aaf7665) [CK] fix CI git token. (#7046) ## Motivation Fix the CI breakage due to git PAT deprecation. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-05-06 02:31:47 +00:00
Illia Silin	5df88bfe73	[rocm-libraries] ROCm/rocm-libraries#6741 (commit 0d4180f) [CK] restore fmha performance reporting and disable c++17 in CI. (#6741) ## Motivation This change restores monitoring of FMHA benchmarks performance in daily builds and removes the std=c++17 flag from CI builds on gfx90a. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-04-25 02:22:11 +00:00
Yi DING	e8c36b1e65	[rocm-libraries] ROCm/rocm-libraries#6701 (commit f9a8d1c) [CK] Fix CI Failures for PR From Forks (#6701) ## Motivation Fork PRs fail CI when `RUN_AITER_TESTS` or `RUN_FA_TESTS` is enabled. The docker scripts run `git clone -b "$CK__BRANCH" https://github.com/ROCm/rocm-libraries.git`, but a fork's branch doesn't exist upstream: ``` fatal: Remote branch <fork-branch> not found in upstream origin ``` Example: [PR #6529 build #4](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/PR-6529/4/pipeline). ## Technical Details `Jenkinsfile`* — for PRs, use the upstream-visible PR ref instead of the head branch name: ```groovy CURRENT_BRANCH_NAME = env.CHANGE_ID ? "refs/pull/${env.CHANGE_ID}/head" : (env.CHANGE_BRANCH ? env.CHANGE_BRANCH : env.BRANCH_NAME) ``` `Dockerfile.aiter` / `Dockerfile.fa` — `git clone -b <ref>` only accepts branches (`refs/heads/`) and tags (`refs/tags/`), so it can't resolve `refs/pull/N/head`. Switch to `git fetch`, which accepts any refspec (and still works for plain branch names): ```sh mkdir rocm-libraries && cd rocm-libraries git init -q git remote add origin https://github.com/ROCm/rocm-libraries.git git fetch --depth 1 --filter=blob:none origin "$CK__BRANCH" git sparse-checkout init --cone git sparse-checkout set projects/composablekernel git checkout FETCH_HEAD ``` `git checkout FETCH_HEAD` lands in detached HEAD, which breaks the existing `git branch -m "$CK__BRANCH"` (and that name isn't a valid local branch anyway). Decouple the local branch name from the upstream ref: - Replace `git init` + `git branch -m` with `git init -b "$LOCAL_BRANCH"` (requires git ≥ 2.28, satisfied by base images) - `LOCAL_BRANCH="ck-import-${ROCM_LIBRARIES_SHA}"` in the rocm-libraries path; `LOCAL_BRANCH="$CK_*_BRANCH"` in the fallback - Downstream `git clone -b ... ../ck` uses `$LOCAL_BRANCH` ## Test Plan Manually trigger a build on this PR with `RUN_AITER_TESTS=true` and `RUN_FA_TESTS=true`; both docker images should build end-to-end. ## Test Result [jenkins / rocm-libraries-folder/Composable Kernel / PR-6701 / #3](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/PR-6701/3/pipeline/) ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-04-24 16:22:28 +08:00
Max Podkorytov	8c238fe875	[rocm-libraries] ROCm/rocm-libraries#6434 (commit 87aae5c) Fix ck4inductor conv instance parsing for NumGroupsToMerge parameter (#6434) ## Summary - Add `num_groups_to_merge` field to `CKGroupedConvFwdOp` dataclass to match the new (#4273) `NumGroupsToMerge` template parameter added to `DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3` - Enable inductor tests by default in Jenkins CI ## Test plan - [x] Built wheel without patch: `test_gen_conv_instances` fails with `TypeError: takes from 47 to 50 positional arguments but 51 were given` - [x] Built wheel with patch: `test_gen_conv_instances` passes	2026-04-22 11:05:11 -07:00
arai713	12f3d646a0	[rocm-libraries] ROCm/rocm-libraries#4769 (commit 72ae66e) [CK_TILE] Restructure Tile Engine's benchmarking and profiling (#4769) ## Motivation This PR introduces a restructure for the benchmarking and profiling aspects of CK Tile's Tile Engine, expanding on the groundwork from this previous https://github.com/ROCm/composable_kernel/pull/3434 and outlined in this [design document](https://amdcloud-my.sharepoint.com/:w:/r/personal/astharai_amd_com/Documents/Restructuring%20Tile%20Engine.docx?d=w14ea28a30718416988ed5ebb759bd3b2&csf=1&web=1&e=l3VBuX). In PR 3434, to reduce repeated code we implemented: - Base class that centralizes common functionality and provides a default implementation (Universal GEMM) - Child classes for GEMM variants override virtual functions to handle variant-specific behavior This refactoring in this PR follows the same process and should greatly reduce the duplicated code present in Tile Engine and make it simpler to add in new operations, increasing scalability. ## Technical Details The files have been refactored around new base structs for benchmarks, profiling and problem descriptions. The new base structs are: - GemmProblem - GemmBenchmark - GemmProfiler Universal GEMM, Preshuffle GEMM, and Multi-D GEMM all have child classes that will inherit from these base structs overriding only what differs per variant. All common functions across the benchmarking and profiling files have been moved into newly added common utility files under the commons/ directory. The new utility files are: - utils.hpp: common functions for the benchmarking and profiling process - benchmark_utils.py: common utility functions for the benchmark generation ## Test Plan I tested using the existing tests for Tile Engine. ## Test Result All tests passed. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-04-14 10:50:24 -07:00
Yi DING	92dc99b713	[rocm-libraries] ROCm/rocm-libraries#5329 (commit 9c43062) [CK] Add flash_attn tests (#5329) ## Motivation Add CI support for running [flash-attention](https://github.com/ROCm/flash-attention) tests against CK, similar to existing AITER and PyTorch downstream test pipelines. ## Technical Details ### New: `Dockerfile.fa` A new Dockerfile that builds a flash-attention test image on top of a ROCm PyTorch base image. It: - Sparse-checkouts CK from `rocm-libraries` (or clones directly from `ROCm/composable_kernel`) - Clones and builds `flash-attention` with CK as the backend - Supports configurable `FA_BRANCH`, `CK_FA_BRANCH`, and `GPU_ARCHS` build args ### Updated: `Jenkinsfile` buildDocker refactor: - Extracted `buildAndPushDockerImage()` helper that handles both "check if exists, skip" and "force build, push" logic, eliminating the duplicated try/catch blocks - Split monolithic `buildDocker()` into `buildDockerBase()`, `buildDockerPytorch()`, `buildDockerAiter()`, and new `buildDockerFa()` - Each downstream docker build now runs unconditionally within its respective guard (`RUN_PYTORCH_TESTS`, `RUN_AITER_TESTS`, `RUN_FA_TESTS`) - Image digests are stored in env vars (`CK_BASE_IMAGE`, `CK_PYTORCH_IMAGE`, `CK_AITER_IMAGE`, `CK_FA_IMAGE`) for use in downstream stages run_downstream_tests refactor: - Merged `run_aiter_tests()` and `run_pytorch_tests()` into a single generic `run_downstream_tests(conf)` that accepts `image`, `timeoutHours`, and `execute_cmds` - Test commands for each downstream target are declared as top-level lists (`RUN_PYTORCH_TESTS_CMDS`, `RUN_AITER_TESTS_CMDS`, `RUN_FA_TESTS_CMDS`) Pipeline stages: - Merged "Run Pytorch Tests" and "Run AITER Tests" into a single "Run Downstream Tests" parallel stage - Added two new FA test stages: "Run FA Tests on gfx942" and "Run FA Tests on gfx950" - Added new pipeline parameters: `RUN_FA_TESTS`, `fa_base_docker`, `fa_branch`, `ck_fa_branch` - `ck_pytorch_branch` and `ck_aiter_branch` now default to the current branch instead of hardcoded `develop` - CRON schedule at 13:00 now also triggers `RUN_FA_TESTS=true` ## Test Plan - [x] Trigger pipeline manually with `RUN_FA_TESTS=true` on gfx942 and gfx950 nodes - [x] Verify existing AITER and PyTorch test stages are unaffected - [x] Verify `buildAndPushDockerImage` correctly skips rebuild when image already exists (with `BUILD_DOCKER=false`) ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-04-10 09:23:10 +08:00
Illia Silin	eb1508015d	[rocm-libraries] ROCm/rocm-libraries#6147 (commit 8035856) [CK] Replace daily CI builds with mainline compiler with TheRock compiler. (#6147) ## Motivation Since the compiler team has deprecated the amd-mainline branch and switched to TheRock, we'll start building a docker image with TheRock artifacts and building/testing Ck with that. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-04-03 11:04:12 -06:00
Illia Silin	cd7e17c837	[rocm-libraries] ROCm/rocm-libraries#6103 (commit c74e44d) Use ck_pytorch docker from private repo. (#6103) ## Motivation Move the pytorch docker image used for CK testing into private repo. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-04-02 09:06:44 -07:00
Jobbins	db382efaf7	[rocm-libraries] ROCm/rocm-libraries#6107 (commit e69d1b2) [CK] poll every 6 hours as workaround (#6107)	2026-04-01 15:52:45 -04:00
Jobbins	ef2a63047f	[rocm-libraries] ROCm/rocm-libraries#6064 (commit cce30ab) [CK] poll develop every 15 minutes for changes (#6064)	2026-04-01 10:34:48 -04:00
Illia Silin	70e4696f01	[rocm-libraries] ROCm/rocm-libraries#5921 (commit 032ac1b) [CK] fix clang lifetimebound errors with staging compiler (#5921) ## Motivation The ROCm staging compiler (newer Clang) enforces `[[clang::lifetimebound]]` annotations on methods that return references or pointers to internal object data. Without these annotations, the staging compiler emits compilation errors for container accessor methods across the CK and CK Tile namespaces. ## Technical Details Adds `[[clang::lifetimebound]]` to all reference/pointer-returning accessors in core container types: `ck::` namespace: - `Array` -- `At()`, `operator[]`, `operator()`, `begin()`, `end()` - `index_array` -- `operator[]` - `StaticallyIndexedArray_v2` -- `At()`, `operator[]`, `operator()` - `IndexLookupTable` -- `operator[]` `ck_tile::` namespace: - `array` -- `get(i)`, `at()`, `operator[]`, `operator()` - `static_array` -- `operator[]` - `thread_buffer` -- `get(i)`, `at()`, `operator[]`, `operator()` - `make_kernel()` -- parameter pack Also removes the unused `instance_index` variable from `batched_gemm_reduce_fp16.cpp` and simplifies its argument parsing accordingly. ## Test Plan - Compile with the staging compiler to verify all lifetimebound errors are resolved - Existing tests pass unchanged -- the attribute is a compile-time annotation with no runtime effect ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-03-30 07:19:32 -07:00
Illia Silin	689c46ff53	[rocm-libraries] ROCm/rocm-libraries#5764 (commit f3c1232) Re-enable daily builds with staging compiler (#5764) ## Motivation This should help us catch and fix any new compilation issues early on. ## Technical Details We now have three compiler profiles: * develop: slightly stabilized version of amd-staging with some of the obvious offending PRs reverted, 1-2 weeks behind amd-staging; * amd-mainline: more stable version of compiler, the baseline for all other branches, e.g., release, npi, etc. 2-4 weeks behind amd-staging. * amd-staging: latest compiler version where all new PRs land, often broken; ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: kensclin <lshyhchy@amd.com>	2026-03-25 16:37:00 +00:00
Eiden Yoshida	1b4f091908	[rocm-libraries] ROCm/rocm-libraries#5691 (commit 2fbb1fc) [CK] MICI: Revert "add self healing to ref repo" (#5691) The check may not be working as intended, causing premature deletion of reference repositories	2026-03-23 07:15:51 -07:00
andrew clark	fe1f90100d	[rocm-libraries] ROCm/rocm-libraries#5464 (commit debfc96) Improved CI infrastructure failure detection (#5464) ## Motivation This PR re-enables CI infrastructure failure detection and notification, which had been disabled due to performance issues caused by loading large build logs (~80k lines) into memory for pattern scanning. The goal is to reliably detect known infrastructure failures (GPU errors, Docker authentication issues, disk space errors, etc.) and send actionable Teams notifications without hanging on large logs. ## Technical Details - Replaced full build log loading and Groovy-based pattern scanning with a streaming wget \| grep -E pipe. grep scans natively so the full log is never loaded into Groovy, resolving the hang on large logs. - Combined all failure patterns into a single grep -E call to avoid multiple log fetches. - The node name is now tracked with the observed failure. - Added a new failure pattern for device's running out of space. ## Test Plan - Forced failures in the "Determine CI Execution" stage with all 9 failure patterns echoed to the build log. - Simulated large log sizes (~80k lines of dummy output) to validate pattern detection and node name extraction at realistic log scales, including patterns placed both before and after large blocks of dummy output. ## Test Result All 9 failure patterns detected correctly. Teams notifications sent with accurate log context, node name, and job links. No hangs observed on 80k line simulated logs. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-03-20 19:16:58 +00:00
Jobbins	f28fed5f72	[rocm-libraries] ROCm/rocm-libraries#5630 (commit 14cd617) add self healing to ref repo (#5630) ## Motivation Check for when mirror repo gets corrupted in CI ## Technical Details We detect broken ref objects and rebuild the local mirror in that case of corruption ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-03-20 09:42:53 -07:00
Yaswanth Raparti	d782cc344d	[rocm-libraries] ROCm/rocm-libraries#5614 (commit 32933df) [CK][CK TILE] Fix smart-build to run install target for client examples (#5614) How ninja install works: - Builds library dependencies (device_operations, etc.) - Installs them to CMAKE_INSTALL_PREFIX - Skips building test executables (not install dependencies) Affected stages (8): - gfx942/gfx950/gfx908/gfx90a CK Client Examples - gfx10-1/gfx10-3/gfx11/gfx12 CK Client Examples ## Motivation Problem: When smart-build is enabled (runAllUnitTests=false), the build step is skipped entirely. This causes client example stages to fail because they depend on the CK library being installed to ../install. Error seen: Target "client_gemm" links to: composable_kernel::device_other_operations but the target was not found. ## Technical Details Root cause: Line 712 only checked runAllUnitTests, so when building with config_targets="install", the install target was never built, leaving the install directory empty. Fix: Added condition to always build when config_targets contains 'install'. The install target automatically builds its dependencies (the CK libraries) but skips building tests, which aligns with smart-build philosophy. ## Test Plan Should be tested on CI ## Test Result Should be tested on CI ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2026-03-19 22:00:29 +00:00
Yaswanth Raparti	3910c9c2a1	[rocm-libraries] ROCm/rocm-libraries#5249 (commit 2a114bb) [CK] [CK_TILE] Improve build and test time of CI with smart dependency parser (#5249) ## Motivation Existing dependency parser needs full build of tests to determine which tests are affected by code changes in a PR. This still takes 2-4 hours for building the tests which slows down the CI as the number of tests grow. To resolve this issue we implemented a smart dependency parser which uses CMake Configure to parse dependencies and build only the affected test cases. We have ensured that two approaches are available 1) CMake pre-build analysis for each PR to ensure fast build and test. 2) Ninja post-build analysis to enable full build for nightly tests. ## Technical Details ```bash ### 1. Configure the project with CMake cmake -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON .. ### 2. Analyze dependencies (no build required!) python3 ../script/dependency-parser/main.py cmake-parse compile_commands.json build.ninja \ --workspace-root .. --output cmake_dependency_mapping.json --parallel 8 ### 3. Find tests affected by changes python3 ../script/dependency-parser/main.py select cmake_dependency_mapping.json origin/develop \ HEAD --test-prefix --output tests_to_run.json ### 4. Build only affected tests ninja $(jq -r '.executables[]' tests_to_run.json \| tr '\n' ' ') ### 5. Run affected tests ctest -R "$(jq -r '.regex' tests_to_run.json)" ``` ### Jenkins Integration - Added `buildMode` to jenkinsfile to integrate both `selective` and `full` build methods ### Known Limitations ### 1. Build-Time Generated Headers (HIGH RISK) Problem: Files generated during the build process (e.g., via `add_custom_command`) cannot be analyzed before building. Example: ```cmake add_custom_command( OUTPUT ${CMAKE_BINARY_DIR}/generated/config.hpp COMMAND generate_config.sh DEPENDS template.hpp.in ) ``` Impact: If a source file includes `generated/config.hpp`, the dependency won't be detected until after building. Mitigation: - CK analysis shows no generated headers currently used - If generated headers are added in the future, they must be built first - Recommendation: Generate headers in CMake configure phase (not build phase) when possible ## Test Plan 1. Modified Files: ``` include/ck_tile/ops/common.hpp include/ck_tile/ops/gemm.hpp include/ck_tile/ops/gemm/warp/warp_gemm.hpp ``` 2. Compare tests selected between `build.ninja` and `cmake-parse` methods ## Test Result - 1. The test completed in 5-6 minutes finding about 8000+ executables that should be built. - 2. We selected a commit 5ccc1387ea which resulted in same 7 tests with both legacy and new methods. - PR \| Legacy tests \| Smart tests \| Notes -- \| -- \| -- \| -- 5261 \| 453 \| 455 \| Only 2 tests (test_amdgcn_mma and test_amdgcn_sparse_mma) 5168 \| 0 \| 0 \| Changes in dispatcher only. No CK tests invoked. 5249 \| 0 \| 0 \| Changes to dependency parser. No CK tests invoked 5260 \| 0 \| 0 \| Changes in dispatcher only. No CK tests invoked. 5174 \| 1 \| 1 \| One test from FMHA affected by this PR in both cases 5383 \| 0 \| 0 \| Changes are only in benchmark files. Did not trigger any tests 5445 \| 1 \| 1 \| Changes are only to tests/ck_tile/gemm_streamk. Only triggered one streamk test in both cases. 5454 \| 3 \| 3 \| Both methods identified same test_grouped_conv_bwd tests 5427 \| 234 \| 234 \| Core infrastructure header changes. Detected exactly same tests 5388 \| 85 \| 85 \| modifies warp-level GEMM operations (warp_gemm.hpp, warp_gemm_dispatcher.hpp). Correctly identified all the streamK gemm tests ## Submission Checklist - [x ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2026-03-19 05:30:19 +00:00
Thrupti Raj Lakshmana Gowda	bdb589448d	[rocm-libraries] ROCm/rocm-libraries#4996 (commit 0a47fbe) [CK TILE ENGINE] Add grouped_gemm operator to Tile Engine (gfx942/gfx950) (#4996) ## Motivation The grouped_gemm CK Tile kernel exists (e.g., `example/17_grouped_gemm/`) but has no Tile Engine wrapper. Grouped GEMM handles multiple independent GEMM problems with varying M/N/K dimensions in a single kernel launch. This PR adds the Tile Engine infrastructure for automated kernel generation, benchmarking, and profiling of grouped GEMM kernels. Jira: AICK-809 ## Technical Details - Created Tile Engine wrapper under `tile_engine/ops/gemm/grouped_gemm/` following the `gemm_universal` template - Files added: `CMakeLists.txt`, `grouped_gemm_common.hpp`, `grouped_gemm_benchmark.hpp`, `grouped_gemm_profiler.hpp`, `grouped_gemm_benchmark.py`, `grouped_gemm_benchmark_single.cpp`, `grouped_gemm_instance_builder.py`, `configs/` - Supported datatypes: fp16, fp8, bf16, bf8 - Supported layouts: rcr, rrr, ccr, crr - Target GPUs: gfx942, gfx950 - CK Tile kernel: `ck_tile::GroupedGemmKernel` from `include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp` - Instance builder extends `GemmKernelBuilder` base class - Registered in `tile_engine/ops/gemm/CMakeLists.txt` - Updated Jenkinsfile to build and benchmark grouped_gemm targets in CI - Benchmark infrastructure includes JSON output, CSV export, and verification support ## Test Plan - CMake configure succeeds for grouped_gemm targets - Kernel instance builder generates valid kernel headers for all (datatype, layout) combinations - At least one kernel binary compiles and runs per datatype/layout combination - Correctness passes with `--verify 1` on gfx942/gfx950 ## Test Result ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-10 18:58:37 -05:00
Bartłomiej Kocot	e262252c4c	[rocm-libraries] ROCm/rocm-libraries#5115 (commit a21861e) [CK][CK Tile] Add grouped conv backward weight tile test and fix tr load in BASE_V1 pipeline (#5115) ## Motivation Test grouped conv backward weight from ck tile and fix incorrect values. ## Technical Details - Add test for CI - Add daily tests - Fix transpose load in BASE_V1 pipeline ## Test Plan test_grouped_convnd_backward_weight_tile ## Test Result in progress ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. AICK-783	2026-03-10 03:03:04 +00:00
andrew clark	b7320a2b19	[rocm-libraries] ROCm/rocm-libraries#5063 (commit 01abf3d) CI Skip Testing Fix (#5063) ## Motivation While testing the Skip CI functionality, it revealed a minor issue where the CI skip check fails when a branch is built at the exact commit where it diverged from develop. CI is still run by default if a failure is detected. When git log <merge-base>..HEAD returns no files (because HEAD equals merge-base), the command grep -v '^$' exits with error code 1, causing the skip check to fail. ## Technical Details Added \|\| true to the grep commands so empty output is handled gracefully instead of causing a script failure. ## Test Plan - Simulate the failures and ensure the grep failure is handled gracefully. ## Test Result - Simulated grep failures using an empty string. The script handles the error correctly. - Verified the CI skip functionality skips CI when non-relevant file changes are made. - Verified the CI skip functionality does not skip CI when relevant file changes are made. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-03-04 22:01:25 -07:00
Thrupti Raj Lakshmana Gowda	94dd8b6955	[rocm-libraries] ROCm/rocm-libraries#4958 (commit 713881f) bf8 and bf16 support for Universal GEMM in Tile Engine (#4958) ## Motivation Currently we have only fp8 and fp16 datatype support for universal GEMM in Tile Engine with this PR support for bf8 and bf16 datatype will be added during the CI phase ## Technical Details Adding bf8 and bf16 support ## Test Plan NA ## Test Result NA ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-03-03 15:29:13 -08:00
andrew clark	f798c36fdd	[rocm-libraries] ROCm/rocm-libraries#4943 (commit ea40212) [CK] Updating CI skip logic (#4943) ## Motivation The CI skip logic has two issues that prevented it from working correctly: 1. Incorrect file patterns: After migrating from standalone repo to `rocm-libraries`, file paths now include the `projects/composablekernel/` prefix (e.g., `projects/composablekernel/docs/README.md`). The skip patterns were still checking for paths starting with `docs/`, which never matched. 2. Incomplete build type support: Jenkins multibranch pipelines provide different environment variables for PR builds (`$CHANGE_TARGET`, `$CHANGE_ID`) vs branch builds (`$BRANCH_NAME`). The previous logic only compared `HEAD~1..HEAD` for branch builds, which missed changes from multi-commit pushes and didn't properly handle feature branch builds. When CI skipped or ran, there was no visibility into which files triggered the decision, making it difficult to diagnose issues. You can now see which files triggered the CI run. ## Technical Details PR builds: Compares all commits against origin/$CHANGE_TARGET. Feature branch builds: Uses git merge-base to find divergence point from develop and checks all touched files since then. Scheduled develop builds are unaffected. These builds are forced to run from the pipeline parameters. Example log output for PR Builds: <img width="647" height="260" alt="image" src="https://github.com/user-attachments/assets/c8673a81-acb2-4fb2-acbb-1c07b5ab3b69" /> Example log output for Branch Builds: <img width="488" height="287" alt="image" src="https://github.com/user-attachments/assets/fbb17ba7-eb2c-42a4-b820-b2a8b9e479c4" /> ## Test Plan Pre-PR validation (branch builds): Push commits with only documentation changes → CI should skip. I will have to verify this after this PR is merged! Push commits with code changes → CI should run Push commits that modify then revert code → CI should run (catching reverts) Verify debug output clearly shows skip/run decision Post-PR validation (PR builds): Create PR with only doc changes → CI should skip. I will have to verify this after this PR is merged! Create PR with mixed doc + code changes → CI should run and log which files triggered it Verify debug output clearly shows skip/run decision ## Test Result All branch build checks succeeded. All PR build checks succeeded. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-03-03 07:49:34 -08:00
Illia Silin	c754ee3df0	[rocm-libraries] ROCm/rocm-libraries#5036 (commit 0bee213) [CK] Switch compiler branch from staging to develop and upgrade sccache. (#5036) ## Motivation Upgrade to official sccache version 0.14, since it now supports hip. Also, switching daily builds from amd-staging to develop compiler branch, since it should be more stable. ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-03-03 07:32:24 -08:00
Thrupti Raj Lakshmana Gowda	720d7fa02b	[rocm-libraries] ROCm/rocm-libraries#4592 (commit 45f76cb) Tile Engine support for gfx950 (#4592) ## Motivation This PR adds support for the gfx950 GPU architecture to the Tile Engine in Composable Kernel library, focusing on GEMM operations with FP8 and BF8 data types. ## Technical Details Added gfx950-specific MFMA warp GEMM implementations with conditional compilation. Updated default GEMM configuration parameters for tile sizes and warp configurations. Added Jenkins CI pipeline stage for testing TILE_ENGINE_GEMM on gfx950 hardware. ## Test Plan Tile engine itself is a benchmarking utility, so if it passes the CI it will be tested automatically. ## Test Result Tile engine itself is a benchmarking utility, so if it passes the CI it will be tested automatically. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Thrupti Raj Lakshmana Gowda<ThruptiRaj.LakshmanaGowda@amd.com> Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>	2026-02-26 10:14:40 -06:00
Eiden Yoshida	e16789b609	[rocm-libraries] ROCm/rocm-libraries#4373 (commit 1c29275) [CK] MICI: Disable failure pattern checking ## Motivation - ck mici jobs hanging at end, possibly at failure pattern checking ## Technical Details - Disable failure pattern checking to see if hanging goes away ## Test Plan - Observe behavior after merge	2026-02-09 15:25:01 +00:00
spolifroni-amd	d2f1541976	[rocm-libraries] ROCm/rocm-libraries#4300 (commit 07e9d56) [CK] add inter/intrawave scheduling concept doc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes Adding information about inter/intrawave scheduling	2026-02-07 00:11:11 +00:00
Illia Silin	1ddb38f098	[rocm-libraries] ROCm/rocm-libraries#4375 (commit 45b616b) [CK] fix path for build filter ## Motivation Fix the filter that determines whether CI builds are necessary. ## Technical Details A script checks the files list returned by git diff and checks whether any code source was modified. If not, if only documentation was changed, it will allow skipping the builds. We make sure we only look at the changes in projects/composablekernel/ folder.	2026-02-06 18:18:14 +00:00
Illia Silin	4dd4869fbf	[rocm-libraries] ROCm/rocm-libraries#4361 (commit 37a74ef) [CK] a bunch of CI fixes. ## Motivation Fixing some of the CK CI issues ## Technical Details fixing paths to dockerfiles and scripts; moving codegen tests to separate stage (collides with main build since you must call cmake from same folder but different options); fixing a couple of clang compilation issues with staging compiler;	2026-02-06 01:07:34 +00:00
Eiden Yoshida	e96beb1f3e	[rocm-libraries] ROCm/rocm-libraries#4352 (commit 3c9beb3) [CK] MICI: Fix git diff in selective_test_filter.py ## Motivation - git diff needs access to reference repo ## Technical Details - mount reference repo path into docker for selective_test_filter.py to access ## Test Plan - tested in MICI ## Test Result - launch_tests.sh ran successfully	2026-02-05 22:57:20 +00:00
Jobbins	344d98781b	[rocm-libraries] ROCm/rocm-libraries#4351 (commit 3b98c98) [composablekernel] fix failure status ## Motivation Pipelines were failing on Math CI status check. ## Technical Details For the success case, we just changed the config in Jenkins to use a proper app token and no code changes were required. However, the failure case would not have worked as coded, so we needed to move that outside of the `rocmnode()` block. ## Test Plan I removed all of the CI in one of the commits to quickly test, and then added it back. Got a successful "success" message and "failure" message produced	2026-02-05 15:57:21 +00:00
Eiden Yoshida	3a02862241	[rocm-libraries] ROCm/rocm-libraries#4349 (commit 9bb7f5c) [CK] MICI: Correct path for build trace script ## Motivation - Corrects path to script due to superrepo migration - Forces all tests to run by default ## Technical Details - now in /projects/composablekernel	2026-02-05 15:56:52 +00:00
Eiden Yoshida	3f42f76b45	[rocm-libraries] ROCm/rocm-libraries#4336 (commit d26a782) [CK] MICI: Use reference repo for checkout operations ## Motivation - Maintain a reference repo on slave nodes that speeds up any clone/checkout operations ## Technical Details - clone a ref repo if it does not exist - update ref repo if it does exist - checkout after ref repo is updated - eliminates double clone ## Test Result - Initial checkouts succeeded	2026-02-05 02:44:29 +00:00
Jeff Huang	7b18f5fed2	[rocm-libraries] ROCm/rocm-libraries#4263 (commit f34aec2) [CK] Add FP8 KV_BLOCKSCALE support for batch prefill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implement per-page K/V quantization for paged attention: - Add KV_BLOCKSCALE enum to BlockAttentionQuantScaleEnum - Use exp2 shift trick to eliminate explicit P scaling overhead - Prefetch physical pages offset for KV cache, overlaps with computations ## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [ ] I have run `clang-format` on all changed files - [ ] Any dependent changes have been merged ## Discussion If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered	2026-02-04 23:26:20 +00:00

1 2 3 4 5 ...

312 Commits