Commit Graph

292 Commits

Author SHA1 Message Date
Max Podkorytov
58e802dcf1 Fix ck4inductor conv instance parsing for NumGroupsToMerge parameter (#6434)
## Summary
- Add `num_groups_to_merge` field to `CKGroupedConvFwdOp` dataclass to
match the new (#4273) `NumGroupsToMerge` template parameter added to
`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3`
- Enable inductor tests by default in Jenkins CI

## Test plan
- [x] Built wheel without patch: `test_gen_conv_instances` fails with
`TypeError: takes from 47 to 50 positional arguments but 51 were given`
- [x] Built wheel with patch: `test_gen_conv_instances` passes
2026-04-22 11:05:11 -07:00
arai713
c810a01ec6 [CK_TILE] Restructure Tile Engine's benchmarking and profiling (#4769)
## Motivation
This PR introduces a restructure for the benchmarking and profiling
aspects of CK Tile's Tile Engine, expanding on the groundwork from this
previous https://github.com/ROCm/composable_kernel/pull/3434 and
outlined in this [design
document](https://amdcloud-my.sharepoint.com/:w:/r/personal/astharai_amd_com/Documents/Restructuring%20Tile%20Engine.docx?d=w14ea28a30718416988ed5ebb759bd3b2&csf=1&web=1&e=l3VBuX).
In PR 3434, to reduce repeated code we implemented:

- Base class that centralizes common functionality and provides a
default implementation (Universal GEMM)
- Child classes for GEMM variants override virtual functions to handle
variant-specific behavior

This refactoring in this PR follows the same process and should greatly
reduce the duplicated code present in Tile Engine and make it simpler to
add in new operations, increasing scalability.

## Technical Details
The files have been refactored around new base structs for benchmarks,
profiling and problem descriptions. The new base structs are:

- GemmProblem
- GemmBenchmark
- GemmProfiler

Universal GEMM, Preshuffle GEMM, and Multi-D GEMM all have child classes
that will inherit from these base structs overriding only what differs
per variant.
All common functions across the benchmarking and profiling files have
been moved into newly added common utility files under the commons/
directory. The new utility files are:

- utils.hpp: common functions for the benchmarking and profiling process
- benchmark_utils.py: common utility functions for the benchmark
generation

## Test Plan
I tested using the existing tests for Tile Engine.
## Test Result
All tests passed.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-14 10:50:24 -07:00
Yi DING
914f4e47a2 [CK] Add flash_attn tests (#5329)
## Motivation

Add CI support for running
[flash-attention](https://github.com/ROCm/flash-attention) tests against
CK, similar to existing AITER and PyTorch downstream test pipelines.

## Technical Details

### New: `Dockerfile.fa`
A new Dockerfile that builds a flash-attention test image on top of a
ROCm PyTorch base image. It:
- Sparse-checkouts CK from `rocm-libraries` (or clones directly from
`ROCm/composable_kernel`)
- Clones and builds `flash-attention` with CK as the backend
- Supports configurable `FA_BRANCH`, `CK_FA_BRANCH`, and `GPU_ARCHS`
build args

### Updated: `Jenkinsfile`

**buildDocker refactor:**
- Extracted `buildAndPushDockerImage()` helper that handles both "check
if exists, skip" and "force build, push" logic, eliminating the
duplicated try/catch blocks
- Split monolithic `buildDocker()` into `buildDockerBase()`,
`buildDockerPytorch()`, `buildDockerAiter()`, and new `buildDockerFa()`
- Each downstream docker build now runs unconditionally within its
respective guard (`RUN_PYTORCH_TESTS`, `RUN_AITER_TESTS`,
`RUN_FA_TESTS`)
- Image digests are stored in env vars (`CK_BASE_IMAGE`,
`CK_PYTORCH_IMAGE`, `CK_AITER_IMAGE`, `CK_FA_IMAGE`) for use in
downstream stages

**run_downstream_tests refactor:**
- Merged `run_aiter_tests()` and `run_pytorch_tests()` into a single
generic `run_downstream_tests(conf)` that accepts `image`,
`timeoutHours`, and `execute_cmds`
- Test commands for each downstream target are declared as top-level
lists (`RUN_PYTORCH_TESTS_CMDS`, `RUN_AITER_TESTS_CMDS`,
`RUN_FA_TESTS_CMDS`)

**Pipeline stages:**
- Merged "Run Pytorch Tests" and "Run AITER Tests" into a single "Run
Downstream Tests" parallel stage
- Added two new FA test stages: "Run FA Tests on gfx942" and "Run FA
Tests on gfx950"
- Added new pipeline parameters: `RUN_FA_TESTS`, `fa_base_docker`,
`fa_branch`, `ck_fa_branch`
- `ck_pytorch_branch` and `ck_aiter_branch` now default to the current
branch instead of hardcoded `develop`
- CRON schedule at 13:00 now also triggers `RUN_FA_TESTS=true`

## Test Plan

- [x] Trigger pipeline manually with `RUN_FA_TESTS=true` on gfx942 and
gfx950 nodes
- [x] Verify existing AITER and PyTorch test stages are unaffected
- [x] Verify `buildAndPushDockerImage` correctly skips rebuild when
image already exists (with `BUILD_DOCKER=false`)

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-10 09:23:10 +08:00
Illia Silin
148ae9f2ef [CK] Replace daily CI builds with mainline compiler with TheRock compiler. (#6147)
## Motivation

Since the compiler team has deprecated the amd-mainline branch and
switched to TheRock, we'll start building a docker image with TheRock
artifacts and building/testing Ck with that.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-03 11:04:12 -06:00
Illia Silin
9ee89abffe Use ck_pytorch docker from private repo. (#6103)
## Motivation

Move the pytorch docker image used for CK testing into private repo.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-04-02 09:06:44 -07:00
Jobbins
f3375e4b79 [CK] poll every 6 hours as workaround (#6107) 2026-04-01 15:52:45 -04:00
Jobbins
6ab0e34ed9 [CK] poll develop every 15 minutes for changes (#6064) 2026-04-01 10:34:48 -04:00
Illia Silin
3873bf3b91 [CK] fix clang lifetimebound errors with staging compiler (#5921)
## Motivation

The ROCm staging compiler (newer Clang) enforces
`[[clang::lifetimebound]]` annotations on methods that return references
or pointers to internal object data. Without these annotations, the
staging compiler emits compilation errors for container accessor methods
across the CK and CK Tile namespaces.

  ## Technical Details

Adds `[[clang::lifetimebound]]` to all reference/pointer-returning
accessors in core container types:

  **`ck::` namespace:**
  - `Array` -- `At()`, `operator[]`, `operator()`, `begin()`, `end()`
  - `index_array` -- `operator[]`
  - `StaticallyIndexedArray_v2` -- `At()`, `operator[]`, `operator()`
  - `IndexLookupTable` -- `operator[]`

  **`ck_tile::` namespace:**
  - `array` -- `get(i)`, `at()`, `operator[]`, `operator()`
  - `static_array` -- `operator[]`
  - `thread_buffer` -- `get(i)`, `at()`, `operator[]`, `operator()`
  - `make_kernel()` -- parameter pack

Also removes the unused `instance_index` variable from
`batched_gemm_reduce_fp16.cpp` and simplifies its argument parsing
  accordingly.

  ## Test Plan

- Compile with the staging compiler to verify all lifetimebound errors
are resolved
- Existing tests pass unchanged -- the attribute is a compile-time
annotation with no runtime effect

 ## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-30 07:19:32 -07:00
Illia Silin
fea325336c Re-enable daily builds with staging compiler (#5764)
## Motivation

This should help us catch and fix any new compilation issues early on.

## Technical Details

We now have three compiler profiles:
* **develop**: slightly stabilized version of amd-staging with some of
the obvious offending PRs reverted, 1-2 weeks behind amd-staging;
* **amd-mainline**: more stable version of compiler, the baseline for
all other branches, e.g., release, npi, etc. 2-4 weeks behind
amd-staging.
* **amd-staging**: latest compiler version where all new PRs land, often
broken;

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: kensclin <lshyhchy@amd.com>
2026-03-25 16:37:00 +00:00
Eiden Yoshida
39a340d60f [CK] MICI: Revert "add self healing to ref repo" (#5691)
The check may not be working as intended, causing premature deletion of
reference repositories
2026-03-23 07:15:51 -07:00
andrew clark
ba959369e3 Improved CI infrastructure failure detection (#5464)
## Motivation

This PR re-enables CI infrastructure failure detection and notification,
which had been disabled due to performance issues caused by loading
large build logs (~80k lines) into memory for pattern scanning. The goal
is to reliably detect known infrastructure failures (GPU errors, Docker
authentication issues, disk space errors, etc.) and send actionable
Teams notifications without hanging on large logs.

## Technical Details

- Replaced full build log loading and Groovy-based pattern scanning with
a streaming wget | grep -E pipe. grep scans natively so the full log is
never loaded into Groovy, resolving the hang on large logs.
- Combined all failure patterns into a single grep -E call to avoid
multiple log fetches.
- The node name is now tracked with the observed failure.
- Added a new failure pattern for device's running out of space.

## Test Plan

- Forced failures in the "Determine CI Execution" stage with all 9
failure patterns echoed to the build log.
- Simulated large log sizes (~80k lines of dummy output) to validate
pattern detection and node name extraction at realistic log scales,
including patterns placed both before and after large blocks of dummy
output.

## Test Result

All 9 failure patterns detected correctly. Teams notifications sent with
accurate log context, node name, and job links. No hangs observed on 80k
line simulated logs.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-20 19:16:58 +00:00
Jobbins
8892f83890 add self healing to ref repo (#5630)
## Motivation

Check for when mirror repo gets corrupted in CI

## Technical Details

We detect broken ref objects and rebuild the local mirror in that case
of corruption

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-20 09:42:53 -07:00
Yaswanth Raparti
a61238b69f [CK][CK TILE] Fix smart-build to run install target for client examples (#5614)
How ninja install works:
- Builds library dependencies (device_operations, etc.)
- Installs them to CMAKE_INSTALL_PREFIX
- Skips building test executables (not install dependencies)

Affected stages (8):
- gfx942/gfx950/gfx908/gfx90a CK Client Examples
- gfx10-1/gfx10-3/gfx11/gfx12 CK Client Examples

## Motivation

Problem: When smart-build is enabled (runAllUnitTests=false), the build
step is skipped entirely. This causes client example stages to fail
because they depend on the CK library being installed to ../install.

Error seen:
  Target "client_gemm" links to:
    composable_kernel::device_other_operations
  but the target was not found.

## Technical Details
Root cause: Line 712 only checked runAllUnitTests, so when building with
config_targets="install", the install target was never built, leaving
the install directory empty.

Fix: Added condition to always build when config_targets contains
'install'. The install target automatically builds its dependencies (the
CK libraries) but skips building tests, which aligns with smart-build
philosophy.

## Test Plan

Should be tested on CI 

## Test Result

Should be tested on CI

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2026-03-19 22:00:29 +00:00
Yaswanth Raparti
1333922f04 [CK] [CK_TILE] Improve build and test time of CI with smart dependency parser (#5249)
## Motivation

Existing dependency parser needs full build of tests to determine which
tests are affected by code changes in a PR. This still takes 2-4 hours
for building the tests which slows down the CI as the number of tests
grow. To resolve this issue we implemented a smart dependency parser
which uses CMake Configure to parse dependencies and build only the
affected test cases. We have ensured that two approaches are available
1) CMake pre-build analysis for each PR to ensure fast build and test.
2) Ninja post-build analysis to enable full build for nightly tests.

## Technical Details

```bash
### 1. Configure the project with CMake
cmake -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ..

### 2. Analyze dependencies (no build required!)
python3 ../script/dependency-parser/main.py cmake-parse compile_commands.json build.ninja \
  --workspace-root .. --output cmake_dependency_mapping.json --parallel 8

### 3. Find tests affected by changes
python3 ../script/dependency-parser/main.py select cmake_dependency_mapping.json origin/develop \
  HEAD --test-prefix --output tests_to_run.json

### 4. Build only affected tests
ninja $(jq -r '.executables[]' tests_to_run.json | tr '\n' ' ')

### 5. Run affected tests
ctest -R "$(jq -r '.regex' tests_to_run.json)"
```

### Jenkins Integration
- Added `buildMode` to jenkinsfile to integrate both `selective` and
`full` build methods

### Known Limitations

### 1. Build-Time Generated Headers (HIGH RISK)

**Problem:** Files generated during the build process (e.g., via
`add_custom_command`) cannot be analyzed before building.

**Example:**
```cmake
add_custom_command(
  OUTPUT ${CMAKE_BINARY_DIR}/generated/config.hpp
  COMMAND generate_config.sh
  DEPENDS template.hpp.in
)
```

**Impact:** If a source file includes `generated/config.hpp`, the
dependency won't be detected until after building.

**Mitigation:**
- CK analysis shows **no generated headers** currently used
- If generated headers are added in the future, they must be built first
- Recommendation: Generate headers in CMake configure phase (not build
phase) when possible


## Test Plan
**1. Modified Files:**
```
include/ck_tile/ops/common.hpp
include/ck_tile/ops/gemm.hpp
include/ck_tile/ops/gemm/warp/warp_gemm.hpp
```
**2. Compare tests selected between `build.ninja` and `cmake-parse`
methods**


## Test Result
- 1. The test completed in 5-6 minutes finding about 8000+ executables
that should be built.
- 2. We selected a commit 5ccc1387ea which resulted in same 7 tests with
both legacy and new methods.
- 


PR | Legacy tests | Smart tests | Notes
-- | -- | -- | --
5261 | 453 | 455 | Only 2 tests (test_amdgcn_mma and
test_amdgcn_sparse_mma)
5168 | 0 | 0 | Changes in   dispatcher only. No CK tests invoked.
5249 | 0 | 0 | Changes to   dependency parser. No CK tests invoked
5260 | 0 | 0 | Changes in   dispatcher only. No CK tests invoked.
5174 | 1 | 1 | One test from FMHA   affected by this PR in both cases
5383 | 0 | 0 | Changes are only in benchmark files. Did not trigger any
tests
5445 | 1 | 1 | Changes are only to tests/ck_tile/gemm_streamk. Only
triggered one streamk test in both cases.
5454 | 3 | 3 | Both methods identified same test_grouped_conv_bwd tests
5427 | 234 | 234 | Core infrastructure header changes. Detected exactly
same tests
5388 | 85 | 85 | modifies warp-level GEMM operations (warp_gemm.hpp,
warp_gemm_dispatcher.hpp). Correctly identified all the streamK gemm
tests







## Submission Checklist

- [x ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2026-03-19 05:30:19 +00:00
Thrupti Raj Lakshmana Gowda
791ebfb517 [CK TILE ENGINE] Add grouped_gemm operator to Tile Engine (gfx942/gfx950) (#4996)
## Motivation

The grouped_gemm CK Tile kernel exists (e.g.,
`example/17_grouped_gemm/`) but has no Tile Engine wrapper. Grouped GEMM
handles multiple independent GEMM problems with varying M/N/K dimensions
in a single kernel launch. This PR adds the Tile Engine infrastructure
for automated kernel generation, benchmarking, and profiling of grouped
GEMM kernels.

Jira: AICK-809

## Technical Details

- Created Tile Engine wrapper under `tile_engine/ops/gemm/grouped_gemm/`
following the `gemm_universal` template
- Files added: `CMakeLists.txt`, `grouped_gemm_common.hpp`,
`grouped_gemm_benchmark.hpp`, `grouped_gemm_profiler.hpp`,
`grouped_gemm_benchmark.py`, `grouped_gemm_benchmark_single.cpp`,
`grouped_gemm_instance_builder.py`, `configs/`
- Supported datatypes: fp16, fp8, bf16, bf8
- Supported layouts: rcr, rrr, ccr, crr
- Target GPUs: gfx942, gfx950
- CK Tile kernel: `ck_tile::GroupedGemmKernel` from
`include/ck_tile/ops/gemm/kernel/grouped_gemm_kernel.hpp`
- Instance builder extends `GemmKernelBuilder` base class
- Registered in `tile_engine/ops/gemm/CMakeLists.txt`
- Updated Jenkinsfile to build and benchmark grouped_gemm targets in CI
- Benchmark infrastructure includes JSON output, CSV export, and
verification support

## Test Plan

- CMake configure succeeds for grouped_gemm targets
- Kernel instance builder generates valid kernel headers for all
(datatype, layout) combinations
- At least one kernel binary compiles and runs per datatype/layout
combination
- Correctness passes with `--verify 1` on gfx942/gfx950

## Test Result



## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-10 18:58:37 -05:00
Bartłomiej Kocot
c749ef9da3 [CK][CK Tile] Add grouped conv backward weight tile test and fix tr load in BASE_V1 pipeline (#5115)
## Motivation

Test grouped conv backward weight from ck tile and fix incorrect values.

## Technical Details

- Add test for CI
- Add daily tests
- Fix transpose load in BASE_V1 pipeline

## Test Plan

test_grouped_convnd_backward_weight_tile

## Test Result

in progress

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

AICK-783
2026-03-10 03:03:04 +00:00
andrew clark
7fcb0d84a8 CI Skip Testing Fix (#5063)
## Motivation

While testing the Skip CI functionality, it revealed a minor issue where
the CI skip check fails when a branch is built at the exact commit where
it diverged from develop. CI is still run by default if a failure is
detected.

When git log <merge-base>..HEAD returns no files (because HEAD equals
merge-base), the command grep -v '^$' exits with error code 1, causing
the skip check to fail.

## Technical Details

Added || true to the grep commands so empty output is handled gracefully
instead of causing a script failure.

## Test Plan

- Simulate the failures and ensure the grep failure is handled
gracefully.

## Test Result

- Simulated grep failures using an empty string. The script handles the
error correctly.
- Verified the CI skip functionality skips CI when non-relevant file
changes are made.
- Verified the CI skip functionality does not skip CI when relevant file
changes are made.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-04 22:01:25 -07:00
Thrupti Raj Lakshmana Gowda
eb126780a2 bf8 and bf16 support for Universal GEMM in Tile Engine (#4958)
## Motivation

Currently we have only fp8 and fp16 datatype support for universal GEMM
in Tile Engine with this PR support for bf8 and bf16 datatype will be
added during the CI phase

## Technical Details

Adding bf8 and bf16 support

## Test Plan

NA

## Test Result

NA

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-03 15:29:13 -08:00
andrew clark
e50594f27f [CK] Updating CI skip logic (#4943)
## Motivation

The CI skip logic has two issues that prevented it from working
correctly:

1. **Incorrect file patterns**: After migrating from standalone repo to
`rocm-libraries`, file paths now include the
`projects/composablekernel/` prefix (e.g.,
`projects/composablekernel/docs/README.md`). The skip patterns were
still checking for paths starting with `docs/`, which never matched.

2. **Incomplete build type support**: Jenkins multibranch pipelines
provide different environment variables for PR builds (`$CHANGE_TARGET`,
`$CHANGE_ID`) vs branch builds (`$BRANCH_NAME`). The previous logic only
compared `HEAD~1..HEAD` for branch builds, which missed changes from
multi-commit pushes and didn't properly handle feature branch builds.

When CI skipped or ran, there was no visibility into which files
triggered the decision, making it difficult to diagnose issues. You can
now see which files triggered the CI run.

## Technical Details

PR builds: Compares all commits against origin/$CHANGE_TARGET.
Feature branch builds: Uses git merge-base to find divergence point from
develop and checks all touched files since then.
Scheduled develop builds are unaffected. These builds are forced to run
from the pipeline parameters.

Example log output for PR Builds:
<img width="647" height="260" alt="image"
src="https://github.com/user-attachments/assets/c8673a81-acb2-4fb2-acbb-1c07b5ab3b69"
/>

Example log output for Branch Builds:
<img width="488" height="287" alt="image"
src="https://github.com/user-attachments/assets/fbb17ba7-eb2c-42a4-b820-b2a8b9e479c4"
/>

## Test Plan

Pre-PR validation (branch builds):

Push commits with only documentation changes → CI should skip. I will
have to verify this after this PR is merged!
Push commits with code changes → CI should run
Push commits that modify then revert code → CI should run (catching
reverts)
Verify debug output clearly shows skip/run decision

Post-PR validation (PR builds):

Create PR with only doc changes → CI should skip. I will have to verify
this after this PR is merged!
Create PR with mixed doc + code changes → CI should run and log which
files triggered it
Verify debug output clearly shows skip/run decision

## Test Result

All branch build checks succeeded.
All PR build checks succeeded.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-03 07:49:34 -08:00
Illia Silin
40abad15a4 [CK] Switch compiler branch from staging to develop and upgrade sccache. (#5036)
## Motivation

Upgrade to official sccache version 0.14, since it now supports hip.
Also, switching daily builds from amd-staging to develop compiler
branch, since it should be more stable.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-03 07:32:24 -08:00
Thrupti Raj Lakshmana Gowda
f7c2b42170 Tile Engine support for gfx950 (#4592)
## Motivation

This PR adds support for the gfx950 GPU architecture to the Tile Engine
in Composable Kernel library, focusing on GEMM operations with FP8 and
BF8 data types.

## Technical Details

Added gfx950-specific MFMA warp GEMM implementations with conditional
compilation.
Updated default GEMM configuration parameters for tile sizes and warp
configurations.
Added Jenkins CI pipeline stage for testing TILE_ENGINE_GEMM on gfx950
hardware.

## Test Plan

Tile engine itself is a benchmarking utility, so if it passes the CI it
will be tested automatically.

## Test Result

Tile engine itself is a benchmarking utility, so if it passes the CI it
will be tested automatically.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Thrupti Raj Lakshmana Gowda<ThruptiRaj.LakshmanaGowda@amd.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2026-02-26 10:14:40 -06:00
Eiden Yoshida
0a9935043a [CK] MICI: Disable failure pattern checking (#4373)
## Motivation

- ck mici jobs hanging at end, possibly at failure pattern checking


## Technical Details

- Disable failure pattern checking to see if hanging goes away

## Test Plan

- Observe behavior after merge
2026-02-09 07:23:47 -08:00
assistant-librarian[bot]
0b4203702c [CK] add inter/intrawave scheduling concept doc (#4300)
## Proposed changes

Adding information about inter/intrawave scheduling

---
🔁 Imported from
[ROCm/composable_kernel#3660](https://github.com/ROCm/composable_kernel/pull/3660)
🧑‍💻 Originally authored by @spolifroni-amd

---------

Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
Co-authored-by: assistant-librarian[bot] <assistant-librarian[bot]@users.noreply.github.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2026-02-06 16:10:23 -08:00
Illia Silin
b5d4754abf [CK] fix path for build filter (#4375)
## Motivation

Fix the filter that determines whether CI builds are necessary.

## Technical Details

A script checks the files list returned by git diff and checks whether
any code source was modified. If not, if only documentation was changed,
it will allow skipping the builds. We make sure we only look at the
changes in projects/composablekernel/ folder.
2026-02-06 13:17:02 -05:00
Illia Silin
2a054fc767 [CK] a bunch of CI fixes. (#4361)
## Motivation

Fixing some of the CK CI issues

## Technical Details

fixing paths to dockerfiles and scripts;
moving codegen tests to separate stage (collides with main build since
you must call cmake from same folder but different options);
fixing a couple of clang compilation issues with staging compiler;
2026-02-05 20:06:57 -05:00
Eiden Yoshida
f382d48125 [CK] MICI: Fix git diff in selective_test_filter.py (#4352)
## Motivation

- git diff needs access to reference repo

## Technical Details

- mount reference repo path into docker for selective_test_filter.py to
access

## Test Plan

- tested in MICI

## Test Result

- launch_tests.sh ran successfully
2026-02-05 17:56:12 -05:00
Jobbins
9ac0ae7cba [composablekernel] fix failure status (#4351)
## Motivation

Pipelines were failing on Math CI status check.

## Technical Details

For the success case, we just changed the config in Jenkins to use a
proper app token and no code changes were required. However, the failure
case would not have worked as coded, so we needed to move that outside
of the `rocmnode()` block.

## Test Plan

I removed all of the CI in one of the commits to quickly test, and then
added it back.  Got a successful "success" message and "failure" message
produced
2026-02-05 08:56:42 -07:00
Eiden Yoshida
ece63708f2 [CK] MICI: Correct path for build trace script (#4349)
## Motivation

- Corrects path to script due to superrepo migration
- Forces all tests to run by default

## Technical Details

- now in /projects/composablekernel

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2026-02-05 10:55:44 -05:00
Eiden Yoshida
e3f6354c67 [CK] MICI: Use reference repo for checkout operations (#4336)
## Motivation

- Maintain a reference repo on slave nodes that speeds up any
clone/checkout operations

## Technical Details

- clone a ref repo if it does not exist
- update ref repo if it does exist
- checkout after ref repo is updated
- eliminates double clone

## Test Result

- Initial checkouts succeeded
2026-02-04 21:43:22 -05:00
assistant-librarian[bot]
9c0d4114ae [CK] Add FP8 KV_BLOCKSCALE support for batch prefill (#4263)
Implement per-page K/V quantization for paged attention:
  - Add KV_BLOCKSCALE enum to BlockAttentionQuantScaleEnum
  - Use exp2 shift trick to eliminate explicit P scaling overhead
- Prefetch physical pages offset for KV cache, overlaps with
computations

## Proposed changes

Please describe the motivation behind the pull request, whether it
enables a new feature or fixes a bug. If there are associated pull
requests or issues, please link them to the pull request.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [ ] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered



---
🔁 Imported from
[ROCm/composable_kernel#3696](https://github.com/ROCm/composable_kernel/pull/3696)
🧑‍💻 Originally authored by @Jeff-Huang

---------

Co-authored-by: Jeff Huang <chiachi.huang@amd.com>
Co-authored-by: Illia Silin <Illia.Silin@amd.com>
2026-02-04 18:25:31 -05:00
Illia Silin
170d49eb2c CK CI migration. (#4310)
## Motivation

Enable the CK CI after migration from standalone repo.

## Technical Details

Modify the jenkinsfile in projects/composablekernel to update the CI
workflow.

## Test Plan

This is for CK internal testing only.

## Test Result

Set up new CK CI pipeline/dashboard.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Andrew Clark <andrew.clark@amd.com>
2026-02-04 12:34:38 -05:00
andrew clark
8ecb21e120 Adding Additional Failure Patterns for Alerts (#3663)
* Added two new failure patterns to detect. Including test function to verify if the patterns are detected

* Modifying pattern match to detect docker login failure. Removed passing tests.

* Removing passing tests. Modifying docker pattern to detect failure

* Removed passing tests

* Removing test logging function

[ROCm/composable_kernel commit: 421b714f13]
2026-02-03 10:23:07 -08:00
Bartłomiej Kocot
39ea2de1d7 Fix path to ck tile conv fwd instance generator (#3699)
* Fix path to ck tile conv fwd instance generator

* fixes

[ROCm/composable_kernel commit: f2b9b3a3a6]
2026-02-02 18:07:33 -08:00
Bartłomiej Kocot
cb58ffa4b8 Enable Grouped Conv Tile Fwd Tests daily (#3680)
[ROCm/composable_kernel commit: 1ae83137eb]
2026-01-31 15:55:25 -07:00
Illia Silin
ab24c0ffe9 remove builds on legacy OSs from CI (#3693)
[ROCm/composable_kernel commit: 63df1c0af2]
2026-01-30 09:15:09 -08:00
Andrew Clark
daa6cae5f5 Finished testing failure types. Removed testing code.
[ROCm/composable_kernel commit: 8654c0628f]
2026-01-26 15:09:49 -07:00
Andrew Clark
858387034e Removed working tests. Validating remaining tests.
[ROCm/composable_kernel commit: 402f21d0a6]
2026-01-26 15:09:49 -07:00
Andrew Clark
b555c06b8d Removed working tests. Validating remaining tests.
[ROCm/composable_kernel commit: 1397924c21]
2026-01-26 15:09:49 -07:00
Andrew Clark
8136cf5c72 Testing a pattern to support all text variations
[ROCm/composable_kernel commit: 6c596b9553]
2026-01-26 15:09:49 -07:00
Andrew Clark
b87853431e Removing working cases to test other failure examples
[ROCm/composable_kernel commit: 58e1d03244]
2026-01-26 15:09:49 -07:00
Andrew Clark
8c91ce81bf Adding forcing failure to test notifications
[ROCm/composable_kernel commit: 95768d1b22]
2026-01-26 15:09:49 -07:00
Andrew Clark
18e95f26aa Fixing Jenkinsfile too large error
[ROCm/composable_kernel commit: 786965b95e]
2026-01-26 15:09:49 -07:00
Andrew Clark
e2f587ad01 Updating failure patterns to be more reliable and adding tests to verify they are caught in the logs
[ROCm/composable_kernel commit: 42a731b791]
2026-01-26 15:09:49 -07:00
andrew clark
5a27de45e5 Sanitizing URL-encoded characters from the image file name (#3622)
[ROCm/composable_kernel commit: 0fbb3bb8c4]
2026-01-21 11:00:53 -07:00
Yi DING
0bb1c90674 Add CMakePresets.json (#3284)
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

[ROCm/composable_kernel commit: f41f37da96]
2026-01-21 08:04:24 -08:00
Bartłomiej Kocot
85c5741492 [CK_BUILDER] Add grouped conv fwd ck tile profiler (#3518)
* [BULDER] Add grouped conv fwd ck tile profiler

* [CK TILE] Fix grouped conv kernels splitk and double lds

* Updates

* Fixes

* Move to ckProfiler

* Fixes

* fix

* fix

* Change instances to empty list by default

* fix

* fix

* Update grouped_convolution_signatures.hpp

* Update grouped_convolution_forward_tile_algs.hpp

* [CK TILE] Add grouped convolution forward tests (#3556)

* [CK TILE] Add grouped convolution forward tests

* fix jenkins

* fixes

* comments fixes

* unit test

* unit test fix

* Move instances outside builder

* fix includes

* clang format fix

* readme fix

* fix includes

* fixes

[ROCm/composable_kernel commit: 0727e85e52]
2026-01-19 22:29:01 -07:00
Max Podkorytov
8bd33c4a35 Optimize clang-format check in Jenkins CI (#3597)
This change improves the clang-format CI check to be faster and not
depend on git being available in the build environment.

Changes:
- Use `find` instead of `git ls-files` (no git dependency)
- Check all C++ files: *.h, *.hpp, *.cpp, *.h.in, *.hpp.in, *.cpp.in, *.cl
- Exclude build/ and include/rapidjson directories
- Use parallel processing with 8 cores (-P 8) for ~8x speedup
- Show only errors with unified diff format (-u)
- Clear error messages: "ERROR: <file> needs formatting"
- Preserve original logic: run clang-format only when RUN_CPPCHECK=false,
  or run both clang-format and cppcheck when RUN_CPPCHECK=true

Performance:
- Sequential processing: ~93 seconds for 5,899 files
- Parallel with 8 cores: ~12 seconds for 5,899 files
- Per-file processing time: ~15ms

This reduces CI time while maintaining code formatting standards.

[ROCm/composable_kernel commit: 98abfa4ade]
2026-01-19 12:23:06 -08:00
John Shumway
0b3ee64c89 Disable CK Builder for SLES15 in Jenkins CI (#3581)
1. Added `-DCK_EXPERIMENTAL_BUILDER=OFF` to the `setup_args` to explicitly disable the experimental builder

2. Added a detailed comment explaining why this is necessary:

   - SLES15 is a legacy platform with limited C++20 ecosystem support
   - While the ROCm compiler supports C++20, the older system libraries and standard library implementation on SLES15 does not reliably support all C++20 features required by the experimental CK Builder

[ROCm/composable_kernel commit: 2d233c838a]
2026-01-16 10:36:23 -08:00
Illia Silin
3827441343 add aiter test_batch_prefill and simplify jenkins file a bit (#3570)
[ROCm/composable_kernel commit: 8705fdcb0c]
2026-01-14 14:07:47 -08:00
Thrupti Raj Lakshmana Gowda
e231cfb3dc [CK TILE ENGINE] CI fix for Basic Tile Engine (#3554)
* memory op changes

* memory op changes

* Fixing TILE_ENGINE_BASIC in Tile Engine

* Removing gfx90a from Tile Engine Run

* [CK TILE ENGINE] increasing ci configs for BASIC case

* Setting RUN_TILE_ENGINE_BASIC_TESTS to ON by default

---------

Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com>

[ROCm/composable_kernel commit: 51027474af]
2026-01-13 16:20:30 -08:00