Commit Graph

3863 Commits

Author SHA1 Message Date
dependabot[bot]
9ecc13eecd Bump pillow from 11.2.1 to 11.3.0 in /projects/composablekernel/docs/sphinx (#475)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 11.2.1 to
11.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/python-pillow/Pillow/releases">pillow's
releases</a>.</em></p>
<blockquote>
<h2>11.3.0</h2>
<p><a
href="https://pillow.readthedocs.io/en/stable/releasenotes/11.3.0.html">https://pillow.readthedocs.io/en/stable/releasenotes/11.3.0.html</a></p>
<h2>Deprecations</h2>
<ul>
<li>Deprecate fromarray mode argument <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9018">#9018</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Deprecate saving I mode images as PNG <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9023">#9023</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
<h2>Documentation</h2>
<ul>
<li>Added release notes for <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9041">#9041</a>
<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9042">#9042</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Add release notes for <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8912">#8912</a>
and <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8969">#8969</a>
<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9019">#9019</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>ImageFont does not handle multiline text <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9000">#9000</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated Ubuntu CI targets <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8988">#8988</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update MinGW package names <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8987">#8987</a>
[<a href="https://github.com/H4M5TER"><code>@​H4M5TER</code></a>]</li>
<li>Updated docstring <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8943">#8943</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Mention that tobytes() with the raw encoder uses Pack.c <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8878">#8878</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Refactor docs <code>Makefile</code> <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8933">#8933</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Add template for quarterly release issue <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8932">#8932</a>
[<a
href="https://github.com/aclark4life"><code>@​aclark4life</code></a>]</li>
<li>Add list of third party plugins <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8910">#8910</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update redirected URL <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8919">#8919</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Docs: use sentence case for headers <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8914">#8914</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Docs: remove unused Makefile targets <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8917">#8917</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Remove indentation from lists <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8915">#8915</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Python 3.13 is tested on Arch <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8894">#8894</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Move XV Thumbnails to read only section <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8893">#8893</a>
[<a
href="https://github.com/aclark4life"><code>@​aclark4life</code></a>]</li>
<li>Updated macOS tested Pillow versions <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8890">#8890</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
<h2>Dependencies</h2>
<ul>
<li>Add AVIF to wheels using only aomenc and dav1d AVIF codecs for
reduced size <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8858">#8858</a>
[<a href="https://github.com/fdintino"><code>@​fdintino</code></a>]</li>
<li>Use same AVIF URL when fetching dependency <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8871">#8871</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update dependency mypy to v1.16.1 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9026">#9026</a>
[@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li>
<li>Update libpng to 1.6.49 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9014">#9014</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update dependency cibuildwheel to v3 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9010">#9010</a>
[@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li>
<li>Updated libjpeg-turbo to 3.1.1 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9009">#9009</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update dependency mypy to v1.16.0 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8991">#8991</a>
[@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li>
<li>Updated libpng to 1.6.48 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8940">#8940</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated Ghostscript to 10.5.1 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8939">#8939</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated harfbuzz to 11.2.1 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8937">#8937</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated libavif to 1.3.0 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8949">#8949</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update dependency cibuildwheel to v2.23.3 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8931">#8931</a>
[@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li>
<li>Updated harfbuzz to 11.1.0 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8904">#8904</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
<h2>Testing</h2>
<ul>
<li>Add <code>match</code> parameter to <code>pytest.warns()</code> <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9038">#9038</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Increase pytest verbosity <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9040">#9040</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Improve SgiImagePlugin test coverage <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8896">#8896</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update ruff pre-commit ID <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8994">#8994</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="89f1f4626a"><code>89f1f46</code></a>
11.3.0 version bump</li>
<li><a
href="f2de251c76"><code>f2de251</code></a>
Updated check script paths (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9052">#9052</a>)</li>
<li><a
href="84855d11c8"><code>84855d1</code></a>
Raise FileNotFoundError when opening an empty path (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9048">#9048</a>)</li>
<li><a
href="204d11d4da"><code>204d11d</code></a>
Raise FileNotFoundError when opening an empty path</li>
<li><a
href="2b39f7581e"><code>2b39f75</code></a>
Handle IPTC TIFF tags with incorrect type (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/8925">#8925</a>)</li>
<li><a
href="e7a53ba19b"><code>e7a53ba</code></a>
Do not update palette for L mode GIF frame (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/8924">#8924</a>)</li>
<li><a
href="c22230b761"><code>c22230b</code></a>
Use save parameters as encoderinfo defaults (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9001">#9001</a>)</li>
<li><a
href="da10ed1cf3"><code>da10ed1</code></a>
Add support for iOS (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9030">#9030</a>)</li>
<li><a
href="be2b4e7864"><code>be2b4e7</code></a>
Fix qtables and quality scaling (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/8879">#8879</a>)</li>
<li><a
href="d4162f8505"><code>d4162f8</code></a>
Updated return type</li>
<li>Additional commits viewable in <a
href="https://github.com/python-pillow/Pillow/compare/11.2.1...11.3.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pillow&package-manager=pip&previous-version=11.2.1&new-version=11.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/ROCm/rocm-libraries/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2026-02-09 19:49:39 -07:00
Bartłomiej Kocot
0b9fa702ac [CK] CK Tile grouped convolution direct load (#4406)
## Motivation

CK Tile grouped convolution forward direct load support.

## Technical Details

Basic pipeline for direct load and new instances for forward for v1 and
v4 pipelines.

## Test Plan

test_grouped_convnd_fwd_tile

## Test Result

CI pending

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
AICK-130
2026-02-09 22:08:57 +01:00
assistant-librarian[bot]
bb8c746cbc Enable group mode (varlen) kernel generation for PyTorch integration (#4292)
## Proposed changes

This PR enables group mode (variable-length attention) kernel generation
for PyTorch's CK SDPA backend.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [X] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [X] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

The change is minimal (single line deletion) but enables a significant
feature: variable-length attention support for ROCm users via PyTorch's
torch.nn.attention.varlen API.



---
🔁 Imported from
[ROCm/composable_kernel#3553](https://github.com/ROCm/composable_kernel/pull/3553)
🧑‍💻 Originally authored by @chinmaydk99

Co-authored-by: Chinmay_Kuchinad <ChinmayDattanand.Kuchinad@amd.com>
2026-02-09 20:58:57 +00:00
Bartłomiej Kocot
72016e355e [CK] Fix grouped conv fwd transform for merged groups (#4399)
## Motivation

[CK] Fix grouped conv fwd transform for merged groups for 1d and 3d.

## Technical Details

After optimizations for 2d there is a lack of implementation for 1d and
3d

## Test Plan

test_grouped_convnd_fwd

## Test Result

pending CI

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-02-09 09:36:52 -06:00
Eiden Yoshida
02e6550609 [CK] MICI: Disable failure pattern checking (#4373)
## Motivation

- ck mici jobs hanging at end, possibly at failure pattern checking


## Technical Details

- Disable failure pattern checking to see if hanging goes away

## Test Plan

- Observe behavior after merge
2026-02-09 07:23:47 -08:00
assistant-librarian[bot]
6c58796a52 [CK_TILE] Add blockscale GEMM support for EightWarps on gfx950 (#4280)
## Proposed changes

gemm blockscale eightwarps support

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [x] I have run `clang-format` on all changed files
- [x] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered



---
🔁 Imported from
[ROCm/composable_kernel#3650](https://github.com/ROCm/composable_kernel/pull/3650)
🧑‍💻 Originally authored by @kensclin

---------

Co-authored-by: KenSCLin <lshyhchy@amd.com>
Co-authored-by: Ding, Yi <yi.ding@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2026-02-09 11:54:54 +08:00
jakpiase
71cc990ffd [CK_TILE] Add support and tests for V6 pipeline in conv fwd (#4357)
Added support for conv v6 pipeline in ck tile's convolution forward
kernel. CK Tile v6 pipeline is the equivalent to old ck's V5 pipeline
and should be faster than other pipelines for some cases. This PR also
adds tests inside profiler that's currently inside experimental
directory, so now we should be able to detect regressions easier.

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: subhajitdchow <sduttach@amd.com>
2026-02-08 20:57:14 +01:00
assistant-librarian[bot]
f38cd21b9e [CK] Add fwd conv group merging to v3 conv instances (#4273)
## Proposed changes

Added conv group merging to the (universal) V3 fwd conv pipeline. The
new instance improves fwd conv performance when the number of
input/output channel per group is low.

On MI300 (`gfx942`) we get

| CK prof command | Baseline (TFLOPS) | V3 group merging (TFLOPS) |
|:-----|:------:|------:|
| grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 4 4 3 3 200 200 1 1 1 1 1 1 1 1
| 3.86035 | 8.36796 |
| grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 8 8 3 3 200 200 2 2 1 1 1 1 1 1
| 10.1867 | 13.4677 |
| grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 8 8 3 3 100 100 1 2 1 1 1 1 1 1
| 11.7875 | 16.3657 |



---
🔁 Imported from
[ROCm/composable_kernel#3675](https://github.com/ROCm/composable_kernel/pull/3675)
🧑‍💻 Originally authored by @vpietila-amd

---------

Co-authored-by: Ville Pietilä <>
Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>
2026-02-08 12:34:59 +01:00
Emily Martins
5c31eeeddb [CK_TILE] Fix MMA concepts compiler error (#4381)
## Motivation

CK Tile is required to support certain older OSs; on these OSs, cpp 20
is not fully supported. For ROCm 7.2, compiler errors occur on one of
these older OSs. An example of this error is as follows:

```bash
/composable_kernel/include/ck_tile/core/arch/mma/amdgcn_mma.hpp:34:28: error: expected concept name with optional arguments
   34 |     { MmaOp::kAMBlock } -> std::convertible_to<unsigned int>;
      |           
```

The goal of this PR is to resolve these compiler errors.

## Technical Details

The existing guards around the mma concepts only check if the concepts
language feature is supported, as follows:

```cpp
#if defined(__cpp_concepts) && __cpp_concepts >= 201907L
// ...
template <typename CtrlFlags>
concept CtrlFlagsGfx9I = requires(CtrlFlags ctrlFlags) {
    // Flag members for Gfx9 MFMA instructions
    { CtrlFlags::Cbsz } -> std::convertible_to<int>;
    { CtrlFlags::Abid } -> std::convertible_to<int>;
    { CtrlFlags::Blgp } -> std::convertible_to<int>;
};

#endif // defined(__cpp_concepts) && __cpp_concepts >= 201907L
```
That said, in cases where functionality from the `<concepts>` header is
used (e.g., `std::convertible_to`), this guard fails to check whether
the `<concepts>` header is available.

This change adds an additional check to the concepts that make use of
functionality from the `<concepts>` header to ensure the header is
available.

## Test Plan

I tested the changes on the relevant docker for gfx90a, gfx950, and
gfx942 and the compiler issue is not present.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-02-06 16:26:57 -08:00
Aviral Goel
92fbf5a880 Increase tolerance for FP16 GEMM tests to handle non-deterministic ro… (#4335)
…unding

Three tests were failing intermittently with small errors (0.01-1.5%)
due to non-deterministic FP16 accumulation order from GPU thread
scheduling:
- test_ck_tile_batched_gemm
- test_ck_tile_grouped_gemm_preshuffle
- test_ck_tile_grouped_gemm_multi_d

These tests use kbatch=1 (no split-K), so errors are from
order-dependent rounding, not atomics. Increased tolerances from 1e-3 to
2e-3 (0.2%) to account for FP16 precision limits while still catching
real bugs.


- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2026-02-06 16:14:28 -08:00
assistant-librarian[bot]
a62115aad1 [CK] add inter/intrawave scheduling concept doc (#4300)
## Proposed changes

Adding information about inter/intrawave scheduling

---
🔁 Imported from
[ROCm/composable_kernel#3660](https://github.com/ROCm/composable_kernel/pull/3660)
🧑‍💻 Originally authored by @spolifroni-amd

---------

Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
Co-authored-by: assistant-librarian[bot] <assistant-librarian[bot]@users.noreply.github.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2026-02-06 16:10:23 -08:00
Enrico Degregori
442c3097ee [CK] Workaround blockscale wp test failure (#4372)
## Motivation

Workaround to fix blockscale wp test failure for pipeline v3

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-02-06 16:09:08 -08:00
Illia Silin
8cd3f55a72 [CK] fix path for build filter (#4375)
## Motivation

Fix the filter that determines whether CI builds are necessary.

## Technical Details

A script checks the files list returned by git diff and checks whether
any code source was modified. If not, if only documentation was changed,
it will allow skipping the builds. We make sure we only look at the
changes in projects/composablekernel/ folder.
2026-02-06 13:17:02 -05:00
Geo Min
8d236a8ff7 [ci] Adding mi350 required group ID (#4378)
After updating mi325 group-id, we are noticing errors for mi350.

Tested here for mi350:
https://github.com/ROCm/TheRock/actions/runs/21733399385/job/62692971370
Tested here for mi325:
https://github.com/ROCm/TheRock/actions/runs/21759203211/job/62778060417

Adding both work properly
2026-02-06 09:59:29 -08:00
Illia Silin
4dc5f52f57 [CK] a bunch of CI fixes. (#4361)
## Motivation

Fixing some of the CK CI issues

## Technical Details

fixing paths to dockerfiles and scripts;
moving codegen tests to separate stage (collides with main build since
you must call cmake from same folder but different options);
fixing a couple of clang compilation issues with staging compiler;
2026-02-05 20:06:57 -05:00
Eiden Yoshida
41fd407963 [CK] MICI: Fix git diff in selective_test_filter.py (#4352)
## Motivation

- git diff needs access to reference repo

## Technical Details

- mount reference repo path into docker for selective_test_filter.py to
access

## Test Plan

- tested in MICI

## Test Result

- launch_tests.sh ran successfully
2026-02-05 17:56:12 -05:00
Geo Min
01302d22b5 [ci] Updating variable group-id for OSSCI (#4360)
OSSCI migrated mi325s, so need a new groupID

Sanity works here:
https://github.com/ROCm/TheRock/actions/runs/21723540679/job/62659665907
normal run works here:
https://github.com/ROCm/TheRock/actions/runs/21723540679/job/62659791422

I've dabbled with organization variables, however, this does not work
for forks so for now, we will do the manual update
2026-02-05 11:01:53 -08:00
Jobbins
ec787e6fa2 [composablekernel] fix failure status (#4351)
## Motivation

Pipelines were failing on Math CI status check.

## Technical Details

For the success case, we just changed the config in Jenkins to use a
proper app token and no code changes were required. However, the failure
case would not have worked as coded, so we needed to move that outside
of the `rocmnode()` block.

## Test Plan

I removed all of the CI in one of the commits to quickly test, and then
added it back.  Got a successful "success" message and "failure" message
produced
2026-02-05 08:56:42 -07:00
Eiden Yoshida
9e00e291dc [CK] MICI: Correct path for build trace script (#4349)
## Motivation

- Corrects path to script due to superrepo migration
- Forces all tests to run by default

## Technical Details

- now in /projects/composablekernel

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2026-02-05 10:55:44 -05:00
Eiden Yoshida
606d2aaf31 [CK] MICI: Use reference repo for checkout operations (#4336)
## Motivation

- Maintain a reference repo on slave nodes that speeds up any
clone/checkout operations

## Technical Details

- clone a ref repo if it does not exist
- update ref repo if it does exist
- checkout after ref repo is updated
- eliminates double clone

## Test Result

- Initial checkouts succeeded
2026-02-04 21:43:22 -05:00
assistant-librarian[bot]
4231c8d673 [CK] Add FP8 KV_BLOCKSCALE support for batch prefill (#4263)
Implement per-page K/V quantization for paged attention:
  - Add KV_BLOCKSCALE enum to BlockAttentionQuantScaleEnum
  - Use exp2 shift trick to eliminate explicit P scaling overhead
- Prefetch physical pages offset for KV cache, overlaps with
computations

## Proposed changes

Please describe the motivation behind the pull request, whether it
enables a new feature or fixes a bug. If there are associated pull
requests or issues, please link them to the pull request.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [ ] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered



---
🔁 Imported from
[ROCm/composable_kernel#3696](https://github.com/ROCm/composable_kernel/pull/3696)
🧑‍💻 Originally authored by @Jeff-Huang

---------

Co-authored-by: Jeff Huang <chiachi.huang@amd.com>
Co-authored-by: Illia Silin <Illia.Silin@amd.com>
2026-02-04 18:25:31 -05:00
Illia Silin
2df84787b6 CK CI migration. (#4310)
## Motivation

Enable the CK CI after migration from standalone repo.

## Technical Details

Modify the jenkinsfile in projects/composablekernel to update the CI
workflow.

## Test Plan

This is for CK internal testing only.

## Test Result

Set up new CK CI pipeline/dashboard.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Andrew Clark <andrew.clark@amd.com>
2026-02-04 12:34:38 -05:00
Ameya Keshava Mallya
ab1efa0334 Merge remote-tracking branch 'origin/develop' into preserved/composablekernel 2026-02-03 22:50:14 +00:00
assistant-librarian[bot]
db2688820e Merge commit '421b714f139fda3361eb4d83a3a87fd8cc1cf169' into develop 2026-02-03 18:31:31 +00:00
andrew clark
dc0dc337a6 Adding Additional Failure Patterns for Alerts (#3663)
* Added two new failure patterns to detect. Including test function to verify if the patterns are detected

* Modifying pattern match to detect docker login failure. Removed passing tests.

* Removing passing tests. Modifying docker pattern to detect failure

* Removed passing tests

* Removing test logging function

[ROCm/composable_kernel commit: 421b714f13]
2026-02-03 10:23:07 -08:00
Illia Silin
ea1f04464b Revert "Implement device grouped gemm fixed nk multi abd for rdna4 (#3619)" (#3705)
This reverts commit 1a8bd3d34b.

[ROCm/composable_kernel commit: 569640dc70]
2026-02-03 09:52:14 -08:00
assistant-librarian[bot]
fc1ff7a1f8 Merge commit '8cbd09c84a3010b4b3dbe2604875772363e2396b' into develop 2026-02-03 16:29:00 +00:00
Emily Martins
1bc181c33f [CK_TILE] Stream-K Tile Engine Test Config File Generation (#3662)
* Stream-K smoke test config file generation

This change converts the stream-k smoke tests to use tile engine. Since
the m, n, and k values dependent on the CU count of a device, the
configs are generated during the Configuration Phase.

* Compute GEMM reference on GPU

* Remove redundant Stream-K tests

Removing redundant tests that are now run via tile engine.

* Fix relative and absolute tolerance calculation

This change updates the Stream-K tile engine interface to ensure that
num_wgs_per_tile is propaged and passed into the compare_results
function to calculate the rel and abs tolerance. Before, split-k was
used, which is incorrect for Stream-K since the split-k value is
always 1.

* Cleanup imports, types, and other misc items

This commit makes the following changes:
- Uses Typing module for nested type hints
- Uses quotes around cu_count_arg argument in generate_configs.cmake in
  if statements
- Adds explicit include for tuple in test_gemm_streamk_simple.cpp
- Adds a type for the tiles argument in argparser to check argument
  validity

* Use CU count as return value for better parsing

* Add reduction tests for bf16, fp8, and bf8

[ROCm/composable_kernel commit: 8cbd09c84a]
2026-02-03 09:12:15 -07:00
assistant-librarian[bot]
ef6ce49698 Merge commit '3f04d27b687365332d2f1654f169444cab192927' into develop 2026-02-03 11:22:52 +00:00
Max Podkorytov
dcb0e63334 Remove concrete performance numbers from BUILD_TIME_OPTIMIZATION.md (#3702)
Replace specific benchmark numbers with qualitative descriptions since
measurements vary across environments and may become outdated.

Co-authored-by: Claude <noreply@anthropic.com>

[ROCm/composable_kernel commit: 3f04d27b68]
2026-02-03 03:54:18 -07:00
assistant-librarian[bot]
f27120c60e Merge commit '8b56ffb6aea4dd5e3c531912ee6b2258398606ee' into develop 2026-02-03 03:12:05 +00:00
Illia Silin
8d79fb88eb Fix one more lifetimebound error. (#3703)
* fix staging compiler errors

* fix clang format

[ROCm/composable_kernel commit: 8b56ffb6ae]
2026-02-02 18:25:56 -08:00
Bartłomiej Kocot
117abb6af4 Fix path to ck tile conv fwd instance generator (#3699)
* Fix path to ck tile conv fwd instance generator

* fixes

[ROCm/composable_kernel commit: f2b9b3a3a6]
2026-02-02 18:07:33 -08:00
assistant-librarian[bot]
9c38bf0527 Merge commit '3e777217551c82a47eb9540791fb5542f2704e63' into develop 2026-02-02 23:16:03 +00:00
Aviral Goel
4ecc7da10e feat: add split_k support for block scale gemm bquant mode. (#3653)
* WIP: add splitk to bquant

* feat: add support for bf8i4 and fp8i4 by calculating correct stride for packed data types

* chore: remove temporary test script

* fix: incorrect tile window length for splitted bq tensor window

* chore: improve comments

* test: add unit tests to cover bquant splitk functionality

* fix: conflict resolution by renaming variables

[ROCm/composable_kernel commit: 3e77721755]
2026-02-02 14:41:53 -08:00
Zoltán Lakatos
1a8bd3d34b Implement device grouped gemm fixed nk multi abd for rdna4 (#3619)
* device struct implementation

* added xdl grouped multi abd fixed nk testing

* wmma implementation fixed

* avoid unnecessary device mem allocation and code cleanups

* cleanup instances definitions

* wmma examples added

* code cleanups

* fix clang format

* typo and compilation fixes related to reference gemm

* fix compilation error due to std::remove_cvref_t

* added missing hip_check_error includes

* correction to example instances

* review commentes addressed

* removed split-k from testing

* code formatting

---------

Co-authored-by: Zoltán Lakatos <zoltan.lakatos@streamhpc.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 301eb5cf08]
2026-02-02 13:58:11 -08:00
assistant-librarian[bot]
b0b7f95d6e Merge commit '069500464de6a55b80e8341c79239b13ac8ef379' into develop 2026-02-02 18:23:47 +00:00
Jan Patrick Lehr
4dece9c549 [Compiler] Addressing new compiler warnings (#3640)
* [Compiler] Addressing new compiler warnings

Clang enables new lifetime warnings in production and we see build
errors due to this with the staging compiler.

The attributes added in this PR are suggested by the compiler. However,
I'm not very familiar with the code base, so the changes may be
incorrect.

* Update some more instances

* Adds file-level ignores via clang diagnostic pragma

The number of instances was large, so I decided to use file-level scope
to disable the warning via pragma clang diagnostic ignored.

It also showed this warning coming from the gtest dependency. For that,
I did add the respective command line flag to the CMake variables. I
don't know if this is acceptable or not.

* This adds the remaining instances

For a build on gfx90a.

* fix clang format

* Adding couple more instances from gfx1200 build

* Fixed another few instances

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 069500464d]
2026-02-02 09:39:48 -08:00
assistant-librarian[bot]
6b3e501d83 Merge commit 'e6bcd192d432561642d45ea5b1c759d6f80ace2a' into develop 2026-02-02 08:25:21 +00:00
ZheWang
418ee44844 Mx fp6 flatmm (#3601)
* add fp6 data-type and support sync/async dwordx3 load/store

* clang-format

* pre-commit

* 1st commit

* default mnk pass ut

* fix a distrubution

* fix

* fix bdram distr

* update

* pass ut

* improve perf

* update

* clean code

* resolve copilot comment

* reslove comment

* clang-format

---------

Co-authored-by: ZheWang <zhewan@amd.com>

[ROCm/composable_kernel commit: e6bcd192d4]
2026-02-02 16:04:40 +08:00
assistant-librarian[bot]
2d624e5a9f Merge commit '1ae83137eb444bba1ba8b064eb77c2e486d90d7d' into develop 2026-01-31 23:13:17 +00:00
Bartłomiej Kocot
c8d112deb5 Enable Grouped Conv Tile Fwd Tests daily (#3680)
[ROCm/composable_kernel commit: 1ae83137eb]
2026-01-31 15:55:25 -07:00
assistant-librarian[bot]
19d77f522e Merge commit '8c1788757a88ee03bc8dbeb69704832c99fa719c' into develop 2026-01-30 20:16:06 +00:00
Po Yen Chen
4947f0306c [CK_TILE] Fix incompatible vector type arguments for the intrinsic calls (#3672)
* Change call to the intrinsics

* fix clang format

* Undo changes under include/ck/utility

* Use named variable as vector size

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>

[ROCm/composable_kernel commit: 8c1788757a]
2026-01-30 12:02:49 -08:00
assistant-librarian[bot]
8bcc4bcacf Merge commit '70d71b1514cc650ef7808d8757097f2d8617d313' into develop 2026-01-30 18:22:07 +00:00
ApoorvaKalyani
629573e3e3 Test fix for gemm_b_scale_xdl_v3. (#3674)
[ROCm/composable_kernel commit: 70d71b1514]
2026-01-30 10:34:54 -07:00
assistant-librarian[bot]
1559a473a8 Merge commit '63df1c0af2b559a6129afb5392fc560d99980926' into develop 2026-01-30 17:22:22 +00:00
Illia Silin
7fbe9af19d remove builds on legacy OSs from CI (#3693)
[ROCm/composable_kernel commit: 63df1c0af2]
2026-01-30 09:15:09 -08:00
jiangyon.ren
f6d2ca82b7 [CK_TILE][FMHA] Add sparse attention VSA (#3341)
* add sparse attention VSA

* fix the pre-commit

* Add jenga test and pre-commit

* add bf16 for vsa

* add jenga support bf16

* remove lse arg

* split kernel code to block & kernel

* fix the pre-commit

* fix the pre-commit

* fix the copyrights

* fix the copyright

* fix the copyright & rename block to pipeline

* fix the copyright and pipeline

* remove lse & dropout & add fmt

* fix the jenga&VSA code review

* remove the useless code & resolved the comments

* remove useless code

* remove useless code

* Clean up code

* Remove more unused code

* Re-format .hpp

* Refactor codegen scripts

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: asleepzzz <hanwen.chang@amd.com>

[ROCm/composable_kernel commit: 4d2f8c111e]
2026-01-31 00:59:47 +08:00
assistant-librarian[bot]
6d1282b943 Merge commit '2377a628373f2c4dd8b92ae9f853b1fb14c55953' into develop 2026-01-30 16:20:17 +00:00