Commit Graph

3037 Commits

Author SHA1 Message Date
Johannes Graner
40cec769ce [rocm-libraries] ROCm/rocm-libraries#4266 (commit 1d8094d)
[CK Conv] Add bwd weight instance for large-k shape
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Proposed changes

This instance improves the shape used in `./bin/ckProfiler
grouped_conv_bwd_weight 1 2 0 2 0 1 2 1 32 2376 256 3 3 100 100 1 1 1 1
1 1 1 1 all` from 10.3 ms to 6.6 ms.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [ ] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered
2026-02-10 16:58:04 +00:00
Erwin Terpstra
b41bfece83 [rocm-libraries] ROCm/rocm-libraries#4268 (commit d2fca53)
[CK_TILE]: PreshuffleB + PreshuffleBQuant for ABQuant
 pipeline (#4268)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Proposed changes

Implement BQuantPreshuffle option for the ABQuant PreshuffleB pipeline.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [X] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [X] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [X] I have added inline documentation which enables the maintainers
with understanding the motivation
- [X] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [X] I have run `clang-format` on all changed files
- [X] Any dependent changes have been merged
2026-02-10 13:59:03 +00:00
Yi DING
d5acfd8d52 [rocm-libraries] ROCm/rocm-libraries#4451 (commit 091bf0f)
[CK_TILE] Blockscale Gemm Fix Multi-Arch Compilation

## Motivation
This PR updates CK_TILE blockscale GEMM-quant kernels and launch helpers
to compile across multiple GPU architectures by introducing compile-time
availability gating and a new attribute tag mechanism for kernel
symbol/attribute specialization.

## Technical Details
- Add an architecture-guarded `kIsAvailable` flag to the gfx950 pipeline
and propagate availability handling into `QuantGemmKernel`.
- Extend `make_kernel`/`kentry` to accept an `Attr` tag enabling
per-kernel compile-time attributes (e.g., `no-packed-fp32-ops`) and
unique symbols.
- Update the blockscale GEMM quant example to pass kernel attributes and
adjust gfx950 gating.

## Test Plan
- CI
- Local test: `cmake .. --preset dev -DGPU_TARGETS='gfx942;gfx950'
-GNinja && ninja tile_example_gemm_quant`
- Local test with ROCm/aiter#1954
## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-02-10 12:42:19 +00:00
dependabot[bot]
6a6cd05dbb [rocm-libraries] ROCm/rocm-libraries#3090 (commit 728d3a3)
Bump fonttools from 4.57.0 to 4.61.0 in
 /projects/composablekernel/docs/sphinx (#3090)

Bumps [fonttools](https://github.com/fonttools/fonttools) from 4.57.0 to
4.61.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/fonttools/fonttools/releases">fonttools's
releases</a>.</em></p>
<blockquote>
<h2>4.61.0</h2>
<ul>
<li>[varLib.main]: <strong>SECURITY</strong> Only use
basename(vf.filename) to prevent path traversal attacks when running
<code>fonttools varLib</code> command-line script, or code which invokes
<code>fonttools.varLib.main()</code>. Fixes CVE-2025-66034, see: <a
href="https://github.com/fonttools/fonttools/security/advisories/GHSA-768j-98cg-p3fv">https://github.com/fonttools/fonttools/security/advisories/GHSA-768j-98cg-p3fv</a>.</li>
<li>[feaLib] Sort BaseLangSysRecords by tag (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3986">#3986</a>).</li>
<li>Drop support for EOL Python 3.9 (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3982">#3982</a>).</li>
<li>[instancer] Support --remove-overlaps for fonts with CFF2 table (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3975">#3975</a>).</li>
<li>[CFF2ToCFF] Add --remove-overlaps option (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3976">#3976</a>).</li>
<li>[feaLib] Raise an error for rsub with NULL target (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3979">#3979</a>).</li>
<li>[bezierTools] Fix logic bug in curveCurveIntersections (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3963">#3963</a>).</li>
<li>[feaLib] Error when condition sets have the same name (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3958">#3958</a>).</li>
<li>[cu2qu.ufo] skip processing empty glyphs to support sparse kerning
masters (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3956">#3956</a>).</li>
<li>[unicodedata] Update to Unicode 17. Require <code>unicodedata2 &gt;=
17.0.0</code> when installed with 'unicode' extra.</li>
</ul>
<h2>4.60.1</h2>
<ul>
<li>[ufoLib] Reverted accidental method name change in
<code>UFOReader.getKerningGroupConversionRenameMaps</code>
that broke compatibility with downstream projects like defcon (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3948">#3948</a>,
<a
href="https://redirect.github.com/fonttools/fonttools/issues/3947">#3947</a>,
<a
href="https://redirect.github.com/robotools/defcon/issues/478">robotools/defcon#478</a>).</li>
<li>[ufoLib] Added test coverage for
<code>getKerningGroupConversionRenameMaps</code> method (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3950">#3950</a>).</li>
<li>[subset] Don't try to subset BASE table; pass it through by default
instead (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3949">#3949</a>).</li>
<li>[subset] Remove empty BaseRecord entries in MarkBasePos lookups (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3897">#3897</a>,
<a
href="https://redirect.github.com/fonttools/fonttools/issues/3892">#3892</a>).</li>
<li>[subset] Add pruning for MarkLigPos and MarkMarkPos lookups (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3946">#3946</a>).</li>
<li>[subset] Remove duplicate features when subsetting (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3945">#3945</a>).</li>
<li>[Docs] Added documentation for the visitor module (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3944">#3944</a>).</li>
</ul>
<h2>4.60.0</h2>
<ul>
<li>
<p>[pointPen] Allow <code>reverseFlipped</code> parameter of
<code>DecomposingPointPen</code> to take a <code>ReverseFlipped</code>
enum value to control whether/how to reverse contour direction of
flipped components, in addition to the existing True/False. This allows
to set <code>ReverseFlipped.ON_CURVE_FIRST</code> to ensure that the
decomposed outline starts with an on-curve point before being reversed,
for better consistency with other segment-oriented contour
transformations. The change is backward compatible, and the default
behavior hasn't changed (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3934">#3934</a>).</p>
</li>
<li>
<p>[filterPen] Added <code>ContourFilterPointPen</code>, base pen for
buffered contour operations, and <code>OnCurveStartPointPen</code>
filter to ensure contours start with an on-curve point (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3934">#3934</a>).</p>
</li>
<li>
<p>[cu2qu] Fixed difference in cython vs pure-python complex division by
real number (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3930">#3930</a>).</p>
</li>
<li>
<p>[varLib.avar] Refactored and added some new sub-modules and scripts
(<a
href="https://redirect.github.com/fonttools/fonttools/issues/3926">#3926</a>).</p>
<ul>
<li><code>varLib.avar.build</code> module to build avar (and a missing
fvar) binaries into a possibly empty TTFont,</li>
<li><code>varLib.avar.unbuild</code> module to print a .designspace
snippet that would generate the same avar binary,</li>
<li><code>varLib.avar.map</code> module to take TTFont and do the
mapping, in user/normalized space,</li>
<li><code>varLib.avar.plan</code> module moved from
<code>varLib.avarPlanner</code>.</li>
</ul>
<p>The bare <code>fonttools varLib.avar</code> script is deprecated, in
favour of <code>fonttools varLib.avar.build</code> (or
<code>unbuild</code>).</p>
</li>
<li>
<p>[interpolatable] Clarify <code>linear_sum_assignment</code> backend
options and minimal dependency usage (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3927">#3927</a>).</p>
</li>
<li>
<p>[post] Speed up <code>build_psNameMapping</code> (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3923">#3923</a>).</p>
</li>
<li>
<p>[ufoLib] Added typing annotations to fontTools.ufoLib (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3875">#3875</a>).</p>
</li>
</ul>
<h2>4.59.2</h2>
<ul>
<li>[varLib] Clear <code>USE_MY_METRICS</code> component flags when
inconsistent across masters (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3912">#3912</a>).</li>
<li>[varLib.instancer] Avoid negative advance width/height values when
instatiating HVAR/VVAR, (unlikely in well-behaved fonts) (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3918">#3918</a>).</li>
<li>[subset] Fix shaping behaviour when pruning empty mark sets (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3915">#3915</a>,
<a
href="https://redirect.github.com/harfbuzz/harfbuzz/issues/5499">harfbuzz/harfbuzz#5499</a>).</li>
<li>[cu2qu] Fixed <code>dot()</code> product of perpendicular vectors
not always returning exactly 0.0 in all Python implementations (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3911">#3911</a>)</li>
<li>[varLib.instancer] Implemented fully-instantiating
<code>avar2</code> fonts (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3909">#3909</a>).</li>
<li>[feaLib] Allow float values in <code>VariableScalar</code>'s axis
locations (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3906">#3906</a>,
<a
href="https://redirect.github.com/fonttools/fonttools/issues/3907">#3907</a>).</li>
<li>[cu2qu] Handle special case in <code>calc_intersect</code> for
degenerate cubic curves where 3 to 4 control points are equal (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3904">#3904</a>).</li>
</ul>
<h2>4.59.1</h2>
<ul>
<li>[featureVars] Update OS/2.usMaxContext if possible after
addFeatureVariationsRaw (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3894">#3894</a>).</li>
<li>[vhmtx] raise TTLibError('not enough data...') when hmtx/vmtx are
truncated (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3843">#3843</a>,
<a
href="https://redirect.github.com/fonttools/fonttools/issues/3901">#3901</a>).</li>
<li>[feaLib] Combine duplicate features that have the same set of
lookups regardless of the order in which those lookups are added to the
feature (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3895">#3895</a>).</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/fonttools/fonttools/blob/main/NEWS.rst">fonttools's
changelog</a>.</em></p>
<blockquote>
<h2>4.61.0 (released 2025-11-28)</h2>
<ul>
<li>[varLib.main]: <strong>SECURITY</strong> Only use
basename(vf.filename) to prevent path traversal attacks when
running <code>fonttools varLib</code> command, or code which invokes
<code>fonttools.varLib.main()</code>.
Fixes CVE-2025-66034, see:
<a
href="https://github.com/fonttools/fonttools/security/advisories/GHSA-768j-98cg-p3fv">https://github.com/fonttools/fonttools/security/advisories/GHSA-768j-98cg-p3fv</a>.</li>
<li>[feaLib] Sort BaseLangSysRecords by tag (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3986">#3986</a>).</li>
<li>Drop support for EOL Python 3.9 (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3982">#3982</a>).</li>
<li>[instancer] Support --remove-overlaps for fonts with CFF2 table (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3975">#3975</a>).</li>
<li>[CFF2ToCFF] Add --remove-overlaps option (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3976">#3976</a>).</li>
<li>[feaLib] Raise an error for rsub with NULL target (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3979">#3979</a>).</li>
<li>[bezierTools] Fix logic bug in curveCurveIntersections (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3963">#3963</a>).</li>
<li>[feaLib] Error when condition sets have the same name (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3958">#3958</a>).</li>
<li>[cu2qu.ufo] skip processing empty glyphs to support sparse kerning
masters (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3956">#3956</a>).</li>
<li>[unicodedata] Update to Unicode 17. Require <code>unicodedata2 &gt;=
17.0.0</code> when installed with 'unicode' extra.</li>
</ul>
<h2>4.60.1 (released 2025-09-29)</h2>
<ul>
<li>[ufoLib] Reverted accidental method name change in
<code>UFOReader.getKerningGroupConversionRenameMaps</code>
that broke compatibility with downstream projects like defcon (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3948">#3948</a>,
<a
href="https://redirect.github.com/fonttools/fonttools/issues/3947">#3947</a>,
<a
href="https://redirect.github.com/robotools/defcon/issues/478">robotools/defcon#478</a>).</li>
<li>[ufoLib] Added test coverage for
<code>getKerningGroupConversionRenameMaps</code> method (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3950">#3950</a>).</li>
<li>[subset] Don't try to subset BASE table; pass it through by default
instead (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3949">#3949</a>).</li>
<li>[subset] Remove empty BaseRecord entries in MarkBasePos lookups (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3897">#3897</a>,
<a
href="https://redirect.github.com/fonttools/fonttools/issues/3892">#3892</a>).</li>
<li>[subset] Add pruning for MarkLigPos and MarkMarkPos lookups (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3946">#3946</a>).</li>
<li>[subset] Remove duplicate features when subsetting (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3945">#3945</a>).</li>
<li>[Docs] Added documentation for the visitor module (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3944">#3944</a>).</li>
</ul>
<h2>4.60.0 (released 2025-09-17)</h2>
<ul>
<li>[pointPen] Allow <code>reverseFlipped</code> parameter of
<code>DecomposingPointPen</code> to take a <code>ReverseFlipped</code>
enum value to control whether/how to reverse contour direction of
flipped components, in addition to
the existing True/False. This allows to set
<code>ReverseFlipped.ON_CURVE_FIRST</code> to ensure that
the decomposed outline starts with an on-curve point before being
reversed, for better consistency
with other segment-oriented contour transformations. The change is
backward compatible, and the
default behavior hasn't changed (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3934">#3934</a>).</li>
<li>[filterPen] Added <code>ContourFilterPointPen</code>, base pen for
buffered contour operations, and
<code>OnCurveStartPointPen</code> filter to ensure contours start with
an on-curve point (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3934">#3934</a>).</li>
<li>[cu2qu] Fixed difference in cython vs pure-python complex division
by real number (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3930">#3930</a>).</li>
<li>[varLib.avar] Refactored and added some new sub-modules and scripts
(<a
href="https://redirect.github.com/fonttools/fonttools/issues/3926">#3926</a>).
<ul>
<li><code>varLib.avar.build</code> module to build avar (and a missing
fvar) binaries into a possibly empty TTFont,</li>
<li><code>varLib.avar.unbuild</code> module to print a .designspace
snippet that would generate the same avar binary,</li>
<li><code>varLib.avar.map</code> module to take TTFont and do the
mapping, in user/normalized space,</li>
<li><code>varLib.avar.plan</code> module moved from
<code>varLib.avarPlanner</code>.
The bare <code>fonttools varLib.avar</code> script is deprecated, in
favour of <code>fonttools varLib.avar.build</code> (or
<code>unbuild</code>).</li>
</ul>
</li>
<li>[interpolatable] Clarify <code>linear_sum_assignment</code> backend
options and minimal dependency
usage (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3927">#3927</a>).</li>
<li>[post] Speed up <code>build_psNameMapping</code> (<a
href="https://redirect.github.com/fonttools/fonttools/issues/3923">#3923</a>).</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e691e3bef9"><code>e691e3b</code></a>
Release 4.61.0</li>
<li><a
href="c2d540f4ad"><code>c2d540f</code></a>
Update NEWS.rst</li>
<li><a
href="3859753a05"><code>3859753</code></a>
Update NEWS.rst</li>
<li><a
href="26eb070a55"><code>26eb070</code></a>
black</li>
<li><a
href="5ff73af326"><code>5ff73af</code></a>
Merge commit from fork</li>
<li><a
href="a696d5ba93"><code>a696d5b</code></a>
varLib: only use the basename(vf.filename)</li>
<li><a
href="b00bc459ef"><code>b00bc45</code></a>
varLib_test: test path traversal in variable-font filename</li>
<li><a
href="066512e4f3"><code>066512e</code></a>
Merge pull request <a
href="https://redirect.github.com/fonttools/fonttools/issues/3986">#3986</a>
from cmyr/base-minmax-sorting</li>
<li><a
href="ce78973e97"><code>ce78973</code></a>
[feaLib] Sort BasLangSysRecords by tag</li>
<li><a
href="5bb37dc201"><code>5bb37dc</code></a>
Merge pull request <a
href="https://redirect.github.com/fonttools/fonttools/issues/3983">#3983</a>
from fonttools/dependabot/pip/brotli-1.2.0</li>
<li>Additional commits viewable in <a
href="https://github.com/fonttools/fonttools/compare/4.57.0...4.61.0">compare
view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=fonttools&package-manager=pip&previous-version=4.57.0&new-version=4.61.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

You can trigger a rebase of this PR by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
2026-02-10 07:08:05 +00:00
Aviral Goel
06ad66b3e4 [rocm-libraries] ROCm/rocm-libraries#4265 (commit 0f9b3b0)
[CK Tools] Auto-enable unbuffered output for Python commands
 (#4265)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

ck-docker exec and ck-exec now automatically detect Python commands and
set PYTHONUNBUFFERED=1 to enable live output streaming. This eliminates
the need to manually set the environment variable when running Python
scripts that print progress updates.

The detection matches python, python3, or any .py file argument.

This helps in watching live terminal output when a python script is
running inside the container.
2026-02-10 03:00:40 +00:00
dependabot[bot]
b688665d79 [rocm-libraries] ROCm/rocm-libraries#475 (commit cabe79b)
Bump pillow from 11.2.1 to 11.3.0 in
 /projects/composablekernel/docs/sphinx (#475)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bumps [pillow](https://github.com/python-pillow/Pillow) from 11.2.1 to
11.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/python-pillow/Pillow/releases">pillow's
releases</a>.</em></p>
<blockquote>
<h2>11.3.0</h2>
<p><a
href="https://pillow.readthedocs.io/en/stable/releasenotes/11.3.0.html">https://pillow.readthedocs.io/en/stable/releasenotes/11.3.0.html</a></p>
<h2>Deprecations</h2>
<ul>
<li>Deprecate fromarray mode argument <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9018">#9018</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Deprecate saving I mode images as PNG <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9023">#9023</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
<h2>Documentation</h2>
<ul>
<li>Added release notes for <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9041">#9041</a>
<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9042">#9042</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Add release notes for <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8912">#8912</a>
and <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8969">#8969</a>
<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9019">#9019</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>ImageFont does not handle multiline text <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9000">#9000</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated Ubuntu CI targets <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8988">#8988</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update MinGW package names <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8987">#8987</a>
[<a href="https://github.com/H4M5TER"><code>@​H4M5TER</code></a>]</li>
<li>Updated docstring <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8943">#8943</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Mention that tobytes() with the raw encoder uses Pack.c <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8878">#8878</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Refactor docs <code>Makefile</code> <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8933">#8933</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Add template for quarterly release issue <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8932">#8932</a>
[<a
href="https://github.com/aclark4life"><code>@​aclark4life</code></a>]</li>
<li>Add list of third party plugins <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8910">#8910</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update redirected URL <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8919">#8919</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Docs: use sentence case for headers <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8914">#8914</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Docs: remove unused Makefile targets <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8917">#8917</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Remove indentation from lists <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8915">#8915</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Python 3.13 is tested on Arch <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8894">#8894</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Move XV Thumbnails to read only section <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8893">#8893</a>
[<a
href="https://github.com/aclark4life"><code>@​aclark4life</code></a>]</li>
<li>Updated macOS tested Pillow versions <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8890">#8890</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
<h2>Dependencies</h2>
<ul>
<li>Add AVIF to wheels using only aomenc and dav1d AVIF codecs for
reduced size <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8858">#8858</a>
[<a href="https://github.com/fdintino"><code>@​fdintino</code></a>]</li>
<li>Use same AVIF URL when fetching dependency <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8871">#8871</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update dependency mypy to v1.16.1 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9026">#9026</a>
[@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li>
<li>Update libpng to 1.6.49 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9014">#9014</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update dependency cibuildwheel to v3 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9010">#9010</a>
[@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li>
<li>Updated libjpeg-turbo to 3.1.1 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9009">#9009</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update dependency mypy to v1.16.0 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8991">#8991</a>
[@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li>
<li>Updated libpng to 1.6.48 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8940">#8940</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated Ghostscript to 10.5.1 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8939">#8939</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated harfbuzz to 11.2.1 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8937">#8937</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Updated libavif to 1.3.0 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8949">#8949</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update dependency cibuildwheel to v2.23.3 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8931">#8931</a>
[@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li>
<li>Updated harfbuzz to 11.1.0 <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8904">#8904</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
<h2>Testing</h2>
<ul>
<li>Add <code>match</code> parameter to <code>pytest.warns()</code> <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9038">#9038</a>
[<a href="https://github.com/hugovk"><code>@​hugovk</code></a>]</li>
<li>Increase pytest verbosity <a
href="https://redirect.github.com/python-pillow/Pillow/issues/9040">#9040</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Improve SgiImagePlugin test coverage <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8896">#8896</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
<li>Update ruff pre-commit ID <a
href="https://redirect.github.com/python-pillow/Pillow/issues/8994">#8994</a>
[<a
href="https://github.com/radarhere"><code>@​radarhere</code></a>]</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="89f1f4626a"><code>89f1f46</code></a>
11.3.0 version bump</li>
<li><a
href="f2de251c76"><code>f2de251</code></a>
Updated check script paths (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9052">#9052</a>)</li>
<li><a
href="84855d11c8"><code>84855d1</code></a>
Raise FileNotFoundError when opening an empty path (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9048">#9048</a>)</li>
<li><a
href="204d11d4da"><code>204d11d</code></a>
Raise FileNotFoundError when opening an empty path</li>
<li><a
href="2b39f7581e"><code>2b39f75</code></a>
Handle IPTC TIFF tags with incorrect type (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/8925">#8925</a>)</li>
<li><a
href="e7a53ba19b"><code>e7a53ba</code></a>
Do not update palette for L mode GIF frame (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/8924">#8924</a>)</li>
<li><a
href="c22230b761"><code>c22230b</code></a>
Use save parameters as encoderinfo defaults (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9001">#9001</a>)</li>
<li><a
href="da10ed1cf3"><code>da10ed1</code></a>
Add support for iOS (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/9030">#9030</a>)</li>
<li><a
href="be2b4e7864"><code>be2b4e7</code></a>
Fix qtables and quality scaling (<a
href="https://redirect.github.com/python-pillow/Pillow/issues/8879">#8879</a>)</li>
<li><a
href="d4162f8505"><code>d4162f8</code></a>
Updated return type</li>
<li>Additional commits viewable in <a
href="https://github.com/python-pillow/Pillow/compare/11.2.1...11.3.0">compare
view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pillow&package-manager=pip&previous-version=11.2.1&new-version=11.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
2026-02-10 02:50:35 +00:00
Bartłomiej Kocot
27e0a34e0f [rocm-libraries] ROCm/rocm-libraries#4406 (commit 61f9f90)
[CK] CK Tile grouped convolution direct load

## Motivation

CK Tile grouped convolution forward direct load support.

## Technical Details

Basic pipeline for direct load and new instances for forward for v1 and
v4 pipelines.

## Test Plan

test_grouped_convnd_fwd_tile

## Test Result

CI pending

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
AICK-130
2026-02-09 21:09:42 +00:00
Chinmay Dattanand Kuchinad
0cafa68b6f [rocm-libraries] ROCm/rocm-libraries#4292 (commit b7f1367)
Enable group mode (varlen) kernel generation for PyTorch
 integration (#4292)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Proposed changes

This PR enables group mode (variable-length attention) kernel generation
for PyTorch's CK SDPA backend.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [X] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [X] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

The change is minimal (single line deletion) but enables a significant
feature: variable-length attention support for ROCm users via PyTorch's
torch.nn.attention.varlen API.
2026-02-09 20:59:55 +00:00
Bartłomiej Kocot
ea6363ad78 [rocm-libraries] ROCm/rocm-libraries#4399 (commit 331512e)
[CK] Fix grouped conv fwd transform for merged groups

## Motivation

[CK] Fix grouped conv fwd transform for merged groups for 1d and 3d.

## Technical Details

After optimizations for 2d there is a lack of implementation for 1d and
3d

## Test Plan

test_grouped_convnd_fwd

## Test Result

pending CI

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-02-09 15:37:36 +00:00
Eiden Yoshida
e16789b609 [rocm-libraries] ROCm/rocm-libraries#4373 (commit 1c29275)
[CK] MICI: Disable failure pattern checking

## Motivation

- ck mici jobs hanging at end, possibly at failure pattern checking

## Technical Details

- Disable failure pattern checking to see if hanging goes away

## Test Plan

- Observe behavior after merge
2026-02-09 15:25:01 +00:00
kensclin
5b3e527c88 [rocm-libraries] ROCm/rocm-libraries#4280 (commit b7de1e1)
[CK_TILE] Add blockscale GEMM support for EightWarps on
 gfx950 (#4280)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Proposed changes

gemm blockscale eightwarps support

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [x] I have run `clang-format` on all changed files
- [x] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered
2026-02-09 03:55:52 +00:00
jakpiase
731afe535a [rocm-libraries] ROCm/rocm-libraries#4357 (commit ff3e982)
[CK_TILE] Add support and tests for V6 pipeline in conv fwd
 (#4357)

Added support for conv v6 pipeline in ck tile's convolution forward
kernel. CK Tile v6 pipeline is the equivalent to old ck's V5 pipeline
and should be faster than other pipelines for some cases. This PR also
adds tests inside profiler that's currently inside experimental
directory, so now we should be able to detect regressions easier.
2026-02-08 19:57:53 +00:00
Ville Pietilä
57d26db844 [rocm-libraries] ROCm/rocm-libraries#4273 (commit 591f504)
[CK] Add fwd conv group merging to v3 conv instances
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Proposed changes

Added conv group merging to the (universal) V3 fwd conv pipeline. The
new instance improves fwd conv performance when the number of
input/output channel per group is low.

On MI300 (`gfx942`) we get

| CK prof command | Baseline (TFLOPS) | V3 group merging (TFLOPS) |
|:-----|:------:|------:|
| grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 4 4 3 3 200 200 1 1 1 1 1 1 1 1
| 3.86035 | 8.36796 |
| grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 8 8 3 3 200 200 2 2 1 1 1 1 1 1
| 10.1867 | 13.4677 |
| grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 8 8 3 3 100 100 1 2 1 1 1 1 1 1
| 11.7875 | 16.3657 |
2026-02-08 11:35:56 +00:00
Emily Martins
4266f867d6 [rocm-libraries] ROCm/rocm-libraries#4381 (commit 5df3343)
[CK_TILE] Fix MMA concepts compiler error

## Motivation

CK Tile is required to support certain older OSs; on these OSs, cpp 20
is not fully supported. For ROCm 7.2, compiler errors occur on one of
these older OSs. An example of this error is as follows:

```bash
/composable_kernel/include/ck_tile/core/arch/mma/amdgcn_mma.hpp:34:28: error: expected concept name with optional arguments
   34 |     { MmaOp::kAMBlock } -> std::convertible_to<unsigned int>;
      |
```

The goal of this PR is to resolve these compiler errors.

## Technical Details

The existing guards around the mma concepts only check if the concepts
language feature is supported, as follows:

```cpp
#if defined(__cpp_concepts) && __cpp_concepts >= 201907L
// ...
template <typename CtrlFlags>
concept CtrlFlagsGfx9I = requires(CtrlFlags ctrlFlags) {
    // Flag members for Gfx9 MFMA instructions
    { CtrlFlags::Cbsz } -> std::convertible_to<int>;
    { CtrlFlags::Abid } -> std::convertible_to<int>;
    { CtrlFlags::Blgp } -> std::convertible_to<int>;
};

#endif // defined(__cpp_concepts) && __cpp_concepts >= 201907L
```
That said, in cases where functionality from the `<concepts>` header is
used (e.g., `std::convertible_to`), this guard fails to check whether
the `<concepts>` header is available.

This change adds an additional check to the concepts that make use of
functionality from the `<concepts>` header to ensure the header is
available.

## Test Plan

I tested the changes on the relevant docker for gfx90a, gfx950, and
gfx942 and the compiler issue is not present.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-02-07 00:28:06 +00:00
Aviral Goel
4237aedf9a [rocm-libraries] ROCm/rocm-libraries#4335 (commit 06976b3)
=?UTF-8?q?Increase=20tolerance=20for=20FP16=20GEMM=20test?=
 =?UTF-8?q?s=20to=20handle=20non-deterministic=20ro=E2=80=A6=20(#4335)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

…unding

Three tests were failing intermittently with small errors (0.01-1.5%)
due to non-deterministic FP16 accumulation order from GPU thread
scheduling:
- test_ck_tile_batched_gemm
- test_ck_tile_grouped_gemm_preshuffle
- test_ck_tile_grouped_gemm_multi_d

These tests use kbatch=1 (no split-K), so errors are from
order-dependent rounding, not atomics. Increased tolerances from 1e-3 to
2e-3 (0.2%) to account for FP16 precision limits while still catching
real bugs.

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2026-02-07 00:15:34 +00:00
spolifroni-amd
d2f1541976 [rocm-libraries] ROCm/rocm-libraries#4300 (commit 07e9d56)
[CK] add inter/intrawave scheduling concept doc
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Proposed changes

Adding information about inter/intrawave scheduling
2026-02-07 00:11:11 +00:00
Enrico Degregori
984a3d1828 [rocm-libraries] ROCm/rocm-libraries#4372 (commit 738ffd7)
[CK] Workaround blockscale wp test failure

## Motivation

Workaround to fix blockscale wp test failure for pipeline v3

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-02-07 00:09:58 +00:00
Illia Silin
1ddb38f098 [rocm-libraries] ROCm/rocm-libraries#4375 (commit 45b616b)
[CK] fix path for build filter

## Motivation

Fix the filter that determines whether CI builds are necessary.

## Technical Details

A script checks the files list returned by git diff and checks whether
any code source was modified. If not, if only documentation was changed,
it will allow skipping the builds. We make sure we only look at the
changes in projects/composablekernel/ folder.
2026-02-06 18:18:14 +00:00
Geo Min
41353c8f3c [rocm-libraries] ROCm/rocm-libraries#4378 (commit d8e2826)
[ci] Adding mi350 required group ID

After updating mi325 group-id, we are noticing errors for mi350.

Tested here for mi350:
https://github.com/ROCm/TheRock/actions/runs/21733399385/job/62692971370
Tested here for mi325:
https://github.com/ROCm/TheRock/actions/runs/21759203211/job/62778060417

Adding both work properly
2026-02-06 18:00:27 +00:00
Illia Silin
4dd4869fbf [rocm-libraries] ROCm/rocm-libraries#4361 (commit 37a74ef)
[CK]  a bunch of CI fixes.

## Motivation

Fixing some of the CK CI issues

## Technical Details

fixing paths to dockerfiles and scripts;
moving codegen tests to separate stage (collides with main build since
you must call cmake from same folder but different options);
fixing a couple of clang compilation issues with staging compiler;
2026-02-06 01:07:34 +00:00
Eiden Yoshida
e96beb1f3e [rocm-libraries] ROCm/rocm-libraries#4352 (commit 3c9beb3)
[CK] MICI: Fix git diff in selective_test_filter.py

## Motivation

- git diff needs access to reference repo

## Technical Details

- mount reference repo path into docker for selective_test_filter.py to
access

## Test Plan

- tested in MICI

## Test Result

- launch_tests.sh ran successfully
2026-02-05 22:57:20 +00:00
Geo Min
58549aa787 [rocm-libraries] ROCm/rocm-libraries#4360 (commit 5aa1f1d)
[ci] Updating variable group-id for OSSCI

OSSCI migrated mi325s, so need a new groupID

Sanity works here:
https://github.com/ROCm/TheRock/actions/runs/21723540679/job/62659665907
normal run works here:
https://github.com/ROCm/TheRock/actions/runs/21723540679/job/62659791422

I've dabbled with organization variables, however, this does not work
for forks so for now, we will do the manual update
2026-02-05 19:02:46 +00:00
Jobbins
344d98781b [rocm-libraries] ROCm/rocm-libraries#4351 (commit 3b98c98)
[composablekernel] fix failure status

## Motivation

Pipelines were failing on Math CI status check.

## Technical Details

For the success case, we just changed the config in Jenkins to use a
proper app token and no code changes were required. However, the failure
case would not have worked as coded, so we needed to move that outside
of the `rocmnode()` block.

## Test Plan

I removed all of the CI in one of the commits to quickly test, and then
added it back.  Got a successful "success" message and "failure" message
produced
2026-02-05 15:57:21 +00:00
Eiden Yoshida
3a02862241 [rocm-libraries] ROCm/rocm-libraries#4349 (commit 9bb7f5c)
[CK] MICI: Correct path for build trace script

## Motivation

- Corrects path to script due to superrepo migration
- Forces all tests to run by default

## Technical Details

- now in /projects/composablekernel
2026-02-05 15:56:52 +00:00
Eiden Yoshida
3f42f76b45 [rocm-libraries] ROCm/rocm-libraries#4336 (commit d26a782)
[CK] MICI: Use reference repo for checkout operations

## Motivation

- Maintain a reference repo on slave nodes that speeds up any
clone/checkout operations

## Technical Details

- clone a ref repo if it does not exist
- update ref repo if it does exist
- checkout after ref repo is updated
- eliminates double clone

## Test Result

- Initial checkouts succeeded
2026-02-05 02:44:29 +00:00
Jeff Huang
7b18f5fed2 [rocm-libraries] ROCm/rocm-libraries#4263 (commit f34aec2)
[CK] Add FP8 KV_BLOCKSCALE support for batch prefill
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Implement per-page K/V quantization for paged attention:
  - Add KV_BLOCKSCALE enum to BlockAttentionQuantScaleEnum
  - Use exp2 shift trick to eliminate explicit P scaling overhead
- Prefetch physical pages offset for KV cache, overlaps with
computations

## Proposed changes

Please describe the motivation behind the pull request, whether it
enables a new feature or fixes a bug. If there are associated pull
requests or issues, please link them to the pull request.

## Checklist

Please put an `x` into the boxes that apply. You can also fill these out
after creating the PR. If you're not sure, please don't hesitate to ask.

- [ ] I have added tests relevant to the introduced functionality, and
the unit tests are passing locally
- [ ] I have added the test to REGRESSION_TESTS list defined at the top
of CMakeLists.txt in tests/CMakeLists.txt, **IF** the test takes more
than 30 seconds to run.
- [ ] I have added inline documentation which enables the maintainers
with understanding the motivation
- [ ] I have removed the stale documentation which is no longer relevant
after this pull request
- [ ] (If this change is user-facing) I have added release notes which
provide the end users with a brief summary of the improvement from this
pull request
- [ ] I have run `clang-format` on all changed files
- [ ] Any dependent changes have been merged

## Discussion

If this is a relatively large or complex change, feel free to start a
discussion by explaining why you chose the solution you did and what
alternatives you considered
2026-02-04 23:26:20 +00:00
Illia Silin
62fbda4d1e [rocm-libraries] ROCm/rocm-libraries#4310 (commit 7f63aa1)
CK CI migration.

## Motivation

Enable the CK CI after migration from standalone repo.

## Technical Details

Modify the jenkinsfile in projects/composablekernel to update the CI
workflow.

## Test Plan

This is for CK internal testing only.

## Test Result

Set up new CK CI pipeline/dashboard.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-02-04 17:35:17 +00:00
andrew clark
421b714f13 Adding Additional Failure Patterns for Alerts (#3663)
* Added two new failure patterns to detect. Including test function to verify if the patterns are detected

* Modifying pattern match to detect docker login failure. Removed passing tests.

* Removing passing tests. Modifying docker pattern to detect failure

* Removed passing tests

* Removing test logging function
2026-02-03 10:23:07 -08:00
Illia Silin
569640dc70 Revert "Implement device grouped gemm fixed nk multi abd for rdna4 (#3619)" (#3705)
This reverts commit 301eb5cf08.
2026-02-03 09:52:14 -08:00
Emily Martins
8cbd09c84a [CK_TILE] Stream-K Tile Engine Test Config File Generation (#3662)
* Stream-K smoke test config file generation

This change converts the stream-k smoke tests to use tile engine. Since
the m, n, and k values dependent on the CU count of a device, the
configs are generated during the Configuration Phase.

* Compute GEMM reference on GPU

* Remove redundant Stream-K tests

Removing redundant tests that are now run via tile engine.

* Fix relative and absolute tolerance calculation

This change updates the Stream-K tile engine interface to ensure that
num_wgs_per_tile is propaged and passed into the compare_results
function to calculate the rel and abs tolerance. Before, split-k was
used, which is incorrect for Stream-K since the split-k value is
always 1.

* Cleanup imports, types, and other misc items

This commit makes the following changes:
- Uses Typing module for nested type hints
- Uses quotes around cu_count_arg argument in generate_configs.cmake in
  if statements
- Adds explicit include for tuple in test_gemm_streamk_simple.cpp
- Adds a type for the tiles argument in argparser to check argument
  validity

* Use CU count as return value for better parsing

* Add reduction tests for bf16, fp8, and bf8
2026-02-03 09:12:15 -07:00
Max Podkorytov
3f04d27b68 Remove concrete performance numbers from BUILD_TIME_OPTIMIZATION.md (#3702)
Replace specific benchmark numbers with qualitative descriptions since
measurements vary across environments and may become outdated.

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-03 03:54:18 -07:00
Illia Silin
8b56ffb6ae Fix one more lifetimebound error. (#3703)
* fix staging compiler errors

* fix clang format
2026-02-02 18:25:56 -08:00
Bartłomiej Kocot
f2b9b3a3a6 Fix path to ck tile conv fwd instance generator (#3699)
* Fix path to ck tile conv fwd instance generator

* fixes
2026-02-02 18:07:33 -08:00
Aviral Goel
3e77721755 feat: add split_k support for block scale gemm bquant mode. (#3653)
* WIP: add splitk to bquant

* feat: add support for bf8i4 and fp8i4 by calculating correct stride for packed data types

* chore: remove temporary test script

* fix: incorrect tile window length for splitted bq tensor window

* chore: improve comments

* test: add unit tests to cover bquant splitk functionality

* fix: conflict resolution by renaming variables
2026-02-02 14:41:53 -08:00
Zoltán Lakatos
301eb5cf08 Implement device grouped gemm fixed nk multi abd for rdna4 (#3619)
* device struct implementation

* added xdl grouped multi abd fixed nk testing

* wmma implementation fixed

* avoid unnecessary device mem allocation and code cleanups

* cleanup instances definitions

* wmma examples added

* code cleanups

* fix clang format

* typo and compilation fixes related to reference gemm

* fix compilation error due to std::remove_cvref_t

* added missing hip_check_error includes

* correction to example instances

* review commentes addressed

* removed split-k from testing

* code formatting

---------

Co-authored-by: Zoltán Lakatos <zoltan.lakatos@streamhpc.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2026-02-02 13:58:11 -08:00
Jan Patrick Lehr
069500464d [Compiler] Addressing new compiler warnings (#3640)
* [Compiler] Addressing new compiler warnings

Clang enables new lifetime warnings in production and we see build
errors due to this with the staging compiler.

The attributes added in this PR are suggested by the compiler. However,
I'm not very familiar with the code base, so the changes may be
incorrect.

* Update some more instances

* Adds file-level ignores via clang diagnostic pragma

The number of instances was large, so I decided to use file-level scope
to disable the warning via pragma clang diagnostic ignored.

It also showed this warning coming from the gtest dependency. For that,
I did add the respective command line flag to the CMake variables. I
don't know if this is acceptable or not.

* This adds the remaining instances

For a build on gfx90a.

* fix clang format

* Adding couple more instances from gfx1200 build

* Fixed another few instances

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2026-02-02 09:39:48 -08:00
ZheWang
e6bcd192d4 Mx fp6 flatmm (#3601)
* add fp6 data-type and support sync/async dwordx3 load/store

* clang-format

* pre-commit

* 1st commit

* default mnk pass ut

* fix a distrubution

* fix

* fix bdram distr

* update

* pass ut

* improve perf

* update

* clean code

* resolve copilot comment

* reslove comment

* clang-format

---------

Co-authored-by: ZheWang <zhewan@amd.com>
2026-02-02 16:04:40 +08:00
Bartłomiej Kocot
1ae83137eb Enable Grouped Conv Tile Fwd Tests daily (#3680) 2026-01-31 15:55:25 -07:00
Po Yen Chen
8c1788757a [CK_TILE] Fix incompatible vector type arguments for the intrinsic calls (#3672)
* Change call to the intrinsics

* fix clang format

* Undo changes under include/ck/utility

* Use named variable as vector size

---------

Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>
2026-01-30 12:02:49 -08:00
ApoorvaKalyani
70d71b1514 Test fix for gemm_b_scale_xdl_v3. (#3674) 2026-01-30 10:34:54 -07:00
Illia Silin
63df1c0af2 remove builds on legacy OSs from CI (#3693) 2026-01-30 09:15:09 -08:00
jiangyon.ren
4d2f8c111e [CK_TILE][FMHA] Add sparse attention VSA (#3341)
* add sparse attention VSA

* fix the pre-commit

* Add jenga test and pre-commit

* add bf16 for vsa

* add jenga support bf16

* remove lse arg

* split kernel code to block & kernel

* fix the pre-commit

* fix the pre-commit

* fix the copyrights

* fix the copyright

* fix the copyright & rename block to pipeline

* fix the copyright and pipeline

* remove lse & dropout & add fmt

* fix the jenga&VSA code review

* remove the useless code & resolved the comments

* remove useless code

* remove useless code

* Clean up code

* Remove more unused code

* Re-format .hpp

* Refactor codegen scripts

---------

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: asleepzzz <hanwen.chang@amd.com>
2026-01-31 00:59:47 +08:00
Kiefer van Teutem
2377a62837 Adding remaining conv, dynamic_op, and scaleadd_scaleadd_relu flavors for grouped conv fwd (#3529)
* Adding remaining flavors for grouped conv fwd

As titled. Following variants are added:
- grouped_conv2d_fwd_dynamic_op
- grouped_conv3d_fwd_dynamic_op
- grouped_conv3d_fwd_bilinear
- grouped_conv3d_fwd_convscale
- grouped_conv3d_fwd_convinvscale
- grouped_conv3d_fwd_convscale_add
- grouped_conv3d_fwd_convscale_relu
- grouped_conv3d_fwd_scale
- grouped_conv3d_fwd_combconvscale
- grouped_conv3d_fwd_scaleadd_scaleadd_relu

* Fix incomplete parsing of types from source names in add_instance_library() cmakelists function so we don't build f8 on RDNA3.

* Do not build f8 / bf8 only flavor tests on RDNA3

* Make sure we have proper generic instances for all instance lists related to the post-ces extra flavors, with scalarPerVector = 1. Then disable all but one generic instance per instance list to reduce compile time.

* Post rebase fix: Template parameters for Grouped Conv Fwd Device Impl got tweaked upstream.

* adding int8 and fp16 overloads to the elementwise operations

* fixed copilot nits

* Addressing review comments:

- removed unnecessary examples for dynamic op
- removed unnecessary conv specalizations for all the flavors
- removed spurious bilinear and scale source files

* clang-format

* reduced no of tests

---------

Co-authored-by: Wojciech Laskowski <wojciech.laskowski@streamhpc.com>
2026-01-30 17:02:14 +01:00
Erwin Terpstra
6a6177a246 [CK_Tile] Support for a4w4 (fp4) in block scale gemm AB quant (#3603)
* chore: split block scale example instances in more separate files to speed up compile times

* wip: fp4 scaffolding for abquant

* feat: add fp4 decoding-while-loading to abquant pipeline

* feat: add support for fp4 CPU verification in abquant

* chore: add time tracking to reference calculation

* feat: add a4w4 test for blockscale gemm

* feat: optimize reference calculation by preconverting values to AccType

* feat: add fp4 to fp8 look-up table

* fix: reference to wrong ComputeDataType field in QuantProblem

* feat: type utilities for determining MFMA compute types

* feat: packed fp4 for abquant weight preshuffle

* feat: add separate tests for a4w4 base case, padding and preshuffleB

* fix: fp4 conversion on gfx950 attempting to use non-supported method

* fix: test case was using quant group sizes which don't work on gfx950 due to larger mfma tile size

* chore: add fp4 preshuffleb mode to block scale example

* chore: sanity check for packed types being 1 byte

* chore: clarify tensor dimension indices with constants

* chore: replace traits check with specialized check for packed types

* style: some minor refactoring and cleanup

* fix: correct conversion table for FNUZ fp8

* chore: add fp4 instances to main abquant instances again

* chore: use same initialization branch for int4 and fp4

* chore: add missing initialization for fp4 in block scale gemm example

---------

Co-authored-by: Thomas Ning <Thomas.Ning@amd.com>
2026-01-30 04:40:50 -07:00
Zoltán Lakatos
565fea2645 fix undefined behaviour in softmax kernel (#3683)
Co-authored-by: root <zoltan.lakatos@streamhpc.com>
2026-01-30 15:22:54 +08:00
vivienfanghuagood
f3d8b7210f Extend CK fmha_batch_prefill kernel coverage to head_dim=256 (#3328)
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2026-01-30 11:18:20 +08:00
MHYangAMD
6ff0737843 Fix redundant cast in model sensitive rmsnorm (#3681)
* Fix redundant cast

* Fix linting
2026-01-30 10:52:19 +08:00
Max Podkorytov
83b6155354 Add ck-rocprof: GPU profiling tool for rocprof-compute (#3627)
* Decouple configure/build/test tools from Docker

Create a two-layer tool architecture:
- Core tools (ck-configure, ck-build, ck-test): Environment-agnostic,
  work on any system with ROCm - no Docker dependency
- Container tools (ck-docker): Manage Docker containers and delegate
  to core tools via docker exec

Changes:
- Add ck-configure: New CMake configuration tool with preset support,
  native GPU detection, and flexible options
- Refactor ck-build: Remove Docker dependency, add --configure and
  --list options, call ninja directly
- Refactor ck-test: Remove Docker dependency, add CTest integration
  with --smoke/--regression/--all options
- Enhance common.sh: Add native GPU detection, build directory utils,
  and output helpers
- Update ck-docker: Add configure/build/test/exec commands that
  delegate to core tools inside container

This enables:
- Native development on ROCm hosts without Docker
- Simpler CI/CD integration
- Consistent behavior inside and outside containers

Co-Authored-By: Claude <noreply@anthropic.com>

* Add ck-rocprof: GPU profiling tool for rocprof-compute

Adds a command-line profiling tool to simplify GPU performance
analysis workflow using AMD rocprof-compute.

Features:
- Easy setup with automatic Python venv configuration
- Simple CLI: setup, run, analyze, compare, list
- Automatic GPU architecture detection
- Focus on LDS metrics (Block 12) for bank conflict analysis
- Comprehensive documentation with examples and troubleshooting

Usage:
  ck-rocprof setup                    # One-time environment setup
  ck-rocprof run <name> <executable>  # Profile executable
  ck-rocprof analyze <name> [block]   # Analyze metrics
  ck-rocprof compare <name1> <name2>  # Compare two runs
  ck-rocprof list                     # List available runs

* Make ck-rocprof documentation concise and improve Docker integration

- Streamlined documentation from 416 to 157 lines (62% reduction)
- Focused on essential commands, metrics, and workflows
- Enhanced script to run all operations inside Docker containers
- Fixed workload directory path and improved container management
- Added automatic rocprofiler-compute installation and dependency handling

* Add --no-roof flag to ck-rocprof profile command

Skip roofline analysis by default to speed up profiling. Roofline
analysis can add significant time to profiling runs but is not
needed for most LDS bank conflict analysis workflows.

* Make ck-rocprof work independently of Docker

Add native execution mode that runs rocprof-compute directly on the host
system when available, falling back to Docker mode when not.

Key changes:
- Auto-detect native mode when rocprof-compute is in PATH or common locations
- Add execution mode wrappers (exec_cmd, file_exists, dir_exists, etc.)
- Native mode stores venv at .ck-rocprof-venv in project root
- Native mode stores workloads at build/workloads/
- Support user-installed rocprofiler-compute (e.g., ~/.local/rocprofiler-compute)
- Add CK_FORCE_DOCKER env var to force Docker mode
- Update help message to show current execution mode
- Maintain full backward compatibility with existing Docker workflow

Tested successfully with rocprofiler-compute 3.4.0 installed from source
on MI300X GPU in native mode.

Co-Authored-By: Claude <noreply@anthropic.com>

* Add clean/status commands and improve ck-rocprof robustness

- Add 'clean' command to remove profiling runs (supports --all)
- Add 'status' command to show configuration and environment info
- Add workload name validation to prevent path traversal attacks
- Fix uv installation to use pip instead of curl for reliability
- Add cross-platform stat support for macOS compatibility
- Consolidate ROCPROF_CANDIDATES to avoid code duplication
- Expand help documentation with all profiling block descriptions
- Fix Docker wrapper script escaping issues

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix analyze command to use correct workload path

rocprof-compute stores results directly in the workload directory
(pmc_perf.csv) rather than in a GPU architecture subdirectory.
Updated find_workload_path to detect this correctly.

Co-Authored-By: Claude <noreply@anthropic.com>

* Address PR review security and robustness issues

Security fixes:
- Escape executable path in cmd_run to prevent shell injection
- Add workload name validation to cmd_analyze and cmd_compare

Robustness improvements:
- Add error checking for uv package manager installation
- Use consistent project root detection (find_project_root || get_project_root)
- Use /opt/rocm instead of hardcoded /opt/rocm-7.0.1 in Docker mode
- Derive ROCM_REQUIREMENTS path from ROCPROF_BIN for flexibility
- Use gfx950 as fallback GPU consistent with common.sh

Documentation updates:
- Fix env var name GPU_TARGET -> CK_GPU_TARGET
- Update storage layout to reflect current structure (workloads/<name>/)
- Document clean and status commands
- Clarify native vs Docker default paths

Co-Authored-By: Claude <noreply@anthropic.com>

* Simplify ck-rocprof to native-only mode

Remove Docker mode from ck-rocprof. Docker users should run the tool
via `ck-docker exec ck-rocprof ...` instead.

This simplification:
- Removes ~210 lines of Docker-specific code
- Eliminates mode detection complexity
- Makes the script easier to maintain
- Provides clearer error messages when rocprof-compute is not found

The setup command now lists all searched locations when rocprof-compute
is not found, helping users understand how to install it.

Co-Authored-By: Claude <noreply@anthropic.com>

* Add rocprofiler-compute source installation fallback

When rocprof-compute is not found in system locations, automatically
install rocprofiler-compute 3.4.0 from source as a fallback. This
eliminates the hard dependency on system ROCm packages.

Implementation details:
- Clone rocprofiler-compute from GitHub to ~/.local/
- Install dependencies via requirements.txt (not editable install)
- Create wrapper that sets PYTHONPATH to source directory
- Execute source script directly rather than importing as module

This approach matches the project's development workflow and works
around the incomplete pyproject.toml that prevents editable installs.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-01-29 17:20:22 -08:00
Illia Silin
05ef93a69d Add a flag to build CK libs required for HipTensor. (#3684)
* create a filter to build only libs required by hiptensor

* allow building libs for miopen and hiptensor at the same time

* tweak the lib filtering logic one more time
2026-01-29 16:12:49 -08:00
Enrico Degregori
f16d9100e4 Multi AB support for wave transfer (#3578)
* Add multi AB support to wave transfer

* Improviments to multi ABD examples

* Add instances and use intrawave v1 instead of interwave

* Apply changes to other transfers

* Wave transfer: add support for multiple internal vgpr buffers

* Fix compilation error gfx11
2026-01-29 10:29:40 -08:00