composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-04-19 22:39:03 +00:00

Author	SHA1	Message	Date
Christopher Millette	e1e2f7ac2e	[rocm-libraries] ROCm/rocm-libraries#4447 (commit 6d08a99) [CK] Optimize multi-dimensional static for loop decomposition (#4447) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Motivation Recursive template implementations might initially seem attractive to minimize necessary coding. Unfortunately, this style is often affects readability and requires significant resources from the compiler to generate instantiation chains. In "high-traffic" code (e.g., used in many places + compilation units), this generally does not scale well and can bloat the overall compile times to unnecessary lengths. The aim of this PR is to take some of most high-traffic utility code and try our best to eliminate recursive templates in favor of fold expansions and constexpr function helpers. In local tests with clang build analyzer, device_grouped_conv2d_fwd_xdl_ngchw_gkcyx_ngkhw_f16_16x16_instance.cpp showed high hit-rates on slow template instantiations in static_for, dimensional static_for (static_ford), which are subsequently affected by implementation of the Sequence class and associated transforms. Example: ** Templates that took longest to instantiate: 70111 ms: ck::detail::applier<int, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1... (372 times, avg 188 ms) // 70 seconds!** The above is part of the implementation of static_for which uses Sequence classes.. ## Technical Details ### Summary of Optimization Techniques \| Technique \| Used In \| Benefit \| \|-----------\|---------\|---------\| \| __Constexpr for-loop computation__ \| sequence_reverse_inclusive_scan, sequence_map_inverse \| Moves O(N) work from template instantiation to constexpr evaluation \| \| __Pack expansion with indexing__ \| sequence_reverse, Sequence::Modify \| Single template instantiation instead of recursive \| \| __Flat iteration + decomposition__ \| ford, static_ford \| O(1) template depth instead of O(N^D) \| \| __Pre-computed strides__ \| index_decomposer \| Enables O(1) linear-to-multi-index conversion \| ### Impact on Compile Time These optimizations reduce template instantiation depth from O(N) or O(N^D) to O(1), which: 1. Reduces compiler memory usage 2. Reduces compile time exponentially for deep instantiation chains 3. Enables larger iteration spaces without hitting template depth limits ## Test Plan * Existing tests for Sequence are re-used to affirm correctness * Unit tests for ford and static_ford are added (dimensional looping) * 8 new regression tests specifically verify the fixes for the PR feedback: - `NonTrivialOrder3D_201` - Tests Orders<2,0,1> for static_ford - `NonTrivialOrder3D_201_Runtime` - Tests Orders<2,0,1> for ford - `ConsistencyWithNonTrivialOrder_201` - Verifies static_ford and ford consistency - `NonTrivialOrder3D_120` - Tests Orders<1,2,0> for static_ford - `NonTrivialOrder3D_120_Runtime` - Tests Orders<1,2,0> for ford - `NonTrivialOrder4D` - Tests 4D with Orders<3,1,0,2> for static_ford - `NonTrivialOrder4D_Runtime` - Tests 4D with Orders<3,1,0,2> for ford - `AsymmetricDimensionsWithOrder` - Tests asymmetric dimensions with non-trivial ordering ## Test Result ### Compile Time Comparison: `8b72bc8` (base) → `477e0686` (optimized) #### Commits in Range (8 commits) 1. `fd4ca17f48` - Optimize sequence_reverse_inclusive_scan and sequence_reverse 2. `7a7e3fdeef` - Optimize sequence_map_inverse 3. `92855c9913` - Optimize ford and static_ford calls to eliminate nested template recursion 4. `88a564032b` - Add unit tests for ford and static_ford 5. `1a0fb22217` - Fix clang-format 6. `8a0d26bddf` - Increase template recursion depth to 1024 7. `dc53bb6e20` - Address copilot feedback and add regression tests 8. `477e06861d` - Increase bracket depth to 1024 #### Build Timing Results \| File \| Base (8b72bc8759d9 \| HEAD(a0438bd398) \| Improvement \| \|------\|------\|------\|-------------\| \| grouped_conv2d_fwd (f16) -j1 \| 313.31s \| 272.93s \| __12.9% faster__ \| \| grouped_conv1d_fwd (bf16) -j1 \| 79.33s \| 68.61s \| __13.5% faster__ \| \| grouped_conv1d_bwd_weight (f16) -j1\| 15.77s \| 14.31s \| __9.2% faster__ \| \| device_grouped_conv2d_fwd_instance -j64 \| s \| s \| __% faster__ \| #### Key Optimizations 1. __sequence_reverse_inclusive_scan/sequence_reverse__: O(N) → O(1) template depth 2. __sequence_map_inverse__: O(N) → O(1) template depth 3. __ford/static_ford__: O(N^D) → O(1) template depth using flat iteration with index decomposition 4. __Copilot feedback fixes__: Corrected New2Old mapping for non-trivial orderings ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-02-11 22:13:15 +00:00
Bartłomiej Kocot	ea4942cd02	[rocm-libraries] ROCm/rocm-libraries#4506 (commit d9ccef7) Revert "[CK Conv] Add bwd weight instance for large-k shape" (#4506) Reverts ROCm/rocm-libraries#4266 due to CI failures. Should be investigated by @johannes-graner	2026-02-11 21:37:50 +00:00
Christopher Millette	04eddbc5ce	[rocm-libraries] ROCm/rocm-libraries#4471 (commit 10fa702) [CK] Optimize vector type build times Supercedes https://github.com/ROCm/rocm-libraries/pull/4281 due to CI issues on import ## Proposed changes Build times can be affected by many different things and is highly attributed to the way we write and use the code. Two critical areas of the builds are frontend parsing and backend codegen and compilation. ### Frontend Parsing The length of the code, the include header tree and macro expansions all affect the front-end parsing time. This PR seeks to reduce the parsing time of the dtype_vector.hpp vector_type class by reducing redundant code by generalization. * Partial specializations of vector_type for native and non-native datatypes have been generalized to one single class, consolidating all of the data initialization and AsType casting requirements into one place. * The class nnvb_data_t_selector (e.g., Non-native vector base dataT selector) class has been removed and replaced with scalar_type instantiations as they have the same purpose. Scalar type class' purpose is already to map generalized datatypes to native types compatible with ext_vector_t. ### Backend Codegen Template instantiation behavior can also affect build times. Recursive instantiations are very slow versus concrete instantiations. The compiler must make multiple passes to expand template instantiations so we need to be careful about how they are used. * Previous vector_type classes declared a union storage class, which aliases StaticallyIndexedArray<T,N>. ``` template <typename T> struct vector_type<T, 4, typename ck::enable_if_t<is_native_type<T>()>> { using d1_t = T; typedef T d2_t __attribute__((ext_vector_type(2))); typedef T d4_t __attribute__((ext_vector_type(4))); using type = d4_t; union { d4_t d4_; StaticallyIndexedArray<d1_t, 4> d1x4_; StaticallyIndexedArray<d2_t, 2> d2x2_; StaticallyIndexedArray<d4_t, 1> d4x1_; } data_; ... }; ``` * Upon further inspection, StaticallyIndexedArray is built on-top of a recursive Tuple concatenation. ``` template <typename T, index_t N> struct StaticallyIndexedArrayImpl { using type = typename tuple_concat<typename StaticallyIndexedArrayImpl<T, N / 2>::type, typename StaticallyIndexedArrayImpl<T, N - N / 2>::type>::type; }; ``` This union storage has been removed from the vector_type storage class. * Further references to StaticallyIndexedArray have been replaced with StaticallyIndexedArray_v2, which is a concrete implementation using C-style arrays. ``` template <typename T, index_t N> struct StaticallyIndexedArray_v2 { ... T data_[N]; }; ``` ### Fixes * Using bool datatype with vector_type was previously error prone. Bool, as a native datatype would be stored into bool ext_vector_type(N) for storage, which is a packed datatype. Meaning that for example, sizeof(bool ext_vector_type(4)) == 1, which does not equal sizeof(StaticallyIndexedArray<bool ext_vector_type(1), 4> == 4. The union of these datatypes has incorrect data slicing, meaning that the bits location of the packed bool do not match with the StaticallyIndexedArray member. As such, vector_type will use C-Style array storage for bool type instead of ext_vector_type. ``` template <typename T, index_t Rank> using NativeVectorT = T __attribute__((ext_vector_type(Rank))); sizeof(NativeVectorT<bool, 4>) == 1 (1 byte per 4 bool - packed) element0 = bit 0 of byte 0 element1 = bit 1 of byte 0 element2 = bit 2 of byte 0 element3 = bit 3 of byte 0 sizeof(StaticallyIndexedArray[NativeVectorT<bool, 1>, 4] == 4 (1 byte per bool) element0 = bit 0 of byte 0 element1 = bit 0 of byte 1 element1 = bit 0 of byte 2 element1 = bit 0 of byte 3 union{ NativeVectorT<bool, 4> d1_t; ... StaticallyIndexedArray[NativeVectorT<bool,1>, 4] d4x1; }; // union size == 4 which means invalid slicing! ``` * Math utilities such as next_power_of_two addressed for invalid cases of X < 2 * Remove redundant implementation of next_pow2 ### Additions * integer_log2_floor to math.hpp * is_power_of_two_integer to math.hpp ### Build Time Analysis Machine: banff-cyxtera-s78-2 Target: gfx942 \| Build Target \| Threads \| Frontend Parse Time (s) \| Backend Codegen Time (s) \| TotalTime (s) \| commitId \| \|---------------\|---------\|-------------------------\|--------------------------\|---------------\|	2026-02-11 19:01:05 +00:00
Bartłomiej Kocot	2dd2f114b3	[rocm-libraries] ROCm/rocm-libraries#4407 (commit adde219) [CK][CK TILE] Add has hot loop check for pipeline v1 ## Motivation Add has hot loop check for pipeline v1 (v1 basic and v1 basic async). Enable more tests which have been fixed by this change. ## Technical Details Hot loop has been executed without num loop check. ## Test Plan test_grouped_convnd_fwd_tile ## Test Result Passed ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. AICK-651 AICK-663	2026-02-11 13:43:01 +00:00
Johannes Graner	e88f139c6c	[rocm-libraries] ROCm/rocm-libraries#4271 (commit 6fce58e) [Conv] Add NumGroupsToMerge to BwdWeight type string MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes Add parameter to bwd weight V3 type string showing the number of groups to merge. This is required for MIOpen to be properly tuned since it uses type strings for performance database entries. In order to not break existing tuning databases, the parameter is added as a named suffix and only when group merging is enabled. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [ ] I have run `clang-format` on all changed files - [ ] Any dependent changes have been merged ## Discussion If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered	2026-02-11 09:08:38 +00:00
Cong Ma	d06f35027a	[rocm-libraries] ROCm/rocm-libraries#4354 (commit d41f08a) [CK TILE] fix numerical errors of preshuffle_b This pull request introduces several improvements and fixes related to quantized grouped GEMM (General Matrix Multiply) pipelines and their supporting utilities. # The numerical issue ## Steps to reproduce ```bash Run ./bin/tile_example_gemm_weight_preshuffle -prec=fp8 ./bin/tile_example_gemm_weight_preshuffle -prec=int4 ``` # Solution The main changes address type correctness, improve data layout and shuffling logic, and expand test coverage to better validate different GEMM configurations. Key changes include: ### Data layout and shuffling logic * Refactored the logic in `shuffle_b_permuteN` to use `constexpr` variables for `KLane` and `ItemsPerAccess`, simplifying tile view construction and correcting the permutation order for improved efficiency and correctness (`tensor_shuffle_utils.hpp`). * Fixed the calculation of `KLaneBytes` in weight preshuffle pipeline policies to account for internal data type conversion (e.g., from `pk_int4_t` to `fp8`), ensuring accurate memory access and alignment in quantized GEMM policies (`wp_pipeline_agmem_bgmem_creg_base_policy.hpp`, `gemm_wp_abquant_pipeline_ag_bg_cr_base_policy.hpp`). [[1]](diffhunk://#diff-93f16cd76e6e24404777e682a5ac8e039913ddd6a438c7efd61fdda42276e4efL274-R275) [[2]](diffhunk://#diff-9c3d0fc3c014feed435bfd93ba1f8f9fb3e054dcc322deada3addf70bee5a58cL100-R105) ### Test infrastructure enhancements * Unit tests did not catch this issue since there were no tests for fp8. Added new configuration structs (`config_mn_16x16`, `config_mn_32x32`) to support additional GEMM tile shapes and updated tests to run with these configurations for broader coverage (`test_gemm_pipeline_util.hpp`). [[1]](diffhunk://#diff-5a5962b2c4aa7f6a87d1d6201ad383135e30df13b42654e997d870d57420d5b8R86-R103) [[2]](diffhunk://#diff-5a5962b2c4aa7f6a87d1d6201ad383135e30df13b42654e997d870d57420d5b8L255-R269) Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2026-02-11 07:05:46 +00:00
Thomas Ning	807efa703a	[rocm-libraries] ROCm/rocm-libraries#4274 (commit 7c380df) Add padding to cshuffle epilogue to avoid bank conflict (#4274) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes Added the padding to CShuffle Epilogue to avoid the bank conflicts of 64. Synced up and learned from the internal repo. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [ ] I have run `clang-format` on all changed files - [ ] Any dependent changes have been merged ## Discussion If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered	2026-02-11 05:52:42 +00:00
Bartłomiej Kocot	6d6ee8f023	[rocm-libraries] ROCm/rocm-libraries#4457 (commit 258a459) [CK][CK Tile] Temporary disable grouped conv fwd tile comp async instances (#4457) ## Motivation [CK][CK Tile] Temporary disable grouped conv fwd tile comp async instances due to the failures ## Technical Details disable configs to not comple these instances ## Test Plan test_grouped_convnd_fwd_Tile ## Test Result pending ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-02-11 01:52:59 +00:00
Joseph Macaranas	9c94c2294a	[rocm-libraries] ROCm/rocm-libraries#4460 (commit ba5ef82) [Azure External CI] Disable Azure CI on rocm-libraries (#4460) - Deleting all pipeline trigger files tied to Azure External CI from top-level and project-level.	2026-02-10 23:11:31 +00:00
John Shumway	1af75d290e	[rocm-libraries] ROCm/rocm-libraries#4277 (commit 4348901) Add a README.md file to ck/library/util MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit I'm collecting information about our current testing (#3664). As part of this work I a README to the directory to emphasize the GPU-first testing strategy and our support for type-specific tolerances. This readme contains internal code comments for CK developers and does not need ROCm documentation review.	2026-02-10 21:27:27 +00:00
Randy Spaulding	d546ec0a53	[rocm-libraries] ROCm/rocm-libraries#4269 (commit 209f62f) Adapt parser to monorepo MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes Addressing issues found trying to run the dependency parser on MIOpen: - Ninja is recording the full path, e.g.: [json] ``` "file_to_executables": { "/home/rspauldi/repos/rocm-libraries/projects/miopen/include/miopen/miopen.h": [ ``` - Running git in monorepo reports the full _relative_ path, e.g.: ``` "projects/miopen/include/miopen/miopen.h" ``` Of course, `git diff` also returns all files modified in every other project's commits. These are filtered out as early as possible. This solution searches for `rocm-libraries` in the `parsing` step, and if found extracts the project name and stores it in `enhanced_dependency_mapping.json`. Leading folders are truncated from each file path, up to and including the project name. This allows `_is_project_file` to remain unchanged. The `selection` step then retrieves the project name from the json if it is defined, and truncates the project folder from the `git diff` output so the filenames exactly match the json entries. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [X] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [ ] I have run `clang-format` on all changed files - [ ] Any dependent changes have been merged ## Discussion Successfully runs on rocm-libraries MIOpen PRs and produces a list of tests. I haven't verified the results yet. This version is not applicable to CI since it operates on a per-executable level and MIOpen CI uses the single gtest binary. I'll be working towards that in future PRs over the next few weeks. ``` /home/rspauldi/repos/rocm-libraries/projects/miopen# git checkout miopen/sgundabo_enable_ck_bwd_wrw_navi <run CMake with TEST_DISCRETE=ON> # ninja tests # root@rjs1:/home/rspauldi/repos/rocm-libraries/projects/miopen# python3 /dep/main.py parse build/build.ninja Parsing ninja dependencies from: build/build.ninja Parsing ninja build file... Found 312 executables Found 820 object-to-source mappings Found 820 object files Extracting detailed dependencies for all object files... Processed 100/820 object files... Processed 200/820 object files... Processed 300/820 object files... Processed 400/820 object files... Processed 500/820 object files... Processed 600/820 object files... Processed 700/820 object files... Processed 800/820 object files... Completed dependency extraction for 820 object files Building file-to-executable mapping... Found rocm-libraries project: 'miopen' Built mapping for 608 files Files used by multiple executables: 216 Sample files with multiple dependencies: build/include/miopen/config.h: 306 executables build/include/miopen/export.h: 306 executables build/include/miopen/export_internals.h: 304 executables driver/InputFlags.hpp: 2 executables driver/driver.hpp: 2 executables === Enhanced Dependency Mapping Summary === Total executables: 312 Total files mapped: 608 Total object files processed: 820 File types: .cpp files: 310 .hpp files: 292 .h files: 6 Files used by multiple executables: 216 Top files with most dependencies: build/include/miopen/config.h: 306 executables build/include/miopen/export.h: 306 executables include/miopen/miopen.h: 304 executables src/include/miopen/config.hpp: 304 executables build/include/miopen/export_internals.h: 304 executables src/include/miopen/rank.hpp: 303 executables src/include/miopen/errors.hpp: 302 executables src/include/miopen/object.hpp: 302 executables src/include/miopen/returns.hpp: 302 executables src/include/miopen/sysinfo_utils.hpp: 302 executables Exporting mapping to build/enhanced_file_executable_mapping.csv Exporting complete mapping to build/enhanced_dependency_mapping.json Results exported to: CSV: build/enhanced_file_executable_mapping.csv JSON: build/enhanced_dependency_mapping.json root@rjs1:/home/rspauldi/repos/rocm-libraries/projects/miopen# python3 /dep/main.py select build/enhanced_dependency_mapping.json 1b13d8b72d54e34bdc7ae70dd2b6e809dca8b10e 09e5965d55ebbfacfd1ed18e5092580c2ffae748 Identified 30 files modified in project 'miopen' Exported 304 tests to run to tests_to_run.json ``` I don't know if clang-format applies to scripts. If so, could someone show me how to run it in CK?	2026-02-10 18:38:21 +00:00
Johannes Graner	40cec769ce	[rocm-libraries] ROCm/rocm-libraries#4266 (commit 1d8094d) [CK Conv] Add bwd weight instance for large-k shape MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes This instance improves the shape used in `./bin/ckProfiler grouped_conv_bwd_weight 1 2 0 2 0 1 2 1 32 2376 256 3 3 100 100 1 1 1 1 1 1 1 1 all` from 10.3 ms to 6.6 ms. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [ ] I have run `clang-format` on all changed files - [ ] Any dependent changes have been merged ## Discussion If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered	2026-02-10 16:58:04 +00:00
Erwin Terpstra	b41bfece83	[rocm-libraries] ROCm/rocm-libraries#4268 (commit d2fca53) [CK_TILE]: PreshuffleB + PreshuffleBQuant for ABQuant pipeline (#4268) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes Implement BQuantPreshuffle option for the ABQuant PreshuffleB pipeline. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [X] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [X] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [X] I have added inline documentation which enables the maintainers with understanding the motivation - [X] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [X] I have run `clang-format` on all changed files - [X] Any dependent changes have been merged	2026-02-10 13:59:03 +00:00
Yi DING	d5acfd8d52	[rocm-libraries] ROCm/rocm-libraries#4451 (commit 091bf0f) [CK_TILE] Blockscale Gemm Fix Multi-Arch Compilation ## Motivation This PR updates CK_TILE blockscale GEMM-quant kernels and launch helpers to compile across multiple GPU architectures by introducing compile-time availability gating and a new attribute tag mechanism for kernel symbol/attribute specialization. ## Technical Details - Add an architecture-guarded `kIsAvailable` flag to the gfx950 pipeline and propagate availability handling into `QuantGemmKernel`. - Extend `make_kernel`/`kentry` to accept an `Attr` tag enabling per-kernel compile-time attributes (e.g., `no-packed-fp32-ops`) and unique symbols. - Update the blockscale GEMM quant example to pass kernel attributes and adjust gfx950 gating. ## Test Plan - CI - Local test: `cmake .. --preset dev -DGPU_TARGETS='gfx942;gfx950' -GNinja && ninja tile_example_gemm_quant` - Local test with ROCm/aiter#1954 ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-02-10 12:42:19 +00:00
dependabot[bot]	6a6cd05dbb	[rocm-libraries] ROCm/rocm-libraries#3090 (commit 728d3a3) Bump fonttools from 4.57.0 to 4.61.0 in /projects/composablekernel/docs/sphinx (#3090) Bumps [fonttools](https://github.com/fonttools/fonttools) from 4.57.0 to 4.61.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/fonttools/fonttools/releases">fonttools's releases</a>.</em></p> <blockquote> <h2>4.61.0</h2> <ul> <li>[varLib.main]: <strong>SECURITY</strong> Only use basename(vf.filename) to prevent path traversal attacks when running <code>fonttools varLib</code> command-line script, or code which invokes <code>fonttools.varLib.main()</code>. Fixes CVE-2025-66034, see: <a href="https://github.com/fonttools/fonttools/security/advisories/GHSA-768j-98cg-p3fv">https://github.com/fonttools/fonttools/security/advisories/GHSA-768j-98cg-p3fv</a>.</li> <li>[feaLib] Sort BaseLangSysRecords by tag (<a href="https://redirect.github.com/fonttools/fonttools/issues/3986">#3986</a>).</li> <li>Drop support for EOL Python 3.9 (<a href="https://redirect.github.com/fonttools/fonttools/issues/3982">#3982</a>).</li> <li>[instancer] Support --remove-overlaps for fonts with CFF2 table (<a href="https://redirect.github.com/fonttools/fonttools/issues/3975">#3975</a>).</li> <li>[CFF2ToCFF] Add --remove-overlaps option (<a href="https://redirect.github.com/fonttools/fonttools/issues/3976">#3976</a>).</li> <li>[feaLib] Raise an error for rsub with NULL target (<a href="https://redirect.github.com/fonttools/fonttools/issues/3979">#3979</a>).</li> <li>[bezierTools] Fix logic bug in curveCurveIntersections (<a href="https://redirect.github.com/fonttools/fonttools/issues/3963">#3963</a>).</li> <li>[feaLib] Error when condition sets have the same name (<a href="https://redirect.github.com/fonttools/fonttools/issues/3958">#3958</a>).</li> <li>[cu2qu.ufo] skip processing empty glyphs to support sparse kerning masters (<a href="https://redirect.github.com/fonttools/fonttools/issues/3956">#3956</a>).</li> <li>[unicodedata] Update to Unicode 17. Require <code>unicodedata2 >= 17.0.0</code> when installed with 'unicode' extra.</li> </ul> <h2>4.60.1</h2> <ul> <li>[ufoLib] Reverted accidental method name change in <code>UFOReader.getKerningGroupConversionRenameMaps</code> that broke compatibility with downstream projects like defcon (<a href="https://redirect.github.com/fonttools/fonttools/issues/3948">#3948</a>, <a href="https://redirect.github.com/fonttools/fonttools/issues/3947">#3947</a>, <a href="https://redirect.github.com/robotools/defcon/issues/478">robotools/defcon#478</a>).</li> <li>[ufoLib] Added test coverage for <code>getKerningGroupConversionRenameMaps</code> method (<a href="https://redirect.github.com/fonttools/fonttools/issues/3950">#3950</a>).</li> <li>[subset] Don't try to subset BASE table; pass it through by default instead (<a href="https://redirect.github.com/fonttools/fonttools/issues/3949">#3949</a>).</li> <li>[subset] Remove empty BaseRecord entries in MarkBasePos lookups (<a href="https://redirect.github.com/fonttools/fonttools/issues/3897">#3897</a>, <a href="https://redirect.github.com/fonttools/fonttools/issues/3892">#3892</a>).</li> <li>[subset] Add pruning for MarkLigPos and MarkMarkPos lookups (<a href="https://redirect.github.com/fonttools/fonttools/issues/3946">#3946</a>).</li> <li>[subset] Remove duplicate features when subsetting (<a href="https://redirect.github.com/fonttools/fonttools/issues/3945">#3945</a>).</li> <li>[Docs] Added documentation for the visitor module (<a href="https://redirect.github.com/fonttools/fonttools/issues/3944">#3944</a>).</li> </ul> <h2>4.60.0</h2> <ul> <li> <p>[pointPen] Allow <code>reverseFlipped</code> parameter of <code>DecomposingPointPen</code> to take a <code>ReverseFlipped</code> enum value to control whether/how to reverse contour direction of flipped components, in addition to the existing True/False. This allows to set <code>ReverseFlipped.ON_CURVE_FIRST</code> to ensure that the decomposed outline starts with an on-curve point before being reversed, for better consistency with other segment-oriented contour transformations. The change is backward compatible, and the default behavior hasn't changed (<a href="https://redirect.github.com/fonttools/fonttools/issues/3934">#3934</a>).</p> </li> <li> <p>[filterPen] Added <code>ContourFilterPointPen</code>, base pen for buffered contour operations, and <code>OnCurveStartPointPen</code> filter to ensure contours start with an on-curve point (<a href="https://redirect.github.com/fonttools/fonttools/issues/3934">#3934</a>).</p> </li> <li> <p>[cu2qu] Fixed difference in cython vs pure-python complex division by real number (<a href="https://redirect.github.com/fonttools/fonttools/issues/3930">#3930</a>).</p> </li> <li> <p>[varLib.avar] Refactored and added some new sub-modules and scripts (<a href="https://redirect.github.com/fonttools/fonttools/issues/3926">#3926</a>).</p> <ul> <li><code>varLib.avar.build</code> module to build avar (and a missing fvar) binaries into a possibly empty TTFont,</li> <li><code>varLib.avar.unbuild</code> module to print a .designspace snippet that would generate the same avar binary,</li> <li><code>varLib.avar.map</code> module to take TTFont and do the mapping, in user/normalized space,</li> <li><code>varLib.avar.plan</code> module moved from <code>varLib.avarPlanner</code>.</li> </ul> <p>The bare <code>fonttools varLib.avar</code> script is deprecated, in favour of <code>fonttools varLib.avar.build</code> (or <code>unbuild</code>).</p> </li> <li> <p>[interpolatable] Clarify <code>linear_sum_assignment</code> backend options and minimal dependency usage (<a href="https://redirect.github.com/fonttools/fonttools/issues/3927">#3927</a>).</p> </li> <li> <p>[post] Speed up <code>build_psNameMapping</code> (<a href="https://redirect.github.com/fonttools/fonttools/issues/3923">#3923</a>).</p> </li> <li> <p>[ufoLib] Added typing annotations to fontTools.ufoLib (<a href="https://redirect.github.com/fonttools/fonttools/issues/3875">#3875</a>).</p> </li> </ul> <h2>4.59.2</h2> <ul> <li>[varLib] Clear <code>USE_MY_METRICS</code> component flags when inconsistent across masters (<a href="https://redirect.github.com/fonttools/fonttools/issues/3912">#3912</a>).</li> <li>[varLib.instancer] Avoid negative advance width/height values when instatiating HVAR/VVAR, (unlikely in well-behaved fonts) (<a href="https://redirect.github.com/fonttools/fonttools/issues/3918">#3918</a>).</li> <li>[subset] Fix shaping behaviour when pruning empty mark sets (<a href="https://redirect.github.com/fonttools/fonttools/issues/3915">#3915</a>, <a href="https://redirect.github.com/harfbuzz/harfbuzz/issues/5499">harfbuzz/harfbuzz#5499</a>).</li> <li>[cu2qu] Fixed <code>dot()</code> product of perpendicular vectors not always returning exactly 0.0 in all Python implementations (<a href="https://redirect.github.com/fonttools/fonttools/issues/3911">#3911</a>)</li> <li>[varLib.instancer] Implemented fully-instantiating <code>avar2</code> fonts (<a href="https://redirect.github.com/fonttools/fonttools/issues/3909">#3909</a>).</li> <li>[feaLib] Allow float values in <code>VariableScalar</code>'s axis locations (<a href="https://redirect.github.com/fonttools/fonttools/issues/3906">#3906</a>, <a href="https://redirect.github.com/fonttools/fonttools/issues/3907">#3907</a>).</li> <li>[cu2qu] Handle special case in <code>calc_intersect</code> for degenerate cubic curves where 3 to 4 control points are equal (<a href="https://redirect.github.com/fonttools/fonttools/issues/3904">#3904</a>).</li> </ul> <h2>4.59.1</h2> <ul> <li>[featureVars] Update OS/2.usMaxContext if possible after addFeatureVariationsRaw (<a href="https://redirect.github.com/fonttools/fonttools/issues/3894">#3894</a>).</li> <li>[vhmtx] raise TTLibError('not enough data...') when hmtx/vmtx are truncated (<a href="https://redirect.github.com/fonttools/fonttools/issues/3843">#3843</a>, <a href="https://redirect.github.com/fonttools/fonttools/issues/3901">#3901</a>).</li> <li>[feaLib] Combine duplicate features that have the same set of lookups regardless of the order in which those lookups are added to the feature (<a href="https://redirect.github.com/fonttools/fonttools/issues/3895">#3895</a>).</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/fonttools/fonttools/blob/main/NEWS.rst">fonttools's changelog</a>.</em></p> <blockquote> <h2>4.61.0 (released 2025-11-28)</h2> <ul> <li>[varLib.main]: <strong>SECURITY</strong> Only use basename(vf.filename) to prevent path traversal attacks when running <code>fonttools varLib</code> command, or code which invokes <code>fonttools.varLib.main()</code>. Fixes CVE-2025-66034, see: <a href="https://github.com/fonttools/fonttools/security/advisories/GHSA-768j-98cg-p3fv">https://github.com/fonttools/fonttools/security/advisories/GHSA-768j-98cg-p3fv</a>.</li> <li>[feaLib] Sort BaseLangSysRecords by tag (<a href="https://redirect.github.com/fonttools/fonttools/issues/3986">#3986</a>).</li> <li>Drop support for EOL Python 3.9 (<a href="https://redirect.github.com/fonttools/fonttools/issues/3982">#3982</a>).</li> <li>[instancer] Support --remove-overlaps for fonts with CFF2 table (<a href="https://redirect.github.com/fonttools/fonttools/issues/3975">#3975</a>).</li> <li>[CFF2ToCFF] Add --remove-overlaps option (<a href="https://redirect.github.com/fonttools/fonttools/issues/3976">#3976</a>).</li> <li>[feaLib] Raise an error for rsub with NULL target (<a href="https://redirect.github.com/fonttools/fonttools/issues/3979">#3979</a>).</li> <li>[bezierTools] Fix logic bug in curveCurveIntersections (<a href="https://redirect.github.com/fonttools/fonttools/issues/3963">#3963</a>).</li> <li>[feaLib] Error when condition sets have the same name (<a href="https://redirect.github.com/fonttools/fonttools/issues/3958">#3958</a>).</li> <li>[cu2qu.ufo] skip processing empty glyphs to support sparse kerning masters (<a href="https://redirect.github.com/fonttools/fonttools/issues/3956">#3956</a>).</li> <li>[unicodedata] Update to Unicode 17. Require <code>unicodedata2 >= 17.0.0</code> when installed with 'unicode' extra.</li> </ul> <h2>4.60.1 (released 2025-09-29)</h2> <ul> <li>[ufoLib] Reverted accidental method name change in <code>UFOReader.getKerningGroupConversionRenameMaps</code> that broke compatibility with downstream projects like defcon (<a href="https://redirect.github.com/fonttools/fonttools/issues/3948">#3948</a>, <a href="https://redirect.github.com/fonttools/fonttools/issues/3947">#3947</a>, <a href="https://redirect.github.com/robotools/defcon/issues/478">robotools/defcon#478</a>).</li> <li>[ufoLib] Added test coverage for <code>getKerningGroupConversionRenameMaps</code> method (<a href="https://redirect.github.com/fonttools/fonttools/issues/3950">#3950</a>).</li> <li>[subset] Don't try to subset BASE table; pass it through by default instead (<a href="https://redirect.github.com/fonttools/fonttools/issues/3949">#3949</a>).</li> <li>[subset] Remove empty BaseRecord entries in MarkBasePos lookups (<a href="https://redirect.github.com/fonttools/fonttools/issues/3897">#3897</a>, <a href="https://redirect.github.com/fonttools/fonttools/issues/3892">#3892</a>).</li> <li>[subset] Add pruning for MarkLigPos and MarkMarkPos lookups (<a href="https://redirect.github.com/fonttools/fonttools/issues/3946">#3946</a>).</li> <li>[subset] Remove duplicate features when subsetting (<a href="https://redirect.github.com/fonttools/fonttools/issues/3945">#3945</a>).</li> <li>[Docs] Added documentation for the visitor module (<a href="https://redirect.github.com/fonttools/fonttools/issues/3944">#3944</a>).</li> </ul> <h2>4.60.0 (released 2025-09-17)</h2> <ul> <li>[pointPen] Allow <code>reverseFlipped</code> parameter of <code>DecomposingPointPen</code> to take a <code>ReverseFlipped</code> enum value to control whether/how to reverse contour direction of flipped components, in addition to the existing True/False. This allows to set <code>ReverseFlipped.ON_CURVE_FIRST</code> to ensure that the decomposed outline starts with an on-curve point before being reversed, for better consistency with other segment-oriented contour transformations. The change is backward compatible, and the default behavior hasn't changed (<a href="https://redirect.github.com/fonttools/fonttools/issues/3934">#3934</a>).</li> <li>[filterPen] Added <code>ContourFilterPointPen</code>, base pen for buffered contour operations, and <code>OnCurveStartPointPen</code> filter to ensure contours start with an on-curve point (<a href="https://redirect.github.com/fonttools/fonttools/issues/3934">#3934</a>).</li> <li>[cu2qu] Fixed difference in cython vs pure-python complex division by real number (<a href="https://redirect.github.com/fonttools/fonttools/issues/3930">#3930</a>).</li> <li>[varLib.avar] Refactored and added some new sub-modules and scripts (<a href="https://redirect.github.com/fonttools/fonttools/issues/3926">#3926</a>). <ul> <li><code>varLib.avar.build</code> module to build avar (and a missing fvar) binaries into a possibly empty TTFont,</li> <li><code>varLib.avar.unbuild</code> module to print a .designspace snippet that would generate the same avar binary,</li> <li><code>varLib.avar.map</code> module to take TTFont and do the mapping, in user/normalized space,</li> <li><code>varLib.avar.plan</code> module moved from <code>varLib.avarPlanner</code>. The bare <code>fonttools varLib.avar</code> script is deprecated, in favour of <code>fonttools varLib.avar.build</code> (or <code>unbuild</code>).</li> </ul> </li> <li>[interpolatable] Clarify <code>linear_sum_assignment</code> backend options and minimal dependency usage (<a href="https://redirect.github.com/fonttools/fonttools/issues/3927">#3927</a>).</li> <li>[post] Speed up <code>build_psNameMapping</code> (<a href="https://redirect.github.com/fonttools/fonttools/issues/3923">#3923</a>).</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`e691e3bef9`"><code>e691e3b</code></a> Release 4.61.0</li> <li><a href="`c2d540f4ad`"><code>c2d540f</code></a> Update NEWS.rst</li> <li><a href="`3859753a05`"><code>3859753</code></a> Update NEWS.rst</li> <li><a href="`26eb070a55`"><code>26eb070</code></a> black</li> <li><a href="`5ff73af326`"><code>5ff73af</code></a> Merge commit from fork</li> <li><a href="`a696d5ba93`"><code>a696d5b</code></a> varLib: only use the basename(vf.filename)</li> <li><a href="`b00bc459ef`"><code>b00bc45</code></a> varLib_test: test path traversal in variable-font filename</li> <li><a href="`066512e4f3`"><code>066512e</code></a> Merge pull request <a href="https://redirect.github.com/fonttools/fonttools/issues/3986">#3986</a> from cmyr/base-minmax-sorting</li> <li><a href="`ce78973e97`"><code>ce78973</code></a> [feaLib] Sort BasLangSysRecords by tag</li> <li><a href="`5bb37dc201`"><code>5bb37dc</code></a> Merge pull request <a href="https://redirect.github.com/fonttools/fonttools/issues/3983">#3983</a> from fonttools/dependabot/pip/brotli-1.2.0</li> <li>Additional commits viewable in <a href="https://github.com/fonttools/fonttools/compare/4.57.0...4.61.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=fonttools&package-manager=pip&previous-version=4.57.0&new-version=4.61.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end)	2026-02-10 07:08:05 +00:00
Aviral Goel	06ad66b3e4	[rocm-libraries] ROCm/rocm-libraries#4265 (commit 0f9b3b0) [CK Tools] Auto-enable unbuffered output for Python commands (#4265) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ck-docker exec and ck-exec now automatically detect Python commands and set PYTHONUNBUFFERED=1 to enable live output streaming. This eliminates the need to manually set the environment variable when running Python scripts that print progress updates. The detection matches python, python3, or any .py file argument. This helps in watching live terminal output when a python script is running inside the container.	2026-02-10 03:00:40 +00:00
dependabot[bot]	b688665d79	[rocm-libraries] ROCm/rocm-libraries#475 (commit cabe79b) Bump pillow from 11.2.1 to 11.3.0 in /projects/composablekernel/docs/sphinx (#475) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bumps [pillow](https://github.com/python-pillow/Pillow) from 11.2.1 to 11.3.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/python-pillow/Pillow/releases">pillow's releases</a>.</em></p> <blockquote> <h2>11.3.0</h2> <p><a href="https://pillow.readthedocs.io/en/stable/releasenotes/11.3.0.html">https://pillow.readthedocs.io/en/stable/releasenotes/11.3.0.html</a></p> <h2>Deprecations</h2> <ul> <li>Deprecate fromarray mode argument <a href="https://redirect.github.com/python-pillow/Pillow/issues/9018">#9018</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Deprecate saving I mode images as PNG <a href="https://redirect.github.com/python-pillow/Pillow/issues/9023">#9023</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> </ul> <h2>Documentation</h2> <ul> <li>Added release notes for <a href="https://redirect.github.com/python-pillow/Pillow/issues/9041">#9041</a> <a href="https://redirect.github.com/python-pillow/Pillow/issues/9042">#9042</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Add release notes for <a href="https://redirect.github.com/python-pillow/Pillow/issues/8912">#8912</a> and <a href="https://redirect.github.com/python-pillow/Pillow/issues/8969">#8969</a> <a href="https://redirect.github.com/python-pillow/Pillow/issues/9019">#9019</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>ImageFont does not handle multiline text <a href="https://redirect.github.com/python-pillow/Pillow/issues/9000">#9000</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Updated Ubuntu CI targets <a href="https://redirect.github.com/python-pillow/Pillow/issues/8988">#8988</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Update MinGW package names <a href="https://redirect.github.com/python-pillow/Pillow/issues/8987">#8987</a> [<a href="https://github.com/H4M5TER"><code>@H4M5TER</code></a>]</li> <li>Updated docstring <a href="https://redirect.github.com/python-pillow/Pillow/issues/8943">#8943</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Mention that tobytes() with the raw encoder uses Pack.c <a href="https://redirect.github.com/python-pillow/Pillow/issues/8878">#8878</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Refactor docs <code>Makefile</code> <a href="https://redirect.github.com/python-pillow/Pillow/issues/8933">#8933</a> [<a href="https://github.com/hugovk"><code>@hugovk</code></a>]</li> <li>Add template for quarterly release issue <a href="https://redirect.github.com/python-pillow/Pillow/issues/8932">#8932</a> [<a href="https://github.com/aclark4life"><code>@aclark4life</code></a>]</li> <li>Add list of third party plugins <a href="https://redirect.github.com/python-pillow/Pillow/issues/8910">#8910</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Update redirected URL <a href="https://redirect.github.com/python-pillow/Pillow/issues/8919">#8919</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Docs: use sentence case for headers <a href="https://redirect.github.com/python-pillow/Pillow/issues/8914">#8914</a> [<a href="https://github.com/hugovk"><code>@hugovk</code></a>]</li> <li>Docs: remove unused Makefile targets <a href="https://redirect.github.com/python-pillow/Pillow/issues/8917">#8917</a> [<a href="https://github.com/hugovk"><code>@hugovk</code></a>]</li> <li>Remove indentation from lists <a href="https://redirect.github.com/python-pillow/Pillow/issues/8915">#8915</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Python 3.13 is tested on Arch <a href="https://redirect.github.com/python-pillow/Pillow/issues/8894">#8894</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Move XV Thumbnails to read only section <a href="https://redirect.github.com/python-pillow/Pillow/issues/8893">#8893</a> [<a href="https://github.com/aclark4life"><code>@aclark4life</code></a>]</li> <li>Updated macOS tested Pillow versions <a href="https://redirect.github.com/python-pillow/Pillow/issues/8890">#8890</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> </ul> <h2>Dependencies</h2> <ul> <li>Add AVIF to wheels using only aomenc and dav1d AVIF codecs for reduced size <a href="https://redirect.github.com/python-pillow/Pillow/issues/8858">#8858</a> [<a href="https://github.com/fdintino"><code>@fdintino</code></a>]</li> <li>Use same AVIF URL when fetching dependency <a href="https://redirect.github.com/python-pillow/Pillow/issues/8871">#8871</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Update dependency mypy to v1.16.1 <a href="https://redirect.github.com/python-pillow/Pillow/issues/9026">#9026</a> [@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li> <li>Update libpng to 1.6.49 <a href="https://redirect.github.com/python-pillow/Pillow/issues/9014">#9014</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Update dependency cibuildwheel to v3 <a href="https://redirect.github.com/python-pillow/Pillow/issues/9010">#9010</a> [@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li> <li>Updated libjpeg-turbo to 3.1.1 <a href="https://redirect.github.com/python-pillow/Pillow/issues/9009">#9009</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Update dependency mypy to v1.16.0 <a href="https://redirect.github.com/python-pillow/Pillow/issues/8991">#8991</a> [@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li> <li>Updated libpng to 1.6.48 <a href="https://redirect.github.com/python-pillow/Pillow/issues/8940">#8940</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Updated Ghostscript to 10.5.1 <a href="https://redirect.github.com/python-pillow/Pillow/issues/8939">#8939</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Updated harfbuzz to 11.2.1 <a href="https://redirect.github.com/python-pillow/Pillow/issues/8937">#8937</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Updated libavif to 1.3.0 <a href="https://redirect.github.com/python-pillow/Pillow/issues/8949">#8949</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Update dependency cibuildwheel to v2.23.3 <a href="https://redirect.github.com/python-pillow/Pillow/issues/8931">#8931</a> [@<a href="https://github.com/apps/renovate">renovate[bot]</a>]</li> <li>Updated harfbuzz to 11.1.0 <a href="https://redirect.github.com/python-pillow/Pillow/issues/8904">#8904</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> </ul> <h2>Testing</h2> <ul> <li>Add <code>match</code> parameter to <code>pytest.warns()</code> <a href="https://redirect.github.com/python-pillow/Pillow/issues/9038">#9038</a> [<a href="https://github.com/hugovk"><code>@hugovk</code></a>]</li> <li>Increase pytest verbosity <a href="https://redirect.github.com/python-pillow/Pillow/issues/9040">#9040</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Improve SgiImagePlugin test coverage <a href="https://redirect.github.com/python-pillow/Pillow/issues/8896">#8896</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> <li>Update ruff pre-commit ID <a href="https://redirect.github.com/python-pillow/Pillow/issues/8994">#8994</a> [<a href="https://github.com/radarhere"><code>@radarhere</code></a>]</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`89f1f4626a`"><code>89f1f46</code></a> 11.3.0 version bump</li> <li><a href="`f2de251c76`"><code>f2de251</code></a> Updated check script paths (<a href="https://redirect.github.com/python-pillow/Pillow/issues/9052">#9052</a>)</li> <li><a href="`84855d11c8`"><code>84855d1</code></a> Raise FileNotFoundError when opening an empty path (<a href="https://redirect.github.com/python-pillow/Pillow/issues/9048">#9048</a>)</li> <li><a href="`204d11d4da`"><code>204d11d</code></a> Raise FileNotFoundError when opening an empty path</li> <li><a href="`2b39f7581e`"><code>2b39f75</code></a> Handle IPTC TIFF tags with incorrect type (<a href="https://redirect.github.com/python-pillow/Pillow/issues/8925">#8925</a>)</li> <li><a href="`e7a53ba19b`"><code>e7a53ba</code></a> Do not update palette for L mode GIF frame (<a href="https://redirect.github.com/python-pillow/Pillow/issues/8924">#8924</a>)</li> <li><a href="`c22230b761`"><code>c22230b</code></a> Use save parameters as encoderinfo defaults (<a href="https://redirect.github.com/python-pillow/Pillow/issues/9001">#9001</a>)</li> <li><a href="`da10ed1cf3`"><code>da10ed1</code></a> Add support for iOS (<a href="https://redirect.github.com/python-pillow/Pillow/issues/9030">#9030</a>)</li> <li><a href="`be2b4e7864`"><code>be2b4e7</code></a> Fix qtables and quality scaling (<a href="https://redirect.github.com/python-pillow/Pillow/issues/8879">#8879</a>)</li> <li><a href="`d4162f8505`"><code>d4162f8</code></a> Updated return type</li> <li>Additional commits viewable in <a href="https://github.com/python-pillow/Pillow/compare/11.2.1...11.3.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pillow&package-manager=pip&previous-version=11.2.1&new-version=11.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end)	2026-02-10 02:50:35 +00:00
Bartłomiej Kocot	27e0a34e0f	[rocm-libraries] ROCm/rocm-libraries#4406 (commit 61f9f90) [CK] CK Tile grouped convolution direct load ## Motivation CK Tile grouped convolution forward direct load support. ## Technical Details Basic pipeline for direct load and new instances for forward for v1 and v4 pipelines. ## Test Plan test_grouped_convnd_fwd_tile ## Test Result CI pending ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. AICK-130	2026-02-09 21:09:42 +00:00
Chinmay Dattanand Kuchinad	0cafa68b6f	[rocm-libraries] ROCm/rocm-libraries#4292 (commit b7f1367) Enable group mode (varlen) kernel generation for PyTorch integration (#4292) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes This PR enables group mode (variable-length attention) kernel generation for PyTorch's CK SDPA backend. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [X] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [X] I have run `clang-format` on all changed files - [ ] Any dependent changes have been merged ## Discussion The change is minimal (single line deletion) but enables a significant feature: variable-length attention support for ROCm users via PyTorch's torch.nn.attention.varlen API.	2026-02-09 20:59:55 +00:00
Bartłomiej Kocot	ea6363ad78	[rocm-libraries] ROCm/rocm-libraries#4399 (commit 331512e) [CK] Fix grouped conv fwd transform for merged groups ## Motivation [CK] Fix grouped conv fwd transform for merged groups for 1d and 3d. ## Technical Details After optimizations for 2d there is a lack of implementation for 1d and 3d ## Test Plan test_grouped_convnd_fwd ## Test Result pending CI ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-02-09 15:37:36 +00:00
Eiden Yoshida	e16789b609	[rocm-libraries] ROCm/rocm-libraries#4373 (commit 1c29275) [CK] MICI: Disable failure pattern checking ## Motivation - ck mici jobs hanging at end, possibly at failure pattern checking ## Technical Details - Disable failure pattern checking to see if hanging goes away ## Test Plan - Observe behavior after merge	2026-02-09 15:25:01 +00:00
kensclin	5b3e527c88	[rocm-libraries] ROCm/rocm-libraries#4280 (commit b7de1e1) [CK_TILE] Add blockscale GEMM support for EightWarps on gfx950 (#4280) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes gemm blockscale eightwarps support ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [x] I have run `clang-format` on all changed files - [x] Any dependent changes have been merged ## Discussion If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered	2026-02-09 03:55:52 +00:00
jakpiase	731afe535a	[rocm-libraries] ROCm/rocm-libraries#4357 (commit ff3e982) [CK_TILE] Add support and tests for V6 pipeline in conv fwd (#4357) Added support for conv v6 pipeline in ck tile's convolution forward kernel. CK Tile v6 pipeline is the equivalent to old ck's V5 pipeline and should be faster than other pipelines for some cases. This PR also adds tests inside profiler that's currently inside experimental directory, so now we should be able to detect regressions easier.	2026-02-08 19:57:53 +00:00
Ville Pietilä	57d26db844	[rocm-libraries] ROCm/rocm-libraries#4273 (commit 591f504) [CK] Add fwd conv group merging to v3 conv instances MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes Added conv group merging to the (universal) V3 fwd conv pipeline. The new instance improves fwd conv performance when the number of input/output channel per group is low. On MI300 (`gfx942`) we get \| CK prof command \| Baseline (TFLOPS) \| V3 group merging (TFLOPS) \| \|:-----\|:------:\|------:\| \| grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 4 4 3 3 200 200 1 1 1 1 1 1 1 1 \| 3.86035 \| 8.36796 \| \| grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 8 8 3 3 200 200 2 2 1 1 1 1 1 1 \| 10.1867 \| 13.4677 \| \| grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 8 8 3 3 100 100 1 2 1 1 1 1 1 1 \| 11.7875 \| 16.3657 \|	2026-02-08 11:35:56 +00:00
Emily Martins	4266f867d6	[rocm-libraries] ROCm/rocm-libraries#4381 (commit 5df3343) [CK_TILE] Fix MMA concepts compiler error ## Motivation CK Tile is required to support certain older OSs; on these OSs, cpp 20 is not fully supported. For ROCm 7.2, compiler errors occur on one of these older OSs. An example of this error is as follows: ```bash /composable_kernel/include/ck_tile/core/arch/mma/amdgcn_mma.hpp:34:28: error: expected concept name with optional arguments 34 \| { MmaOp::kAMBlock } -> std::convertible_to<unsigned int>; \| ``` The goal of this PR is to resolve these compiler errors. ## Technical Details The existing guards around the mma concepts only check if the concepts language feature is supported, as follows: ```cpp #if defined(__cpp_concepts) && __cpp_concepts >= 201907L // ... template <typename CtrlFlags> concept CtrlFlagsGfx9I = requires(CtrlFlags ctrlFlags) { // Flag members for Gfx9 MFMA instructions { CtrlFlags::Cbsz } -> std::convertible_to<int>; { CtrlFlags::Abid } -> std::convertible_to<int>; { CtrlFlags::Blgp } -> std::convertible_to<int>; }; #endif // defined(__cpp_concepts) && __cpp_concepts >= 201907L ``` That said, in cases where functionality from the `<concepts>` header is used (e.g., `std::convertible_to`), this guard fails to check whether the `<concepts>` header is available. This change adds an additional check to the concepts that make use of functionality from the `<concepts>` header to ensure the header is available. ## Test Plan I tested the changes on the relevant docker for gfx90a, gfx950, and gfx942 and the compiler issue is not present. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-02-07 00:28:06 +00:00
Aviral Goel	4237aedf9a	[rocm-libraries] ROCm/rocm-libraries#4335 (commit 06976b3) =?UTF-8?q?Increase=20tolerance=20for=20FP16=20GEMM=20test?= =?UTF-8?q?s=20to=20handle=20non-deterministic=20ro=E2=80=A6=20(#4335)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit …unding Three tests were failing intermittently with small errors (0.01-1.5%) due to non-deterministic FP16 accumulation order from GPU thread scheduling: - test_ck_tile_batched_gemm - test_ck_tile_grouped_gemm_preshuffle - test_ck_tile_grouped_gemm_multi_d These tests use kbatch=1 (no split-K), so errors are from order-dependent rounding, not atomics. Increased tolerances from 1e-3 to 2e-3 (0.2%) to account for FP16 precision limits while still catching real bugs. - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>	2026-02-07 00:15:34 +00:00
spolifroni-amd	d2f1541976	[rocm-libraries] ROCm/rocm-libraries#4300 (commit 07e9d56) [CK] add inter/intrawave scheduling concept doc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Proposed changes Adding information about inter/intrawave scheduling	2026-02-07 00:11:11 +00:00
Enrico Degregori	984a3d1828	[rocm-libraries] ROCm/rocm-libraries#4372 (commit 738ffd7) [CK] Workaround blockscale wp test failure ## Motivation Workaround to fix blockscale wp test failure for pipeline v3 ## Technical Details <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan <!-- Explain any relevant testing done to verify this PR. --> ## Test Result <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-02-07 00:09:58 +00:00
Illia Silin	1ddb38f098	[rocm-libraries] ROCm/rocm-libraries#4375 (commit 45b616b) [CK] fix path for build filter ## Motivation Fix the filter that determines whether CI builds are necessary. ## Technical Details A script checks the files list returned by git diff and checks whether any code source was modified. If not, if only documentation was changed, it will allow skipping the builds. We make sure we only look at the changes in projects/composablekernel/ folder.	2026-02-06 18:18:14 +00:00
Geo Min	41353c8f3c	[rocm-libraries] ROCm/rocm-libraries#4378 (commit d8e2826) [ci] Adding mi350 required group ID After updating mi325 group-id, we are noticing errors for mi350. Tested here for mi350: https://github.com/ROCm/TheRock/actions/runs/21733399385/job/62692971370 Tested here for mi325: https://github.com/ROCm/TheRock/actions/runs/21759203211/job/62778060417 Adding both work properly	2026-02-06 18:00:27 +00:00
Illia Silin	4dd4869fbf	[rocm-libraries] ROCm/rocm-libraries#4361 (commit 37a74ef) [CK] a bunch of CI fixes. ## Motivation Fixing some of the CK CI issues ## Technical Details fixing paths to dockerfiles and scripts; moving codegen tests to separate stage (collides with main build since you must call cmake from same folder but different options); fixing a couple of clang compilation issues with staging compiler;	2026-02-06 01:07:34 +00:00
Eiden Yoshida	e96beb1f3e	[rocm-libraries] ROCm/rocm-libraries#4352 (commit 3c9beb3) [CK] MICI: Fix git diff in selective_test_filter.py ## Motivation - git diff needs access to reference repo ## Technical Details - mount reference repo path into docker for selective_test_filter.py to access ## Test Plan - tested in MICI ## Test Result - launch_tests.sh ran successfully	2026-02-05 22:57:20 +00:00
Geo Min	58549aa787	[rocm-libraries] ROCm/rocm-libraries#4360 (commit 5aa1f1d) [ci] Updating variable group-id for OSSCI OSSCI migrated mi325s, so need a new groupID Sanity works here: https://github.com/ROCm/TheRock/actions/runs/21723540679/job/62659665907 normal run works here: https://github.com/ROCm/TheRock/actions/runs/21723540679/job/62659791422 I've dabbled with organization variables, however, this does not work for forks so for now, we will do the manual update	2026-02-05 19:02:46 +00:00
Jobbins	344d98781b	[rocm-libraries] ROCm/rocm-libraries#4351 (commit 3b98c98) [composablekernel] fix failure status ## Motivation Pipelines were failing on Math CI status check. ## Technical Details For the success case, we just changed the config in Jenkins to use a proper app token and no code changes were required. However, the failure case would not have worked as coded, so we needed to move that outside of the `rocmnode()` block. ## Test Plan I removed all of the CI in one of the commits to quickly test, and then added it back. Got a successful "success" message and "failure" message produced	2026-02-05 15:57:21 +00:00
Eiden Yoshida	3a02862241	[rocm-libraries] ROCm/rocm-libraries#4349 (commit 9bb7f5c) [CK] MICI: Correct path for build trace script ## Motivation - Corrects path to script due to superrepo migration - Forces all tests to run by default ## Technical Details - now in /projects/composablekernel	2026-02-05 15:56:52 +00:00
Eiden Yoshida	3f42f76b45	[rocm-libraries] ROCm/rocm-libraries#4336 (commit d26a782) [CK] MICI: Use reference repo for checkout operations ## Motivation - Maintain a reference repo on slave nodes that speeds up any clone/checkout operations ## Technical Details - clone a ref repo if it does not exist - update ref repo if it does exist - checkout after ref repo is updated - eliminates double clone ## Test Result - Initial checkouts succeeded	2026-02-05 02:44:29 +00:00
Jeff Huang	7b18f5fed2	[rocm-libraries] ROCm/rocm-libraries#4263 (commit f34aec2) [CK] Add FP8 KV_BLOCKSCALE support for batch prefill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implement per-page K/V quantization for paged attention: - Add KV_BLOCKSCALE enum to BlockAttentionQuantScaleEnum - Use exp2 shift trick to eliminate explicit P scaling overhead - Prefetch physical pages offset for KV cache, overlaps with computations ## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request. ## Checklist Please put an `x` into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask. - [ ] I have added tests relevant to the introduced functionality, and the unit tests are passing locally - [ ] I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run. - [ ] I have added inline documentation which enables the maintainers with understanding the motivation - [ ] I have removed the stale documentation which is no longer relevant after this pull request - [ ] (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request - [ ] I have run `clang-format` on all changed files - [ ] Any dependent changes have been merged ## Discussion If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered	2026-02-04 23:26:20 +00:00
Illia Silin	62fbda4d1e	[rocm-libraries] ROCm/rocm-libraries#4310 (commit 7f63aa1) CK CI migration. ## Motivation Enable the CK CI after migration from standalone repo. ## Technical Details Modify the jenkinsfile in projects/composablekernel to update the CI workflow. ## Test Plan This is for CK internal testing only. ## Test Result Set up new CK CI pipeline/dashboard. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.	2026-02-04 17:35:17 +00:00
andrew clark	421b714f13	Adding Additional Failure Patterns for Alerts (#3663 ) * Added two new failure patterns to detect. Including test function to verify if the patterns are detected * Modifying pattern match to detect docker login failure. Removed passing tests. * Removing passing tests. Modifying docker pattern to detect failure * Removed passing tests * Removing test logging function	2026-02-03 10:23:07 -08:00
Illia Silin	569640dc70	Revert "Implement device grouped gemm fixed nk multi abd for rdna4 (#3619 )" (#3705 ) This reverts commit `301eb5cf08`.	2026-02-03 09:52:14 -08:00
Emily Martins	8cbd09c84a	[CK_TILE] Stream-K Tile Engine Test Config File Generation (#3662 ) * Stream-K smoke test config file generation This change converts the stream-k smoke tests to use tile engine. Since the m, n, and k values dependent on the CU count of a device, the configs are generated during the Configuration Phase. * Compute GEMM reference on GPU * Remove redundant Stream-K tests Removing redundant tests that are now run via tile engine. * Fix relative and absolute tolerance calculation This change updates the Stream-K tile engine interface to ensure that num_wgs_per_tile is propaged and passed into the compare_results function to calculate the rel and abs tolerance. Before, split-k was used, which is incorrect for Stream-K since the split-k value is always 1. * Cleanup imports, types, and other misc items This commit makes the following changes: - Uses Typing module for nested type hints - Uses quotes around cu_count_arg argument in generate_configs.cmake in if statements - Adds explicit include for tuple in test_gemm_streamk_simple.cpp - Adds a type for the tiles argument in argparser to check argument validity * Use CU count as return value for better parsing * Add reduction tests for bf16, fp8, and bf8	2026-02-03 09:12:15 -07:00
Max Podkorytov	3f04d27b68	Remove concrete performance numbers from BUILD_TIME_OPTIMIZATION.md (#3702 ) Replace specific benchmark numbers with qualitative descriptions since measurements vary across environments and may become outdated. Co-authored-by: Claude <noreply@anthropic.com>	2026-02-03 03:54:18 -07:00
Illia Silin	8b56ffb6ae	Fix one more lifetimebound error. (#3703 ) * fix staging compiler errors * fix clang format	2026-02-02 18:25:56 -08:00
Bartłomiej Kocot	f2b9b3a3a6	Fix path to ck tile conv fwd instance generator (#3699 ) * Fix path to ck tile conv fwd instance generator * fixes	2026-02-02 18:07:33 -08:00
Aviral Goel	3e77721755	feat: add split_k support for block scale gemm bquant mode. (#3653 ) * WIP: add splitk to bquant * feat: add support for bf8i4 and fp8i4 by calculating correct stride for packed data types * chore: remove temporary test script * fix: incorrect tile window length for splitted bq tensor window * chore: improve comments * test: add unit tests to cover bquant splitk functionality * fix: conflict resolution by renaming variables	2026-02-02 14:41:53 -08:00
Zoltán Lakatos	301eb5cf08	Implement device grouped gemm fixed nk multi abd for rdna4 (#3619 ) * device struct implementation * added xdl grouped multi abd fixed nk testing * wmma implementation fixed * avoid unnecessary device mem allocation and code cleanups * cleanup instances definitions * wmma examples added * code cleanups * fix clang format * typo and compilation fixes related to reference gemm * fix compilation error due to std::remove_cvref_t * added missing hip_check_error includes * correction to example instances * review commentes addressed * removed split-k from testing * code formatting --------- Co-authored-by: Zoltán Lakatos <zoltan.lakatos@streamhpc.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2026-02-02 13:58:11 -08:00
Jan Patrick Lehr	069500464d	[Compiler] Addressing new compiler warnings (#3640 ) * [Compiler] Addressing new compiler warnings Clang enables new lifetime warnings in production and we see build errors due to this with the staging compiler. The attributes added in this PR are suggested by the compiler. However, I'm not very familiar with the code base, so the changes may be incorrect. * Update some more instances * Adds file-level ignores via clang diagnostic pragma The number of instances was large, so I decided to use file-level scope to disable the warning via pragma clang diagnostic ignored. It also showed this warning coming from the gtest dependency. For that, I did add the respective command line flag to the CMake variables. I don't know if this is acceptable or not. * This adds the remaining instances For a build on gfx90a. * fix clang format * Adding couple more instances from gfx1200 build * Fixed another few instances --------- Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2026-02-02 09:39:48 -08:00
ZheWang	e6bcd192d4	Mx fp6 flatmm (#3601 ) * add fp6 data-type and support sync/async dwordx3 load/store * clang-format * pre-commit * 1st commit * default mnk pass ut * fix a distrubution * fix * fix bdram distr * update * pass ut * improve perf * update * clean code * resolve copilot comment * reslove comment * clang-format --------- Co-authored-by: ZheWang <zhewan@amd.com>	2026-02-02 16:04:40 +08:00
Bartłomiej Kocot	1ae83137eb	Enable Grouped Conv Tile Fwd Tests daily (#3680 )	2026-01-31 15:55:25 -07:00
Po Yen Chen	8c1788757a	[CK_TILE] Fix incompatible vector type arguments for the intrinsic calls (#3672 ) * Change call to the intrinsics * fix clang format * Undo changes under include/ck/utility * Use named variable as vector size --------- Co-authored-by: illsilin_amdeng <Illia.Silin@amd.com>	2026-01-30 12:02:49 -08:00

1 2 3 4 5 ...

3048 Commits