Merge pull request #3078 from spolifroni-amd/spolifroni-amd/cherry-pick-changhelog-changes

updated the changelog for 7.1 and beyond
2026-06-30 11:47:48 +00:00 · 2025-11-04 13:26:22 -08:00
parent 833ae1d051 6490427c89
commit 57d8d66258
1 changed files with 38 additions and 20 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,40 @@

 Documentation for Composable Kernel available at [https://rocm.docs.amd.com/projects/composable_kernel/en/latest/](https://rocm.docs.amd.com/projects/composable_kernel/en/latest/).

+## (Unreleased) Composable Kernel for ROCm
+
+### Added 
+
+* Added a compute async pipeline in the CK TILE universal GEMM on gfx950
+* Added support for B Tensor type pk_int4_t in the CK TILE weight preshuffle GEMM.
+* Added the new api to load different memory sizes to SGPR.
+* Added support for B Tensor Preshuffle in CK TILE Grouped GEMM.
+* Added a basic copy kernel example and supporting documentation for new CK Tile developers.
+* Added support for grouped_gemm kernels to perform multi_d elementwise operation.
+* Added support for Multiple ABD GEMM
+* Added benchmarking support for tile engine GEMM Multi D.
+* Added block scaling support in CK_TILE GEMM, allowing flexible use of quantization matrices from either A or B operands.
+* Added the row-wise column-wise quantization for CK_TILE GEMM & CK_TILE Grouped GEMM.
+* Added support for f32 to FMHA (fwd/bwd).
+* Added tensor-wise quantization for CK_TILE GEMM.
+* Added support for batched contraction kernel.
+* Added pooling kernel in CK_TILE
+
+### Changed
+
+* Removed `BlockSize` in `make_kernel` and `CShuffleEpilogueProblem` to support Wave32 in CK_TILE (#2594)
+
+## Composable Kernel 1.1.0 for ROCm 7.1.0
+
+### Added
+
+* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd)
+* Added support for elementwise kernel.
+
+### Upcoming changes
+
+* Non-grouped convolutions are deprecated. Their functionality is supported by grouped convolution.
+
 ## Composable Kernel 1.1.0 for ROCm 7.0.0

 ### Added
@@ -19,26 +53,18 @@ Documentation for Composable Kernel available at [https://rocm.docs.amd.com/proj
 * Added support for Split K for grouped convolution backward data.
 * Added logit soft-capping support for fMHA forward kernels.
 * Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv)
-* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd)
 * Added benchmarking support for tile engine GEMM.
 * Added Ping-pong scheduler support for GEMM operation along the K dimension.
 * Added rotating buffer feature for CK_Tile GEMM.
 * Added int8 support for CK_TILE GEMM.
-* Added support for elementwise kernel.

 ### Optimized

+* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout.
+* Added Vectorize Transpose optimization for CK Tile 
+* Added the asynchronous copy for gfx950

-* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout. (#2166)
-* Added Vectorize Transpose optimization for CK Tile (#2131)
-* Added the asynchronous copy for gfx950 (#2425)
-
-
-### Fixes
-
-None
-
-### Changes
+### Changed

 * Removed support for gfx940 and gfx941 targets (#1944)
 * Replaced the raw buffer load/store intrinsics with Clang20 built-ins (#1876)
@@ -47,14 +73,6 @@ None
 * Number of instances in instance factory for grouped convolution backward weight NGCHW/GKYXC/NGKHW has been reduced.
 * Number of instances in instance factory for grouped convolution backward data NGCHW/GKYXC/NGKHW has been reduced.

-### Known issues
-
-None
-
-### Upcoming changes
-
-* Non-grouped convolutions are deprecated. All of their functionality is supported by grouped convolution.
-
 ## Composable Kernel 1.1.0 for ROCm 6.1.0

 ### Additions