mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
updated the changelog with 7.1 and beyond info
[ROCm/composable_kernel commit: 1b95803431]
This commit is contained in:
committed by
John Afaganis
parent
0329d71fb9
commit
7c14d97d0e
84
CHANGELOG.md
84
CHANGELOG.md
@@ -2,35 +2,17 @@
|
||||
|
||||
Documentation for Composable Kernel available at [https://rocm.docs.amd.com/projects/composable_kernel/en/latest/](https://rocm.docs.amd.com/projects/composable_kernel/en/latest/).
|
||||
|
||||
## Composable Kernel 1.2.0 for ROCm 7.0.0
|
||||
## (Unreleased) Composable Kernel for ROCm
|
||||
|
||||
### Added
|
||||
|
||||
### Added
|
||||
* Added a compute async pipeline in the CK TILE universal GEMM on gfx950
|
||||
* Added support for B Tensor type pk_int4_t in the CK TILE weight preshuffle GEMM.
|
||||
* Added the new api to load different memory sizes to SGPR.
|
||||
* Added support for B Tensor Preshuffle in CK TILE Grouped GEMM.
|
||||
* Added a basic copy kernel example and supporting documentation for new CK Tile developers.
|
||||
* Added support for grouped_gemm kernels to perform multi_d elementwise operation.
|
||||
* Added support for bf16, f32, and f16 for 2D and 3D NGCHW grouped convolution backward data
|
||||
* Added a fully asynchronous HOST (CPU) arguments copy flow for CK grouped GEMM kernels.
|
||||
* Added support GKCYX layout for grouped convolution forward (NGCHW/GKCYX/NGKHW, number of instances in instance factory for NGCHW/GKYXC/NGKHW has been reduced).
|
||||
* Added support for GKCYX layout for grouped convolution forward (NGCHW/GKCYX/NGKHW).
|
||||
* Added support for GKCYX layout for grouped convolution backward weight (NGCHW/GKCYX/NGKHW).
|
||||
* Added support for GKCYX layout for grouped convolution backward data (NGCHW/GKCYX/NGKHW).
|
||||
* Added support for Stream-K version of mixed fp8/bf16 GEMM
|
||||
* Added support for Multiple D GEMM
|
||||
* Added support for Multiple ABD GEMM
|
||||
* Added GEMM pipeline for microscaling (MX) FP8/FP6/FP4 data types
|
||||
* Added support for FP16 2:4 structured sparsity to universal GEMM.
|
||||
* Added support for Split K for grouped convolution backward data.
|
||||
* Added logit soft-capping support for fMHA forward kernels.
|
||||
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv)
|
||||
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd)
|
||||
* Added benchmarking support for tile engine GEMM.
|
||||
* Added Ping-pong scheduler support for GEMM operation along the K dimension.
|
||||
* Added rotating buffer feature for CK_Tile GEMM.
|
||||
* Added int8 support for CK_TILE GEMM.
|
||||
* Added support for elementwise kernel.
|
||||
* Added benchmarking support for tile engine GEMM Multi D.
|
||||
* Added block scaling support in CK_TILE GEMM, allowing flexible use of quantization matrices from either A or B operands.
|
||||
* Added the row-wise column-wise quantization for CK_TILE GEMM & CK_TILE Grouped GEMM.
|
||||
@@ -39,19 +21,50 @@ Documentation for Composable Kernel available at [https://rocm.docs.amd.com/proj
|
||||
* Added support for batched contraction kernel.
|
||||
* Added pooling kernel in CK_TILE
|
||||
|
||||
### Changed
|
||||
|
||||
* Removed `BlockSize` in `make_kernel` and `CShuffleEpilogueProblem` to support Wave32 in CK_TILE (#2594)
|
||||
|
||||
## Composable Kernel 1.1.0 for ROCm 7.1.0
|
||||
|
||||
### Added
|
||||
|
||||
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd)
|
||||
* Added support for elementwise kernel.
|
||||
|
||||
### Upcoming changes
|
||||
|
||||
* Non-grouped convolutions are deprecated. Their functionality is supported by grouped convolution.
|
||||
|
||||
## Composable Kernel 1.1.0 for ROCm 7.0.0
|
||||
|
||||
### Added
|
||||
|
||||
* Added support for bf16, f32, and f16 for 2D and 3D NGCHW grouped convolution backward data
|
||||
* Added a fully asynchronous HOST (CPU) arguments copy flow for CK grouped GEMM kernels.
|
||||
* Added support GKCYX layout for grouped convolution forward (NGCHW/GKCYX/NGKHW, number of instances in instance factory for NGCHW/GKYXC/NGKHW has been reduced).
|
||||
* Added support for GKCYX layout for grouped convolution forward (NGCHW/GKCYX/NGKHW).
|
||||
* Added support for GKCYX layout for grouped convolution backward weight (NGCHW/GKCYX/NGKHW).
|
||||
* Added support for GKCYX layout for grouped convolution backward data (NGCHW/GKCYX/NGKHW).
|
||||
* Added support for Stream-K version of mixed fp8/bf16 GEMM
|
||||
* Added support for Multiple D GEMM
|
||||
* Added GEMM pipeline for microscaling (MX) FP8/FP6/FP4 data types
|
||||
* Added support for FP16 2:4 structured sparsity to universal GEMM.
|
||||
* Added support for Split K for grouped convolution backward data.
|
||||
* Added logit soft-capping support for fMHA forward kernels.
|
||||
* Added support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv)
|
||||
* Added benchmarking support for tile engine GEMM.
|
||||
* Added Ping-pong scheduler support for GEMM operation along the K dimension.
|
||||
* Added rotating buffer feature for CK_Tile GEMM.
|
||||
* Added int8 support for CK_TILE GEMM.
|
||||
|
||||
### Optimized
|
||||
|
||||
* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout.
|
||||
* Added Vectorize Transpose optimization for CK Tile
|
||||
* Added the asynchronous copy for gfx950
|
||||
|
||||
* Optimize the gemm multiply multiply preshuffle & lds bypass with Pack of KGroup and better instruction layout. (#2166)
|
||||
* Added Vectorize Transpose optimization for CK Tile (#2131)
|
||||
* Added the asynchronous copy for gfx950 (#2425)
|
||||
|
||||
|
||||
### Fixes
|
||||
|
||||
None
|
||||
|
||||
### Changes
|
||||
### Changed
|
||||
|
||||
* Removed support for gfx940 and gfx941 targets (#1944)
|
||||
* Replaced the raw buffer load/store intrinsics with Clang20 built-ins (#1876)
|
||||
@@ -59,15 +72,6 @@ None
|
||||
* Number of instances in instance factory for grouped convolution forward NGCHW/GKYXC/NGKHW has been reduced.
|
||||
* Number of instances in instance factory for grouped convolution backward weight NGCHW/GKYXC/NGKHW has been reduced.
|
||||
* Number of instances in instance factory for grouped convolution backward data NGCHW/GKYXC/NGKHW has been reduced.
|
||||
* Removed `BlockSize` in `make_kernel` and `CShuffleEpilogueProblem` to support Wave32 in CK_TILE (#2594)
|
||||
|
||||
### Known issues
|
||||
|
||||
None
|
||||
|
||||
### Upcoming changes
|
||||
|
||||
* Non-grouped convolutions are deprecated. All of their functionality is supported by grouped convolution.
|
||||
|
||||
## Composable Kernel 1.1.0 for ROCm 6.1.0
|
||||
|
||||
|
||||
Reference in New Issue
Block a user