mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 02:02:46 +00:00
* WIP
* Add Unit tests for the Multi Reduction Kernel
* clang format
* Rename multiblock to threadwise
* Multiblock WIP
* Fix multi reduce multi block unit tests
* Multi Reduce Tile Engine: WIP
* refactoring + try addressing precision error
* Fix multiops examples
* Cleanup
* Clean up tile engine's reduce op
* Update changelog
* Fix remod/clang
* Fix dates
* Fix documentation & missing file
* Fix comments
* Use the update_tile api in the multi-block kernel
* Unify threadwise/multiblock into a single kernel + default multiblock output to float in tests
* Add TileParitioner
* Cleanup
* Add warning when no data to process, in the example
* Refactoring Reduce kernel Tile Partioner + cleanup
* Move the tile partioner to its own file
* Add missing includes
* Fix copyright header with update_amd_copyright_headers.py
* Fix change of interface in Reduce2dProblem
---------
Co-authored-by: Damien Lejeune <damien.lejeune@amd.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
[ROCm/composable_kernel commit: 4216d43da8]
Reduction with CK Tile
This example demonstrates parallel reduction (sum, max, etc.) using the CK Tile programming model, a core operation for normalization, statistics, and aggregation in deep learning.
Algorithm and Math
Given a tensor X and a reduction axis, compute:
-
Sum:
Y = \sum_i X_i -
Max:
Y = \max_i X_i -
Mean:
Y = \frac{1}{N} \sum_i X_i -
Tilewise Reduction: Each thread block reduces a tile (block) of the input, using shared memory and register accumulation for efficiency.
Tile Programming Model
- Tiles: Each thread block processes a tile (block) of the input tensor.
- Pipeline: Modular, can be extended for fused reductions or post-processing.
Build & Run
mkdir build && cd build
sh ../script/cmake-ck-dev.sh ../ <arch>
make tile_example_reduce -j
./bin/tile_example_reduce -?
Source Structure
- Kernel:
reduce.hpp(tile-programming kernel template) - Executable:
reduce.cpp(argument parsing, kernel launch) - Build:
CMakeLists.txt
Related CK Tile Examples
- 03_gemm: GEMM with tiles
- 04_img2col: im2col transformation
- 06_permute: Permutation with tiles
For distribution, see include/ck_tile/tile_program/tile_distribution/.