Files
damien-lejeune 4216d43da8 Dlejeune/ck tile 2d multiple reductions (#3147)
* WIP

* Add Unit tests for the Multi Reduction Kernel

* clang format

* Rename multiblock to threadwise

* Multiblock WIP

* Fix multi reduce multi block unit tests

* Multi Reduce Tile Engine: WIP

* refactoring + try addressing precision error

* Fix multiops examples

* Cleanup

* Clean up tile engine's reduce op

* Update changelog

* Fix remod/clang

* Fix dates

* Fix documentation & missing file

* Fix comments

* Use the update_tile api in the multi-block kernel

* Unify threadwise/multiblock into a single kernel + default multiblock output to float in tests

* Add TileParitioner

* Cleanup

* Add warning when no data to process, in the example

* Refactoring Reduce kernel Tile Partioner + cleanup

* Move the tile partioner to its own file

* Add missing includes

* Fix copyright header with update_amd_copyright_headers.py

* Fix change of interface in Reduce2dProblem

---------

Co-authored-by: Damien Lejeune <damien.lejeune@amd.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>
2026-01-09 11:16:37 +01:00
..

Reduction with CK Tile

This example demonstrates parallel reduction (sum, max, etc.) using the CK Tile programming model, a core operation for normalization, statistics, and aggregation in deep learning.


Algorithm and Math

Given a tensor X and a reduction axis, compute:

  • Sum: Y = \sum_i X_i

  • Max: Y = \max_i X_i

  • Mean: Y = \frac{1}{N} \sum_i X_i

  • Tilewise Reduction: Each thread block reduces a tile (block) of the input, using shared memory and register accumulation for efficiency.


Tile Programming Model

  • Tiles: Each thread block processes a tile (block) of the input tensor.
  • Pipeline: Modular, can be extended for fused reductions or post-processing.

Build & Run

mkdir build && cd build
sh ../script/cmake-ck-dev.sh ../ <arch>
make tile_example_reduce -j
./bin/tile_example_reduce -?

Source Structure

  • Kernel: reduce.hpp (tile-programming kernel template)
  • Executable: reduce.cpp (argument parsing, kernel launch)
  • Build: CMakeLists.txt

For distribution, see include/ck_tile/tile_program/tile_distribution/.


Back to CK Tile Examples