# Reduction with CK Tile This example demonstrates parallel reduction (sum, max, etc.) using the CK Tile programming model, a core operation for normalization, statistics, and aggregation in deep learning. --- ## Algorithm and Math Given a tensor $X$ and a reduction axis, compute: - **Sum**: $Y = \sum_i X_i$ - **Max**: $Y = \max_i X_i$ - **Mean**: $Y = \frac{1}{N} \sum_i X_i$ - **Tilewise Reduction**: Each thread block reduces a tile (block) of the input, using shared memory and register accumulation for efficiency. --- ## Tile Programming Model - **Tiles**: Each thread block processes a tile (block) of the input tensor. - **Pipeline**: Modular, can be extended for fused reductions or post-processing. --- ## Build & Run ```bash mkdir build && cd build sh ../script/cmake-ck-dev.sh ../ make tile_example_reduce -j ./bin/tile_example_reduce -? ``` --- ## Source Structure - **Kernel**: `reduce.hpp` (tile-programming kernel template) - **Executable**: `reduce.cpp` (argument parsing, kernel launch) - **Build**: `CMakeLists.txt` --- ## Related CK Tile Examples - [03_gemm](../03_gemm/README.md): GEMM with tiles - [04_img2col](../04_img2col/README.md): im2col transformation - [06_permute](../06_permute/README.md): Permutation with tiles For distribution, see `include/ck_tile/tile_program/tile_distribution/`. --- [Back to CK Tile Examples](../README.md)