mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-05 22:22:27 +00:00
54 lines
1.4 KiB
Markdown
54 lines
1.4 KiB
Markdown
# Reduction with CK Tile
|
|
|
|
This example demonstrates parallel reduction (sum, max, etc.) using the CK Tile programming model, a core operation for normalization, statistics, and aggregation in deep learning.
|
|
|
|
---
|
|
|
|
## Algorithm and Math
|
|
|
|
Given a tensor $X$ and a reduction axis, compute:
|
|
- **Sum**: $Y = \sum_i X_i$
|
|
- **Max**: $Y = \max_i X_i$
|
|
- **Mean**: $Y = \frac{1}{N} \sum_i X_i$
|
|
|
|
- **Tilewise Reduction**: Each thread block reduces a tile (block) of the input, using shared memory and register accumulation for efficiency.
|
|
|
|
---
|
|
|
|
## Tile Programming Model
|
|
|
|
- **Tiles**: Each thread block processes a tile (block) of the input tensor.
|
|
- **Pipeline**: Modular, can be extended for fused reductions or post-processing.
|
|
|
|
---
|
|
|
|
## Build & Run
|
|
|
|
```bash
|
|
mkdir build && cd build
|
|
sh ../script/cmake-ck-dev.sh ../ <arch>
|
|
make tile_example_reduce -j
|
|
./bin/tile_example_reduce -?
|
|
```
|
|
|
|
---
|
|
|
|
## Source Structure
|
|
|
|
- **Kernel**: `reduce.hpp` (tile-programming kernel template)
|
|
- **Executable**: `reduce.cpp` (argument parsing, kernel launch)
|
|
- **Build**: `CMakeLists.txt`
|
|
|
|
---
|
|
|
|
## Related CK Tile Examples
|
|
|
|
- [03_gemm](../03_gemm/README.md): GEMM with tiles
|
|
- [04_img2col](../04_img2col/README.md): im2col transformation
|
|
- [06_permute](../06_permute/README.md): Permutation with tiles
|
|
|
|
For distribution, see `include/ck_tile/tile_program/tile_distribution/`.
|
|
|
|
---
|
|
[Back to CK Tile Examples](../README.md)
|