mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-05 22:22:27 +00:00
This commit is contained in:
53
example/ck_tile/05_reduce/README.md
Normal file
53
example/ck_tile/05_reduce/README.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# Reduction with CK Tile
|
||||
|
||||
This example demonstrates parallel reduction (sum, max, etc.) using the CK Tile programming model, a core operation for normalization, statistics, and aggregation in deep learning.
|
||||
|
||||
---
|
||||
|
||||
## Algorithm and Math
|
||||
|
||||
Given a tensor $X$ and a reduction axis, compute:
|
||||
- **Sum**: $Y = \sum_i X_i$
|
||||
- **Max**: $Y = \max_i X_i$
|
||||
- **Mean**: $Y = \frac{1}{N} \sum_i X_i$
|
||||
|
||||
- **Tilewise Reduction**: Each thread block reduces a tile (block) of the input, using shared memory and register accumulation for efficiency.
|
||||
|
||||
---
|
||||
|
||||
## Tile Programming Model
|
||||
|
||||
- **Tiles**: Each thread block processes a tile (block) of the input tensor.
|
||||
- **Pipeline**: Modular, can be extended for fused reductions or post-processing.
|
||||
|
||||
---
|
||||
|
||||
## Build & Run
|
||||
|
||||
```bash
|
||||
mkdir build && cd build
|
||||
sh ../script/cmake-ck-dev.sh ../ <arch>
|
||||
make tile_example_reduce -j
|
||||
./bin/tile_example_reduce -?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Source Structure
|
||||
|
||||
- **Kernel**: `reduce.hpp` (tile-programming kernel template)
|
||||
- **Executable**: `reduce.cpp` (argument parsing, kernel launch)
|
||||
- **Build**: `CMakeLists.txt`
|
||||
|
||||
---
|
||||
|
||||
## Related CK Tile Examples
|
||||
|
||||
- [03_gemm](../03_gemm/README.md): GEMM with tiles
|
||||
- [04_img2col](../04_img2col/README.md): im2col transformation
|
||||
- [06_permute](../06_permute/README.md): Permutation with tiles
|
||||
|
||||
For distribution, see `include/ck_tile/tile_program/tile_distribution/`.
|
||||
|
||||
---
|
||||
[Back to CK Tile Examples](../README.md)
|
||||
Reference in New Issue
Block a user