mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-15 18:42:06 +00:00
65 lines
2.0 KiB
Markdown
65 lines
2.0 KiB
Markdown
# Client Example: Parallel Reduction (NHWC)
|
|
|
|
## Theory
|
|
|
|
This client example demonstrates **parallel reduction operations** over NHWC tensors. Reduction is a fundamental operation in deep learning for computing statistics (such as batch mean/variance), loss aggregation, and normalization.
|
|
|
|
**Mathematical Formulation:**
|
|
Given a tensor $X[N, H, W, C]$ and a reduction axis (e.g., channel $C$):
|
|
- **Sum**: $Y_{n,h,w} = \sum_c X_{n,h,w,c}$
|
|
- **Max**: $Y_{n,h,w} = \max_c X_{n,h,w,c}$
|
|
- **Mean**: $Y_{n,h,w} = \frac{1}{C} \sum_c X_{n,h,w,c}$
|
|
|
|
**Algorithmic Background:**
|
|
- Reductions are implemented using parallel tree or segmented reduction algorithms.
|
|
- Efficient reductions require careful memory access, synchronization, and sometimes numerically stable algorithms.
|
|
|
|
## How to Run
|
|
|
|
### Prerequisites
|
|
|
|
Please follow the instructions in the main [Build Guide](../../README.md#building-ck) section as a prerequisite to building and running this example.
|
|
|
|
### Build and run
|
|
```bash
|
|
cd composable_kernel/client_example/26_reduce
|
|
mkdir build && cd build
|
|
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
|
|
make -j
|
|
|
|
# Example run (reduce over channel dimension)
|
|
./reduce_nhwc_c
|
|
```
|
|
|
|
## Source Code Structure
|
|
|
|
### Directory Layout
|
|
```
|
|
client_example/26_reduce/
|
|
├── reduce_nhwc_c.cpp # Main client example: reduction over NHWC tensors (channel axis)
|
|
├── CMakeLists.txt # Build configuration for the example
|
|
```
|
|
|
|
### Key Functions
|
|
|
|
- **main()** (in `reduce_nhwc_c.cpp`):
|
|
Sets up input tensors, configures reduction parameters, launches the reduction kernel, and verifies the result.
|
|
- **Reduction kernel invocation**:
|
|
Uses the Composable Kernel device API to launch the reduction operation.
|
|
|
|
---
|
|
|
|
## Additional Details
|
|
|
|
- Supports sum, max, mean, and other reductions over NHWC tensors.
|
|
- Example parameters can be adjusted in the source for different workloads.
|
|
|
|
---
|
|
|
|
## Related Examples
|
|
|
|
- [12_reduce](../../example/12_reduce/README.md): Parallel reduction in the main example directory
|
|
|
|
---
|
|
[Back to Client Examples](../README.md)
|