Files
composable_kernel/client_example/26_reduce
Aviral Goel c8563f2101 chore(copyright): update copyright header for test directory (#3252)
* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for test directory

* chore(copyright): update copyright header for client_example directory

* chore(copyright): update copyright header for test directory
2025-11-20 20:36:57 -05:00
..

Client Example: Parallel Reduction (NHWC)

Theory

This client example demonstrates parallel reduction operations over NHWC tensors. Reduction is a fundamental operation in deep learning for computing statistics (such as batch mean/variance), loss aggregation, and normalization.

Mathematical Formulation: Given a tensor X[N, H, W, C] and a reduction axis (e.g., channel C):

  • Sum: Y_{n,h,w} = \sum_c X_{n,h,w,c}
  • Max: Y_{n,h,w} = \max_c X_{n,h,w,c}
  • Mean: Y_{n,h,w} = \frac{1}{C} \sum_c X_{n,h,w,c}

Algorithmic Background:

  • Reductions are implemented using parallel tree or segmented reduction algorithms.
  • Efficient reductions require careful memory access, synchronization, and sometimes numerically stable algorithms.

How to Run

Prerequisites

Please follow the instructions in the main Build Guide section as a prerequisite to building and running this example.

Build and run

cd composable_kernel/client_example/26_reduce
mkdir build && cd build
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
make -j

# Example run (reduce over channel dimension)
./reduce_nhwc_c

Source Code Structure

Directory Layout

client_example/26_reduce/
├── reduce_nhwc_c.cpp         # Main client example: reduction over NHWC tensors (channel axis)
├── CMakeLists.txt            # Build configuration for the example

Key Functions

  • main() (in reduce_nhwc_c.cpp):
    Sets up input tensors, configures reduction parameters, launches the reduction kernel, and verifies the result.
  • Reduction kernel invocation:
    Uses the Composable Kernel device API to launch the reduction operation.

Additional Details

  • Supports sum, max, mean, and other reductions over NHWC tensors.
  • Example parameters can be adjusted in the source for different workloads.

  • 12_reduce: Parallel reduction in the main example directory

Back to Client Examples