mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-13 17:55:48 +00:00

Files

Aviral Goel 004784ef98 chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

* chore(copyright) update library wide CMakeLists.txt files copyright header template

* Fix build

---------

Co-authored-by: Sami Remes <samremes@amd.com>

2025-11-28 13:49:54 -08:00

CMakeLists.txt

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

README.md

[DOCS] Documentation Addition (Readme updates) (#2495 )

2025-10-16 03:10:57 -07:00

reduce_nhwc_c.cpp

chore(copyright): update copyright header for test directory (#3252 )

2025-11-20 20:36:57 -05:00

README.md

Client Example: Parallel Reduction (NHWC)

Theory

This client example demonstrates parallel reduction operations over NHWC tensors. Reduction is a fundamental operation in deep learning for computing statistics (such as batch mean/variance), loss aggregation, and normalization.

Mathematical Formulation: Given a tensor X[N, H, W, C] and a reduction axis (e.g., channel C):

Sum: Y_{n,h,w} = \sum_c X_{n,h,w,c}
Max: Y_{n,h,w} = \max_c X_{n,h,w,c}
Mean: Y_{n,h,w} = \frac{1}{C} \sum_c X_{n,h,w,c}

Algorithmic Background:

Reductions are implemented using parallel tree or segmented reduction algorithms.
Efficient reductions require careful memory access, synchronization, and sometimes numerically stable algorithms.

How to Run

Prerequisites

Please follow the instructions in the main Build Guide section as a prerequisite to building and running this example.

Build and run

cd composable_kernel/client_example/26_reduce
mkdir build && cd build
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc ..
make -j

# Example run (reduce over channel dimension)
./reduce_nhwc_c

Source Code Structure

Directory Layout

client_example/26_reduce/
├── reduce_nhwc_c.cpp         # Main client example: reduction over NHWC tensors (channel axis)
├── CMakeLists.txt            # Build configuration for the example

Key Functions

main() (in reduce_nhwc_c.cpp):
Sets up input tensors, configures reduction parameters, launches the reduction kernel, and verifies the result.
Reduction kernel invocation:
Uses the Composable Kernel device API to launch the reduction operation.

Additional Details

Supports sum, max, mean, and other reductions over NHWC tensors.
Example parameters can be adjusted in the source for different workloads.

12_reduce: Parallel reduction in the main example directory

Back to Client Examples

README.md

Client Example: Parallel Reduction (NHWC)

Theory

How to Run

Prerequisites

Build and run

Source Code Structure

Directory Layout

Key Functions

Additional Details

Related Examples