mirror of https://github.com/ROCm/composable_kernel.git synced 2026-04-19 22:39:03 +00:00

Files

Yashvardhan Agarwal ea10a78203 [ck_tile] refactor reduce kernel (#3257 )

* refactor reduce kernel

- Rename Reduce kernel as per convention

- Move kept_dim and reduce_dims from runtime to compile-time parameters

- Update Reduce2dProblem template to include KeptDim, ReduceDims, and
Rank

- Remove IsSupportedArgument validation function as it's unnecessary.
Not using the GuaranteedLastDimensionVectorStride while making tensor
view or descriptor which removes the bounds enforced earlier. We still
calculate and use vector size.

- Update reduce example to demonstrate NCHW->NHW reduction with
non-contiguous support

- Update tests

Kernel now handles both contiguous and non-contiguous memory layout.

* fix compile errors

2025-12-17 21:46:08 +02:00

CMakeLists.txt

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

README.md

[DOCS] Documentation Addition (Readme updates) (#2495 )

2025-10-16 03:10:57 -07:00

reduce.cpp

[ck_tile] refactor reduce kernel (#3257 )

2025-12-17 21:46:08 +02:00

README.md

Reduction with CK Tile

This example demonstrates parallel reduction (sum, max, etc.) using the CK Tile programming model, a core operation for normalization, statistics, and aggregation in deep learning.

Algorithm and Math

Given a tensor X and a reduction axis, compute:

Sum: Y = \sum_i X_i
Max: Y = \max_i X_i
Mean: Y = \frac{1}{N} \sum_i X_i
Tilewise Reduction: Each thread block reduces a tile (block) of the input, using shared memory and register accumulation for efficiency.

Tile Programming Model

Tiles: Each thread block processes a tile (block) of the input tensor.
Pipeline: Modular, can be extended for fused reductions or post-processing.

Build & Run

mkdir build && cd build
sh ../script/cmake-ck-dev.sh ../ <arch>
make tile_example_reduce -j
./bin/tile_example_reduce -?

Source Structure

Kernel: reduce.hpp (tile-programming kernel template)
Executable: reduce.cpp (argument parsing, kernel launch)
Build: CMakeLists.txt

03_gemm: GEMM with tiles
04_img2col: im2col transformation
06_permute: Permutation with tiles

For distribution, see include/ck_tile/tile_program/tile_distribution/.

Back to CK Tile Examples

README.md

Reduction with CK Tile

Algorithm and Math

Tile Programming Model

Build & Run

Source Structure

Related CK Tile Examples