Files
composable_kernel/example/ck_tile/36_pooling
Yashvardhan Agarwal 3052d7c9e6 [CK_TILE] Add indexing to pooling operator (Lwpck 3892) (#3013)
* Add indexing support to pooling operator

- Add IndexDataType template parameter to pooling problem and kernel
definitions

- Enable pooling kernel to output indices of selected elements during
max/absmax pooling

- Add overloaded operators for Max and AbsMax that track when values
change using bool changed parameter

-  Support optional index buffer allocation and management in device
memory

- Modify BlockReduce2d classes to handle index tensors alongside value
tensors

-  Add separate shared memory allocation for index data in cross-warp
reductions

- Create validate_pool_indices function to verify index correctness

- Modify pool3d.cpp example to demonstrate index output functionality

- Add tests for index output

* fixes

* Refactor BlockReduce2D functions to get rid auxiliary private types.

* comment resolutions and some changes to block_reduce2d

- index reference implementation improved
- reduce_operator.hpp cleanedup
- updated the block_reduce2d.hpp to have index calculation for
BlockReduce2dLinearCrossWarpSync as well

* conditionally used variable declaration improvement

- the conditionally used vairbales are used only when indexing is
enabled. To inform the compiler that they may be unused and declare them
with least size possible. This may allow it to be optimized compared to
the previous declarations

* comment resolutions

* lexical ordering of the indicies

- introduced accumulate methods that handle the intermediate steps if
needed to order the indexes

* add reduce_operator_accumulate.hpp to core.hpp

---------

Co-authored-by: Adam Osewski <Adam.Osewski@amd.com>
2025-10-29 09:58:04 +02:00
..

Pooling Operator

This folder contains example for the pooling operator using ck_tile tile-programming implementation. Currently the pooling kernel only supports 2D and 3D pooling.

build

# in the root of ck_tile
mkdir build && cd build
# you can replace <arch> with the appropriate architecture (for example gfx90a or gfx942) or leave it blank
../script/cmake-ck-dev.sh  ../ <arch>
# The 3D pooling example
make tile_example_pool3d -j`nproc`

This will result in an executable build/bin/tile_example_pool3d

example

args:
          -N    batch size (default:2)
          -D    depth dimension (default:30)
          -H    height dimension (default:30)
          -W    width dimension (default:30)
          -C    channel dimension (default:32)
          -Z    pooling window depth (default:2)
          -Y    pooling window height (default:2)
          -X    pooling window width (default:2)
         -Sz    window stride depth (default:2)
         -Sy    window stride height (default:2)
         -Sx    window stride width (default:2)
         -Dz    window dilation depth (default:1)
         -Dy    window dilation height (default:1)
         -Dx    window dilation width (default:1)
     -LeftPz    left padding depth (default:1)
     -LeftPy    left padding height (default:1)
     -LeftPx    left padding width (default:1)
    -RightPz    right padding depth (default:1)
    -RightPy    right padding height (default:1)
    -RightPx    right padding width (default:1)
          -v    0: No validation, 1: CPU validation (default:1)
     -warmup    number of iterations before benchmark (default:0)
     -repeat    number of iterations to benchmark (default:1)