# Pooling Operator
This folder contains example for the pooling operator using ck_tile tile-programming implementation. Currently the pooling kernel only supports 2D and 3D pooling.
## Tensor Descriptor Transformations
The pooling kernel transforms the input tensor into 2D format suitable for reduction. This section explains the transformation pipeline for both 2D and 3D pooling operations.
### 3D Pooling Transformations
For 3D pooling, the input tensor has shape `(N, D, H, W, C)` where:
- `N`: batch size
- `D`: depth dimension
- `H`: height dimension
- `W`: width dimension
- `C`: channel dimension
The transformations convert this 5D tensor into a 2D tensor where rows represent output positions (M) and columns represent pooling window elements (K).
```mermaid
graph TD
%% Input Tensor: (N, D, H, W, C)
Input["Input Tensor
(N, D, H, W, C)"]
style Input fill:#e1f5fe
%% Pass-through N dimension
PassN["Pass-through N
(batch size)"]
style PassN fill:#f3e5f5
Input --> PassN
%% Pad spatial dimensions
PadD["Pad D
(depth with left/right padding)"]
style PadD fill:#fff9c4
Input --> PadD
PadH["Pad H
(height with left/right padding)"]
style PadH fill:#fff9c4
Input --> PadH
PadW["Pad W
(width with left/right padding)"]
style PadW fill:#fff9c4
Input --> PadW
%% Pass-through C dimension
PassC["Pass-through C
(channels)"]
style PassC fill:#f3e5f5
Input --> PassC
%% Embed sliding windows
EmbedD["Embed D
window(Z) × output_positions(Dₒ)"]
style EmbedD fill:#fff3e0
PadD --> EmbedD
EmbedH["Embed H
window(Y) × output_positions(Hₒ)"]
style EmbedH fill:#fff3e0
PadH --> EmbedH
EmbedW["Embed W
window(X) × output_positions(Wₒ)"]
style EmbedW fill:#fff3e0
PadW --> EmbedW
%% Merge into 2D matrix
MergeM["Merge M
(N, Dₒ, Hₒ, Wₒ, C)
→ output positions"]
style MergeM fill:#e8f5e9
PassN --> MergeM
EmbedD --> MergeM
EmbedH --> MergeM
EmbedW --> MergeM
PassC --> MergeM
MergeK["Merge K
(Z, Y, X)
→ window elements"]
style MergeK fill:#e8f5e9
EmbedD --> MergeK
EmbedH --> MergeK
EmbedW --> MergeK
%% Final padding for block alignment
PadM["Right-pad M
(for block alignment)"]
style PadM fill:#fff9c4
MergeM --> PadM
PadK["Right-pad K
(for block alignment)"]
style PadK fill:#fff9c4
MergeK --> PadK
%% Result
Result["2D Matrix
(M × K)"]
style Result fill:#c8e6c9
PadM --> Result
PadK --> Result
```
**Transformation Steps:**
1. **Padding**: Apply left and right padding to spatial dimensions (D, H, W) to handle boundary conditions
2. **Sliding Windows**: Use embed transforms to create sliding windows across each spatial dimension, expanding each dimension into (window_size, output_positions)
3. **Reshaping**: Merge all dimensions into a 2D matrix where:
- M dimension = N × Dₒ × Hₒ × Wₒ × C (total output positions)
- K dimension = Z × Y × X (elements per pooling window)
4. **Block Alignment**: Apply right padding to ensure M and K dimensions are aligned to block size
### 2D Pooling Transformations
2D pooling follows the same transformation pipeline but operates on 4D tensors with shape `(N, H, W, C)`. The process is identical except:
- Only H and W dimensions are padded and embedded
- K dimension merges only (Y, X) window elements
- M dimension merges (N, Hₒ, Wₒ, C)
### Output Tensor Transformations
The output tensor transformations are simpler:
- Merge all output dimensions (N, Dₒ/Hₒ, Wₒ, C) into a single M dimension
- Apply right padding for block alignment
- The result is a 1D tensor that maps directly to the M dimension of the computation matrix
## build
```
# in the root of ck_tile
mkdir build && cd build
# you can replace with the appropriate architecture (for example gfx90a or gfx942) or leave it blank
../script/cmake-ck-dev.sh ../
# The 3D pooling example
make tile_example_pool3d -j`nproc`
```
This will result in an executable `build/bin/tile_example_pool3d`
## example
```
args:
-N batch size (default:2)
-D depth dimension (default:30)
-H height dimension (default:30)
-W width dimension (default:30)
-C channel dimension (default:32)
-Z pooling window depth (default:2)
-Y pooling window height (default:2)
-X pooling window width (default:2)
-Sz window stride depth (default:2)
-Sy window stride height (default:2)
-Sx window stride width (default:2)
-Dz window dilation depth (default:1)
-Dy window dilation height (default:1)
-Dx window dilation width (default:1)
-LeftPz left padding depth (default:1)
-LeftPy left padding height (default:1)
-LeftPx left padding width (default:1)
-RightPz right padding depth (default:1)
-RightPy right padding height (default:1)
-RightPx right padding width (default:1)
-v 0: No validation, 1: CPU validation (default:1)
-warmup number of iterations before benchmark (default:0)
-repeat number of iterations to benchmark (default:1)
```