diff --git a/example/ck_tile/36_pooling/README.md b/example/ck_tile/36_pooling/README.md index ab49b57095..4417e03734 100644 --- a/example/ck_tile/36_pooling/README.md +++ b/example/ck_tile/36_pooling/README.md @@ -2,6 +2,116 @@ This folder contains example for the pooling operator using ck_tile tile-programming implementation. Currently the pooling kernel only supports 2D and 3D pooling. +## Tensor Descriptor Transformations + +The pooling kernel transforms the input tensor into 2D format suitable for reduction. This section explains the transformation pipeline for both 2D and 3D pooling operations. + +### 3D Pooling Transformations + +For 3D pooling, the input tensor has shape `(N, D, H, W, C)` where: +- `N`: batch size +- `D`: depth dimension +- `H`: height dimension +- `W`: width dimension +- `C`: channel dimension + +The transformations convert this 5D tensor into a 2D tensor where rows represent output positions (M) and columns represent pooling window elements (K). + +```mermaid +graph TD + %% Input Tensor: (N, D, H, W, C) + Input["Input Tensor
(N, D, H, W, C)"] + style Input fill:#e1f5fe + + %% Pass-through N dimension + PassN["Pass-through N
(batch size)"] + style PassN fill:#f3e5f5 + Input --> PassN + + %% Pad spatial dimensions + PadD["Pad D
(depth with left/right padding)"] + style PadD fill:#fff9c4 + Input --> PadD + + PadH["Pad H
(height with left/right padding)"] + style PadH fill:#fff9c4 + Input --> PadH + + PadW["Pad W
(width with left/right padding)"] + style PadW fill:#fff9c4 + Input --> PadW + + %% Pass-through C dimension + PassC["Pass-through C
(channels)"] + style PassC fill:#f3e5f5 + Input --> PassC + + %% Embed sliding windows + EmbedD["Embed D
window(Z) × output_positions(Dₒ)"] + style EmbedD fill:#fff3e0 + PadD --> EmbedD + + EmbedH["Embed H
window(Y) × output_positions(Hₒ)"] + style EmbedH fill:#fff3e0 + PadH --> EmbedH + + EmbedW["Embed W
window(X) × output_positions(Wₒ)"] + style EmbedW fill:#fff3e0 + PadW --> EmbedW + + %% Merge into 2D matrix + MergeM["Merge M
(N, Dₒ, Hₒ, Wₒ, C)
→ output positions"] + style MergeM fill:#e8f5e9 + PassN --> MergeM + EmbedD --> MergeM + EmbedH --> MergeM + EmbedW --> MergeM + PassC --> MergeM + + MergeK["Merge K
(Z, Y, X)
→ window elements"] + style MergeK fill:#e8f5e9 + EmbedD --> MergeK + EmbedH --> MergeK + EmbedW --> MergeK + + %% Final padding for block alignment + PadM["Right-pad M
(for block alignment)"] + style PadM fill:#fff9c4 + MergeM --> PadM + + PadK["Right-pad K
(for block alignment)"] + style PadK fill:#fff9c4 + MergeK --> PadK + + %% Result + Result["2D Matrix
(M × K)"] + style Result fill:#c8e6c9 + PadM --> Result + PadK --> Result +``` + +**Transformation Steps:** +1. **Padding**: Apply left and right padding to spatial dimensions (D, H, W) to handle boundary conditions +2. **Sliding Windows**: Use embed transforms to create sliding windows across each spatial dimension, expanding each dimension into (window_size, output_positions) +3. **Reshaping**: Merge all dimensions into a 2D matrix where: + - M dimension = N × Dₒ × Hₒ × Wₒ × C (total output positions) + - K dimension = Z × Y × X (elements per pooling window) +4. **Block Alignment**: Apply right padding to ensure M and K dimensions are aligned to block size + +### 2D Pooling Transformations + +2D pooling follows the same transformation pipeline but operates on 4D tensors with shape `(N, H, W, C)`. The process is identical except: +- Only H and W dimensions are padded and embedded +- K dimension merges only (Y, X) window elements +- M dimension merges (N, Hₒ, Wₒ, C) + +### Output Tensor Transformations + +The output tensor transformations are simpler: +- Merge all output dimensions (N, Dₒ/Hₒ, Wₒ, C) into a single M dimension +- Apply right padding for block alignment +- The result is a 1D tensor that maps directly to the M dimension of the computation matrix + ## build ``` # in the root of ck_tile